Site Reliability Engineer

Link Group

2 days ago

Remote

Web Development

About the Role

We are looking for a Senior Site Reliability Engineer who will take end-to-end ownership of reliability for AI-driven applications and pipelines. This is a hands-on engineering role, not a coordination or ticket-driven position. The ideal candidate actively diagnoses, resolves, and automates production issues rather than only designing solutions.

Requirements

5+ years as SRE / Production / Platform Engineer
Strong incident management & RCA experience
Hands-on with: Azure DevOps, Kubernetes, Datadog, Azure, CI/CD
Proactive, ownership mindset, self-driven
Experience in production environments
Nice to have: AI/LLM pipelines, Grafana

Responsibilities

Build and maintain monitoring, alerting, dashboards
Lead incident response & root cause analysis
Ensure reliability and performance of AI pipelines
Standardize telemetry (latency, failures, throughput)
Optimize CI/CD and release quality
Reduce recurring incidents with engineering teams

Site Reliability Engineer

About the Role

Requirements

Responsibilities

More jobs

Nurse - Clinical Review

HealthHelp

CT Tech, Radiology Per Diem Varied shift

Kpc Global Medical Centers Inc.