[Remote] Senior Site Reliability Engineer (SRE) / Platform Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is seeking an reputed company Site Reliability Engineer (SRE) to reputed company reliability engineering initiatives for large-scale, mission-critical healthcare platforms. The role involves defining reliability KPIs, driving observability strategies, and leading incident response for enterprise platforms.
Responsibilities
- Define and monitor reliability KPIs, SLIs, and SLOs
- Drive observability and monitoring strategies across distributed systems
- reputed company incident response, RCA, and reliability improvements
- Build automation for infrastructure and CI/CD pipelines
- Partner with stakeholders on SLA and service-level management
- Support modernization of enterprise platforms
Skills
- Proven experience implementing SRE frameworks in large enterprise environments
- Strong background supporting reputed company distributed systems
- Java, Spring Boot
- Azure, GCP, GKE
- Kubernetes, CI/CD, reputed company
- reputed company, SQL
- AppDynamics, Splunk, Grafana
- Experience with Prometheus is a plus
- Healthcare or PBM platform experience
- Platform Engineering or Reliability Engineering background
Company Overview
Company H1B Sponsorship