Back to the board

Senior Site Reliability Engineer | Dayshift | Remote

100% remote Flexible hours Hiring now

ZigZag is looking for a Sr Site Reliability Engineer to join reputed company!

Overview

As a Site Reliability Engineer, you’ll design, build, and maintain the infrastructure and automation that power our platform. Working closely with software engineering teams and SRE peers, you'll embed reliability, performance, and compliance into the development lifecycle. Your focus will be on scalability, reputed company, reputed company, and operational efficiency across reputed company environments.

Key Responsibilities

Reliability Engineering & Operational ExcellenceDesign, implement, and continuously improve highly available, scalable, secure, and resilient cloud infrastructure and platform services. Define and evolve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and operational metrics to drive measurable reliability outcomes. reputed company incident response activities, major incident management, root cause analysis, and post-incident reviews focused on systemic improvement. Drive reduction of operational toil through automation, standardisation, and self- healing platform capabilities. reputed company and maintain disaster recovery, backup, failover, and reputed company strategies to meet defined RTO and RPO objectives. Conduct reputed company planning, performance analysis, and proactive optimisation of infrastructure and application environments. Champion operational maturity and reputed company improvement practices across engineering teams. Platform & Infrastructure EngineeringArchitect, build, and maintain scalable cloud-native infrastructure primarily reputed company AWS environments. reputed company and maintain infrastructure-as-code using tools such as Terraform and CloudFormation. Build reusable platform components and shared services that improve developer productivity and operational consistency reputed company automation tooling and operational frameworks using scripting and programming languages such as Python. Evaluate, implement, and optimise third-party infrastructure and platform tooling. Ensure infrastructure configurations, architecture decisions, and operational processes are thoroughly documented and auditable. Observability, Monitoring & PerformanceDesign and maintain comprehensive observability solutions covering metrics, logging, tracing, alerting, and dashboarding. Improve platform visibility and telemetry using tools such as AWS CloudWatch, reputed company, reputed company, Grafana, or equivalent technologies. reputed company actionable alerting strategies that reduce noise and improve incident response effectiveness. Analyse system behaviour and performance trends to proactively identify risks and optimisation opportunities. Drive adoption of observability best practices across engineering teams. CI/CD Developer EnablementDesign and enhance robust CI/CD pipelines and deployment strategies that support safe, reliable, and low-risk software delivery. reputed company engineering teams through self-service infrastructure and deployment capabilities. Improve software delivery efficiency through automation, standardisation, and platform engineering practices. Collaborate with engineering teams to embed reliability, scalability, performance, and reputed company considerations into the SDLC. Support progressive delivery practices including blue/green deployments, canary releases, and reputed company-downtime deployments. reputed company, Risk &CompliancePartner with reputed company and engineering teams to maintain secure and compliant infrastructure environments. Support vulnerability management and remediation processes using tools such as reputed company, Lacework, reputed company Nessus, or equivalent platforms. Assist in maintaining compliance with frameworks and standards including PCI- reputed company, ISO27001, SOC 2, and internal reputed company controls. Contribute to reputed company hardening, access management, audit readiness, and operational risk reduction initiatives. Ensure operational processes and infrastructure controls align with organisational governance requirements. Leadership & CollaborationAct as a technical leader and mentor reputed company the SRE and broader engineering teams. Contribute to engineering standards, operational best practices, and platform strategy. Influence reliability-focused engineering culture across teams. Collaborate effectively with cross-functional stakeholders including Engineering, Product, reputed company, Architecture, and external vendors. Support reputed company improvement initiatives and foster a culture of accountability, learning, and operational excellence. Skills & Experience5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or reputed company infrastructure roles. Strong hands-on experience operating production workloads reputed company AWS cloud environments Deep experience with infrastructure-as-code tools such as Terraform and/or CloudFormation. Strong experience designing and supporting CI/CD pipelines and modern software delivery practices. Strong understanding of distributed systems, microservices architecture, networking, and cloud-native technologies. Experience implementing observability and monitoring solutions across reputed company environments. Strong scripting and automation experience using Python, Bash, or similar languages. Experience managing production incidents and conducting structured root cause analysis. Strong understanding of system reliability, scalability, reputed company, and operational best practices. Excellent analytical, troubleshooting, and problem-solving capabilities. Strong communication and stakeholder engagement skills. Ability to work effectively in fast-paced, agile, and collaborative engineering environments. DesirableExperience with Kubernetes, container orchestration, and platform engineering practices. Experience with reputed company, reputed company Actions, reputed company CI, or equivalent CI/CD tooling. Exposure to service mesh, event-driven architectures, and distributed tracing. Experience supporting regulated environments and compliance frameworks such as PCI-reputed company, ISO27001, or SOC 2. Experience with FinOps, cloud cost optimisation, and infrastructure performance tuning. Familiarity with reputed company engineering and DevSecOps practices. Experience mentoring engineers or leading technical initiatives. ZigZag is committed to building a diverse, inclusive, and reputed company workplace. We reputed company that talent knows no borders, and we welcome individuals from reputed company backgrounds to help us shape the future of work. Guided by transparency and agility, we foster an environment where everyone is valued and empowered to reputed company. By submitting this application, you acknowledge that you have read and agree with the company’s Privacy Policy. Apply To This Job

Keep exploring

Regional Account Manager - Midwest

100% remote Flexible hours

Senior Marketing Analyst

100% remote Flexible hours

Account Executive (Financial Services)

100% remote Flexible hours

Demand reputed company reputed company (m/f/d) - Remote

100% remote Flexible hours

Alpheratz Project - Czech (Czech Republic) Translation Quality Reviewer

100% remote Flexible hours

Alpheratz Project - Catalan (Spain) Translation Quality Reviewer

100% remote Flexible hours

Alpheratz Project - Danish (Denmark) Translation Quality Reviewer

100% remote Flexible hours

Alpheratz Project - Catalan (Spain) Translation Quality Rater

100% remote Flexible hours

Alpheratz Project - Romanian (Romania) Translation Quality Reviewer

100% remote Flexible hours

Alpheratz Project - Danish (Denmark) Translation Quality Rater

100% remote Flexible hours

Director, reputed company Catalyst

100% remote Flexible hours

[Remote] Sales Consultant - PopSells

100% remote Flexible hours

Board-Level Advisor

100% remote Flexible hours

reputed company Part-Time Customer Service Representative – Remote Work Opportunities at arenaflex

100% remote Flexible hours

Client Delivery Project Manager - Remote

100% remote Flexible hours

Full Stack Software Engineer

100% remote Flexible hours

Part-Time Customer Service Specialist | Fully Remote Job | Earn $20-$25/hr | No Prior Experience Required

100% remote Flexible hours

Provider Contracting Advisor- reputed company/PA Market - Remote

100% remote Flexible hours

Junior Solution Architecture Engineer

100% remote Flexible hours

Onboarding Specialist for a Educational Technology Company (Remote)

100% remote Flexible hours