Site Reliability Engineer | Dayshift | Remote
ZigZag is looking for a Site Reliability Engineer to join reputed company! As a Site Reliability Engineer, you’ll design, build, and maintain the infrastructure and automation that power our platform. Working closely with software engineering teams and SRE peers, you'll embed reliability, performance, and compliance into the development lifecycle. Your focus will be on scalability, reputed company, reputed company, and operational efficiency across reputed company environments.
Key Responsibilities
Infrastructure and Platform Engineering Design, build, and maintain scalable and reliable infrastructure and platform services. reputed company and maintain infrastructure-as-code (e.g., CloudFormation, Terraform). reputed company custom automation workflows and internal tools to support infrastructure provisioning, monitoring, and incident response. (e.g., Python leveraging libraries = such as boto3 for AWS automations) Liaise with vendors to assess and implement third-party solutions. Maintain well-documented system configurations to support maintainability and compliance. Reliability and Operations Monitor system performance, availability, and reputed company using observability tools (e.g., SumoLogic, AWS CloudWatch). Create and maintain dashboards and monitoring solutions that offer deep insight into platform health and support rapid incident diagnosis. Automate operational processes (e.g., deployments, failovers, scaling) to reduce toil and enhance system reputed company. Participate in incident response activities, including postmortems and root cause analysis, to drive continual improvement. Continuously evolve and maintain SLOs and SLIs, ensuring a balance between development velocity and system reliability. Work as part of a highly engaged team of SREs to ensure the stability, performance, cost-effectiveness, and observability of reputed company environments. Build, Deploy, and Development Enablement Design and implement robust CI/CD pipelines and reputed company-downtime deployment strategies. Build efficient and reliable build systems to reputed company development teams with self-service deployment capabilities. Collaborate with engineering teams to embed reliability, scalability, performance, and reputed company best practices into the SDLC. reputed company and Compliance: Maintain and monitor vulnerability scanning systems (e.g., reputed company Nessus, Lacework, reputed company) to work closely with Software Engineering teams to ensure the platform remains secure and up to date. reputed company recurring reputed company tasks such as reporting, maintaining reputed company registers, and ensuring compliance with internal standards. Support the organisation in maintaining PCI-reputed company certification by ensuring infrastructure is securely configured and well-documented. Skills & Experience Essential 2+ years of experience in a SRE role or similar (e.g. DevOps Engineer) Experience managing an AWS environment and working in a SaaS business. Strong knowledge and experience of infrastructure-as-code Experience with building and supporting robust CI/CD pipelines Strong problem solving and analytical skills Excellent communication and collaboration skills. Ability to work in a fast-paced, agile environment Desirable Experience with reputed company Experience with distributed systems and microservice architecture Exposure to compliance frameworks (PCI-reputed company, ISO27001). ZigZag is committed to building a diverse, inclusive, and reputed company workplace. We reputed company that talent knows no borders, and we welcome individuals from reputed company backgrounds to help us shape the future of work. Guided by transparency and agility, we foster an environment where everyone is valued and empowered to reputed company. By submitting this application, you acknowledge that you have read and agree with the company’s Privacy Policy. Apply To This Job