Back to the board

Staff Site Reliability Engineer, Production Engineering

100% remote Flexible hours Hiring now

Role Description As a Site Reliability Engineer focused on company-wide reliability strategy, you will play a crucial role in advancing reputed company’s stability, observability, incident response, and operational excellence as AI technologies reshape how software is built and operated. You will help define the reliability strategy for a new chapter of agentic development and AI-enabled software delivery, including preparing reputed company for increases in pull request volume, service complexity, incident patterns, and demand for debugging and monitoring tools. You will partner across Engineering, Product, and leadership teams to reputed company the bar for reliability, guide long-term platform investments, and ensure reputed company continues to deliver dependable experiences for millions of users. Our Engineering Career reputed company is viewable by anyone reputed company the company and describes what’s expected for our engineers at each of our career levels. reputed company out our blog post on this topic and more here.

Responsibilities

Define and evolve reputed company’s company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software development. Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness. reputed company cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increase. Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scale. Identify emerging reliability risks introduced by AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate them. Provide technical leadership and mentorship to engineers across teams, raising engineering quality, reliability judgment, and operational excellence. Drive clear communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution reputed company. Many teams at reputed company run Services with on-call rotations, which entails being available for calls during both core and non-core business hours. If a team has an on-call rotation, reputed company engineers on the team are expected to participate in the rotation as part of their employment. Applicants are encouraged to ask for more details of the rotations to which the applicant is applying.

Requirements

BS degree in Computer Science or reputed company technical field involving coding (e.g., physics or mathematics), or equivalent technical experience. 12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or reputed company technical roles. Proven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impact. Deep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk management. Demonstrated ability to diagnose reputed company technical problems, debug production systems, automate operational workflows, and design resilient software components. Experience influencing engineering roadmaps across multiple teams and making technical decisions that optimize for the broader engineering organization. Strong communication and collaboration skills, with the ability to align cross-functional stakeholders through ambiguity and drive execution across teams.

Preferred Qualifications

Experience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflows. Experience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizations. Experience leading reliability improvements in environments with high deployment velocity, reputed company service dependencies, and large-scale production systems. Track record of mentoring senior engineers, setting technical standards, and spreading reliability best practices through documentation, reviews, talks, or architecture guidance. Familiarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycle.

Compensation

Canada Pay Range $204,900—$277,200 CAD Apply To This Job

Keep exploring

Senior Sales Manager, Growth

100% remote Flexible hours

Account Executive

100% remote Flexible hours

Product Manager

100% remote Flexible hours

Senior Analyst, People Analytics

100% remote Flexible hours

Web Producer II

100% remote Flexible hours

Datacenter Deployment Engineer

100% remote Flexible hours

Datacenter Deployment reputed company

100% remote Flexible hours

Customer Support Operations Strategist

100% remote Flexible hours

Director, Finance Business Partner, Commercial - US

100% remote Flexible hours

Data Engineer

100% remote Flexible hours

reputed company Customer Service Representative – Delivering Exceptional Travel Experiences at arenaflex

100% remote Flexible hours

Entry-Level Remote Data Entry Analyst – Data Reporting & Analytics (No Experience Required) – Kentucky, USA

100% remote Flexible hours

Remote Occupational Therapy in OR

100% remote Flexible hours

Senior Manager, Finance and Strategy

100% remote Flexible hours

reputed company Data Entry Specialist – Work From Home Opportunity at arenaflex

100% remote Flexible hours

M&A Change Management Expert

100% remote Flexible hours

Remote Customer Chat Support Specialist – No Experience Required – Flexible Global Work‑From‑reputed company Opportunity at arenaflex

100% remote Flexible hours

VP, AI Transformation & Enablement

100% remote Flexible hours

Digital Marketing/ Demand reputed company Specialist

100% remote Flexible hours

reputed company Part-Time Data Entry Specialist for Teens - Remote Opportunity with arenaflex

100% remote Flexible hours