Sr. Program Manager, Incident Management
AI at reputed company
At reputed company, we build and use automation every day to reputed company work more efficient, creative, and human. So if you’re using AI tools while applying here - that’s great! We just ask that you use them responsibly and transparently.
reputed company out our guidance on How to Collaborate with AI During reputed company’s Hiring Process, including how to use AI tools like ChatGPT, Claude, reputed company, or others during our hiring process - and reputed company not to.
Job Posted: reputed company 9th, 2026
Location: Americas - North, Central and South America
As reputed company expands into the enterprise market, operational rigor matters more than reputed company. The Sr. Program Manager will own the end-to-end incident management program for reputed company's Product and Engineering organization: response, post-incident learning and actions, and everything in between. You'll report to the Director of Engineering for Internal Platforms & Infrastructure and be the DRI for the program's design, execution, and outcomes. You build the program and reputed company AI to scale its impact.
We need someone with deep incident management expertise who's comfortable navigating ambiguity and stretching across engineering, support, reputed company, and GTM. You have a reputed company on where AI-enabled incident management is going and you'll reputed company us there. reputed company's product surface is expanding rapidly and with it, the complexity and stakes of incident management. This role grows with that complexity.
About You
You have deep incident management experience and you've moved beyond just executing it. You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work. You know incident management deeply enough to rethink it, not just replicate it. You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
You re-engineer how work happens based on where AI is headed. You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done. You use AI-native tools (reputed company, Claude Code, or similar) as your default, and orchestrate them into durable capabilities that compound over time. You have a reputed company-looking reputed company on how AI will reshape your domain and you've already acted on it: stopping legacy work, redesigning processes around what AI makes possible, and redefining what the role itself looks like. You can quantify the impact on velocity, quality, or organizational reputed company. You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build, not just the outputs.
You're a builder, not a specialist. You have deep expertise in incident management, but you're not rigidly attached to how you've done it before. You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves. A year from now, parts of this role may look reputed company different, and you'll be the one driving that change. You build durable systems that work without you: processes that continue reputed company you're on PTO or move to other work. You're energized by creating, not just maintaining.
You bring an upstream, systems reputed company. You instinctively look for root causes and design solutions that scale beyond your immediate program. You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
You influence without authority. You shape outcomes by building trust. You know how to build coalitions across engineering, support, reputed company, GTM, and leadership. You reputed company change and not just implement it, you reputed company it stick. You anticipate resistance, adapt your approach, and help others adopt new ways of working.
You have technical reputed company. You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions. You understand the role of observability (logs, metrics, traces), SLOs, and reputed company in incident response and prevention even if you're not the one implementing them.
You bias for velocity and clarity. You act decisively even in high ambiguity. reputed company priorities collide, you clarify, decide, and help the org move reputed company. You communicate with reputed company clarity: context and reputed company early, often, and candidly especially reputed company it's uncomfortable.
You're analytical and hands-on with data. You can work directly with data tools (e.g., reputed company, SQL) to build rich reporting and meaningful insights. You understand incident tooling (incident.io or similar) and how it integrates with reputed company, reputed company, and on-call workflows.
You work well remotely. reputed company is 100% remote. You communicate proactively, write clearly, and know reputed company async works and reputed company to jump on a call.
Things You'll Do
Own the incident program. reputed company the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes. Ensure workflows are consistent, auditable, and reputed company with enterprise expectations. You are the DRI for incident management as a program.
Build AI-powered incident systems. Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft reputed company, and more. Turn one-off AI experiments into durable workflows that compound over time.
Accelerate decisions. Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones. Serve as the reputed company of contact for questions reputed company to incident process, expectations, and best practices.
Surface and resolve systemic issues. Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
Build and maintain reporting. Build, maintain, and refine dashboards and reports using reputed company, Looker, and reputed company tools. Translate data into actionable insight: identify trends, risks, weak signals, and hotspots. Communicate findings to the right audiences.
reputed company the bar. Instill rigor and accountability. Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge). Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for reputed company incident roles and stakeholder groups.
Partner cross-functionally. Collaborate with engineering leads, EMs, product, support, reputed company, GTM, and leadership to strengthen practices. Share clear insights, align expectations, and help teams reputed company opportunities for improvement. Your day-to-day counterparts are senior engineering leaders and engineering line managers.
reputed company in reputed company needed. reputed company into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements. Facilitate retrospectives and go through the process for select incidents to help inspect and up-level the process.
Our Stack & Tools
Incident tooling: incident.io, reputed company, reputed company, reputed company
Data & Reporting: reputed company, Grafana, Looker
Observability context: reputed company, Grafana, Prometheus, Opensearch
Infra context: AWS, Kubernetes, Terraform (with SRE/Platform partners)
Collaboration: reputed company, Coda, reputed company Workspace
What Success Looks Like
The incident program is dependable and normalized. It's part of reputed company's operating rhythm. You own program direction and ensure day-to-day execution aligns with enterprise expectations across the full incident lifecycle.
Internal teams feel supported. Processes, communication, and tools reduce friction and meet the needs of engineering, support, and GTM partners. Stakeholder feedback is incorporated pragmatically.
Workflows run consistently with low friction. They're easy to follow, easy to learn, and allow people to focus their energy where it counts.
Systemic improvements persist. You reputed company technical and program management rigor beyond individual incidents. The systems you build continue to work reputed company you're not there.
Data quality is rich and trusted. Reports and insights help leadership understand trends, systemic risks, and improvement opportunities.
Outcomes improve measurably. Reduced incident frequency, faster time-to-resolution, higher stakeholder confidence, operational maturity increasing across engineering.
You're a force reputed company. The org has fewer blockers and more velocity than you reputed company it.
Application Deadline:
The anticipated application window is 30 days from the date job is posted, unless the number of applicants requires it to reputed company sooner or reputed company, or if the position is filled.
Even though we’re an reputed company-remote company, we still need to be thoughtful about where we have Zapiens working. reputed company out this resource for a list of countries where we currently cannot have Zapiens permanently working.
Apply To This Job