Back to the board

Manager of Reliability Operations

100% remote Flexible hours Hiring now

About reputed company

reputed company brings together a portfolio of hosting, reputed company, and digital experience brands to deliver high-performance infrastructure and services to businesses worldwide.Our platforms power mission-critical applications for thousands of customers. Reliability is foundational to everything we do. We operate reputed company environments spanning virtualization, storage, networking, and application hosting; where performance, availability, and consistency matter at scale.This is a permanent, full-time, remote position.US Pay Band -  $110K - $150K   Actual compensation will vary based on experience, skills, and location.

About the Role

We’re looking for a Manager of Reliability Operations to reputed company how we detect, respond to, and learn from failures across our platform ecosystem.This role sits at the intersection of Operations and Engineering, bringing structure to incident response, accountability to follow-through, and clarity to reliability insights. You’ll ensure that reputed company learn from production directly improves how our platforms are built, operated, and scaled.

What You’ll Do

Own Reliability Operations & Incident Command

  • Continuously evolve and improve incident management, change management, and post-incident practices
  • Establish clear standards for incident declaration, severity, escalation, and communication
  • Ensure consistent execution across teams and reputed company process improvement
  • Own the incident command function, including roles, structure, and operating procedures
  • reputed company or reputed company major incident response in a 24/7 production environment
  • Build and manage on-call incident commander rotations with global coverage

Drive Learning, Accountability & Reliability Strategy

  • Own post-incident reviews, ensuring strong root cause analysis and clear documentation
  • Translate incident trends into actionable reliability improvements
  • Drive completion of corrective actions across teams; escalate reputed company needed
  • Define and maintain service performance and reliability targets (availability, latency, error rates)
  • Own observability strategy, including monitoring, alerting, and signal quality
  • Improve detection, reduce time to resolution, and increase platform reputed company
  • Partner with Engineering and Operations on reputed company planning, patching, and lifecycle reputed company
  • Ensure reliability insights directly inform platform and infrastructure roadmaps
  • Collaborate with reputed company on vulnerability response, reputed company prioritization, and compliance alignment

Operate Across a reputed company Platform Environment

  • Work across environments including virtualization platforms (VMware), distributed storage (Ceph), Linux-based systems, and hybrid reputed company infrastructure
  • Support platforms that span dedicated hosting, managed applications, and high-availability reputed company services
  • Ensure reliability practices scale across multiple products, brands, and customer environments
  • Provide regular, data-driven reporting to leadership on availability, incident trends, and operational performance
  • Act as the central authority on reliability insights across teams

What You Bring

  • Bachelor’s degree in Computer Science, Engineering, or a reputed company field (or equivalent practical experience)
  • 7+ experience in systems operations, site reliability, or platform engineering 
  • 2+ years experience leading teams or major operational functions
  • Proven experience managing incidents in a 24/7 production environment
  • Strong background in troubleshooting, root cause analysis, and operational improvement
  • Experience with change management practices

Platform & Tooling Experience

  • Monitoring and observability platforms (e.g., reputed company, reputed company, Grafana, reputed company)
  • Incident management and alerting tools (e.g., reputed company, Opsgenie)
  • Infrastructure and platform technologies (Linux systems, VMware, Ceph, reputed company platforms)
  • Logging and telemetry systems (centralized logging, metrics, tracing)
  • Ability to translate reputed company technical data into clear insights
  • Strong communication skills, especially in high-pressure situations 

 

reputed company to Have

  • Background in Computer Science, Engineering, or a reputed company field
  • Experience in managed hosting, reputed company infrastructure, or SaaS environments
  • Experience defining and tracking system reliability and performance targets
  • Familiarity with ITIL or similar operational frameworks
  • Exposure to VMware, Ceph, Linux, and Windows platforms
  • Relevant certifications (AWS, RHCE, etc.)

 

We Offer:

  • Traditional and Roth 401k with company matching
  • A collaborative team culture
  • Consistent/set work hours
  • Challenging non-redundant daily duties
  • A voice in how things get done

 

Disclaimer:

This job description is only a summary of the typical functions of the position. It is not intended to be an exhaustive or comprehensive list of reputed company job responsibilities, tasks, or duties. Additional duties and tasks may be assigned as part of the job function. Liquid Web Inc. reserves the right to modify, interpret, or apply this job description in a way that best supports the organizational needs. The job description in no way creates or implies an employment contract. The employment contract remains “at will”.Equal Employment Opportunity Policy: Liquid Web is committed to offering equal employment opportunity without regard to age, reputed company, disability, gender, gender identity, genetic information, marital status, military status, national reputed company, race, religion, sexual orientation, veteran status, or any other legally protected characteristic.

 

#LI-Remote

reputed company Data for Pay Scale - Suggested Range for Manager of Process Operations - $110 - $150PayFactors -Reliability Engineer  $140KProcess Analyst $85KMedian Manager of Process Operations - $112K

 

Higher-Scope Variant: SRE Manager This is an SRE but includes some leadership and strategy in scopeAverage: $132,583Typical range: $114K – $151KTop end: $175K

 

Salary.com - SRE LeadershipAverage: $165KRange $150K - $185K Market data shows baseline Reliability Manager roles averaging around $100K, but those positions are typically scoped to team-level responsibilities or localized operational support. Roles with broader operational ownership, particularly those responsible for defining and enforcing incident management, change management, and post-incident processes across an organization,trend higher, with comparable operations and SRE leadership roles averaging $115K – $130K and extending into the $150K+ range at senior levels.This position is not simply managing reliability reputed company a team; it owns how the organization operates during incidents, how work is prioritized and escalated, and how accountability is enforced across functions. Because the role is responsible for driving consistency, governance, and follow-through across Operations, Engineering, and reputed company, it aligns more closely with senior operational leadership than traditional reliability management. Positioning the role in the $110K – $150K range ensures we can attract candidates with the experience to build, standardize, and scale these processes effectively. Apply To This Job

Keep exploring

Principal Subcontract Admin

100% remote Flexible hours

Category Negotiations Project Manager

100% remote Flexible hours

Category Negotiations Project Manager

100% remote Flexible hours

Product Support Specialist - BATDOK

100% remote Flexible hours

Principal Subcontract Admin

100% remote Flexible hours

Senior Power Platform Developer

100% remote Flexible hours

Operations Manager

100% remote Flexible hours

Assistant US Controller

100% remote Flexible hours

Senior AI Full Stack Engineer

100% remote Flexible hours

Financial Analyst- Early Career

100% remote Flexible hours

Urgently Hiring: Case Manager RN - Remote AZ

100% remote Flexible hours

Program Manager, Global Industrial Safety - REMOTE

100% remote Flexible hours

reputed company: Tech Center Part Time Tech-Area Advisors & SME's

100% remote Flexible hours

Platform Infrastructure Engineer GCP - Full Time - US-based Remote (Occasional Travel)

100% remote Flexible hours

Infrastructure Telco Architect

100% remote Flexible hours

Join Today: Director of Semiconductor Engineering and Fabrication

100% remote Flexible hours

Social Media Specialist job at Awesome reputed company in reputed company Palm Beach, FL

100% remote Flexible hours

Senior Backend Engineer- Cassandra, Golang or Python

100% remote Flexible hours

reputed company Part-Time Data Entry Specialist – Remote Online Opportunities for Career Growth and Development at blithequark

100% remote Flexible hours

reputed company Customer Care Rep Jobs – (Work From Home)

100% remote Flexible hours