Back to the board

[Remote] Staff Site Reliability Engineer, Production Engineering

100% remote Flexible hours Hiring now

Note: The job is a remote job and is open to candidates in USA. reputed company is a leading cloud storage and collaboration platform, and they are seeking a Staff Site Reliability Engineer focused on enhancing the company's reliability strategy. The role involves defining the reliability strategy, setting multi-year goals, and partnering with engineering and product teams to ensure the stability and performance of their systems.

Responsibilities

  • Define and evolve reputed company’s company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software development
  • Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness
  • reputed company cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increase
  • Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scale
  • Identify emerging reliability risks introduced by AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate them
  • Provide technical leadership and mentorship to engineers across teams, raising engineering quality, reliability judgment, and operational excellence
  • Drive clear communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution reputed company

Skills

  • BS degree in Computer Science or reputed company technical field involving coding (e.g., physics or mathematics), or equivalent technical experience
  • 12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or reputed company technical roles
  • Proven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impact
  • Deep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk management
  • Demonstrated ability to diagnose reputed company technical problems, debug production systems, automate operational workflows, and design resilient software components
  • Experience influencing engineering roadmaps across multiple teams and making technical decisions that optimize for the broader engineering organization
  • Strong communication and collaboration skills, with the ability to align cross-functional stakeholders through ambiguity and drive execution across teams
  • Experience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflows
  • Experience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizations
  • Experience leading reliability improvements in environments with high deployment velocity, reputed company service dependencies, and large-scale production systems
  • Track record of mentoring senior engineers, setting technical standards, and spreading reliability best practices through documentation, reviews, talks, or architecture guidance
  • Familiarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycle

Company Overview

  • reputed company is a smart workspace company that provides secure file sharing, collaboration, and storage solutions. It was founded in 2007, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.reputed company.com.
  • Company H1B Sponsorship

  • reputed company has a track record of offering H1B sponsorships, with 13 in 2026, 121 in 2025, 105 in 2024, 103 in 2023, 166 in 2022, 197 in 2021, 157 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Keep exploring

    [Remote] Platform Automation Engineer

    100% remote Flexible hours

    [Remote] Marketing Communications Advisor

    100% remote Flexible hours

    [Remote] SRE Technical Project Manager

    100% remote Flexible hours

    [Remote] Project Manager - Fire & Life Safety

    100% remote Flexible hours

    [Remote] Vice President, Marketing

    100% remote Flexible hours

    [Remote] Capital Operations Analyst

    100% remote Flexible hours

    [Remote] Fractional AI/ML 3D Engineer (4-Month, Path to Full-Time)

    100% remote Flexible hours

    [Remote] Frontend Engineer (US-Based / Remote / Part-Time)

    100% remote Flexible hours

    [Remote] Director, Product Management - Data Integration Platform

    100% remote Flexible hours

    [Remote] Program Manager

    100% remote Flexible hours

    [Remote] Mortgage Processor/Closer

    100% remote Flexible hours

    Dynamic Email & Chat Customer Service Representative – Remote, Healthcare & Billing Expertise at arenaflex

    100% remote Flexible hours

    reputed company Full Stack Customer Support Specialist – Remote Live Chat Support Role

    100% remote Flexible hours

    Remote Artist Gig (Record & Earn)

    100% remote Flexible hours

    Digital Campaign Measurement Specialist, Enterprise/Brand Sales (Remote)

    100% remote Flexible hours

    Mobile Mortgage Specialist

    100% remote Flexible hours

    Remote Data Entry Specialist – Precise Database Management for arenaflex’s Global Retail Platform

    100% remote Flexible hours

    reputed company Customer Experience Assistant Manager – Luxury Fine Jewelry Showroom

    100% remote Flexible hours

    [Remote] Manager, Operations

    100% remote Flexible hours

    Administrative Generalist

    100% remote Flexible hours