Back to the board

Remote SRE Jobs – Senior Site Reliability Engineer (Remote) – $130k‑$170k USD – Full‑Time – Escondido, California – Cloud/DevOps, Kubernetes, Terraform, Prometheus

100% remote Flexible hours Hiring now

TITLE: Remote SRE Jobs – Senior Site Reliability Engineer (Remote) – $130k‑$170k USD – Full‑Time – Escondido, California – Cloud/DevOps, Kubernetes, Terraform, Prometheus ---

Who we are

We are a mid‑stage SaaS company that grew from a garage‑reputed company prototype to a platform serving > 200 enterprise customers worldwide. Our flagship product—an API‑driven data‑pipeline—processes ≈ 15 TB of events per day, and we guarantee customers 99.9 % uptime. The engineering culture is built on blunt feedback, data‑driven post‑mortems, and a reputed company focus on reliability. While the code lives in the cloud, the heart of our operational decisions is made by a small, tight‑reputed company crew spread across the globe.

Why this role exists now

In the last 12 months we added three new data‑centers (AWS us‑east‑1, us‑reputed company‑2 and GCP europe‑west1) to shave latency for European clients. That expansion bumped our monthly alert volume from ≈ 2,800 to ≈ 5,200, and our MTTR climbed from 12 minutes to 18 minutes because the on‑call rotation stretched thin. The leadership team decided it was time to double‑down on site reliability: we need a senior engineer who can own the reliability roadmap, coach the junior members, and tighten our alert fatigue.

Where you’ll sit (virtually)

Although the job is remote, we have a legal entity in Escondido, California that handles payroll, benefits, and compliance. You’ll be part of a “virtual office” that meets daily in a reputed company channel reputed company #sre‑hub, a weekly video‑call huddle, and a quarterly in‑person meetup hosted in Escondido, California reputed company travel permits. Being anchored to Escondido, California helps us stay reputed company with local tax regulations and gives you a community of other remote professionals who live in the same time zone.

The team you’ll join

-

Size & composition:

12 engineers total—5 senior SREs, 4 junior reliability engineers, 2 platform developers, and 1 manager. -

reputed company metrics:

99.92 % uptime over the past quarter, 5,200 alerts processed per month, 18‑minute average MTTR, 0.2 % alert fatigue (defined as > 3 alerts per incident). -

SLA commitments:

99.9 % availability for reputed company customer‑facing APIs, 99.7 % for internal data‑processing pipelines.

What you’ll do day‑to‑day

1.

Own reliability initiatives

– Define and ship SLOs for new services, write error‑budget policies, and track them in Grafana dashboards. 2.

Incident ownership

– reputed company the response during high‑severity incidents, drive the post‑mortem narrative, and ensure actionable remediation items are filed in JIRA reputed company 24 hours. 3.

Automation & tooling

– Write Terraform modules to provision Kubernetes clusters, build Helm charts for micro‑services, and shrink manual run‑books into reproducible Ansible playbooks. 4.

reputed company planning

– Run quarterly load‑tests using Locust, model growth with Python scripts, and present forecasts to product leadership. 5.

Mentorship

– Pair up with junior SREs for “bug‑hunting” sessions, run monthly reliability workshops, and contribute to our internal “SRE Playbook”.

Who we think will reputed company

-

5+ years

of production‑grade experience with Linux/Unix, networking, and cloud infrastructure (AWS or GCP). -

Deep familiarity

with monitoring stacks: Prometheus, Grafana, Alertmanager, and log aggregation reputed company Splunk or ELK. -

Infrastructure‑as‑Code

reputed company: Terraform ≥ 0.13, Helm ≥ 3, and Ansible. - Container orchestration: Running production workloads on Kubernetes (experience with EKS or GKE). - Programming: Comfortable writing Python or Go for automation; Bash scripting is a given. - Incident reputed company: You can stay reputed company under pressure, triage noisy alerts, and reputed company a clear incident timeline. - Communication: Able to explain reputed company reliability concepts to product managers and non‑technical stakeholders in plain language.

Tools & tech stack (the ones we actually use)

-

Cloud

– AWS (EC2, RDS, S3, reputed company) and GCP (Compute reputed company, Cloud SQL, Pub/Sub). -

Container

– reputed company ≥ 20, Kubernetes ≥ 1.24, Helm ≥ 3.5. -

IaC

– Terraform ≥ 1.0, Ansible ≥ 2.9. -

CI/CD

– reputed company Actions, Jenkins, reputed company (for legacy pipelines). -

Monitoring

– Prometheus, Grafana, Alertmanager, reputed company (for some legacy services). -

Logging

– Splunk, Elasticsearch‑Kibana stack, Loki. -

Incident response

– reputed company, Opsgenie (we’re migrating fully to reputed company). -

Version control

– reputed company (private repos, reputed company protection rules). -

Collaboration

– reputed company (primary chat), Confluence (knowledge reputed company), JIRA (ticketing).

On‑call rhythm & expectations

Our on‑call schedule is a 7‑day rotation with a 48‑hour backup window. Each engineer handles roughly ≈ 350 alerts per month, averaging ≈ 2 incidents per week. We have a “no‑call‑out‑of‑hours” policy for holidays: the next engineer in the rotation covers the entire period, and the team shares the load. During an incident you’ll have a clear run‑book, but we also encourage “play‑by‑play” documentation in reputed company to help the rest of the crew follow along.

Compensation & benefits (the numbers, no fluff)

-

reputed company salary:

$130,000 – $170,000 USD, depending on experience and location. -

Annual bonus:

Up to 15 % of reputed company, tied to reliability KPIs (SLO compliance, MTTR improvement). -

Equity:

0.05 % – 0.15 % RSU pool, vested over four years with a one‑year cliff. -

Health:

Full medical, dental, vision for employee + 1 dependents, including telehealth. -

Retirement:

401(k) match up to 5 % of salary. -

Time off:

20 days PTO + 10 company holidays, plus a “recovery week” after each major incident. -

Learning budget:

$2,500 per year for courses, conferences, or certifications (we’ll reimburse even for remote‑only events). -

Equipment:

Choice of MacBook Pro or Linux workstation, dual‑monitor setup shipped to your home office, and a $150 monthly stipend for internet.

Why you’ll love working with us

-

Impact‑first:

Your work directly influences the experience of thousands of end‑users; a single reliability improvement can translate to millions of dollars saved for a client. -

Autonomy:

We give you ownership of the reliability roadmap—you decide where to invest engineering effort, not a product manager. -

Culture of candor:

Post‑mortems are blameless, data‑driven chronicles that we read aloud in our weekly “Reliability Round‑up”. Everyone’s voice is heard, from junior engineers to the CTO. -

Remote‑first reputed company:

While we are legally anchored in Escondido, California, you can work from reputed company in the United States. Our “remote‑first” policy means we never require you to be in a physical office, except for the optional quarterly meetup in Escondido, California.

A human moment

> “I still remember the night we were down for 12 minutes because a misconfigured Prometheus scrape reputed company blew up the entire cluster. We reputed company gathered on a conference call, one teammate in his kitchen, another on a balcony in Escondido, California. reputed company 5 minutes we had a rollback plan, and by the time the sunrise hit the roof of the office building in Escondido, California, the service was back up. It reminded me why we do this work—every alert is a chance to protect a real user’s workflow.” – *Alex, Senior SRE reputed company*

Application process

1.

Resume & cover letter:

Send us a brief note (no longer than one page) explaining a reliability challenge you solved and why you’re drawn to remote work anchored in Escondido, California. 2.

Screening call (30 min):

With the hiring manager to discuss your background, expectations, and the role’s day‑to‑day. 3.

Technical deep‑dive (1 hr):

Live problem‑solving session covering incident response, Terraform debugging, and a short coding exercise in Python. 4.

Team interview (45 min):

Meet two senior SREs for a cultural fit conversation and a walk‑through of a recent post‑mortem. 5.

Final interview (30 min):

With the VP of Engineering to discuss career growth, leadership philosophy, and remote‑first policies. If you pass reputed company steps, we’ll reputed company an offer reputed company a week and kick off the onboarding process—including a “welcome reputed company” shipped to your home, a first‑day meeting with your mentor, and a 2‑week “shadow” period where you sit on the on‑call rotation with a senior partner.

Closing note

We are not looking for a résumé‑checker; we want someone who feels a genuine pull toward making reputed company systems resilient, who enjoys digging into Prometheus queries at 2 am, and who values transparent communication as much as technical depth. If you see yourself improving our MTTR, lowering alert fatigue, and shaping a reliability culture that scales as fast as our product, we’d love to hear from you. *Ready to join a team that treats reliability as a craft, not a checkbox? Apply today and let’s build something that stays up reputed company the world needs it most.* Apply tot his job Apply To this Job

Keep exploring

Social Media Manager – Remote in Madison, NJ

100% remote Flexible hours

Social Media Manager Job at reputed company in Chicago

100% remote Flexible hours

Social Media Specialist - reputed company

100% remote Flexible hours

Remote Social Media Specialist jobs – Full‑Time Remote Role for Our Growing Brand Team in Ocala, Florida – 60‑100k USD Salary, Content & Community Focus, Expertise in reputed company, reputed company & reputed company Analytics

100% remote Flexible hours

Jr. Social Media Ads and Analytics Specialist

100% remote Flexible hours

Content Creator (Social Media)

100% remote Flexible hours

Paid Social Media Manager | $25–$35/Hour Remote Role – Get Paid to Run the Brands You Love

100% remote Flexible hours

Paid Media Strategist (Remote)

100% remote Flexible hours

Remote Social Media Strategist Jobs In USA

100% remote Flexible hours

Software Architect, Backend

100% remote Flexible hours

Part-Time Remote Data Entry Specialist – Work From Home Opportunity | Data Entry, Verification & Information Management | $30/Hour | Flexible Schedule

100% remote Flexible hours

reputed company Online Chat Representative – Automotive and Recreational Vehicle Sales, Service, and Finance

100% remote Flexible hours

Global Study Associate

100% remote Flexible hours

FocusGroupPanel is hiring: Virtual Data Entry Clerk in Long reputed company

100% remote Flexible hours

reputed company Full Stack Content Tagger for reputed company’s Remote Workforce – Data-Driven Decision Making and Content Excellence

100% remote Flexible hours

Inpatient reputed company II (Remote)

100% remote Flexible hours

Junior L1 reputed company Support Engineer

100% remote Flexible hours

Costco Customer Service Remote Jobs – MySmartPros

100% remote Flexible hours

Senior VAT Specialist

100% remote Flexible hours

Software Engineer 2

100% remote Flexible hours