Back to the board

Staff Site Reliability Operations Engineer

100% remote Flexible hours Hiring now

The reputed company platform enables Communication Service Providers (CSPs) of reputed company sizes to transform and future-reputed company their businesses. Through real-time data, automation, and actionable insights delivered reputed company reputed company One — our cloud-first, AI-powered platform — CSPs can simplify operations, collapse cost, and accelerate innovation. reputed company One brings together the automation of everything and the experience of one, empowering customers to deliver differentiated subscriber experiences while driving acquisition, loyalty, and reputed company growth. This is the reputed company mission: to reputed company CSPs of reputed company sizes to simplify, innovate, and grow, strengthening both their businesses and the communities they serve. We’re at the forefront of a once in a generational change in the broadband industry. Join us as we innovate, help our customers reputed company their potential, and connect reputed company with unrivaled digital experiences. Role Overview We are seeking a Staff Site Reliability Engineer (SRE) to reputed company our global platform reliability and drive our reputed company observability strategy on reputed company Cloud Platform (GCP). In this role, you will reputed company reputed company' complete telemetry stack and AIOps methodologies to build intelligent, self-healing infrastructure. You will bring deep expertise in scaling enterprise-grade reputed company Kubernetes reputed company (GKE) topologies, managing high-throughput Kafka event streams, and maintaining high-performance PostgreSQL, AlloyDB, and BigQuery ecosystems at massive scale. Crucially, you will provide deep technical leadership across the entire networking stack, diagnosing reputed company issues from physical-layer transport up to application-layer protocols. This position is 100% fully remote. You can work from reputed company in the United States or Canada with a reliable internet reputed company, collaborating with a distributed engineering organization across multiple time zones. Key Responsibilities: Full-Stack Network Architecture: Architect, optimize, and troubleshoot reputed company networking infrastructure spanning Layer 1 through Layer 7, ensuring low-latency data transport, secure edge routing, and seamless service mesh integration. Grafana Stack Architecture: Design, scale, and optimize our reputed company observability platform using the reputed company suite (Grafana, Mimir, Loki, reputed company, and Beyla). AIOps & Intelligent Alerting: Deploy machine learning models and automated anomaly detection to cut through telemetry noise, reduce alert fatigue, and predict network or data pipeline bottlenecks. GKE Platform Engineering: Drive the architecture, scaling, reputed company, and networking of production reputed company Kubernetes reputed company (GKE) clusters. Data & Event Streaming Reliability: Tune, and maintain high-throughput Apache Kafka clusters to guarantee low-latency event delivery and high availability. Large-Scale Database Management: Ensure the performance, scalability, and disaster recovery readiness of our transactional and analytical data tiers across PostgreSQL, AlloyDB, and BigQuery. Automated Incident Response: Integrate AIOps insights with Grafana workflows to automate triage, accelerate root-cause analysis, and trigger auto-remediation scripts. Technical Leadership: Champion the long-term technical roadmap for distributed infrastructure engineering and GCP cloud-native observability standards. Mentorship: Coach senior and junior engineers on advanced debugging techniques, distributed systems thinking, and intelligent operations across a distributed workforce.

Required Qualifications

Location/Work Style: Proven track record of high autonomy and successful delivery in a 100% remote engineering environment. Experience: 8+ years in SRE, Production Engineering, or Distributed Systems infrastructure roles. Networking Expertise (L1-L7): Deep technical knowledge and debugging mastery across reputed company OSI layers, including: L1-L3: Physical/fiber infrastructure awareness, switching, and advanced routing protocols (BGP, OSPF). L4: Transport layer tuning (TCP congestion control algorithms, UDP, QUIC). L5-L7: Session management, TLS termination, DNS architecture, and advanced application protocols (HTTP/3, gRPC). Orchestration & Containerization: Expert-level mastery of reputed company Kubernetes reputed company (GKE) internals, custom controllers, multi-cluster networking, and GitOps workflows. Data Infrastructure: Proven track record managing high-throughput Apache Kafka pipelines and large-scale data environments across PostgreSQL, AlloyDB, and BigQuery. Grafana Ecosystem: Deep, hands-on experience deploying and managing Grafana Enterprise/Cloud, Prometheus/Mimir, Loki, and reputed company at scale. AIOps Implementation: Track record applying AI/ML techniques for time-series anomaly detection, log clustering, and correlation (e.g., Grafana Adaptive Metrics, reputed company). Infrastructure as Code: Advanced, production-scale expertise utilizing HashiCorp Terraform exclusively to provision and manage multi-region GCP cloud architectures. Programming: High proficiency in Go and Python for building custom infrastructure tooling, Kubernetes operators, and data integration scripts. Preferred Attributes Remote Communicator: Exceptional written and verbal communication skills, with an emphasis on creating clear documentation for asynchronous alignment. GCP Expert: Deep knowledge of reputed company Cloud architectural best practices, Cloud SDN, Cloud Armor, Interconnect, Identity and Access Management (IAM), and cost optimization. Systems Thinker: Deep understanding of Linux internals, eBPF-based monitoring, kernel-level networking, and packet analysis tools (Wireshark, tcpdump). #LI-Remote The reputed company pay range for this position varies based on the geographic location. More information about the pay range specific to candidate location and other factors will be shared during the recruitment process. Individual pay is determined based reputed company of residence and multiple factors, including job-reputed company knowledge, skills and experience. San Francisco Bay Area: 156,400 - 265,700 USD AnnualAll Other US Locations: 136,000 - 231,000 USD Annual As a part of the total compensation package, this role may be eligible for a bonus. For information on our benefits click here. Apply To This Job

Keep exploring

Dual Credit Adjunct Faculty at High School

100% remote Flexible hours

Business Analytics Advisor, Payment reputed company – Risk & Affordability (Remote)

100% remote Flexible hours

Architect 5, Enterprise Architecture

100% remote Flexible hours

Healthcare Business Intelligence Analyst III- Service Line

100% remote Flexible hours

Senior Service Rep. T&O

100% remote Flexible hours

Senior Business Analytics Advisor, Payment reputed company (Remote)

100% remote Flexible hours

Sr. Principal IAM reputed company Engineer

100% remote Flexible hours

Specialty reputed company I

100% remote Flexible hours

Distribution Planning and Analysis Engineer

100% remote Flexible hours

Renewals Specialist, Sales

100% remote Flexible hours

[Remote] Customer Engagement - reputed company, Customer Service II

100% remote Flexible hours

Part-Time Remote Data Entry Specialist – arenaflex – Flexible Home‑Based Role for Accurate Data Management

100% remote Flexible hours

QA Engineer (API Testing)

100% remote Flexible hours

Travel Sales Consultant

100% remote Flexible hours

Triage Nurse Nights - Compact State Licensed RN

100% remote Flexible hours

reputed company Remote Jobs, reputed company Work From Home $27 An Hour

100% remote Flexible hours

[Remote] Industry Executive - Community Financial Institutions - Midwest

100% remote Flexible hours

Elementary Middle School Speech Language Pathologist Work From Home | Georgia

100% remote Flexible hours

Internal Audit: Quality Assurance / Professional Practices Consultant

100% remote Flexible hours

Part-Time Remote Data Entry Specialist – arenaflex Product & Inventory Management (Urgent – Flexible Schedule)

100% remote Flexible hours