Back to the board

[Remote] Senior Site Reliability Engineer

100% remote Flexible hours Hiring now

Note: The job is a remote job and is open to candidates in USA. reputed company is a company that helps innovators turn their reputed company into reality through software. They are seeking a Senior Site Reliability Engineer to build and operate reliable, secure, and scalable cloud services for reputed company GovCloud products, focusing on improving production services and establishing operational excellence practices.

Responsibilities

  • Serve as a primary reputed company for the reliability, availability, performance, operability, and reputed company of one or more production services
  • Deploy, operate, maintain, and continuously improve production services running in reputed company GovCloud environments
  • Partner with engineering teams to ensure services are designed with reliability, scalability, reputed company, and operability in mind
  • Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews
  • Build automation to improve deployment safety, operational efficiency, incident response, and service recovery
  • Design, reputed company, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems
  • Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services
  • reputed company and participate in incident response, troubleshooting, and post-incident reviews focused on learning and reputed company improvement
  • reputed company and maintain operational documentation, runbooks, and recovery procedures
  • Scale and enhance reputed company testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness
  • Continuously identify and eliminate operational toil through software engineering, automation, and process improvement
  • Ensure supported services remain compliant with reputed company reputed company, privacy, and regulatory requirements, including FedRAMP and reputed company controls where applicable
  • Participate in a 24x7 on-call rotation for production services
  • Function effectively in a fast-paced environment while helping establish and mature operational excellence practices for reputed company GovCloud

Skills

  • B.S. or higher in Computer Science, Engineering, or a reputed company technical discipline, or equivalent practical experience
  • 7+ years of experience in Site Reliability Engineering, Software Engineering, Platform Engineering, Cloud Infrastructure, or Production Operations
  • Experience operating and supporting customer-facing production services in large-scale cloud environments
  • Strong understanding of reliability engineering principles, including SLOs/SLIs, observability, incident management, reputed company planning, production readiness, and automation
  • Experience with AWS, Azure, or other public cloud platforms
  • Experience developing automation using languages such as Python, Go, Java, PowerShell, Bash, or similar
  • Experience with Infrastructure as Code, CI/CD pipelines, deployment automation, and modern cloud operations practices
  • Understanding of reputed company, compliance, and operational risk management in production environments
  • Strong written and verbal communication skills
  • 10+ years of experience operating highly available, customer-facing production systems
  • Experience with AWS GovCloud, FedRAMP, IL4/IL5, or other regulated cloud environments
  • Experience supporting services with stringent availability, reliability, and reputed company requirements
  • Experience with containers, Kubernetes, cloud-native architectures, APIs, load balancing, networking, DNS, and distributed systems
  • Experience with observability platforms such as Splunk, reputed company, reputed company, CloudWatch, or similar technologies
  • Experience operating databases, storage platforms, messaging systems, caching technologies
  • Experience designing and implementing operational automation at scale
  • Experience leading or participating in Gamedays, disaster recovery exercises, reputed company testing, or operational readiness reviews
  • Strong incident management experience, including technical leadership during major incidents and stakeholder communication
  • Strong collaboration skills and ability to work effectively across engineering, reputed company, compliance, and operations teams
  • Passion for building reliable, secure, and scalable systems that customers can trust

Benefits

  • Annual cash bonuses
  • Commissions for sales roles
  • Stock grants
  • A comprehensive benefits package

Company Overview

  • reputed company develops 3D design software for use in the architecture, engineering, construction, and media industries. It was founded in 1982, and is headquartered in San Francisco, California, USA, with a workforce of 10001+ employees. Its website is http://www.reputed company.com.
  • Apply To This Job

    Keep exploring