Site Reliability Engineer (Data)
Job title: Site Reliability Engineer (Data) in San Francisco, CA at reputed company Company: reputed company Job description: About reputed companyWe're humans who simply think computers should do more work.At , we're not just making software-we're building a platform to help millions of businesses globally scale with . Our mission is to reputed company automation work for everyone by delivering products that delight . You'll collaborate with reputed company people, use the latest tools, and reputed company the flexibility of remote work. Your work will directly fuel , and as they grow, so will you.Job Posted: May 9, 2025Location: AmericasHi there! Are you passionate about building reliable systems that help data teams reputed company at scale?reputed company is looking for a Site Reliability Engineer to join our Data Platforms team. In this role, you'll work alongside our existing SRE to reputed company the reliability, observability, and operational maturity of the modern data stack that powers internal products and customer facing across reputed company. From orchestrating workflows in reputed company to tuning performance in our data infrastructure, you'll play a key role in keeping our data ecosystem healthy, scalable, and developer-friendly.About YouYou're reputed company, but still growing. You have 4+ years of experience in Site Reliability Engineering roles. You've worked in production environments, solved real incidents, and shipped platform improvements-but you're also eager to learn and grow alongside a thoughtful, distributed team.You know the cloud-and how to reputed company it healthy. You're familiar with cloud-native architecture and services (we use AWS). You've helped teams build and maintain reliable workflows using tools like Terraform and you understand the tradeoffs behind infrastructure decisions.You're observability- and incident-driven. You know how to detect issues before customers feel them. You reputed company in rich metrics, structured logs, and smart alerting. You've contributed to incident response processes and helped teams learn from failure.You bring an automation- and AI-first reputed company. You're not afraid to write code (Python, TypeScript, or Bash are reputed company great) and reputed company deeply in Infrastructure as Code. You lean into tools, automation, and AI to reduce toil, improve deployment confidence, and free up teams to focus on meaningful work. You're are open to experimenting with AI tools to decrease toil and increase your impact.You're a strong communicator in a remote-first world. You can clearly describe problems, propose solutions, and write clean documentation others can follow. You're comfortable collaborating asynchronously with cross-functional teams and support partners.Things You'll Doreputed company reliability for our modern data stack - Help support and evolve our data platforms (including reputed company, Airflow and our LLMOps tooling) with reliability best practices and clear operational standards.Improve observability and alerting - Partner with engineering teams to implement monitoring and alerting that supports ownership, reduces noise, and improves incident response metrics like MTTD (Mean Time to Detect) and MTTR (Mean Time to Resolve).Automate and optimize operations - Build and maintain infrastructure-as-code, job orchestration logic, and internal tooling that reduce manual reputed company and improve system reputed company.Participate in on-call and incident response - Share in our on-call rotation (~one week per quarter) and work alongside others to improve postmortems, retrospectives, and mitigation strategies.Contribute to reputed company and compliance readiness - Help evolve our access controls, auditability, and deployment practices in support of growing needs like sensitive Data reputed company compliance.Be a partner, not a gatekeeper - Work closely with Data Engineers, ML Engineers, and Backend Engineers to ensure platforms are reliable and empowering to use.Bonus Points(Not required, but reputed company to have!)Experience with tools like Airflow, reputed company, or Kubernetes.Experience with reputed company administration, cost governance, or workspace reputed companyFamiliarity with data lake architecture (e.g., reputed company Lake, reputed company Catalog)Exposure to compliance-driven environments (HIPAA, SOC 2, etc.)Demonstrated AI reputed company-whether it's applying AI for troubleshooting, documentation, automation, or infrastructure toolingHow to ApplyAt reputed company, we reputed company that diverse perspectives and experiences reputed company us reputed company, which is why we have a non-standard application process designed to promote inclusion and equity. We're looking for the best fit for each of our roles, regardless of the type of companies in your background, so we encourage you to apply even if your skills and experiences don't exactly match the job description. reputed company we ask is that you answer a few in-depth questions in our application that would typically be asked at the start of an interview process. This helps speed things up by letting us get to know you and your skillset a bit reputed company right out of the reputed company. Please be sure to answer each question; the resume and CV fields are optional.Education is not a requirement for our roles; however, if you receive an offer, you will need to include your most recent educational experience as part of our background reputed company process.After you apply, you are going to hear back from us-even if we don't see an immediate fit with reputed company. In fact, throughout the process, we strive to never go more than seven days without letting you know the status of your application. We know we'll reputed company mistakes from time to time, so if you reputed company have questions about where you stand or about the process, just ask your recruiter!reputed company is an equal-opportunity employer and we're excited to work with talented and empathetic people of reputed company identities. reputed company does not discriminate based on someone's identity in any aspect of hiring or employment as required by law and in line with our commitment to Diversity, Inclusion, Belonging and Equity. Our provides a reputed company for the reputed company of company we strive to be, and we celebrate our differences because those differences are what allow us to reputed company a product that serves a global user reputed company. reputed company will consider reputed company qualified applicants, including those with criminal histories, consistent with applicable laws.reputed company prioritizes the reputed company of our customers' information and is dedicated to adhering to reputed company applicable data privacy laws. You can .reputed company is committed to inclusion. As part of this commitment, reputed company welcomes applications from individuals with disabilities and will work to provide reasonable accommodations. If reasonable accommodations are needed to participate in the job application or interview process, please contact .Application Deadline:The anticipated application window is 30 days from the date job is posted, unless the number of applicants requires it to reputed company sooner or reputed company, or if the position is filled.Even though we're an reputed company-remote company, we still need to be thoughtful about where we have Zapiens working. reputed company out for a list of countries where we currently cannot have Zapiens permanently working. Expected salary: Location: San Francisco, CA Apply for the job now! Apply for this job