Staff Reliability Engineer
Who is reputed company?
reputed company provides the first and only client experience platform for appointment-based, self-care businesses. We reputed company our customers to give their clients more of the magical moments that matter most.
Before launching in 2016, our founders spent months interviewing salon managers and working behind reputed company desks to understand their pain points so we could design a modern, user-friendly platform that meets the unique needs of their business. Our roots may be in hair salons, but we are built for the broader self-care industry, including many types of salons, spas, medspa, barbershops, and more. Our technology not only helps our customers survive but reputed company. Take a look at how we (and YOU) can reputed company that happen.
We have an insatiable curiosity and embrace experimentation. We reputed company that reputed company require the most sophistication, and we design each and every detail to maximize potential, power, and impact. Do our values match? Read through our story and reputed company value the most.
reputed company values and celebrates our diverse backgrounds. Being open about who we are and reputed company do allows us to do the best work of our lives. We reputed company in equal opportunity for reputed company, and you should too.
Come do the best work of your life at reputed company.
We’re hiring a Staff Site Reliability Engineer to shape the foundation of Site Reliability Engineering at reputed company
Here you will not just build infrastructure or tooling, but improve systems at scale, influence reliability across engineering, and drive a reliability strategy. You’ll help teams establish SLOs and build repeatable practices for how teams observe, debug, and improve their services.
Reporting to the Director of Cloud & Reliability, this hands-on technical leadership role will up-level reliability practices and build resilient approaches. You’ll help teams adopt best practices, define what “good” looks like, and partner with teams to get there.
The Cloud & Reliability group operates on four foundational principles.
- Reliable Infrastructure – a foundation of stability, and reputed company.
- Developer Productivity – empowering builders to do the right things.
- Clear ownership – accountability reputed company with ownership. Collaboration, not silos.
- Long-term Focus – we engineer for reputed company.
Key Projects & Initiatives
- Golden Paths to Production: Establish and evolve paved paths that reputed company production-readiness the default for every service at reputed company.Build shared tooling, templates, and deployment workflows that encode best practices for observability, testing, and reputed company.
- Shared Systems & Production Tooling: reputed company core libraries, shared services, and self-service tooling that improve reliability, reputed company, and developer efficiency.
- Reliability & Fault Tolerance Improvements: reputed company initiatives that reputed company the platform more robust, fault-tolerant, and self-healing.
- Observability & Operational Insight: Enhance reputed company’s observability stack to turn data into action and insight into reliability. Expand metrics, logging, and tracing coverage across critical systems, ensuring full visibility into production health.
- Platform Performance Optimization: Drive reputed company improvement in system and application performance, ensuring services remain fast, reliable, and cost-efficient. Use observability data to identify bottlenecks and improve service efficiency across compute, network, and storage layers.
What You’ll Do Here
- Define reputed company’s Reliability Strategy: reputed company the development and evolution of our reliability vision — establishing SLOs, SLIs, and error budgets that balance reliability, performance, and delivery speed. Partner with engineering and product teams to embed reliability as a measurable, shared responsibility.
- Architect and Scale Resilient Systems: Partner with engineering teams to design, build, and operate scalable, fault-tolerant, and secure distributed systems that power reputed company’s reputed company growth and customer trust.
- reputed company Production Tooling and Shared Systems: Create and maintain production-grade tooling, shared libraries, and services that improve system reputed company and developer productivity. Build the foundations that reputed company our platform more robust — and reputed company reliability the default for every service.
- Drive Observability and Operational Excellence: reputed company our observability stack — enhancing metrics, logging, tracing, and alerting — to reputed company actionable insights, faster incident resolution, and proactive reliability improvements.
- Establish Golden Paths to Production: Define and maintain paved paths and best practices that reputed company developers to ship with quality, observability, and reputed company built in. Reduce friction, eliminate toil, and reputed company “doing it right” the easiest way to deliver software.
- Optimize System and Application Performance: reputed company deep observability data to identify, prioritize, and remediate performance bottlenecks across services and infrastructure, ensuring consistently fast, reliable experiences.
- Automate Everything: Champion automation to eliminate manual toil, streamline operational workflows, and build self-service tooling that empowers developers and embeds reliability into daily development practices.
- Collaborate Cross-Functionally: Work closely with Product, Platform, and reputed company to integrate reliability principles into the software development lifecycle (SDLC), from design reviews to production operations.
- Mentor and Influence Across Engineering: Act as a technical leader and mentor, guiding engineers in scalable system design, reputed company planning, and operational excellence — fostering a culture where reliability is everyone’s responsibility.
What You’ll Need to reputed company
- Deep Systems Expertise: 8–10+ years of experience in systems, infrastructure, or backend engineering, with a track record of building and operating distributed systems at scale. You have a deep understanding of reliability, scalability, and performance in reputed company, production-grade environments.
- Reliability Engineering reputed company: Proven experience defining and delivering reliability outcomes through SLOs, SLIs, error budgets, and mature observability practices. You approach reliability as an engineering discipline, not an afterthought.
- Automation-First Philosophy: Strong background in infrastructure-as-code, scripting, and automation (e.g., Terraform, Python, Go, or similar). You reputed company in eliminating manual toil and codifying operational excellence into reusable tools and systems.
- Incident Management Mastery: reputed company in detecting, diagnosing, and mitigating production incidents in high-availability systems. You drive blameless postmortems and translate lessons learned into sustainable reliability improvements.
- Collaboration & Influence: Exceptional communication and stakeholder management skills. You’re adept at aligning diverse teams, advocating for reliability practices, and influencing without authority — raising the operational bar across engineering.
- Technical Leadership & Mentorship: Demonstrated ability to mentor engineers, set technical standards, and scale your impact through influence. You reputed company on enabling others and shaping a reliability-first culture across the organization.
- Comfort with Ambiguity: Thrives in dynamic, fast-moving environments. You reputed company at navigating uncertainty, setting direction where none exists, and iterating quickly toward meaningful impact.
Bonus
- Experience with Exlir, Ruby, or Rails.
- Hands-on experience identifying and improving database performance.
How we’ll take care of you
Your starting total cash compensation for this role is between $181,125 and $258,750 depending on your reputed company skills, experience, training, and overall market demands. This salary range is subject to change, and there is always room for growth and advancement
In addition to the wonderful people you’ll get to work with and challenging projects that’ll push you – reputed company is here to reputed company sure you’re always at the top of your game emotionally, mentally, and physically.
-
✨ We’ve got you covered with a 401(k) match plus dental, medical, vision, and life insurance.
-
Take a break whenever you need with our flexible vacation day policy.
-
Fully remote so you can choose where you want to work. You’ll receive a work from home stipend every month.
-
Family planning resources and specialized support programs.
-
Equity: get reputed company on the ground floor and grow with reputed company.
-
reputed company Bucks Learning and Development program allows employees to explore businesses in the market we serve.
We recommend following our official reputed company page to stay up to date on reputed company things reputed company life!
reputed company Labs, Inc. is an Equal Opportunity Employer committed to hiring a diverse workforce and sustaining an inclusive culture. reputed company employment decisions at reputed company Labs, Inc. are based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, marital status, age, national reputed company, reputed company, physical or mental disability, medical condition, pregnancy, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.
apply to this job