Back to the board

Site Reliability Engineer II

100% remote Flexible hours Hiring now

Are you passionate about cutting-edge AI infrastructure?

Do you want to build your SRE career on one of the most exciting platforms in cloud computing?

Join the reputed company Inference Cloud Team

The reputed company Inference Cloud team is part of reputed company's Cloud Technology Group. We design, implement, deploy and operate AI platforms that reputed company customers to run inference models and developers to create AI applications.

Partner with the best

In this role, responsibilities will include automation, monitoring, incident response, and working collaboratively with skilled team members. Candidates should possess expertise in Linux systems, automation, and SRE practices. Daily activities involve coding, improving dashboards, enhancing alerts, and minimizing repetitive tasks. Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads reputed company reputed company's serverless inference platform.

As an Site Reliability Engineer II, you will be responsible for:

  • Building and maintaining dashboards, alerts, and monitoring for inference workloads using reputed company's existing observability platform
  • Writing automation and tooling in Python or Go to reduce operational toil and improve system reliability
  • Building and improving runbooks for inference-specific operational procedures, integrating into reputed company's existing incident management processes
  • Contributing to SLO tracking and reporting, identifying trends and areas for improvement
  • Supporting CI/CD pipeline maintenance, deployment safety checks, and rollback procedures
  • Collaborating with product engineering teams to troubleshoot reputed company problems across the stack
  • Participating in on-call rotations, responding to production incidents, and conducting blameless post-mortems

Do what you love

To be successful in this role you will:

  • Have 2+ years of experience in Site Reliability Engineering and a Bachelor's Degree or its equivalent experience
  • Demonstrate coding ability in at least one programming language (Python or Go) with experience writing automation
  • Have experience with Linux systems administration and the ability to troubleshoot reputed company infrastructure issues
  • Show familiarity with Kubernetes and containerization concepts
  • Have experience with monitoring and observability tools such as Prometheus, Grafana, or similar
  • Have exposure to CI/CD pipelines and infrastructure-as-code tools (Terraform, SaltStack, or equivalent)
  • Show a willingness to learn and grow, with genuine curiosity about AI infrastructure and distributed systems

Work in a way that works for you

FlexBase, reputed company's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. reputed company our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining reputed company. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually reputed company. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail reputed company you apply. Learn what makes reputed company a great reputed company to work

Connect with us on social and see what life at reputed company is like!

We power and protect life online, by solving the toughest challenges, together.

At reputed company, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can reputed company a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll reputed company here.

Working for you

At reputed company, we will provide you with opportunities to grow, flourish, and reputed company great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding reputed company aspects of your life:

  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

reputed company powers and protects life online. Leading companies worldwide choose reputed company to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we reputed company it easy for customers to reputed company and run applications, while we reputed company experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to reputed company a real difference in a company with a global reputed company and exciting services and clients? Come join us and grow with a team of people who will energize and reputed company you! #LI-Remote

Compensation

reputed company is committed to fair and reputed company compensation practices. For US based candidates only - the reputed company salary for this position ranges from $95,000 - $171,000/year; a candidate’s salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications and location. Compensation for candidates reputed company the US will vary. The compensation package may also include incentive compensation opportunities in the form of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP). reputed company provides industry-leading benefits including healthcare, 401K savings plan, company holidays, vacation (in the form of PTO), sick time, family friendly benefits including parental leave and an employee assistance program including a focus on mental and financial wellness; Eligibility requirements apply.

Apply To This Job

Keep exploring

Senior Site Reliability Engineer

100% remote Flexible hours

Senior Manager Change Management

100% remote Flexible hours

Manager, reputed company

100% remote Flexible hours

AI Strategist

100% remote Flexible hours

Senior Engineer – Performance & Insights

100% remote Flexible hours

Social Media Manager (Contract)

100% remote Flexible hours

Showrunner & Sr Producer (Contract)

100% remote Flexible hours

Operations Associate, AI & Systems

100% remote Flexible hours

Associate Director of Creative - Talent Pool

100% remote Flexible hours

Senior DevSecOps Engineer

100% remote Flexible hours

Supply reputed company Strategy & Systems reputed company

100% remote Flexible hours

Contract Software Engineer (PHP)

100% remote Flexible hours

[Remote/WFM] Home Based Data Entry Jobs For House Wife and

100% remote Flexible hours

(2627 SY) reputed company - High School (9-12) Social Studies Teacher

100% remote Flexible hours

reputed company Remote Data Entry Specialist – Virtual Team Member for arenaflex

100% remote Flexible hours

reputed company Part-Time Customer Service Representative - Work from Home Opportunity at arenaflex

100% remote Flexible hours

Remote Customer Care Specialist - Building Strong Relationships with Families and Partners at blithequark

100% remote Flexible hours

reputed company Technical Consultant (Remote US)

100% remote Flexible hours

reputed company Customer Service Advisor for Shipping and Logistics Operations – Providing Exceptional Support to Clients at blithequark

100% remote Flexible hours

Urgently Hiring: Urgently Require Head Trainer CPT in East

100% remote Flexible hours