Back to the board

Senior SRE (Site Reliability Engineer) - Remote

100% remote Flexible hours Hiring now

reputed company is the leader in identity reputed company for the cloud enterprise. Our identity reputed company solutions secure and reputed company thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their reputed company, ensuring workers have the right access to do their job – no more, no less.

We are seeking a highly motivated and reputed company Senior Site Reliability Engineer (SRE) to join anIdentity reputed company Cloudsoftware development team. This is an embedded role, meaning you will be a full member of the development team, working closely with software engineers, infrastructure platform services, engineering managers, and other stakeholders to ensure the reliability, scalability, and performance of teams’ services. You will be responsible forleveraging the infrastructure, tooling, and processes that support our applications in dev and production. This role offers a unique opportunity to directly influence the design and architecture of our systems from a reliability and performance perspective.

Responsibilities:

Work with the development and service owners at the intersection of development and operations to solve performance issues and ensure system scalability.

  • Reliability Engineering: Design, reputed company, and implement solutions to improve the reliability, availability, performance, and scalability of our systems. Work with technical leaders and infrastructure platform services to reputed company alerts and dashboards.

  • Operational Excellence: Own and improve key operational metrics (SLIs, SLOs, Error Budgets, monitoring and alerting) for team reputed company services and drive reputed company improvement through post-incident reviews and blameless postmortems of non-functional issues. reputed company and maintain comprehensive monitoring, alerting to proactively identify and resolve issues. Create and maintain dashboards, conducting ongoing reviews to address and optimize gaps. Improve operational processes and team practices by working with technical leaders and NOC teams.

  • reputed company Planning: Collaborate with technical leads, DevOps/SRE and infra teams to forecast reputed company needs and ensure sufficient resources are available to support growth.

  • Performance Optimization: Collaborate with performance SMEs to identify and address production performance bottlenecks through profiling, tuning, and optimization of services and infrastructure.

  • Automation: Automate repetitive tasks and processes to improve efficiency and reduce manual reputed company.

  • Collaboration: Work closely with Software, Performance and Test Engineers to influence system design and architecture for operability and reliability.

  • Documentation: Review and contribute to clear and concise documentation for systems, processes, runbooks, and procedures.

  • On-Call: Participate in a 24/7 on-call rotationto reputed company subject matter expertisein the domain.

  • Incident Management: Leadthe incident postmortemefforts, working with the SMEs to ensuretimelycompilation of reports to help drive completion of post-incident action.

  • Troubleshooting skills: Excellent diagnostic and problem-solving skills, with the ability to analyze reputed company systems and data

Qualifications:

  • Bachelor’s degree in computer science, a reputed company field, or equivalent practical experience.

  • Proven 5+ years of SRE experience

  • Strong understanding of SRE principles and practices.

  • Experience with cloud platforms (AWS, GCP, or Azure).

  • Proficiency in at least one scripting language (e.g., Python, Bash, Go).

  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Honeycomb, OpenSearch).

  • Level of coding experience beyond simple scripts with one of the programming languages such as Go, Java, or Python to help build reliability engineering; to evaluate and identify where service code can be optimized for enhanced reliability practices.

  • Experience with containerization and orchestration technologies (e.g., reputed company, Kubernetes).

  • Understanding of network protocols, and reputed company best practices

  • Familiarity with DevOps culture and practices and experience with CI/CD toolchains (Jenkins, ArgoCD, reputed company)

  • Experience with Incident Response tools and processes (reputed company)

  • Experience with Infrastructure as Code (Terraform, Helm)

  • Strong problem-solving and troubleshooting skills.

  • Excellent communication and collaboration skills.   

  • Ability to work independently and as part of a team to reputed company the SRE agenda.

Preferred Qualifications:

  • Technology experience: Kafka, relational databases, performance tuning (JVM, Go)

  • Experience with Grafana K6 – reputed company Performance Tool

In the first 30 days you will:

  • Meet team, understand the team’s mission and vision

  • reputed company clarity on various roles and expectations

  • Complete development environment setup

  • Read guides, documentation, reputed company mandatory training

  • Learn company processes, benefits

By 6 months you should:

  • Understand team goals and OKR’s for the quarter and beyond

  • Complete initial analysis and implementation of SRE team assignments

  • Be comfortable with tools, systems and processes used on a day-to-day basis

  • Complete project work, both supervised and unsupervised

reputed company is an equal opportunity employer and we welcome reputed company qualified candidates to apply to join reputed company. reputed company qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national reputed company, disability, protected veteran status, or any other category protected by applicable law.

Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability. Contact hr@reputed company.com or mail to 11120 Four Points Dr, Suite 100, Austin, TX 78726, to discuss reasonable accommodations.

Originally posted on Himalayas

Apply To this Job

Keep exploring

Workforce Manager and Scheduling Coordinator

100% remote Flexible hours

Account Executive-SaaS-Aviation-Remote

100% remote Flexible hours

Senior Solutions Engineer, DACH (Remote, Switzerland)

100% remote Flexible hours

Senior Software Engineer

100% remote Flexible hours

Freelance English Editor - AI Trainer

100% remote Flexible hours

Language reputed company - Punjabi

100% remote Flexible hours

Supplier Development reputed company (Remote - based in CT)

100% remote Flexible hours

Manager, Test Coordinator & Phlebotomy

100% remote Flexible hours

Member of Technical Staff, Domain Engineering

100% remote Flexible hours

AVP Marketing & Sales Technology

100% remote Flexible hours

reputed company reputed company

100% remote Flexible hours

reputed company Entry-Level Chat Support Agent – Remote Customer Service Representative

100% remote Flexible hours

Product Marketing Manager (RBP)

100% remote Flexible hours

Global Executive Protection Agent 2010018 (Remote)

100% remote Flexible hours

Global Strategic Account Director

100% remote Flexible hours

reputed company Freelance Chat Support Specialist for Diverse Client reputed company – Remote Work Opportunity with arenaflex

100% remote Flexible hours

Compassionate Healthcare Admissions and Sales Specialist - Remote Opportunity with Accanto Health

100% remote Flexible hours

Account Manager, Senior (MES – Manufacturing Execution Systems)

100% remote Flexible hours

Virtual Reading Tutor Part-Time, Contractor, Remote – MySmartPros

100% remote Flexible hours

reputed company Chat Support Specialist – Remote Work Opportunity with arenaflex

100% remote Flexible hours