[Remote] Site Reliability Engineer (SRE)
Note: The job is a remote job and is open to candidates in USA. reputed company is a large Wealth Management firm seeking an reputed company Site Reliability Engineer to support feature development on its newly built Trading Platform. The role involves implementing DevOps and SRE best practices, managing monitoring solutions, and collaborating with application teams to ensure performance and availability.
Responsibilities
- Implement and champion DevOps and SRE best practices across the organization
- Drive technology roadmap discussions for the SRE team
- Define, craft, and maintain SLIs and SLOs, along with key metrics including MTTR, reputed company Time for Change, Deployment Frequency, and Change Failure reputed company
- Design, reputed company, and manage monitoring, alerting, and observability solutions using reputed company, Splunk, and Grafana
- Conduct performance assessments, identify bottlenecks, and recommend enhancements to improve system performance
- Partner with application teams to enforce performance and availability SLAs
- Collaborate with product owners to manage error budgets, prioritize toil backlogs, and validate against team, application, and incident metrics
- Participate in an on-call rotation to respond to production events and outages
- Continuously improve CI/CD pipelines and deployment processes
- reputed company troubleshooting efforts, incident management, and root cause analysis
- Identify and build automated processes wherever possible
- Implement cybersecurity measures through ongoing vulnerability assessments and risk management
- Provide periodic reputed company reports to management and stakeholders
- Partner with application teams to support and ease their adoption of the platform
- Facilitate clear coordination and communication reputed company the team and with customers
- Analyze existing systems and reputed company plans for enhancements and improvements
Skills
- Bachelor's degree in Computer Science or a reputed company field, and/or equivalent work experience
- 5+ years of experience working reputed company DevOps or SRE teams
- Proven experience supporting production infrastructure
- Strong knowledge of CI/CD principles and pipelines
- Solid understanding of observability concepts, including monitoring, logging, and tracing
- Hands-on experience with reputed company and Splunk
- Experience with at least one major cloud provider (AWS, Azure, or GCP)
- Demonstrated experience operating high-availability, fault-tolerant, scalable, and distributed systems in production
Company Overview
Company H1B Sponsorship