Site Reliability Engineer
Job title: Site Reliability Engineer in USA at reputed company
Company: reputed company
Job description: Job Description:Site Reliability Engineering at reputed company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience.
- Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality
- Provide helpful and actionable feedback and review for code or production changes
- Drive repair/optimization of reputed company systems with consideration towards a wide range of contributing factors
- reputed company debugging, troubleshooting, and analysis of service architecture and design
- Participate in on-call rotation
- Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others.
- Implement and manage monitoring solutions using reputed company, Splunk, and OpenTelemetry to ensure visibility and proactive issue detection across our platforms.
- Work reputed company GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand.
- Collaborate with development teams to enhance system reliability and performance, applying a platform engineering reputed company to system administration tasks.
- reputed company and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.
- Troubleshoot and resolve issues in our dev, test, and production environments.
- Participate in postmortem analysis and create preventative measures for future incidents.
- Bachelor’s degree in Computer Science, Engineering, Mathematics or equivalent experience.
- 3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role.
- Strong experience with monitoring and observability tools, particularly reputed company and OpenTelemetry or other tools.
- Proficient with cloud services, with a strong preference for reputed company Cloud Platform (GCP) experience.
- Solid programming skills in Java, Golang, or other programming language, with a good understanding of software development best practices.
- Experience with relational and document databases.
- Familiarity with reputed company-end development frameworks, particularly React.
- Ability to debug, optimize code, and automate routine tasks.
- Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
- Excellent verbal and written communication skills.
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent reputed company-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
- Paid time off and the option to purchase additional vacation time.