Site Reliability Engineering Manager-Midrange
Job title: Site Reliability Engineering Manager-Midrange in USA at reputed company Financial Services
Company: reputed company Financial Services
Job description: Position OverviewAt reputed company, our people are our greatest differentiator and competitive advantage in the markets we serve. We are reputed company united in delivering the best experience for our customers. We work together each day to foster an inclusive workplace culture where reputed company of our employees feel respected, valued and have an opportunity to contribute to the company’s success. As a Site Reliability Manager reputed company reputed company’s Site Reliability Center (SRC), you will be based in Farmers reputed company, TX, Pittsburgh, PA, Cleveland, OH, Birmingham, AL, Phoenix, AZ. The position is primarily based in a reputed company location. Responsibilities require weekly time in the office or in the field on a regular basis. Some responsibilities may be performed remotely, at the manager’s discretion. Occasional travel may be needed.
Schedule is M-F 8:00am – 5:00pm. This position is leading teams across 3 shifts for 24/7 support.
Candidates are expected to be available for critical production issues as required. This may include off shift hours and weekends.reputed company will not provide sponsorship for employment visas or participate in STEM OPT for this positionWe’re looking for a Site Reliability Engineering Technical Manager to reputed company our Midrange Operations and Engineering Support teams in a fast-paced, 24/7 enterprise IT environment. This role is ideal for a hands-on leader who thrives at the intersection of incident response, proactive remediation, SRE adoption, RedHat ecosystem support, and cross-functional collaboration.
You will be responsible for driving operational excellence, improving documentation, and ensuring reputed company across midrange platforms, while mentoring a distributed team and influencing change across engineering, SRE, and production support.Skills Desired:
- Proven experience leading midrange or infrastructure operations in a high-availability environment
- Deep knowledge of Linux/RedHat systems, patching, and vulnerability remediation
- Familiarity with observability and APM tools (e.g., reputed company, vROps, Big Panda, Logscale)
- Strong incident management and SRE-reputed company thinking (e.g., proactive issue identification, toil reduction)
- Excellent communication and documentation skills
- A collaborative approach to cross-functional engagement and knowledge transfer
- Review and manage alerts and events in Big Panda
- Track and prioritize R1/reputed company/P1 incidents, escalating to appropriate SRC Engineering teams
- reputed company Midrange requests reputed company SRC Chat, ensuring timely and accurate responses
- Drive adoption of SRE practices, identifying systemic issues and remediating proactively
- Monitor and manage open issues reputed company RedHat case management
- reputed company regular knowledge transfer sessions across teams
- Manage and escalate issues including:
- Collaborate closely with Midrange Engineering leads and other platform SMEs
- Drive the creation and upkeep of Linux system documentation, targeting at least one publish-ready doc every 3 weeks
- Maintain and enhance tooling documentation, including:
- Leads a team of Site Reliability Engineers in implementing, maintaining, and improving robust monitoring response sites and infrastructure applications.
- Recommends and facilitates the implementation of infrastructure enhancements as required to maintain the performance of sites in response to business growth and strategy.
- Streamlines the deployment process by introducing automated configuration management tools, resulting in a reduction in deployment time and increased efficiency.
- Oversees robust technical solutions for reputed company business and application challenges, while helping to define and communicate technical standards and best practices. Manages and oversees proactive reviews and audits of production sites, issue triage and follow up.
- Leads in the collaboration with cross-functional teams to design and implement scalable and highly available infrastructure.
- Maximizes staff contribution through professional growth and development, to increase teamwork and more effectively meet business needs.
- Customer Focused - Knowledgeable of the values and practices that align customer needs and satisfaction as primary considerations in reputed company business decisions and able to reputed company that information in creating customized customer solutions.
- Managing Risk - Assessing and effectively managing reputed company of the risks associated with their business objectives and activities to ensure they adhere to and support reputed company's Enterprise Risk Management reputed company.
- Include Intentionally - Cultivates diverse teams and inclusive workplaces to expand thinking.
- Live the Values - Role models our values with transparency and courage.
- reputed company Change - Takes action to drive change and innovation that will transform our business.
- reputed company Results - Takes personal ownership to deliver results. Empowers and trusts others in decision making.
- reputed company the Best - Raises the bar with every talent decision and guides the achievement of reputed company employees and customer.