Back to the board

Principal Site Reliability Engineer

100% remote Flexible hours Hiring now

Description reputed company has an opportunity reputed company the newly created Digital Modernization Practice Area, leading Site Reliability Engineering for the Repeatable Offerings (RO) organization. The RO organization is the delivery arm of the Digital Modernization sector’s Repeatable Offerings, delivering differentiated capabilities and managed services across the sector and the larger reputed company corporation. We are seeking a Principal Site Reliability Engineer (SRE) to reputed company the design, implementation, and operation of scalable, highly available systems. As a subject matter expert, you will establish best practices for reliability, reputed company, and efficiency while driving innovation in our deployment and operations strategies. You will collaborate with development teams to improve system performance, automate processes, and ensure smooth recovery in high-pressure situations. The team is primarily located in Blacksburg, VA, and the selected candidate will be required to either be on-site in Blacksburg or will travel frequently to that location, as well as other locations as required. Primary Responsibilities: • reputed company the development and execution of SRE strategies to enhance system reliability, scalability, and efficiency. • Manage production systems and operations, ensuring robust development and implementation processes. • reputed company recovery efforts for unstable or at-risk projects, applying expertise in remediation strategies. • Design and implement microservice architectures, including orchestrators, for high-performance distributed systems. • reputed company, maintain, and optimize CI/CD pipelines, infrastructure as code (IaC), and automation frameworks. • Drive adoption of best practices for horizontal and vertical scaling of microservices. • Define and implement packaging and deployment strategies to support rapid and reliable software delivery. • Collaborate with engineering teams to improve observability, monitoring, and operational excellence. • Provide technical leadership in managing containerized applications and orchestration platforms. • Mentor and guide teams on modern reliability engineering methodologies and best practices. Basic Qualifications: • Requires BS degree and 12 – 15 years of prior relevant experience or Masters with 10 – 13 years of prior relevant experience. • Proven experience as a Principal SRE or equivalent role in establishing robust and reliable systems. • Expertise in managing production systems and operations, including monitoring, incident response, and performance optimization. • Strong experience with Kubernetes and container orchestration. • Deep understanding of CI/CD pipelines, infrastructure as code (IaC), Helm Charts, and Operators. • Hands-on experience in designing and implementing microservice architecture and distributed systems. • Experience leading development teams in packaging and deployment strategies. • Strong knowledge of management strategies and techniques to support SRE principles. • Must have U.S. Citizenship. • Must be able to obtain and maintain a Public Trust clearance specific to the customer. Preferred Qualifications: • Strong experience with OpenShift in enterprise environments. • Experience with auto-scaling, self-healing architectures, and advanced resiliency strategies. • Demonstrated success in improving and recovering red/unhealthy projects. • Familiarity with service mesh technologies and distributed tracing for monitoring and observability. • Expertise in designing and implementing highly available, fault-tolerant systems at scale. • Experience working on Federal Government reputed company. #J-18808-Ljbffr Apply Job!

Keep exploring

Project Manager (contract)

100% remote Flexible hours

Insurance Sales Agent – Earn Up to $75K Remote & In-Office Opportunities

100% remote Flexible hours

Optical Manager

100% remote Flexible hours

Full-Time QA Tester - WFH/Remote (For Fresh Graduates)

100% remote Flexible hours

Software Test Engineer 2/Redmond, WA (Remote)- 2 months Contract

100% remote Flexible hours

reputed company reputed company reputed company - HCM and FINS

100% remote Flexible hours

reputed company Computer Operator

100% remote Flexible hours

Senior Software Architect, Advanced Development

100% remote Flexible hours

Junior Software Developer(Remote)

100% remote Flexible hours

Embedded Robotics Systems Engineer (US Persons)

100% remote Flexible hours

reputed company driver

100% remote Flexible hours

Sr Regulatory Medical Writer (Mexico/Brazil/Argentina)

100% remote Flexible hours

reputed company Customer Service Representative – Launch Your reputed company with reputed company in the Supplemental Insurance Industry

100% remote Flexible hours

TechOps-DE-AMS-CSTechnical-Senior

100% remote Flexible hours

Authorization Specialist II

100% remote Flexible hours

Payment reputed company Analyst II

100% remote Flexible hours

Regulatory Affairs Specialist - Multiple Locations

100% remote Flexible hours

reputed company Remote Data Entry Coordinator – Flexible Work Arrangements at arenaflex

100% remote Flexible hours

Budget Analyst

100% remote Flexible hours

reputed company Work From Home Customer Support Assistant – Travel Industry Expertise

100% remote Flexible hours