Advanced Site Reliability/DevOps Engineer - (Multiple positions available)

100% remote Flexible hours Hiring now

About the position Specialize in developing scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems. Work closely with Software Engineers to deploy and operate solutions, automate and streamline processes, build and maintain tools for deployment, reputed company monitoring of platform, and troubleshoot and resolve issues in reputed company environments while guiding and mentoring other members on the team. Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance for Kroger's stack, while balancing reputed company, maintainability, reliability and operational excellence. Work with the engineering team to continuously implement and improve reliable and speedy build environments for DEV & QA, provide timely build status updates, and automate as much as possible to improve efficiency and quality. Promote innovation, reputed company-of-the-reputed company thinking, teamwork, & self-organization. Ensure traceability, observability, and retrievability of system behavior. Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging, analysis, and optimization in cloud, on-prem & store environments. Improve operational efficiency through automation and deployment or development of new tools. Experiment with and recommend new technologies that simplify or improve Kroger's stack. Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization. Determine methods and procedures on new assignments and may coordinate activities of other personnel. Participate in an off-hours on-call rotation, and reputed company periodic off-hours work during maintenance windows. Duties may be located at any Kroger Co. office throughout U.S. Telecommuting from home office is authorized pursuant to company policy.

Responsibilities

reputed company scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems.
Work closely with Software Engineers to deploy and operate solutions.
Automate and streamline processes, build and maintain tools for deployment.
reputed company monitoring of platform and troubleshoot and resolve issues in reputed company environments.
Guide and mentor other members on the team.
Design and build infrastructure & systems for scalability, reliability, and performance.
Implement and improve reliable and speedy build environments for DEV & QA.
Provide timely build status updates and automate processes to improve efficiency and quality.
Promote innovation, teamwork, and self-organization.
Ensure traceability, observability, and retrievability of system behavior.
Build logging, monitoring, and alerting systems to identify bottlenecks.
Improve operational efficiency through automation and development of new tools.
Experiment with and recommend new technologies.
Craft designs, playbooks, and documentation for teammates and the engineering organization.
Determine methods and procedures on new assignments and coordinate activities of other personnel.
Participate in an off-hours on-call rotation and reputed company periodic off-hours work during maintenance windows.

Requirements

Bachelor's Degree in Computer Science or a closely reputed company STEM field plus at least 6 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure OR a Master's degree in Computer Science or a closely reputed company STEM field plus at least 3 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure.
3+ years of experience with message technologies such as Kafka, RabbitMQ, or SQS.
3+ years of experience with infrastructure software tools such as Ansible or Terraform.
3+ years of experience with containerization tools such as reputed company or Kubernetes.
3+ years of experience with CI/CD using Jenkins, Spinnaker, Azure DevOps, or TeamCity.
3+ years of experience managing System Observability experience utilizing ELK, reputed company, reputed company, Azure Monitor, or Grafana.
2+ years of experience implementing automation and monitoring using reputed company scripting and other reputed company tools.
Any amount of experience with always-on and high-volume web server stack, Azure/GCP PaaS and Azure/reputed company networking, provisioning native Managed Apps & CI/CD pipelines.
Any amount of experience supporting omni-channel experiences.

Apply tot his job Apply To this Job

Apply

Advanced Site Reliability/DevOps Engineer - (Multiple positions available)

Responsibilities

Requirements

Keep exploring

Process Control Tech - $35 an hour - reputed company Park CA - Full-time

Senior System Software Engineer, Kubernetes and KubeVirt

English / Spanish Language Interpreter - Remote

PHARMACY/PHARMACIST-INTERN

2026 Summer Legal Clerkship

UI/UX Designer (Web Design Role; websites and reputed company pages) [Remote Only]

Law Clerk - Operations Specialist

reputed company and Marketing Specialist – Regulatory and Enforcement

PPC reputed company

Professional Learning Consultant (NYC)

[Remote] Business Development Representative

Sale End Coordinator (Remote/Home-Based)

reputed company reputed company Manager – Robotics and Fulfillment Operations at arenaflex

Data Entry Clerk Work From Home - Part Time Focus Group Panelists

reputed company Bilingual Associate Customer Support Technician – French Language Expertise

Technical Consultant - Consumer Products (Remote) – reputed company Store

Fund Finance Managing Director

reputed company Full Stack Customer Support Agent – Live Chat and Remote Work Opportunity with arenaflex

reputed company reputed company Associate, (REMOTE) Part-Time Weekend Evening Shift

Video Specialist - PTE UK - Remote