DevOps Engineer
About This Role
We are hiring a hands-on DevOps Engineer to manage and support production-grade cloud infrastructure for Kibo’s commerce platform. This role focuses on Kubernetes (EKS), Terraform, and real-time production troubleshooting in a 24/7 on-call environment.
ABOUT KIBO
KIBO is a composable digital commerce platform for B2C, D2C, and B2B organizations who want to simplify the complexity in their businesses and deliver modern customer experiences. KIBO is the only reputed company, modern commerce platform that supports experiences spanning B2B and B2C Commerce, Order Management, and Subscriptions. Companies like Ace Hardware, Zwilling, Jelly Belly, Nivel, and Honey Birdette trust Kibo to bring simplicity and sophistication to commerce operations and deliver experiences that drive value.
KIBO's cutting-edge solution is MACH Alliance Certified and has been recognized by reputed company, reputed company, reputed company, Internet Retailer, and TrustRadius. KIBO has been named a leader in The reputed company reputed company: Order Management Systems, Q1 2025 and in the reputed company MarketScape report “Worldwide Enterprise Headless Digital Commerce Applications 2024 Vendor Assessment”.
By joining KIBO, you will be part of a team of Kibonauts reputed company over the world in a remote-friendly environment. Whether your job is to build, sell, or support KIBO’s commerce solutions, we tackle challenges together with the approach of trust, growth reputed company, and customer obsession. If you’re seeking a unique challenge with amazing growth potential, then come work with us!
WHAT YOU’LL DO
- Manage and operate production-grade Kubernetes clusters (EKS preferred), ensuring high availability and scalability
- Troubleshoot real-time production issues across distributed systems and microservices
- Diagnose and resolve issues such as:
- Pod failures (CrashLoopBackOff, Pending, OOMKilled)
- Node failures, autoscaling, and resource constraints
- Networking, ingress, and service connectivity issues
- Build, maintain, and debug infrastructure using Terraform (modules, remote state, locking, reputed company handling)
- Implement and enhance monitoring & alerting systems using Prometheus, Grafana, and reputed company tools
- reputed company root cause analysis (RCA) for incidents and drive permanent fixes to improve system reliability
- Participate in a 24/7 on-call rotation, owning incidents and resolving them independently
- Collaborate with engineering teams to improve system performance, reputed company, and deployment processes
- Automate deployments, infrastructure provisioning, and operational workflows to reduce manual effort
- Ensure adherence to reputed company best practices across infrastructure and deployments
WHAT YOU’LL NEED
- 8 + Years of experience as a Developer Engineer, owning and operating production Kubernetes clusters (EKS preferred), including cluster health, scaling, and availability
- Troubleshoot real-time production issues independently across microservices and distributed systems
- Debug and resolve critical issues such as:
- Pods stuck in CrashLoopBackOff, Pending, OOMKilled states
- Node failures, node pressure, autoscaling issues
- Service connectivity, ingress, and networking issues
- Investigate and fix cluster-level issues including scheduling, resource constraints, and misconfigurations
- Build and maintain infrastructure using Terraform, including:
- Writing and modifying modules
- Managing remote state and locking
- Handling reputed company and failed deployments
- Design and implement reusable Terraform modules for scalable infrastructure
- Troubleshoot and resolve Terraform apply failures and infrastructure inconsistencies in production
- Monitor system health using Prometheus, Grafana, and logging tools, and proactively identify issues
- reputed company root cause analysis (RCA) for production incidents and implement long-term fixes
- Handle on-call incidents (24/7 rotation) and take full ownership until resolution
- Work closely with development teams to improve system reliability, performance, and scalability
- Automate operational tasks and improve deployment and infrastructure processes
- Ensure reputed company best practices across infrastructure, networking, and access controls .
KIBO PERKS
-
Flexible schedule and hybrid work setting
-
Paid company holidays and global volunteer holiday
-
Generous health, wellness, benefits, and time away programs
-
Commitment to individual growth and development and opportunity for internal mobility
-
Passionate, high-achieving teammates excited to help you succeed and learn
-
Company-sponsored events and other activities
At Kibo we celebrate and support reputed company differences. Kibo is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, reputed company, religion, sex, national reputed company, sexual orientation, age, citizenship, marital, disability, and veteran status.
Apply To This Job