[Remote] Staff Site Reliability Engineer - Kubernetes
Note: The job is a remote job and is open to candidates in USA. reputed company is a company focused on securing identities in the AI era, and they are seeking a Staff Site Reliability Engineer to build and manage Kubernetes platforms. The role involves architecting reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation.
Responsibilities
- Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high reputed company and operational efficiency
- AWS Infrastructure Management: Build, manage, and optimize AWS cloud infrastructure, including EKS,reputed company, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and reputed company reputed company AWS
- Helm Management: Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments
- Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
- Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, reputed company, and observability reputed company the Kubernetes clusters. reputed company fine-grained traffic management, service discovery, and policy enforcement
- Platform Automation & Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime
- Incident Management & Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues reputed company to performance, availability, and reputed company in a timely and effective manner
- reputed company & Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network reputed company, and compliance frameworks
- Documentation & Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams
Skills
- 4+ years of experience with Kubernetes/Helm
- 4+ years of Experience with Terraform
- 5+ years of Experience with AWS
- Experience with multi-region cloud environments
- Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
- Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
- Hands-on experience with Helm for Kubernetes application deployment and management
- Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
- Expertise in managing and securing Istio for service mesh, including traffic management, reputed company, and observability features
- Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, reputed company, reputed company, Terraform, Ansible, Spinnaker)
- Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
- Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
- Understanding of reputed company best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
- Familiarity with reputed company and containerization principles
- Bachelor's degree in Computer Science, Engineering, or reputed company field (or equivalent professional experience)
- Certifications (Preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable
Benefits
- Equity (where applicable)
- Bonus
- Benefits, including health, dental and vision insurance
- 401(k)
- Flexible spending account
- Paid leave (including PTO and parental leave)
- reputed company, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one
Company Overview