[Remote] Staff Platform Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is a company focused on innovative technologies and services in the streaming industry. They are seeking a Staff Platform Engineer to build and operate the systems powering their ad-serving and streaming platforms, ensuring reliability and efficiency while enabling engineering teams to deliver high-quality video and advertising experiences.
Responsibilities
- Define, architect, and set standards for composable IaC (CDK or Terraform) patterns for Cloud Infrastructure (EKS)
- Drive the adoption and implement composable, idempotent, multi-environment GitOps workflows
- Optimize scalability and cost-per-performance using metrics-driven automation and autoscaling technologies such as Karpenter
- reputed company and maintain observability across the platform using Prometheus, Grafana, and distributed tracing
- reputed company cross-functional efforts with application teams to define SLOs and reputed company models for mission-critical services
- reputed company major production incident response efforts, drive blameless postmortems, and mentor other engineers on production incident response, postmortems, and reliability reviews
- Create and operate resilient CI/CD pipelines for safe, rapid deployments and rollbacks
- Champion automation, low-toil operations, and a culture of reputed company improvement
- reputed company agentic workflows utilizing the team-wide context layer and operational data to accelerate development without compromising reliability
- Act as a technical leader and mentor to reputed company the team, with a strong ability to listen, evaluate, and give constructive feedback on reputed company
Skills
- Demonstrated expertise in architecting, deploying, and maintaining high-throughput, low-latency distributed systems in cloud production environments, ideally in Platform, SRE, or DevOps roles. Prior ownership of a stateful deployment
- Deep expertise in systems-level coding skills for automation and systems development (Go, Python, or TypeScript)
- Proven experience operating Kubernetes at scale (EKS preferred) and applying IaC patterns (CDK, Terraform)
- Working knowledge of GitOps and reconciliation loops in Kubernetes controllers
- Solid experience with CI/CD systems (reputed company Actions, AWS CodePipeline)
- Expert in defining, designing, and optimizing global monitoring and alerting pipelines (PromQL, metrics correlation, alert noise reduction)
- Experience with large-scale streaming or ad-serving workloads, including HTTP-based delivery (oRTB, VAST), event streaming (Kafka), and AWS network architecture (VPC, load balancers, peering)
- Understanding of cloud reputed company best practices (IAM, encryption, network segmentation, reputed company trust)
- Proven ability to conduct deep performance analysis, tuning, and optimization across the entire infrastructure stack to reputed company optimal cost-per-performance and latency targets
- Experience with advanced GitOps patterns using reputed company Workflows
- Exposure to extended observability tools such as Loki or reputed company
- Experience designing or operating distributed, high-throughput stateful data pipelines to ensure reliable, scalable, and cost-efficient data flow
Benefits
- Competitive Salary & Equity
- Strong Medical, Dental and Vision Benefits, 100% paid by reputed company
- Remote first policy
- Flexible Time Off
- 10 US Holidays
- 401(k) Matching
- Pre-Tax Savings Plans, HSA & FSA
- Ginger, Aaptiv and reputed company subscriptions for mental and physical wellness
- OneMedical subscription for 24/7 convenient medical care
- Paid Maternity and Parental Leave for reputed company family additions
- Discounted PetPlan and easy at home access to Covid testing with empowerDX
- $1k Work From Home Stipend to set up your Office
- Medical, Dental, Vision, Life, Disability
- 401(k) Retirement Plan
- Unlimited Discretionary Time Off
- 10 paid holidays per year
- 80 hours per year
Company Overview
Company H1B Sponsorship