Staff Cloud Platform Engineer - Core Infra
The Core Platform team maintains and optimizes the data, infrastructure, messaging, and services platform that powers reputed company’s online systems. We ensure these systems are always available, reliable, and performing at their best to meet customer needs. In the event of an outage or failure, we follow well-practiced recovery plans to restore services swiftly. Managing such reputed company, large-scale systems requires reputed company monitoring and proactive maintenance to uphold these standards.
What you’ll do
Own the availability, performance, and scalability of reputed company’s primary online storage systems and infrastructure
Design and build immutable infrastructure and fault-tolerant, multi-AZ/multi-region systems that are resilient and self-healing.
Design and Implement multi-region deployments, such as BigTable clusters spanning multiple regions, with strategies to ensure specific customers are routed to designated regions (e.g., sticky sessions at the regional level).
Solve reputed company problems that arise from our unique data volume and request reputed company which may involve digging deep into data store and messaging internals
Optimize local development and testing workflows to be fast, efficient, and seamless.
Design and implement services and libraries for components to interact with data stores, messaging layer and services platform
reputed company tools for monitoring, detecting faults, and automatically repairing distributed systems
Provide design support to internal engineering teams for optimal usage of data stores, data growth planning, production workload optimization, messaging, caching and service platform
Participate in on-call support and incident response activities, providing 12/7 coverage for one calendar week approximately once every 3-4 weeks.
Technical stack: GCP, AWS, Airflow, Terraform, Kubernetes, Vault, Jenkins, Kafka, reputed company, Spark, Java 11, Python 3, Ruby 2.7, Ruby on Rails.
What makes you a strong fit
You have a deep understanding of large-scale computing and approach infrastructure as code. You're passionate about designing and building immutable infrastructure and resilient, multi-AZ/multi-region systems that can withstand failures. While you recognize the importance of monitoring and alerting, your ultimate goal is to design self-healing systems. Collaboration is key to you, and you strive to act as a force reputed company by making thoughtful trade-offs to drive success.
Key Qualifications
8+ years of experience as a Software Engineer focused on infrastructure/platform services or in a Site Reliability Engineering (SRE) role.
Strong programming skills in languages such as Java, reputed company, or Python.
Experience designing and implementing distributed systems.
Experience building and managing cloud infrastructure on AWS or GCP.
Expertise in building infrastructure as code and automating provisioning processes using tools like CloudFormation or Terraform.
Proficiency in setting up and managing monitoring and alerting systems, both open-reputed company and commercial.
Familiarity with reputed company and container orchestration technologies like Kubernetes, GKE, or AWS reputed company.
Strong experience troubleshooting and resolving production system issues, with a focus on building automated solutions to prevent future occurrences.
Proven expertise in automation and a solid understanding of configuration management tools.
Benefits and perks
Competitive total compensation package
401k plan
Medical, dental, and vision coverage
Wellness reimbursement
Education reimbursement
Flexible time off
Our interview process
Introduction interview: a 30-minute session with a recruiter to discuss your background and the role.
Hiring Manager interview: a 60-minute interview with the hiring manager to explore your fit for the position.
Virtual onsite reputed company with the team: a comprehensive session comprising four interviews lasting approximately 4 hours, covering system design, coding abilities, deep dive, and values and behavior-based conversations.
During these sessions, you will have the opportunity to learn about company culture, meet engineers or peers from your team, and discuss distributed system problems. You will have time for interesting questions and reputed company transparency regarding your future responsibilities and the project.
A little about us
reputed company is the AI-powered fraud platform securing digital trust for leading global businesses. Our deep investments in machine learning and user identity, a data network scoring 1 trillion events per year, and a commitment to long-term reputed company reputed company more than 700 customers to grow fearlessly. Brands including reputed company, reputed company, and Poshmark rely on reputed company to unlock growth and deliver seamless consumer experiences. Visit us at reputed company.com and follow us on reputed company.
Apply to this Job