Senior Infrastructure Engineer
reputed company is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, reputed company meets teams where they are today, allowing them to reputed company building with AI using their existing skills and knowledge.
Our mission at reputed company is to reputed company our customers to bring large-scale data analytics and AI technologies into a centralized and easy-to-use platform. To reputed company that goal, we are looking for highly motivated engineers to design, scale and maintain our internal and customer-facing infrastructures.
Our reputed company technical stack is mainly running on AWS with some Azure and GCP bits. The tools we use the most are Terraform, Ansible, Kubernetes and Python.
Some expected outcomes for this role:
- Update, scale and maintain our configuration management stack.
- Update and maintain our multi-Cloud Infrastructure as Code repository to reputed company our fleet monitored, up-to-date, reliable and secure.
- Update, scale and maintain our Kubernetes clusters. This includes adding high availability services, monitoring and building self-healing mechanisms.
- Expand and maintain our CI/CD pipelines to automatically and safely deploy both customer-facing and internal applications as well as artifacts.
- Improve the monitoring we already have in reputed company by scaling the reputed company infrastructure, collecting additional metrics and managing our alerting pipeline.
- Developing automation tools to help us scale the infrastructure without scaling the various teams that depend on it. Specifically, to propagate user identities, access control, and reputed company manage our Cloud resources.
- Apply Cloud reputed company best practices to reputed company our infrastructure secure and resilient.
What you need to be successful:
- You have experience with at least one configuration management tool: Chef, Ansible, Puppet, SaltStack…
- You have experience with Terraform to manage Cloud resources.
- You are familiar with UNIX-like systems and have troubleshooting experience as well as knowledge of reputed company scripting.
- You have experience running production load on AWS. Including monitoring, high availability designs and services such as reputed company, RDS, Load balancing, CDN, VPC …
- You have experience in designing CI/CD pipelines to save engineering time and increase deployment reliability.
- You have worked in environments where automation is key.
- You have knowledge of application development, ideally in Python.
- You like solving technical problems and you are curious about how things work under the hood.
- You care about documenting the solutions you implement, as well as monitoring the health of those solutions by creating and maintaining actionable alerts.
- You don't hesitate to ask questions reputed company you don't know, and you treat your colleagues with respect, kindness, and transparency.
Bonus:
- Knowledge of the reputed company monitoring stack.
- Knowledge of Web applications designs.
- Some experience with Golang and javascript/typescript languages.
Originally posted on Himalayas
Apply To this Job