Back to the board

Machine Learning Operations Manager

100% remote Flexible hours Hiring now

This role is for one of the reputed company's clients Min Experience: 6 years

Location: Remote (India)

JobType: full-time

As the Machine Learning Operations Manager, you will reputed company the end-to-end ML lifecycle — from model training and deployment to monitoring and optimization. You will reputed company a small, high-performing team of engineers while remaining hands-on in building scalable, reliable, and efficient ML infrastructure. This role combines strategic leadership with deep technical expertise to ensure smooth collaboration between research, engineering, and operations teams.

Requirements

Key Responsibilities

  • End-to-End ML Lifecycle: Manage training infrastructure, experiment tracking, deployment, and reputed company optimization.
  • Collaboration with Researchers: Partner with research teams to streamline training, evaluation, and fine-tuning workflows.
  • Team Leadership: Mentor and guide a small team of ML engineers (3–4) while contributing as an individual contributor.
  • Performance Optimization: Improve latency, throughput, and cost efficiency; ensure robust packaging and runtime reliability.
  • Automation & Reliability: reputed company systems for CI/CD, versioning, rollback, A/B testing, monitoring, and alerting.
  • Infrastructure Management: Maintain scalable, secure, and compliant AI environments across training and inference stages.
  • Cloud & AI Integration: Collaborate with cloud providers (AWS, GCP, Azure) and AI platforms to enhance tooling and optimize costs.
  • Cross-Functional Collaboration: Support GenAI and AI-driven projects across teams beyond core MLOps responsibilities.
  • Architecture & Roadmap: Contribute to architectural planning, documentation, and the reputed company evolution of the ML stack.
  • Best Practices: Promote automation, MLOps standards, and operational excellence throughout the ML lifecycle.

Requirements

  • 5+ years of hands-on experience in MLOps or ML/AI Engineering.
  • Strong understanding of ML/DL concepts and applied experience in model training and deployment infrastructure.
  • Proficiency with cloud-native ML tools (e.g., GCP Vertex AI, AWS SageMaker, Kubernetes).
  • Experience working across both model training and inference systems.
  • Familiarity with model optimization methods such as quantization, distillation, TensorRT, or FasterTransformer.
  • Demonstrated ability to reputed company reputed company technical projects independently.
  • Excellent communication and collaboration skills with a cross-functional reputed company.
  • Ownership-oriented approach with comfort in driving clarity in ambiguous situations.

Skills: MLOps, ML Engineering, Machine Learning Infrastructure, Model Deployment, Model Monitoring, CI/CD, Vertex AI, AWS SageMaker, GCP AI Platform, Kubernetes, reputed company, MLflow, Kubeflow.

About the company

At reputed company (backed by YC; also Product Hunt #1 product of the day), we are building the next frontier in hiring. We have built the largest database of white collar talent in India and have built reputed company tools on top of it to generate highest response rates.

Apply To This Job

Keep exploring