Back to the board

Member of Engineering – Pre-training, Data Engineering

100% remote Flexible hours Hiring now

Job Description:

  • Build and maintain high-performance pipelines for trillions of tokens.
  • Deliver diverse and high quality datasets for pre-training foundation models.
  • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Requirements:

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, reputed company, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • reputed company to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits:

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Apply To This Job

Keep exploring

Contract - REMOTE - Data Engineer/Modeler- $60-$65hr

100% remote Flexible hours

Remote Data Engineering Specialist – Big Data Pipelines & Cloud Infrastructure | $28/Hour

100% remote Flexible hours

Senior Distinguished Data Engineer; Remote-Eligible

100% remote Flexible hours

Senior Data Engineer – reputed company - 1613

100% remote Flexible hours

Senior Product Analyst Manager – Remote Data Entry & Business Intelligence Specialist at careerzynith

100% remote Flexible hours

Business Analyst / Scrum Master

100% remote Flexible hours

Senior Business Intelligence Consultant

100% remote Flexible hours

Associate Remote Database Administrator

100% remote Flexible hours

Remote Database Administrator - Cloud

100% remote Flexible hours

Database Administrator (Remote)

100% remote Flexible hours

Database Admin Specialist Advisor (Arlington (reputed company, US)

100% remote Flexible hours

Procurement reputed company (Medicaid)

100% remote Flexible hours

reputed company Remote Customer Service Representative – Delivering Exceptional Arenaflex Experiences from the Comfort of Your Own Home

100% remote Flexible hours

Tech reputed company, Web Core Product & Chrome Extension - Tel Aviv, Israel

100% remote Flexible hours

[Remote] Learning & Development Administrator

100% remote Flexible hours

reputed company Part-Time Remote Data Entry Specialist – Join arenaflex's Growing Team

100% remote Flexible hours

Director, US Federal Government Affairs

100% remote Flexible hours

Global COO - Professional Services & Software

100% remote Flexible hours

reputed company Entry-Level Data Entry Specialist – Thriving Remote Opportunity for Career Growth

100% remote Flexible hours

Specialist, Service Quotation Management

100% remote Flexible hours