Back to the board

Member of Engineering – Pre-training, Data Engineering

100% remote Flexible hours Hiring now

Job Description:

  • Build and maintain high-performance pipelines for trillions of tokens.
  • Deliver diverse and high quality datasets for pre-training foundation models.
  • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Requirements:

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, Docker, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • Nice to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits:

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Apply tot his job Apply To this Job

Keep exploring

Senior Manager – Data Engineering

100% remote Flexible hours

Senior Data Engineer (REMOTE OR HYBRID MA)

100% remote Flexible hours

Staff Data Engineer (Python, LLM, Data Platforms) - Remote

100% remote Flexible hours

Senior Data Engineer

100% remote Flexible hours

Data Engineer x2

100% remote Flexible hours

Senior Data Engineer / Data Platform Lead

100% remote Flexible hours

Business Intelligence Analyst - Remote - Full Time

100% remote Flexible hours

Data Engineer, Data Platforms (Remote)

100% remote Flexible hours

Business Intelligence Analyst (Remote)

100% remote Flexible hours

Stream Processing Engineer (Apache Flink Internals) - Remote Work | REF#294352

100% remote Flexible hours

Crypto KOL Relationship Director - Whale Relations Manager

100% remote Flexible hours

Experienced Entry-Level Live Chat Support Representative – Flexible Night Shift Opportunities

100% remote Flexible hours

Telephone Triage Registered Nurse (RN)

100% remote Flexible hours

Director of Internal Audit

100% remote Flexible hours

Experienced Data Entry Specialist for Stay-at-Home Moms – Flexible Work Opportunity at arenaflex

100% remote Flexible hours

Experienced Customer Service Representative – Join arenaflex's Dynamic Team and Elevate Your Career in the Lingerie Industry

100% remote Flexible hours

Account Manager II

100% remote Flexible hours

Experienced Remote Customer Service/Data Entry Representative – Claims Coordination Team Support

100% remote Flexible hours

Regional F&I Manager

100% remote Flexible hours

Senior Specialist, Employee Engagement & Talent Management

100% remote Flexible hours