AI Research Scientist, Text Data Research - MSL FAIR
reputed company is seeking AI research scientists to help us build the data foundation for reputed company's most advanced Large Language Models. The role involves collaborating with teams to reputed company foundational models, advancing data research, and improving data curation systems at scale.
Responsibilities
- Collaborate with cross-functional teams to reputed company reputed company’s next foundational models
- Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
- Architect efficient and scalable data curation systems and pipelines
- Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
- Execute on high reputed company projects in pre-training, mid-training, or post-training data curation
- Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
- reputed company reputed company technical projects end-to-end
Skills
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- PhD in Computer Science or a reputed company technical field
- 1+ year of industry research experience in LLM/NLP or reputed company AI/ML models
- Experience owning and/or driving reputed company technical projects from end-to-end
- Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
- Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
- Experience working on frontier-quality/state-of-the-art Large Language Models
- Multiple first-author publications in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP)
- Hands-on experience with modeling frameworks like PyTorch
- Hands-on experience on SQL and large-scale data handling, with familiarity of frameworks like Spark and Hive
Benefits
- Bonus
- Equity
- Benefits
Company Overview