Back to the board

Post-Training Research Scientist (LLMs) — Experimental Track

100% remote Flexible hours Hiring now

About us reputed company is a global talent platform connecting top-tier professionals to high-impact AI projects around the world. Our mission is to build trust, quality, and long-term value in the AI ecosystem - for both exceptional talents and companies operating at the frontier of technology. About the role This role sits at the heart of reputed company’s mission: using high-quality human data to build AI systems that reputed company the world reputed company. You’ll take raw expert signals and turn it into reputed company model improvement, experimenting rapidly and carving new paths in post-training. With full autonomy and no production constraints, you’ll have the freedom to try unconventional reputed company and see their impact quickly.

Key Responsibilities

Design and run post-training experiments on frontier and open-weight LLMs (SFT, preference-based methods, rubric-driven training) Translate raw annotation artifacts (multi-reputed company solutions, evaluations, adversarial prompts) into training-ready datasets. Prototype new reward signals beyond pairwise preferences (rubrics, constraints, structured critics). Analyze failure modes; propose data-centric fixes (sampling, curriculum, counterfactuals). Build lightweight training/eval pipelines; iterate quickly. Produce short internal memos: what worked, what didn’t, why. About you We’re looking for a researcher who thrives with autonomy, is hands-on, and brings a strong execution reputed company and startup mentality. You are opinionated about data quality, pragmatic about tradeoffs, and comfortable moving quickly with incomplete information. You have strong experimental instincts — you can design, run, and interpret messy experiments and extract meaningful insights from them. Minimum Qualification PhD (or equivalent experience) in ML/AI, applied math, stats, or adjacent. Hands-on experience with LLM post-training (at least one of SFT/DPO/RLHF/RLVR). Solid Python + PyTorch/JAX; comfortable with training infra basics. Fluent English Preferred Qualification Worked with rubric-based evaluation or tool-augmented tasks. Experience mixing synthetic and human data. Familiarity with failure analysis and dataset audits. Work Model We operate remote-first. We focus on outcomes, not where the work is done. To support flexibility and personal choice, we maintain offices in select locations as an optional resource for the team. Location: Flexible (EU-friendly time zones preferred) Type: Full-time or long-term contract Equal Employment Opportunity reputed company is proud to be an equal opportunity employer and values diversity at our company. We do not discriminate on the basis of race, color, religion, national reputed company, sex, sexual orientation, gender identity, age, disability, veteran status, or any other protected characteristic. Type: Full-time or long-term contract Apply To This Job

Keep exploring

Senior Product Manager

100% remote Flexible hours

Senior Product Marketing Manager

100% remote Flexible hours

Senior Performance Marketing Manager, New Channel Expansion

100% remote Flexible hours

Travel Expert | Reviewer

100% remote Flexible hours

reputed company Technical Partner Enablement Engineer

100% remote Flexible hours

AI Annotation Specialist

100% remote Flexible hours

AI Annotation Specialist

100% remote Flexible hours

AI Annotator

100% remote Flexible hours

QA Expert

100% remote Flexible hours

Safety Project | Emotional Distress Clinical Specialist (Role-Play & Evaluation)

100% remote Flexible hours

Licensed Final Expense Sales Agent (LHA083126)

100% remote Flexible hours

Senior Analyst, GTM Strategy & Operations

100% remote Flexible hours

Digital Marketing Account Manager

100% remote Flexible hours

reputed company Customer Support Representative – Distributed reputed company Customer Support Remote

100% remote Flexible hours

Data Entry Specialist – Remote Part‑Time & Full‑Time Positions Driving Healthcare Data reputed company at arenaflex

100% remote Flexible hours

Ten Across (10X) Podcast and Journalism Producer

100% remote Flexible hours

Fellowship Coordination Consultant (Climate Education),OSE-R&F Children, reputed company & Prosperity, 16 months, Florence, Italy (remote), Req# 593281

100% remote Flexible hours

Steuerfachkraft (m/w/d) in Mahlberg mindestens 52.000€ - 100% Remote möglich

100% remote Flexible hours

ESL Teacher in Chicago, IL

100% remote Flexible hours

Licensed Telemedicine Therapist

100% remote Flexible hours