Back to the board

Mathematics Model Prompt Evaluator

100% remote Flexible hours Hiring now

Role Overview

We are seeking expert mathematicians to author and verify high-quality open-ended prompts for AI model evaluation. You will craft and review challenging, unambiguous mathematical problems across core subdomains, assessing AI reasoning quality and helping establish rigorous evaluation standards for frontier language models.

  • *You will be assigned one of two task types:**

• *Authoring Task** Create 5 original, open-ended prompts from your assigned subdomain at varying difficulty levels (undergraduate, advanced undergraduate, or graduate/professional). Prompts should require human judgment to evaluate the quality of the AI's response, such as chain-of-thought reasoning or proof construction.

  • *Verification Task**

Review 5 authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness. Edit prompts and difficulty ratings where needed.

  • *Mathematics Subdomains Covered**

Probability & Statistics, Algebra (incl. Linear Algebra), Ordinary/Partial Differential Equations & Dynamical Systems, Geometry, Graph Theory, Number Theory.

  • *Key Responsibilities**

- Author clear, unambiguous, open-ended mathematical prompts that elicit evaluable AI responses - Verify prompts are within the scope of the assigned subdomain and correctly rated for difficulty - Ensure all 5 prompts in a task are sufficiently distinct from one another with varying difficulty levels - Apply expert judgment to assess the depth and quality of mathematical reasoning required - Edit prompts and difficulty assignments where standards are not met

  • *Ideal Qualifications**

- Master's degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field - 2–6 years of professional or research experience in a quantitative field - Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning - Experience in academic research, mathematical competition design, or quantitative industry roles is a plus - Excellent written English and ability to craft precise, well-scoped technical questions

  • *More About the Opportunity**

- Expected commitment: 10+ hours/week - Asynchronous, fully remote work Apply tot his job Apply To this Job

Keep exploring

Safety Evaluator

100% remote Flexible hours

CRM Product Owner, GTM

100% remote Flexible hours

Product Manager - Vault CRM Suite

100% remote Flexible hours

JDE ERP IT Project Manager (Remote EST, CST, MST ONLY)

100% remote Flexible hours

Project Manager, International Strategic Initiatives

100% remote Flexible hours

Project Manager PMP, Richmond, VA

100% remote Flexible hours

Implementation Project Manager II (Remote)

100% remote Flexible hours

Associate Product Manager (Full-Time or Contractor) - Remote

100% remote Flexible hours

Business Analyst, Junior Product Owner

100% remote Flexible hours

Salesforce Product Owner, Administrator

100% remote Flexible hours

Business Partner

100% remote Flexible hours

Director of Compliance - Product

100% remote Flexible hours

Member Care Admin - Data Solutions Analyst

100% remote Flexible hours

Pharmacist (Clinical Pharmacy Specialist) - CRH PACT CPS

100% remote Flexible hours

Tech Support for Dashboards and Trackers

100% remote Flexible hours

Logistics and Supply Chain Specialist

100% remote Flexible hours

Remote Customer Service Representative – arenaflex Home‑Based Support Specialist (Full‑Time, Flexible Schedule, Career Growth)

100% remote Flexible hours

Backend Python Engineer

100% remote Flexible hours

Experienced Customer Success Representative – Online Communication Expert (Beginner Level / No Experience)

100% remote Flexible hours

Part-Time Remote Data Entry Specialist – Precision Data Management, Quality Assurance & Collaborative Support at arenaflex

100% remote Flexible hours