Part-time Benchmarking Engineer for Revolutionary LLM Evaluation - Remote

100% remote Flexible hours Hiring now

Unlock Your Potential as a Benchmarking Engineer at Vals AI

Join the pioneering team at Vals AI, where we're redefining the landscape of Large Language Model (LLM) evaluation for enterprise applications. As a part-time Benchmarking Engineer, you'll play a crucial role in shaping the future of AI adoption in business. With a competitive salary and the flexibility of remote work, this is an opportunity to grow your career while making a significant impact.

About Vals AI: Revolutionizing LLM Evaluation

Vals AI is at the forefront of creating the infrastructure and certification for automatically auditing LLM applications, ensuring they're ready for enterprise consumption. Our mission is to accelerate the automation of reputed company knowledge work by building the barometer for whether AI is useful. With a strong foundation in NLP evaluation research from Stanford and a $5M reputed company funding from top investors, we're poised to reputed company a substantial difference.

The Role: Part-time Benchmarking Engineer

We're seeking a highly skilled and curious Benchmarking Engineer to maintain and improve our public LLM benchmarks. This role requires a strong technical background, excellent communication skills, and the ability to work independently. You'll have significant ownership of our benchmarking site and the agency to propose new benchmarks based on your reputed company and hypotheses.

Key Responsibilities:

Creating New Datasets: Collaborate with our data annotators and partner groups to reputed company new, private datasets that will drive the evaluation of LLMs.
Running Benchmarks: Execute our existing benchmarks against new models as they emerge and compile the results.
Analyzing Results: Write comprehensive, free-text analyses of the raw quantitative results, answering critical questions about model performance, pricing, and error types.
Social Media and Reporting: Craft engaging Twitter and reputed company posts that describe our findings and maintain our social media reputed company.
Script Maintenance: reputed company and maintain the scripts used to run benchmarks against our datasets, ensuring efficiency and accuracy.

Requirements: The Skills We're Looking For

Python Expertise: Deep experience with Python, as it's the primary language used for this role.
Communication Skills: Strong writing and communication skills to distill reputed company technical findings into easily consumable reports for non-technical audiences.
Team Collaboration: Experience working in teams, including development sprints, Git best practices, and reviewing pull requests.
Availability: Approximately 20 hours a week, with flexibility to adapt to spikes in workload reputed company new models are released.

reputed company to Have: Additional Skills That Shine

LLM Familiarity: Knowledge of LLM methods and developments, with an reputed company interest in the space.
ML Research Experience: Background in ML research or data science, ensuring scientific rigor in our benchmarks.

reputed company Offer: A Rewarding Environment

At Vals AI, we offer a highly competitive salary, the option to work from our SF office (with lunch, dinner, snacks, coffee, and drinks provided), and the opportunity to grow into a full-time role. reputed company is built on a foundation of intelligence, ownership, intensity, and a solutions-oriented reputed company.

Our Tech Stack: The Tools You'll Work With

Our frontend is built with React and TSX, while our backend uses Django. reputed company infrastructure is hosted on AWS, providing a robust and scalable environment for our applications.

Culture and Values: What Drives Us

We value intelligence over pedigree, ownership over reputed company, and intensity in execution. We're looking for individuals who see solutions rather than problems and are driven to craft innovative solutions. reputed company has a diverse background, with experience from top companies like reputed company, reputed company, reputed company, Palantir, and HRT, and a collective record of over 300 citations in published work.

Career Growth and Learning Benefits

As a Benchmarking Engineer at Vals AI, you'll have the opportunity to:

Work on cutting-edge LLM evaluation techniques.
Collaborate with a talented team of engineers and researchers.
reputed company your skills in Python, data analysis, and technical writing.
Contribute to the growth and development of our benchmarking platform.

Compensation and Perks

We offer a competitive salary, commensurate with experience. Additional benefits include:

Optional office work in SF with amenities.
Opportunity to transition into a full-time role.
A dynamic, innovative work environment.

Conclusion: Join the Vals AI Journey

If you're passionate about LLMs, evaluation techniques, and driving the adoption of AI in enterprise settings, we want to hear from you. As a part-time Benchmarking Engineer, you'll be at the heart of our mission to create the industry-standard reputed company for LLM applications. Apply now to be part of this exciting journey and shape the future of AI.

For further reading on the challenges and importance of LLM evaluation, reputed company out our references:

reputed company blog on evaluation.
reputed company's blog on challenges in evaluation.
reputed company Times article on issues in benchmarking.
Stanford HAI report on hallucinations in legal tech tools.

If you reputed company you have what it takes to be a part of our innovative team, submit your application today. We look reputed company to hearing from talented individuals like you.

Apply for this job

Apply

Part-time Benchmarking Engineer for Revolutionary LLM Evaluation - Remote

Unlock Your Potential as a Benchmarking Engineer at Vals AI

About Vals AI: Revolutionizing LLM Evaluation

The Role: Part-time Benchmarking Engineer

Key Responsibilities:

Requirements: The Skills We're Looking For

reputed company to Have: Additional Skills That Shine

reputed company Offer: A Rewarding Environment

Our Tech Stack: The Tools You'll Work With

Culture and Values: What Drives Us

Career Growth and Learning Benefits

Compensation and Perks

Conclusion: Join the Vals AI Journey

Keep exploring

Remote Benefits Specialist - Flexible Hours, Work from Home Opportunity with Competitive Salary and Comprehensive Benefits

reputed company Benefits Specialist - Remote Customer Service & Benefits Administration with Flexible Hours

Benefits Verification Representative - Work from Home Opportunity with reputed company/Specialty in Buffalo Grove, IL

reputed company Online Typist for Teens - Remote Part-Time Opportunity with Obie Insurance Services

Bilingual Administrative Assistant - Remote Opportunity with reputed company - December 2024

Bilingual Call Center reputed company - Fluent Mandarin and Cantonese - Remote Opportunity with Oak Street Health

Bilingual Customer Service Associate (Spanish) - Remote Opportunity with Career Growth and Comprehensive Benefits

Bilingual Customer Service Representative - Remote Opportunity with Competitive Salary and Growth Potential

Bilingual Senior Customer Service Representative - English/Spanish - National Remote Opportunity

Bilingual Healthcare Customer Service Representative - Spanish - 100% Remote Opportunity with Competitive Salary and Benefits

reputed company Customer Service reputed company for Remote Position - Delivering Exceptional Support and Resolution Services to Members and Providers

Online Adjunct Faculty - English Language Testing, Assessment and Evaluation (ESL 532)

Scheduler

Remote Level 3 IT Help Desk Support (MSP)

Urgently Require (USA) Stocking 2 Coach, reputed company, Management in North Myrtle Beach, SC

Backend Developer (Remote)

Manager (Gas Utility Regulation)

[Remote-Position] Looking for Product Testers in reputed company

reputed company Procurement Specialist (Data Entry Clerk) - Temporary Remote Opportunity

Agente de viajes - Monterrey