This opportunity is shared as part of a referral program
Role Overview This project focused on improving the performance of advanced language models. We are seeking experts with experience in Math, Physics, Biology, Chemistry or another academic field to evaluate and refine model outputs. Experts will help craft domain-relevant prompts and assess AI-generated responses for clarity, accuracy, and stylistic consistency. This is a unique opportunity to apply your field knowledge toward shaping frontier AI systems.
Key Responsibilities
Create domain-relevant prompts reflective of real-world use cases
Evaluate AI responses for adherence to professional tone and formatting standards
Draft expert-level “golden responses” that serve as benchmarks for model performance
Flag errors, inconsistencies, or stylistic deviations in AI output
Collaborate asynchronously with a research team refining model evaluation protocols
Ideal Qualifications
Minimum education is a master’s degree in your field of study
Deep familiarity with domain-specific writing, formatting, and communication norms
Strong analytical and written communication skills
Detail-oriented with a commitment to quality and consistency
Experience with prompt engineering or AI tools is a plus, but not required
More About the Opportunity
Remote and asynchronous — work on your own schedule
Expected commitment: 10+ hours/week
Project duration: ~4 weeks with possible extension
Show more Show less
Requirements
Minimum education is a master’s degree in your field of study
Deep familiarity with domain-specific writing, formatting, and communication norms
Strong analytical and written communication skills
Detail-oriented with a commitment to quality and consistency
Experience with prompt engineering or AI tools is a plus, but not required
Additional Instructions
Create domain-relevant prompts reflective of real-world use cases
Evaluate AI responses for adherence to professional tone and formatting standards
Draft expert-level “golden responses” that serve as benchmarks for model performance
Flag errors, inconsistencies, or stylistic deviations in AI output
Collaborate asynchronously with a research team refining model evaluation protocols
Perks and Benefits
Remote and asynchronous — work on your own schedule
Expected commitment: 10+ hours/week
Project duration: ~4 weeks with possible extension