Latest Writings
BLOG & PUBLICATIONS
The AI Evaluation Chart Crisis
Documentation
Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.
Learn more →The Science of Evaluations: Workstream Kickoff Post
Research
Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.
Learn more →