Latest Writings

BLOG & PUBLICATIONS

The AI Evaluation Chart Crisis

Documentation

Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.

Learn more →

The Science of Evaluations: Workstream Kickoff Post

Research

Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.

Learn more →