Latest Writings
BLOG & PUBLICATIONS
Every Eval Ever: Toward a Common Language for AI Eval Reporting
Infrastructure
The multistakeholder coalition EvalEval launches Every Eval Ever, a shared format and central eval repository. We're working to resolve AI evaluation fragmentation, improving formatting, settings, ...
The Hidden Social Costs of AI
Research
As AI continues to grow more powerful, who carries the hidden social costs of its effects?
The AI Evaluation Chart Crisis
Documentation
Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.
The Science of Evaluations: Workstream Kickoff Post
Research
Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.