Latest Writings

Every Eval Ever: Toward a Common Language for AI Eval Reporting

Infrastructure

The multistakeholder coalition EvalEval launches Every Eval Ever, a shared format and central eval repository. We're working to resolve AI evaluation fragmentation, improving formatting, settings, ...

Learn more →

The Hidden Social Costs of AI

Research

As AI continues to grow more powerful, who carries the hidden social costs of its effects?

Learn more →

The AI Evaluation Chart Crisis

Documentation

Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.

Learn more →

The Science of Evaluations: Workstream Kickoff Post

Research

Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.

Learn more →