2026 ACL Workshop on Evaluating AI in Practice

Background

As Generative AI systems are increasingly integrated into real-world products and decision-making pipelines, evaluation has become a central yet challenging component of responsible AI development. While evaluation research has advanced rapidly, gaps persist between research and practice: model developers often prioritize scalability and integration into development workflows, while evaluation researchers emphasize rigor, validity, and sociotechnical considerations. Recent work by Reuel et al. from the EvalEval coalition mapping first- and third-party social impact evaluations shows a clear division of labor, where developers underreport or deprioritize key impacts such as environmental costs, data provenance, and labor practices, while third-party evaluators provide broader but necessarily incomplete coverage, leaving critical gaps in accountability and comparability.

This workshop focuses on AI evaluation in practice, centering the tensions and collaborations between model developers and evaluation researchers. Through a call for papers on contemporary challenges spanning methodological rigor, sociotechnical perspectives, scalability, community-informed evaluation, and real-world use alongside invited panels, the workshop aims to surface practical insights from across the evaluation ecosystem. The workshop will also host a shared task to foster collaborative development of a robust Evaluation Cards framework, encouraging shared infrastructure and actionable evaluation practices.

Topics

Themes for submission include, but are not limited to:

1. Evaluation Foundations and Validity

Conceptualization and operationalization issues in evaluations of generative AI;
Construct and context validity of existing evaluations;
Reliability, robustness, consistency, and generalization of evaluation methods;
Ethical and consequential validity of evaluations;
State of sociotechnical evaluations including but not limited to the following categories from Solaiman et al. 2025;
Bias, stereotypes, and representational harms;
Cultural values and sensitive content;
Disparate performance across populations and contexts;
Trustworthiness, autonomy, and overreliance on model outputs;
Trust in media and information ecosystems;
Privacy, data protection, and personal sense of self;
Inequality, marginalization, community erasure, and violence;
Long-term amplification and embedding of marginalization;
Concentration of authority, militarization, surveillance, and weaponization;
Imposing norms and values;
Labor, creativity, intellectual property, and ownership;
AI sovereignty and the Global South

2. Evaluation Costs and Infrastructure

Financial costs of evaluations;
Environmental costs and carbon emissions;
Documentation frameworks for financial and environmental evaluation costs;
Data and content moderation labor and well-being;
Infrastructure for evaluations, including evaluation harnesses and tooling.

3. Second-Order and Community-Centered Impacts

Second-order harms, including representational, economic, and cultural impacts;
Linguistic and cultural values in evaluation design;
Community-centered and participatory evaluation approaches;
Disparate performance and privacy-aware evaluation methods;
Economic impacts and labor market effects;
Ecosystem and environmental impacts;
Widening resource gaps.

Submission Guidelines

We welcome two types of papers:

Regular papers: Full-length research papers of 6–8 pages (excluding references and supplementary material);
Tiny papers: Short research, position, or “provocation” papers up to 2 pages (excluding references and supplementary material).

We welcome the following submission formats:

Extended abstracts that are short but complete research papers presenting original or interesting results;
Provocations that introduce novel perspectives or challenge conventional wisdom around broader social impact evaluation for generative AI.

Accepted papers will be presented as posters, with a subset selected for spotlight oral presentation. There is no preference between regular and tiny paper submissions; all papers will be assessed based on their relevance to the workshop themes.

Papers should be submitted in the conference-provided format. The review process will be double-blinded; therefore, all identifying information must be removed from submissions. Submissions may include unpublished work as well as previously published or accepted papers, provided they do not violate dual-submission policies. Contributions are non-archival, and accepted papers will be made publicly available.

Important Dates

Submission Deadline: March 12th, 2026
Notification of Acceptance: April 28, 2026
Camera-ready paper due: May 14, 2026
Pre-recorded video due (hard deadline): June 4, 2026
Workshop at ACL in San Diego: July 3-4, 2026

All deadlines are specified in Anywhere on Earth (AoE).

Submission Site

Submissions may be made via OpenReview

Reviewer Recruitment

To support a fair, high-quality, and sustainable review process, we adopt a reciprocal reviewer recruitment expectation. Authors submitting papers to the workshop will be expected to serve as reviewers, helping ensure sufficient reviewing capacity and timely feedback for all submissions. This expectation must be clearly indicated during paper submission through the OpenReview system.

Program Chairs

Mubashara Akhtar, ETH Zurich & ETH AI Center
Jan Batzner, Weizenbaum Institute, Technical University Munich
Leshem Choshen, MIT, IBM Research, MIT-IBM Watson AI Lab
Avijit Ghosh, Hugging Face
Usman Gohar, Iowa State University
Jennifer Mickel, Eleuther AI
Ichhya Pant, Independent
Zeerak Talat, University of Edinburgh

Organizers

EvalEval Coalition

Workshop Contact

Ichhya Pant: ipant@gwu.edu
Jennifer Mickel: jamickel@utexas.edu
Jan Batzner: jan.batzner@weizenbaum-institut.de