2026 ACL Workshop on Evaluating AI in Practice

Methodological Rigor, Sociotechnical Perspectives, and Community Collaborations

Background

As Generative AI systems are increasingly integrated into real-world products and decision-making pipelines, evaluation has become a central yet challenging component of responsible AI development. While evaluation research has advanced rapidly, gaps persist between research and practice: model developers often prioritize scalability and integration into development workflows, while evaluation researchers emphasize rigor, validity, and sociotechnical considerations. Recent work by Reuel et al. from the EvalEval coalition mapping first- and third-party social impact evaluations shows a clear division of labor, where developers underreport or deprioritize key impacts such as environmental costs, data provenance, and labor practices, while third-party evaluators provide broader but necessarily incomplete coverage, leaving critical gaps in accountability and comparability.

This workshop focuses on AI evaluation in practice, centering the tensions and collaborations between model developers and evaluation researchers. Through a call for papers on contemporary challenges spanning methodological rigor, sociotechnical perspectives, scalability, community-informed evaluation, and real-world use alongside invited panels, the workshop aims to surface practical insights from across the evaluation ecosystem. The workshop will also host a shared task to foster collaborative development of a robust Evaluation Cards framework, encouraging shared infrastructure and actionable evaluation practices.

Topics

Themes for submission include, but are not limited to:

1. Evaluation Foundations and Validity

  • Conceptualization and operationalization issues in evaluations of generative AI;
  • Construct and context validity of existing evaluations;
  • Reliability, robustness, consistency, and generalization of evaluation methods;
  • Ethical and consequential validity of evaluations;
  • State of sociotechnical evaluations including but not limited to the following categories from Solaiman et al. 2025;
  • Bias, stereotypes, and representational harms;
  • Cultural values and sensitive content;
  • Disparate performance across populations and contexts;
  • Trustworthiness, autonomy, and overreliance on model outputs;
  • Trust in media and information ecosystems;
  • Privacy, data protection, and personal sense of self;
  • Inequality, marginalization, community erasure, and violence;
  • Long-term amplification and embedding of marginalization;
  • Concentration of authority, militarization, surveillance, and weaponization;
  • Imposing norms and values;
  • Labor, creativity, intellectual property, and ownership;
  • AI sovereignty and the Global South

2. Evaluation Costs and Infrastructure

  • Financial costs of evaluations;
  • Environmental costs and carbon emissions;
  • Documentation frameworks for financial and environmental evaluation costs;
  • Data and content moderation labor and well-being;
  • Infrastructure for evaluations, including evaluation harnesses and tooling.

3. Second-Order and Community-Centered Impacts

  • Second-order harms, including representational, economic, and cultural impacts;
  • Linguistic and cultural values in evaluation design;
  • Community-centered and participatory evaluation approaches;
  • Disparate performance and privacy-aware evaluation methods;
  • Economic impacts and labor market effects;
  • Ecosystem and environmental impacts;
  • Widening resource gaps.

Submission Guidelines

We welcome two types of papers:

  • Regular papers: Full-length research papers of 6–8 pages (excluding references and supplementary material);
  • Tiny papers: Short research, position, or “provocation” papers up to 2 pages (excluding references and supplementary material).

We welcome the following submission formats:

  • Extended abstracts that are short but complete research papers presenting original or interesting results;
  • Provocations that introduce novel perspectives or challenge conventional wisdom around broader social impact evaluation for generative AI.

Accepted papers will be presented as posters, with a subset selected for spotlight oral presentation. There is no preference between regular and tiny paper submissions; all papers will be assessed based on their relevance to the workshop themes.

Papers should be submitted in the conference-provided format. The review process will be double-blinded; therefore, all identifying information must be removed from submissions. Submissions may include unpublished work as well as previously published or accepted papers, provided they do not violate dual-submission policies. Contributions are non-archival, and accepted papers will be made publicly available.

Important Dates

  • Submission Deadline: March 12th, 2026
  • Notification of Acceptance: April 28, 2026
  • Camera-ready paper due: May 14, 2026
  • Pre-recorded video due (hard deadline): June 4, 2026
  • Workshop at ACL in San Diego: July 3-4, 2026

All deadlines are specified in Anywhere on Earth (AoE).

Submission Site

Submissions may be made via OpenReview

Reviewer Recruitment

To support a fair, high-quality, and sustainable review process, we adopt a reciprocal reviewer recruitment expectation. Authors submitting papers to the workshop will be expected to serve as reviewers, helping ensure sufficient reviewing capacity and timely feedback for all submissions. This expectation must be clearly indicated during paper submission through the OpenReview system.

Program Chairs

  • Mubashara Akhtar, ETH Zurich & ETH AI Center
  • Jan Batzner, Weizenbaum Institute, Technical University Munich
  • Leshem Choshen, MIT, IBM Research, MIT-IBM Watson AI Lab
  • Avijit Ghosh, Hugging Face
  • Usman Gohar, Iowa State University
  • Jennifer Mickel, Eleuther AI
  • Ichhya Pant, Independent
  • Zeerak Talat, University of Edinburgh

Organizers

EvalEval Coalition

Workshop Contact

  • Ichhya Pant: ipant@gwu.edu
  • Jennifer Mickel: jamickel@utexas.edu
  • Jan Batzner: jan.batzner@weizenbaum-institut.de