Cochrane Evaluation of (Semi-) Automated Review Methods (CESAR): Protocol for an adaptive platform study within reviews.

Authors

Gerald Gartlehner, Susan Banda, Max Callaghan, Jo-Ana Chase, Andreea Dobrescu, Angelika Eisele-Metzger, Ella Flemyng, Sean Gardner, Ursula Griebler, Bartosz Helfer, Pawel Jemiolo, Biljana Macura, Jan C Minx, Anna Noel-Storr, Noosheen Rajabzadeh Tahmasebi, Amin Sharifan, Joerg J Meerpohl, James Thomas

Published in

Journal of clinical epidemiology. Pages 112390. Jun 19, 2026. Epub Jun 19, 2026.

Abstract

Artificial intelligence (AI) has the potential to improve the efficiency of evidence synthesis and reduce human error. However, robust methods for evaluating rapidly evolving AI tools within the practical workflows of evidence synthesis remain underdeveloped. This protocol describes a study design for assessing the effectiveness, efficiency, and usability of AI tools in comparison to traditional human-only workflows in the context of Cochrane systematic reviews.
Members of the Cochrane Evaluation of (Semi-) Automated Review Methods (CESAR) project developed an adaptive platform study-within-a-review (SWAR) design, modeled after clinical platform trials. This design employs a master protocol to concurrently evaluate multiple AI tools (interventions) against a standard human-only process (control) across three key review tasks: title and abstract screening, full-text screening, and data extraction. The adaptive framework allows for the addition or removal of AI tools based on interim performance analyses without necessitating a restart of the study. Performance will be assessed using metrics such as accuracy (sensitivity, specificity, precision), efficiency (time on task), response stability, impact of errors, and usability, in alignment with Responsible use of AI in evidence SynthEsis (RAISE) principles.
The study will generate comparative data about the performance and usability of specific AI tools employed in a semi- or fully automated manner relative to standard human effort. The protocol provides a flexible framework for the assessment of AI tools in evidence synthesis, addressing the limitations of static, one-time evaluations.
This study protocol presents a novel methodological approach to addressing the challenges of evaluating AI tools for evidence syntheses. By validating entire workflows rather than individual technologies, the findings will establish an evidence base for determining the viability of integrating AI into evidence-synthesis workflows. The adaptive design of this study is flexible and can be adopted by other investigators, ensuring that the evaluation framework remains relevant as new tools emerge.
Doctors and researchers rely on systematic reviews, which are thorough summaries of all available research on a health topic, to guide decisions about patient care. However, creating these reviews is a slow and demanding process, often taking more than a year to finish. Artificial intelligence (AI) tools could help speed up this work and reduce human errors, but there are currently no reliable ways to test how well these tools perform in real-world settings. This paper describes the design of a study that will rigorously test how well AI tools perform when used in actual systematic review workflows, specifically within Cochrane Reviews. The study will compare AI-assisted methods with the traditional approach, where two trained researchers independently complete each step. It will look at three main tasks: choosing which studies might be relevant based on their titles and abstracts, reading the full-text publication to confirm which studies should be included, and extracting important information from those studies. A key strength of this study is its flexible design. Instead of testing just one AI tool at a single point in time, the study allows researchers to add or remove AI tools as new ones become available, similar to how some modern drug trials are run. This approach helps the study keep up with the fast pace of AI development. Researchers will assess the AI tools based on their accuracy, the time they save, how consistent their results are, and how easy they are to use. The ultimate goal of this study is to give the research community strong evidence about when and how AI can be safely and effectively used in systematic reviews to help summarize medical research.

PMID:
42320766
Bibliographic data and abstract were imported from PubMed on 20 Jun 2026.

Read full publication at:
Please sign in to see all details.

Sign up!

Did you like this publication? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes
Reviewers' rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 3
Comments 0

Comments

There are no comments yet.

Authors

Published in

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments