Authors
Joelle N Eaves, Angeline A Needs, Daniel R Woldring
Published in
Journal of chemical information and modeling. Jun 24, 2026. Epub Jun 24, 2026.
Abstract
Protein-ligand binding affinity prediction (PLBAP) models are routinely benchmarked on the CASF-2016 data set with Pearson correlation coefficient (PCC) as a common measure of scoring power. Published PCC values are frequently reused as baselines for cross-study comparisons. This practice implicitly assumes that published pipelines remain runnable and that reported metrics can be independently verified. To examine this assumption, we conducted a systematic reproducibility audit of 50 PLBAP models published between 2021 and 2024 that reported CASF-2016 scoring power. For each model, we attempted to reproduce the authors' CASF-2016 inference using only publicly available code, documentation, and pretrained weights. To scaffold this audit and to offer a reusable resource for the community, we introduce a minimal five-item reproducibility checklist for PLBAP pipelines, organized around the artifacts a researcher requires to independently rerun inference: (1) a license; (2) preprocessing and featurization, (3) training, and (4) inference code; and (5) pretrained model weights. We find that only 17/50 pipelines satisfied all checklist items to be consistently runnable. Of those 17 runnable models, only nine were statistically reproducible (53% of models). We propose the checklist as a lightweight community standard for future PLBAP releases, document common gaps, and highlight practices that most reliably enabled independent reproduction.
PMID:
42341287
Bibliographic data and abstract were imported from PubMed on 25 Jun 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 9
- Comments 0