Authors
Prudencio Tossou, Cas Wognum, Michael Craig, Hadrien Mary, Emmanuel Noutahi
Published in
Journal of chemical information and modeling. Feb 01, 2024. Epub Feb 01, 2024.
Abstract
This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.
PMID:
38300258
Bibliographic data and abstract were imported from PubMed on 01 Feb 2024.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 56
- Comments 0