Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Predicting High-Risk Colorectal Polyps in African Americans Using Pre-colonoscopy Clinical Features: Machine Learning Model Development and Temporal Validation.

Created on 24 Jun 2026

Authors

Basheer Qolomany, Mrinalini Deverapall, Adeyinka Laiyemo, Zaki Sherif, Mori Yuichi, Omer Ahmed, Hassan Brim, Hassan Ashktorab

Published in

Digestive diseases and sciences. Jun 23, 2026. Epub Jun 23, 2026.

Abstract

Risk stratification for advanced colorectal polyps typically relies on colonoscopy and/or pathology findings. However, there is growing interest in whether noninvasive features available prior to colonoscopy can help identify patients at higher risk. Such approaches may enhance clinical decision-making by prioritizing surveillance for individuals most likely to harbor high-risk polyps, when colonoscopy resources are limited while potentially reducing unnecessary procedures in lower-risk patients. Importantly, the use of noninvasive, pre-procedural information may also help promote more equitable access to risk stratification, particularly in settings where colonoscopy resources are limited or unevenly distributed. We aimed to develop and externally validate machine learning models to predict high-risk colorectal polyps using only noninvasive, pre-colonoscopy demographic, clinical, and behavioral features in a diverse, predominantly African American, urban cohort.
We conducted a retrospective cohort study using demographic, lifestyle, and comorbidity data from patients who underwent colonoscopy at Howard University Hospital to develop and validate several machine learning models, including neural networks, random forest, support vector machines (SVM), Naïve Bayes, logistic regression, decision trees, k-nearest neighbors (KNN), and XGBoost, for predicting high-risk colorectal polyps. High-risk polyps (HRP) were defined as villous or tubullovillous adenomas, high-grade dysplasia, polyps 10 mm in size, and/or the presence of 3 polyps per procedure; all other cases were classified as low-risk polyps (LRP). The dataset included 4,681 patients from 2015 to 2022 used for internal validation and 1,562 patients from 2023 to 2024 used for external validation. Model performance was evaluated using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall area under the curve (PR-AUC), accuracy, precision, recall, and F1 score. Model interpretability and feature contribution were assessed using SHapley Additive exPlanations (SHAP).
Overall predictive performance was moderate using noninvasive pre-colonoscopy features. The neural network demonstrated the strongest overall discrimination, achieving the highest internal validation performance (ROC-AUC 0.78, PR-AUC 0.75, accuracy 0.72), but showed reduced performance in the external cohort (ROC-AUC 0.67, accuracy 0.66), suggesting potential overfitting or temporal feature drift. In contrast, simpler models including Naïve Bayes, SVM, and XGBoost exhibited lower internal performance (ROC-AUC 0.54-0.59) but more stable generalization to the external cohort (ROC-AUC 0.52-0.63; accuracy approximately 0.53-0.60). Model interpretability analysis using SHAP identified age, smoking status, sex, occupation, race, colonoscopy indication, and family history of colorectal cancer as the most influential predictors, highlighting contributions from both traditional clinical and sociodemographic factors.
Prediction of HRP using routine pre-colonoscopy data is feasible but demonstrates limited generalizability across cohorts. These findings highlight the clinical potential and limitations of pre-procedural risk modeling, especially in diverse, underserved populations. Integration of additional data modalities may be required to achieve clinically robust and equitable prediction tools.

PMID:
42337207
Bibliographic data and abstract were imported from PubMed on 24 Jun 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 4
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement