Authors
Senlong Hou, Jiyu Jiang, Xue Li, Mingze Yang, Qilin Chen, Xueyan Ma, Xiaohong Gu
Published in
Journal of medical Internet research. Volume 28. Pages e94775. Jul 02, 2026. Epub Jul 02, 2026.
Abstract
Health problems associated with energy metabolism imbalance have received increasing attention. Our team was the first to propose the concept of gastrointestinal heat retention syndrome (GHRS). Triggered by high-calorie diets and excessive energy intake, GHRS is strongly associated with recurrent respiratory infections and other pediatric conditions in children. Existing research on GHRS has primarily focused on risk factor analyses, and no interpretable exploratory self-assessment model for estimating prevalence probability is currently available for routine household use.
This study aims to develop and validate an interpretable machine learning model to help caregivers conduct exploratory self-assessments of the probability that children meet the criteria for pediatric GHRS and to provide a practical assessment tool for household use.
This study conducted a questionnaire survey of kindergarten children in Longgang District, Shenzhen, China. Samples with missing information on GHRS-related symptoms and signs were excluded. Independent correlates were identified using univariate logistic analysis, collinearity testing, Least Absolute Shrinkage and Selection Operator regression, and multivariable logistic regression. After handling missing data and preprocessing, the dataset was randomly divided into training and test sets in an 8:2 ratio. SMOTETomek (Synthetic Minority Oversampling Technique Tomek Links) was included as a conservative resampling step in the training pipeline, and optimal features were selected using a combination of Pearson correlation analysis and recursive feature elimination. In addition, a Random Forest (RF) sensitivity analysis without SMOTETomek was performed, and the minimum required sample size was estimated according to the events per variable (EPV) rule. Seven machine learning models were developed and evaluated in terms of discrimination, calibration, and clinical utility. The SHAP (Shapley Additive Explanations) method was used to interpret the optimal model, which was subsequently deployed online using the Streamlit framework.
A total of 120,198 questionnaires were collected, of which 108,447 were deemed valid and included in the analysis. The ratio of GHRS-positive to GHRS-negative cases was nearly balanced. The study identified 59 independent correlates of GHRS, including 10 protective factors and 49 risk factors. After data cleaning, a complete-case dataset of 49,798 participants was obtained, exceeding the minimum sample size requirement of 3747 cases based on the EPV rule. During internal validation, the RF model demonstrated acceptable discriminatory performance, stable calibration, and high net benefit in decision curve analysis, and was therefore selected as the primary analytical model. SHAP analysis identified 5 key predictive features. The resulting online tool collects information through 75 single-choice questions and automatically provides an estimated probability of GHRS together with lifestyle recommendations.
Using cross-sectional data, this study developed and validated an interpretable model that enables caregivers to perform exploratory self-assessments of the probability that children meet the study-specific GHRS scale criteria. As a household self-screening tool, the model helps caregivers estimate this probability based on readily obtainable information and model-generated risk estimates.
PMID:
42392592
Bibliographic data and abstract were imported from PubMed on 03 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 6
- Comments 0