Authors
Naruaki Ogasawara
Published in
Journal of gastrointestinal and liver diseases : JGLD. Volume 35. Issue 2. Pages 181-188. Jun 27, 2026. Epub Jun 27, 2026.
Abstract
Inflammatory bowel disease (IBD) case reports provide rich longitudinal insights but have rarely been analyzed using quantitative text-mining approaches. This study applied unsupervised machine learning to PubMed-indexed IBD case reports to identify long-term thematic structures spanning 60 years and evaluate whether major historical milestones in IBD care can be reconstructed from biomedical texts.
Case reports indexed under the keyword "inflammatory bowel disease" were retrieved from PubMed (1960-2025). Titles, key words, and abstracts were concatenated and preprocessed before TF-IDF vectorization. Non-negative matrix factorization (NMF) was applied to extract latent topics, followed by KMeans clustering using the optimal topic number selected by silhouette evaluation (2-15 topics). Cluster characteristics were summarized using report counts and term frequency-inverse document frequency (TF-IDF) statistics. Top discriminative key words were used to assign data-driven topic labels. All analyses were performed in Python 3.10.5 (PyCharm 2022.1.3) using pandas, numpy, scikit-learn, matplotlib, and seaborn.
A total of 18,458 case reports were analyzed. Across all time periods, two highly stable clusters consistently emerged, corresponding to Crohn's disease and ulcerative colitis. Early decades (1960-1989) emphasized pathology and complication-focused descriptions. Reports from the 1990s showed increasing terminology related to diagnosis and emerging therapies. From 2000 onward, infliximab-related and treatment focused terms predominated, paralleling the rise of biology. After 2010, clusters reflected diversified therapeutic strategies, including attention to extraintestinal manifestations and biologic or small-molecule therapies.
Unsupervised machine learning successfully reconstructed important historical changes in IBD management, demonstrating that a large case report text corpus captures the evolution of clinical concepts and treatment paradigms over 60 years.
PMID:
42365648
Bibliographic data and abstract were imported from PubMed on 29 Jun 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 6
- Comments 0