Authors
Sunny Chung, Charles Kahi, Siddharth Singh
Published in
Research square. Jun 26, 2026. Epub Jun 26, 2026.
Abstract
Citation-prediction studies often estimate citation counts using information unavailable at publication. We evaluated whether citation-risk outcomes can be predicted using only publication-time information: metadata, references, author history, and text available on or before publication. We assembled 9,424 original-research articles published from 2017 to 2022 across seven clinical gastroenterology journals using OpenAlex and PubMed. The primary reference-observed cohort included 8,409 articles with a parsed reference list. The primary outcome was ≤ 3 citations within 2 years; secondary outcomes were 0 citations within 3 years, ≤ 3 citations within 3 years, and > 20 citations within 2 years. Models compared a nonsemantic citation/reference/context baseline, author-history variables, whole-document title/abstract embeddings, role-segmented source-text embeddings, and reference-context distributional features. Evaluation used two held-out publication-year folds with PR-AUC, or area under the precision-recall curve, F1, and precision among the top-ranked 10% of predictions. For the primary outcome, the nonsemantic baseline achieved PR-AUC 0.818, F1 0.722, and precision@10% 0.935. Adding whole-document embeddings improved performance to 0.828, 0.735, and 0.962, respectively. Structure-aware features did not improve the primary outcome but provided endpoint-specific gains for secondary outcomes. Author-history features showed standalone signal but did not improve the baseline. Pooled performance exceeded journal-local performance, indicating that citation-risk signal operated at the corpus level. These findings support publication-time-valid citation-risk modeling as a reproducible framework for studying evidence visibility within bounded literatures and motivate replication across other journal sets, specialties, and publication eras.
PMID:
42396484
Bibliographic data and abstract were imported from PubMed on 03 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 1
- Comments 0