Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Publication time valid prediction of citation risk outcomes in a bounded clinical specialty literature corpus.

Created on 03 Jul 2026

Authors

Sunny Chung, Charles Kahi, Siddharth Singh

Published in

Research square. Jun 26, 2026. Epub Jun 26, 2026.

Abstract

Citation-prediction studies often estimate citation counts using information unavailable at publication. We evaluated whether citation-risk outcomes can be predicted using only publication-time information: metadata, references, author history, and text available on or before publication. We assembled 9,424 original-research articles published from 2017 to 2022 across seven clinical gastroenterology journals using OpenAlex and PubMed. The primary reference-observed cohort included 8,409 articles with a parsed reference list. The primary outcome was ≤ 3 citations within 2 years; secondary outcomes were 0 citations within 3 years, ≤ 3 citations within 3 years, and > 20 citations within 2 years. Models compared a nonsemantic citation/reference/context baseline, author-history variables, whole-document title/abstract embeddings, role-segmented source-text embeddings, and reference-context distributional features. Evaluation used two held-out publication-year folds with PR-AUC, or area under the precision-recall curve, F1, and precision among the top-ranked 10% of predictions. For the primary outcome, the nonsemantic baseline achieved PR-AUC 0.818, F1 0.722, and precision@10% 0.935. Adding whole-document embeddings improved performance to 0.828, 0.735, and 0.962, respectively. Structure-aware features did not improve the primary outcome but provided endpoint-specific gains for secondary outcomes. Author-history features showed standalone signal but did not improve the baseline. Pooled performance exceeded journal-local performance, indicating that citation-risk signal operated at the corpus level. These findings support publication-time-valid citation-risk modeling as a reproducible framework for studying evidence visibility within bounded literatures and motivate replication across other journal sets, specialties, and publication eras.

PMID:
42396484
Bibliographic data and abstract were imported from PubMed on 03 Jul 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 1
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement