Authors
Anran Li, Lingfei Qian, Mengmeng Du, Yu Yin, Yan Hu, Zihao Sun, Yihang Fu, Hyunjae Kim, Erica Stutz, Xuguang Ai, Qianqian Xie, Rui Zhu, Jimin Huang, Yifan Yang, Siru Liu, Yih-Chung Tham, Lucila Ohno-Machado, Hyunghoon Cho, Zhiyong Lu, Hua Xu, Qingyu Chen
Published in
Nature communications. Jun 19, 2026. Epub Jun 19, 2026.
Abstract
Large Language Models (LLMs) have demonstrated significant potential in medicine, with many studies adapting them through continued pretraining or fine-tuning on medical data. However, a key question remains: to what extent do LLMs memorize medical training data-that is, recall or regenerate content seen during continued pretraining or fine-tuning. In this work, we investigate memorization of LLMs in medicine, assessing its prevalence (frequency), characteristics (what is memorized), volume (how much), and potential downstream impacts. We systematically analyze common adaptation scenarios: (1) continued pretraining on medical corpora, (2) fine-tuning on standard medical benchmarks, and (3) fine-tuning on real-world clinical data, including over 13,000 unique inpatient records from Yale New Haven Health System. The results demonstrate that memorization is prevalent and significantly higher than that in the general domain. Memorization has distinct characteristics during continued pretraining and fine-tuning, and it is persistent: up to 87% of content memorized during continued pretraining remains after fine-tuning. Memorization can be categorized into three types: beneficial (e.g., accurate recall of clinical guidelines), uninformative (e.g., templated language), and harmful (e.g., sensitive clinical content). We offer practical recommendations to facilitate beneficial memorization, minimize uninformative memorization, and mitigate harmful memorization to protect patient privacy and improve medical utility.
PMID:
42315854
Bibliographic data and abstract were imported from PubMed on 19 Jun 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 1
- Comments 0