Authors
Zijun Nie, Fujian Zheng, Dejun Hu, Zhenzhen Fu, Chongjiang Cao
Published in
Analytical chemistry. Jul 01, 2026. Epub Jul 01, 2026.
Abstract
The annotation of dietary biomarkers is crucial for nutritional epidemiology. While untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is a powerful analytical approach, the annotation of dietary biomarkers is hampered by the low specificity of existing public databases, which limits annotation coverage and accuracy. To address this limitation, we developed a novel database construction strategy and a dual-annotation workflow. We first employed an automated, large language model (LLM)-based text-mining pipeline to parse 7339 scientific articles and supplementary materials, creating the Dietary Metabolite Biomarker Database (DMBDB), which contains 4983 nonredundant biomarkers. The LLMs workflow demonstrated high performance, achieving an F1 score of 0.9269 for biomarker name recognition. Subsequently, two complementary annotation strategies were designed: (i) a specialized LC-MS database derived from DMBDB, incorporating predicted retention times and experimental MS/MS spectra for high-confidence matching, and (ii) a structure-guided molecular networking strategy (SGMNS) that uses DMBDB as background knowledge to annotate dietary biomarkers and their metabolites lacking spectral evidence. The framework was validated using untargeted LC-HRMS analysis of urine samples. LC-MS database directly annotated 566 metabolites, and the integration with SGMNS expanded the total number of annotations to 2078. The LLM-driven database construction combined with the dual-strategy annotation framework provides a powerful paradigm for achieving high-coverage and high-accuracy dietary metabolomics.
PMID:
42384603
Bibliographic data and abstract were imported from PubMed on 02 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 2
- Comments 0