Large Language Model-Generated Dietary Metabolite Biomarker Database Drives Deep Annotation of the Human Diet Metabolome.

Authors

Zijun Nie, Fujian Zheng, Dejun Hu, Zhenzhen Fu, Chongjiang Cao

Published in

Analytical chemistry. Jul 01, 2026. Epub Jul 01, 2026.

Abstract

The annotation of dietary biomarkers is crucial for nutritional epidemiology. While untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is a powerful analytical approach, the annotation of dietary biomarkers is hampered by the low specificity of existing public databases, which limits annotation coverage and accuracy. To address this limitation, we developed a novel database construction strategy and a dual-annotation workflow. We first employed an automated, large language model (LLM)-based text-mining pipeline to parse 7339 scientific articles and supplementary materials, creating the Dietary Metabolite Biomarker Database (DMBDB), which contains 4983 nonredundant biomarkers. The LLMs workflow demonstrated high performance, achieving an F1 score of 0.9269 for biomarker name recognition. Subsequently, two complementary annotation strategies were designed: (i) a specialized LC-MS database derived from DMBDB, incorporating predicted retention times and experimental MS/MS spectra for high-confidence matching, and (ii) a structure-guided molecular networking strategy (SGMNS) that uses DMBDB as background knowledge to annotate dietary biomarkers and their metabolites lacking spectral evidence. The framework was validated using untargeted LC-HRMS analysis of urine samples. LC-MS database directly annotated 566 metabolites, and the integration with SGMNS expanded the total number of annotations to 2078. The LLM-driven database construction combined with the dual-strategy annotation framework provides a powerful paradigm for achieving high-coverage and high-accuracy dietary metabolomics.

PMID:
42384603
Bibliographic data and abstract were imported from PubMed on 02 Jul 2026.

Read full publication at:
Please sign in to see all details.

Sign up!

Did you like this publication? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes
Reviewers' rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 2
Comments 0

Comments

There are no comments yet.

Authors

Published in

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments