Authors
Dai, C., Gabriels, R., Bouwmeester, R., Larrea, A., Scheid, J., Webel, H., He, F., Martens, L., Kohlbacher, O., Bai, M., Xie, L., Sachsenberg, T., Perez-Riverol, Y.
Abstract
The growing volume of public proteomics datasets and the advent of novel machine learning (ML)-based methods create unprecedented opportunities for discovery through large-scale reanalysis. However, traditional desktop tools are increasingly insufficient for processing and integrating data at this scale. To address this challenge, we present a novel package, quantms-rescoring, that extends the cloud-native quantms workflow with a machine learning-based rescoring module. Unlike prior tools that rescore single-engine outputs, quantms-rescoring seamlessly integrates multiple search engines (SAGE, COMET, and MSGF+), performs automatic model selection, model fine-tuning, and scales reproducibly on cloud infrastructures. In quantms-rescoring, we rely on multiple fragment-ion intensity (AlphaPeptDeep and MS2PIP) and retention-time prediction (DeepLC) methods to improve results from multiple peptide database search engines. It features automatic model selection, fine-tuning, and retraining for MS/MS intensity and retention time prediction to select the best model for a given dataset. We applied the novel workflow to five representative datasets spanning DDA label-free quantification, TMT 10-plex isobaric labelling of tumor proteomics data, immunopeptidomics, phospho-proteomics, and unseen lysine malonylation experiments. We achieved a 16-22.8% increase in identified spectra, along with the quantification of 2191 additional phosphorylated peptides and 1337 phosphosites. In the tandem mass tag (TMT)-labeled clear cell renal cell carcinoma dataset, 76 novel differentially expressed multiple search engines identified proteins with quantms-rescoring. Additionally, novel 11,688 HLA-II potential binders were detected in the immunopeptidomics dataset by multiple search engines with quantms-rescoring. For unseen malonylation data, we reported more than 58.8% malonylation PSMs and 30.5% modification sites than COMET alone. Together, these results show that integrating multi-engine searches with machine learning-derived features can be combined in a scalable workflow that enhances identification, PTM localization, and quantification performance.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 13 Jan 2026.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 20
- Comments 0