Resolution of recursive data corruption to transform T-cell epitope discovery

Authors

Preibisch, G., Tyrolski, M., Kucharski, P., Gizinski, S., Grzegorczyk, P., Moon, S., Kim, S., Zaro, B., Gambin, A.

Abstract

Accurate prediction of MHC class~I-presented peptides is essential for any vaccine or T-cell therapy design, yet reported gains on in silico benchmarks have not translated into clinical successes. We show that this discrepancy comes from a methodological error: immunopeptidomics datasets are fundamentally contaminated by existing prediction models through prediction-based deconvolution and filtering - an iterative confirmation bias. An audit of the IEDB, the biggest database in the field, reveals that textbf{over 70%} of published data was labeled by computational models rather than verified experimentally. This inflates in silico benchmarks while textbf{destroying real-world applicability on new data, effectively making it impossible to design new therapies.} In silico simulation shows that iterative data corruption maintains high AUROC while top-of-list retrieval collapses. We reframe epitope discovery as a protein-centric learning-to-rank task and introduce deepMHCflare, a model evaluated exclusively on clean data. deepMHCflare achieves 0.80 Precision@4 on mono-allelic benchmarks versus 0.55-0.65 for gold-standard prediction models. Prospective, head-to-head in vivo tests further confirm this: in a preclinical cancer vaccine study, deepMHCflare identified two of four immunogenic peptides versus none of four for the field standard.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 02 Apr 2026.

Sign up!

Did you like this preprint? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 17
Comments 0

Comments

There are no comments yet.