Authors
Gupta, R.
Abstract
Recombinant Protein Production enables scientists to insert custom DNA into host cells to produce specific proteins. Current industry standard tools to optimize recombinant DNA use the Codon Adaptation Index (CAI), yet doing this changes local mRNA secondary structure and Minimum Free Energy (MFE) at the translation initiation region, creating hairpins that block ribosome loading and initiation, thus limiting protein production. mRNA secondary structure around the start codon (disrupting ribosome docking/initiation) is the true rate limiting barrier to protein synthesis, and its 22x more correlated to protein production than CAI. This project introduces TOM, a novel Transformer deep learning model that optimizes mRNA secondary structure at the critical translation initiation region. Over 40 million natural E. Coli initiation sequences were downloaded, where through a rigorous data filtration effort, only 10,000 ideal, non redundant, and naturally occurring sequences were used to train the model. TOM is benchmarked on MFE, adenine count, codon usage, and a negative element analysis against industry standard optimizers and significantly outperforms. TOMs improvement of the MFE at the initiation region offers a significant increase in protein production by addressing the rate-limiting step of translation initiation.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 04 Nov 2025.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 31
- Comments 0