Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

UTRGen: A unified framework for full-spectrum design of mRNA 5' UTRs

Created on 29 Jun 2026

Authors

Wang, Z., Chen, M., Zhu, X., Fang, X., Cheng, Z., Lang, M., Zhang, J., Huang, J., Li, X.

Abstract

The 5' untranslated region (5' UTR) is a key regulatory element that governs mRNA translation and protein output. However, existing computational methods typically address isolated tasks such as functional prediction or sequence optimization, limiting their ability to support rational design across the full 5' UTR engineering workflow. Here, we present UTRGen, a unified modeling framework for 5' UTRs that integrates sequence generation, multi-property prediction, and constrained function-guided design. UTRGen is pre-trained autoregressively on large-scale 5' UTR datasets from multiple species and subsequently adapted to diverse downstream regulatory tasks. Across systematic evaluations, UTRGen generates novel and diverse 5' UTRs while preserving sequence, structural, and functional characteristics of natural UTRs. After task-specific fine-tuning, UTRGen achieves state-of-the-art performance across 14 benchmark datasets, improving translation efficiency prediction by up to 11.1%, expression level prediction by up to 13.2%, and mean ribosome load prediction by up to 3.0% relative to the strongest baselines. It also achieved the best overall performance for internal ribosome entry site identification. To enable controllable design, we formulate function-guided 5' UTR design as a GRPO-based refinement process over a pre-trained autoregressive sequence prior, using composite rewards to encode functional objectives and biological constraints while regularizing toward the natural 5' UTR distribution. The resulting sequences show consistently improved predicted translation efficiency and expression levels across cellular contexts, and reveal interpretable sequence features associated with high activity, including reduced C content, fewer upstream AUGs, and depletion of inhibitory motifs. Together, our results establish a unified modeling strategy for 5' UTR design and lay a foundation for programmable control of translation.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 29 Jun 2026.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 5
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement