Authors
Qizhe, Z., Zhengyang, Z., Kepeng, L., Wang, J., Kaixuan, D., Xianglei, X., Wei, X., Xuehai, H.
Abstract
High-quality plant genome assemblies are rapidly increasing, but accurate structural annotation remains reliant on transcript and homology evidence, limiting applications in newly sequenced and non-model species. Here, we present PlantGeneAnn, a plant-optimized, strand-specific genome foundation model for ab initio gene structure annotation. Fine-tuned on only nine high-quality model plant annotations, PlantGeneAnn outperformed a multi-species model trained on 42 species, showing that annotation quality is more important than token volume. On a stringent 13-species benchmark covering rosids, asterids, and monocots, PlantGeneAnn surpassed four state-of-the-art baselines across five evaluation levels, from base-level classification to complete transcript recovery. It achieved higher intron precision and better captured complex gene structures. In zero-shot variant effect prediction, PlantGeneAnn identified cryptic splice donors and premature stop codons in maize and rice, with saturation mutagenesis confirming single-nucleotide, context-dependent sensitivity. It also retained generalizability for epigenomic track prediction, highlighting its value for pan-genomics, crop improvement, and non-model plant research.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 27 Jun 2026.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 1
- Comments 0