Homology-aware cross-validation strategies for generalization assessment in RNA structure prediction

Authors

Bugnon, L., Kulemeyer, G., Gerard, M., Di Persia, L., Stegmayer, G., Milone, D. H.

Abstract

RNA secondary structure prediction is a fundamental challenge in bioinformatics, essential for understanding the functional roles of non-coding RNAs. Recently, deep learning models have transformed the field with impressive results, leading to critical discussions regarding the validity of current cross-validation strategies. On the one hand, traditional random partitioning yields overop-timistic results due to data leakage from uncontrolled homology. On the other hand, removing from the training set all sequences that exhibit even the slightest resemblance to the testing sequences penalizes learning-based methods by requiring generalization to completely out-of-distribution sequences. While it is very simple to remove sequences and retrain a machine learned model, it is very difficult to remove the experimental data used for parameter tuning and the sequences used for the development of classical thermodynamic methods. Thus, these methods often benefit from an implicit knowledge leakage. In this work we critically review existing cross-validation strategies for RNA secondary structure prediction: random splitting, clustering-based splitting, and leaving one RNA family out for testing. We analyze the advantages and limitations of each strategy, also expanding them towards the future directions to ensure fair comparisons across the full range of sequence similarities, with the same rigor for both classical and learning-based methods.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 01 Jul 2026.

Sign up!

Did you like this preprint? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 3
Comments 0

Comments

There are no comments yet.

Authors

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments