Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

SOORENA: Self-lOOp containing or autoREgulatory Nodes in biological network Analysis

Created on 05 Nov 2025

Authors

Arar, H., Aldahdooh, J., Nickchi, P., JAFARI, M.

Abstract

Autoregulatory mechanisms, in which proteins modify their own activity or expression, are fundamental components of biological regulatory systems but remain challenging to identify systematically within the scientific literature. Manual curation is outpaced by publication growth, with self-regulation often described implicitly. To address the lack of automated tools for identifying protein autoregulatory mechanisms, we present SOORENA, a two-stage transformer-based model designed to predict and classify such mechanisms within PubMed abstracts. In Stage 1, the model determines whether a publication describes any form of protein autoregulation. In Stage 2, positive instances are further classified into one of seven mechanistic categories: autophosphorylation, autoubiquitination, autocatalytic activity, autoinhibition, autolysis, autoinducer production, and autoregulation. SOORENA was fine-tuned from PubMedBERT using a curated dataset of 1,332 experimentally validated abstracts sourced from UniProt-referenced publications. On a held-out test set, Stage 1 achieved an accuracy of 96.0% and a precision of 97.8%, effectively minimizing false positive propagation. Stage 2 demonstrated robust performance across all classes, with an overall accuracy of 95.5% and a macro-F1 score of 96.2%, including perfect classification for the two least-represented categories. Error analysis revealed that most misclassifications occurred between mechanistically related categories, suggesting that the model's learned representations reflect underlying biological relationships. We deployed SOORENA as a shiny app enabling interactive search, metadata-based filtering, and confidence-weighted prioritization of predictions alongside standardized ontology definitions to support scientific exploration. These results demonstrate that domain-specific language models can scale the discovery and curation of biologically critical self-regulatory mechanisms.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 05 Nov 2025.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 38
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement