Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Combining knowledge distillation and neural networks to predict protein secondary structure.

Created on 01 Sep 2025

Authors

Lufei Zhao, Jingyi Li, Biao Zhang, Xuchu Jiang

Published in

Scientific reports. Volume 15. Issue 1. Pages 32031. Aug 31, 2025. Epub Aug 31, 2025.

Abstract

The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein's 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency. This study proposes a combined model, ITBM-KD, which integrates an improved temporal convolutional network (TCN), bidirectional recurrent neural network (BiRNN), and multilayer perceptron (MLP) to increase the accuracy of protein secondary structure prediction for octapeptides and tripeptides. By combining one-hot encoding, word vector representation of physicochemical properties, and knowledge distillation with the ProtT5 model, the proposed model achieves excellent performance on multiple datasets. To evaluate its effectiveness, two classic datasets, TS115 and CB513, containing 115 and 513 protein datasets, respectively, were used. In addition, 15,078 protein data points collected from the PDB database from June 6, 2018, to June 6, 2020, were used to further verify the robustness and generalizability of the model. This study improves prediction accuracy and provides an essential model for understanding protein structure and function, especially in resource-limited settings.

PMID:
40887463
Bibliographic data and abstract were imported from PubMed on 01 Sep 2025.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 23
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement