Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Analysis of phylogenetic signal in protein language model embeddings.

Created on 18 Jun 2026

Authors

Brendonas Stakauskas, Paweł Górecki

Published in

Scientific reports. Jun 17, 2026. Epub Jun 17, 2026.

Abstract

Protein language models learn high-dimensional representations of amino acid sequences that capture structural, functional and evolutionary information without explicit modeling. In this study, we examine whether distances derived from such representations can be used for phylogenetic tree inference in a zero-shot setting. Using protein families from the PANTHER database and simulated datasets with controlled evolutionary parameters, we compare trees inferred from protein language model embedding distances to trees inferred using classical phylogenetic analysis techniques and to a transformer-based distance predictor trained under explicit evolutionary models. We show that in the zero-shot setting phylogenetic signal is largely lost when sequences are represented by one fixed-sized vector, resulting in poor recovery of tree topology and branch lengths. Accumulating distances across aligned residue-level embeddings substantially improves topological accuracy, particularly for MSA-aware models, and can even match the performance of models specifically trained to infer distances for tree inference. However, distances in protein language model embedding space do not reliably reproduce evolutionary branch lengths.

PMID:
42310357
Bibliographic data and abstract were imported from PubMed on 18 Jun 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 2
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement