Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

TEDLH: Domain HMMs for sensitive detection of remote homologues.

Created on 04 Jul 2026

Authors

Claudia Alvarez-Carreño, Anton S Petrov, Vaishali P Waman, Ian Sillitoe, Christine Orengo

Published in

Bioinformatics (Oxford, England). Jul 03, 2026. Epub Jul 03, 2026.

Abstract

The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods. We used these annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH). TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships.
TEDLH links 934,186 domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships. Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships. These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity.HMM-HMM comparisons within CATH superfamily 3.30.70.100 illustrate how transitive relationships expand sequence coverage. In this superfamily, 5,640 TEDLH HMMs are connected to 173 CATH-PDB representatives. Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score ≥0.6) and a conserved two-layer α/β sandwich core fold.All-against-all HMM-HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits). At low query coverage (<50%), cross-hits are more frequent between CATH classes, architectures and topologies, whereas at higher coverage thresholds (≥70%) they predominantly occur between superfamilies. These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation. As an example, analysis of cross-hits between superfamilies 2.170.130.30 and 3.10.20.30 reveals evolutionary relationships between these groups.
TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.org/10.6084/m9.figshare.28531754 for local use.
Supplementary data are available at Bioinformatics online.

PMID:
42398027
Bibliographic data and abstract were imported from PubMed on 04 Jul 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 4
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement