Authors
Claudia Alvarez-Carreño, Anton S Petrov, Vaishali P Waman, Ian Sillitoe, Christine Orengo
Published in
Bioinformatics (Oxford, England). Jul 03, 2026. Epub Jul 03, 2026.
Abstract
The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods. We used these annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH). TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships.
TEDLH links 934,186 domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships. Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships. These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity.HMM-HMM comparisons within CATH superfamily 3.30.70.100 illustrate how transitive relationships expand sequence coverage. In this superfamily, 5,640 TEDLH HMMs are connected to 173 CATH-PDB representatives. Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score ≥0.6) and a conserved two-layer α/β sandwich core fold.All-against-all HMM-HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits). At low query coverage (<50%), cross-hits are more frequent between CATH classes, architectures and topologies, whereas at higher coverage thresholds (≥70%) they predominantly occur between superfamilies. These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation. As an example, analysis of cross-hits between superfamilies 2.170.130.30 and 3.10.20.30 reveals evolutionary relationships between these groups.
TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.org/10.6084/m9.figshare.28531754 for local use.
Supplementary data are available at Bioinformatics online.
PMID:
42398027
Bibliographic data and abstract were imported from PubMed on 04 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 4
- Comments 0