Authors
Kenji Gerhardt, Carlos A Ruiz-Perez, Luis M Rodriguez-R, Chirag Jain, James M Tiedje, James R Cole, Konstantinos T Konstantinidis
Published in
Nucleic acids research. Volume 53. Issue 8. Apr 22, 2025.
Abstract
Estimation of whole-genome relatedness and taxonomic identification are two important bioinformatics tasks in describing environmental or clinical microbiomes. The genome-aggregate Average Nucleotide Identity is routinely used to derive the relatedness of closely related (species level) microbial and viral genomes, but it is not appropriate for more divergent genomes. Average Amino-acid Identity (AAI) can be used in the latter cases, but no current AAI implementation can efficiently compare thousands of genomes. Here we present FastAAI, a tool that estimates whole-genome pairwise relatedness using shared tetramers of universal proteins in a matter of microseconds, providing a speedup of up to 5 orders of magnitude when compared with current methods for calculating AAI or alternative whole-genome metrics. Further, FastAAI resolves distantly related genomes related at the phylum level with comparable accuracy to the phylogeny of ribosomal RNA genes, substantially improving on a known limitation of current AAI implementations. Our analysis of the resulting AAI matrices also indicated that bacterial lineages predominantly evolve gradually, rather than showing bursts of diversification, and that AAI thresholds to define classes, orders, and families are generally elusive. Therefore, FastAAI uniquely expands the toolbox for microbiome analysis and allows it to scale to millions of genomes.
PMID:
40287826
Bibliographic data and abstract were imported from PubMed on 27 Apr 2025.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 27
- Comments 0