Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Baktfold: Sensitive protein functional annotation across the microbial tree of life using structural information

Created on 03 Apr 2026

Authors

Bouras, G., Lim, S. w., Durr, L., Vreugde, S., Goesmann, A., Edwards, R. A., Schwengers, O.

Abstract

The functional annotation of protein sequences has undergone tremendous progress over recent years, but still too-many protein sequences remain as so-called hypothetical proteins after applying state-of-the-art genome annotation software pipelines. Here, we introduce Baktfold, a new command line software tool for the ultra-sensitive but taxon-independent fast annotation of protein sequences across the microbial tree of life. Baktfold conducts sequential protein structure-based searches against four complementary structure databases. Protein sequences are transformed into Foldseek 3Di tokens via the ProstT5 protein language model and subsequently searched against structure databases via Foldseek. All results are exported in GFF3 and INSDC-compliant flat files as well as comprehensive JSON files facilitating automated downstream analysis 100% interoperable with the popular bacterial annotation tool Bakta. We compared Baktfold's performance in terms of wallclock runtime and functional annotation of hypothetical proteins from various sources including bacterial and archaeal isolates, plasmids, metagenomic-assembled genomes and micro-eukaryotes. When benchmarked on over three hundred thousand species representatives across the prokaryotic tree of life, Baktfold;s median overall bacterial genome annotation rate is 87.8% compared to 72.9% with Bakta, while Baktfold's median bacterial annotation rate of remaining hypothetical proteins is 50.1% (n=290258). For archaea, Baktfold's overall median annotation rate is 71.5% compared to Prokka's 35.8%, with a median archaeal annotation rate of hypothetical proteins of 68.0% (n=14058), making Baktfold the most sensitive automated archaeal annotation method by far. Baktfold is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a MIT license at https://github.com/gbouras13/baktfold.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 03 Apr 2026.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 12
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement