Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

A complete-genome view of phylum Omnitrophota and a multi-order capacity for very long proteins

Created on 10 Jun 2026

Authors

Nielsen, T. N., Lui, L. M.

Abstract

Phylum Omnitrophota (formerly candidate division OP3) is represented in public databases almost entirely by metagenome-assembled genomes; GTDB R232 contains two complete Omnitrophota assemblies. We present 176 complete and 53 high-quality Omnitrophota genomes from Oxford Nanopore metagenomes of Fennoscandian deep groundwater and the Baltic Sea water column, an 88-fold expansion of the complete Omnitrophota count. The 229 genomes resolve to 202 distinct species at 95% ANI; 162 of these have no conspecific match in the 714 NCBI HQ Omnitrophota MAGs. Phylogenomically, 171 of the 176 complete genomes fall in class Gorgyraeia - which contains the cultured episymbiont Velamenicoccus archaeovorus - and 5 in Omnitrophia, with multiple Gorgyraeia orders and families represented. Phylum Omnitrophota hosts many very long proteins, with the longest in our corpus reaching 147,155 AA; the long end of the length distribution is concentrated on Gorgyraeia contigs across multiple Gorgyraeia orders. 24-28% of GTDB-Tk-classified Omnitrophota contigs in the deep-groundwater and Baltic samples host at least one protein of 10 kAA or longer. Across the 916-protein long-protein domain-architecture catalog, 94% carry transmembrane helices or a signal peptide; the four complete-genome proteins above 100,000 amino acids are all in inner-membrane-anchored architectures, the 147,155-AA protein with 147 TM helices. The 176 complete genomes share a uniform metabolic profile across the dominant orders: intact bacterial peptidoglycan biosynthesis alongside fragmentary TCA, incomplete electron transport, absent aerobic terminal oxidase, and partial cofactor and amino-acid biosynthesis. The profile matches the cultured V. archaeovorus phenotype and is consistent with a host-dependent episymbiotic lifestyle. Hypervariable-region calling across the 229 chromosomes returns 1,909 candidate loci, distributed across 223 of them; ribosomal-protein and EF-Tu/EF-G content sits inside called HVRs on 150 of those 223 (67%), recovering across the collection the housekeeping-cargo integrations documented in Nielsen (2026b). All genomes, the OrthoFinder supermatrix and its ML tree, the 916-protein giant-protein domain-architecture catalog, and per-step scripts are released as a community resource at Zenodo (DOI [TBD]).

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 10 Jun 2026.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 12
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement