Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Atacformer: A transformer-based foundation model for analysis and interpretation of ATAC-seq data

Created on 05 Nov 2025

Authors

LeRoy, N. J., Zheng, G., Khoroshevskyi, O., Campbell, D. R., Zhang, A., Sheffield, N. C.

Abstract

Introduction: Chromatin accessibility profiling is an important tool for understanding gene regulation and cellular function. While public repositories house nearly 10,000 scATAC-seq experiments, unifying this data for meaningful analysis remains challenging. Existing tools struggle with the scale and complexity of scATAC-seq datasets, limiting tasks like clustering, cell-type annotation, and reference mapping. A promising solution is using foundation models adapted to specific tasks via transfer learning. While transfer learning has been applied to scRNA-seq, its potential for scATAC-seq remains underexplored. **Methods**: We introduce Atacformer, a transformer-based foundation model for scATAC-seq data analysis. Unlike other models that only produce cell-level representations, Atacformer generates embeddings for individual cis-regulatory elements. Pre-trained on a large atlas of scATAC-seq experiments, Atacformer learns robust representations of genomic regulatory regions for downstream use. After pretraining, the model is fine-tuned for cell-type prediction and batch correction. We also integrated Atacformer with RNA-seq data to build a Contrastive RNA-ATAC Fine Tuning (CRAFT) model capable of cross-modal alignment and RNA imputation from ATAC data. Results: Atacformer matches or exceeds leading scATAC-seq clustering tools in adjusted rand index and runtime, with fine-tuned models achieving top performance across datasets. It processes raw fragment files end-to-end 80% faster than existing tools while preserving biological structure. Fine-tuned on bulk BED files, it recovers cell type and assay labels with >80% accuracy. We show how the Atacformer architecture produces contextualized embeddings of individual genomic regions, which we use to identify unannotated, cell-type-specific promoter elements directly from chromatin accessibility data.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 05 Nov 2025.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 72
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement