Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

ESGI: Efficient splitting of generic indices in single-cellsequencing data

Created on 07 Mar 2026

Authors

Stohn, T., van de Brug, N. D., Theodosiadou, A., Thijssen, B., Jastrzebski, K., Wessels, L. F. A., Bosdriesz, E.

Abstract

Single-cell sequencing technologies increasingly rely on complex nucleotide barcoding schemes to encode cellular identities, experimental conditions, and multiple molecular modalities within a single experiment. While demultiplexing, alignment, and UMI-based quantification form the core preprocessing steps that transform raw sequencing reads into analyzable single-cell data, existing pipelines are often tightly coupled to specific experimental designs and typically assume fixed barcode positions and substitution-only error models. As a result, many emerging assays employing combinatorial, variable-length, or multimodal barcoding designs require custom, hard-coded preprocessing solutions that are difficult to generalize and maintain. Here, we present ESGI (Efficient Splitting of Generic Indices), a flexible and extendable framework for demultiplexing and processing single-cell sequencing data with arbitrary barcode architectures. ESGI operates directly on raw FASTQ files using a generic barcode pattern specification, supports barcode matching with insertions and deletions via Levenshtein distance, accommodates variable-length barcodes, and provides detailed quality metrics for barcode assignment. ESGI optionally integrates genome alignment via STAR and performs feature quantification and UMI collapsing to generate cell-by-feature count matrices. ESGI is well documented and readily applicable to novel single-cell experiments. We demonstrate the versatility of ESGI across six datasets spanning four distinct single-cell technologies, including combinatorial indexing-based transcriptomic and multimodal assays, feature barcode-based protein measurements, and spatial barcoding data. Across these applications, ESGI robustly demultiplexes complex barcode designs that are not natively supported by existing pipelines, while producing results comparable to established workflows where applicable. Together, ESGI provides a general and future-proof solution for preprocessing single-cell sequencing data, enabling rapid adoption and analysis of emerging experimental designs.

Preprint server: bioRxiv
The authors list and abstract were imported from bioRxiv on 07 Mar 2026.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this preprint? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 11
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement