Authors
Niu, X., Wang, J., Wan, S.
Abstract
Motivation: Single-cell RNA sequencing (scRNA-seq) is routinely used to build atlases of tissues, resolve developmental trajectories, and characterize disease microenvironments. Yet many biologically and clinically meaningful populations--including transient progenitors, therapy-resistant tumor subclones, and antigen-specific lymphocytes--occur at very low frequencies (<1%) and are easily missed by standard clustering pipelines. Existing approaches often require extensive manual curation, rely on known marker genes, or trade sensitivity for unacceptable false positive rates due to the insensitivity of metrics like the Gini index to heavy-tailed distributions. A scalable, statistically grounded method is needed to sensitively detect rare populations while providing calibrated confidence and interpretable molecular signatures. Results: We present PalmaClust, a graph-fusion clustering framework that repurposes Palma ratio--a tail-sensitive inequality metric in sociology--to identify marker genes driven by extreme sparsity. PalmaClust constructs and fuses multiple K-Nearest Neighbor (KNN) graphs derived from complementary gene-selection statistics including the Palma ratio, Gini index, and Fano factor. It employs a local refinement strategy that re-prioritizes Palma-ranked genes within parent clusters. Benchmarking across diverse public scRNA-seq datasets confirms that PalmaClust consistently outperforms state-of-the-art baselines, improving rare-class F1 scores by at least 20% (absolute) while maintaining high global clustering stability. Further studies demonstrate that the Palma ratio-derived graph layer is essential for capturing ultra-rare signatures that other views miss. Availability: https://github.com/wan-mlab/PalmaClust.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 19 Mar 2026.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 13
- Comments 0