Authors
Surana, P., Dutta, P., Papineni, N., Sathian, R., Zhou, Z., Liu, H., Davuluri, R. V.
Abstract
Characterizing tissue-specific (TSp) gene expression is crucial for understanding development and disease; however, traditional expression-based methods often overlook the latent regulatory grammar embedded in the non-coding DNA, particularly in distal promoter regions. Here, we introduce TSProm, a framework that specializes a DNA foundation model (DNABERT2) to decipher the long-range regulatory logic of TSp promoters at the gene isoform level. The contributions of our work are two-fold. First, we propose a novel comparative design that trains two distinct models, A: for general promoter biology and B: for TSp regulation. These models enable the precise isolation of sequence motifs around the transcription start site that uniquely define tissue identity. Second, we introduce a comprehensive explainable AI (xAI) module that integrates attention-based discovery with model-agnostic SHAP analysis for a robust, cross-validated interpretation of learned features. Applying this framework to human brain, liver, and testis promoters, we identified and validated clinically relevant transcription factors (TFs) in brain, such as SP1, MYC, and HES6, and confirmed their known roles in diseases such as gliomas and neuroblastomas. Our analysis further revealed that C2H2 zinc finger proteins are a dominant feature of the global landscape of TSp gene regulation. TSProm provides a novel and interpretable framework for identifying TSp gene regulatory elements, offering powerful computational tools for the study of tissue-specific gene regulation in normal and disease conditions.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 01 Nov 2025.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 54
- Comments 0