Authors
Shur, A., Tziony, I., Orenstein, Y.
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of size {sigma}, a minimizer is defined by two positive integers k,w and a linear order {rho} on k-mers. A sequence is processed by a sliding window algorithm that chooses in each window of length w+k-1 its minimum k-mer with respect to {rho}. A key characteristic of a minimizer is its density, which is the expected frequency of chosen k-mers among all k-mers in a random infinite {sigma}-ary sequence. Minimizers of smaller density are preferred as they produce smaller samples, which lead to reduced runtime and memory usage in downstream applications. While the hardness of finding a minimizer of minimum density for given input parameters ({sigma},k,w) is unknown, it has a huge search space of ({sigma}^k)! and there is no known algorithm apart from a trivial brute-force search. In this paper, we tackle the minimum density problem for minimizers. We first formulate this problem as an ILP of size {Theta}(w{sigma}^(w+k)), which has worst-case solution time that is doubly-exponential in (k+w) under standard complexity assumptions. Our experiments show that an ILP solver terminates with an optimal solution only for very small k and w. We then present our main method, called OptMini, which computes an optimal minimizer in O(w*2^{{sigma}^k+O(k)}) time and thus is capable of processing large w values. In experiments, OptMini works much faster than the runtime predicts due to several additional tricks shrinking the search space without harming optimality. We use OptMini to compute minimum-density minimizers for ({sigma},k)[isin]{(2,2),(2,3),(2,4),(2,5),(2,6),(4,2)} and w[isin][2,3{sigma}^k], with the exception of certain w-ranges for k=6 and the single case of k=5, w=2. Finally, we derive conclusions and insights regarding the density values as a function of w, patterns in optimal minimizer orders, and the relation between minimum-size universal hitting sets and minimum-density minimizers.
Preprint server:
bioRxiv
The authors list and abstract were imported from bioRxiv on 29 Jan 2026.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 11
- Comments 0