Sequence mappability is an important task in genome re-sequencing. In the
(k,m)-mappability problem, for a given sequence T of length n, our goal
is to compute a table whose ith entry is the number of indices j=i such
that length-m substrings of T starting at positions i and j have at
most k mismatches. Previous works on this problem focused on heuristic
approaches to compute a rough approximation of the result or on the case of
k=1. We present several efficient algorithms for the general case of the
problem. Our main result is an algorithm that works in O(nmin{mk,logk+1n}) time and O(n) space for
k=O(1). It requires a carefu l adaptation of the technique of Cole
et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also
show O(n2)-time algorithms to compute all results for a fixed m
and all k=0,…,m or a fixed k and all m=k,…,n−1. Finally we show
that the (k,m)-mappability problem cannot be solved in strongly subquadratic
time for k,m=Θ(logn) unless the Strong Exponential Time Hypothesis
fails.Comment: Accepted to SPIRE 201