Efficient Computation of Sequence Mappability

research

Authors: Mai Alzamel
Panagiotis Charalampopoulos
Costas S. Iliopoulos
Tomasz Kociumaka
Solon P. Pissis
Jakub Radoszewski
Juliusz Straszyński
Publication date: 31 July 2018
Publisher
Doi

Abstract

Sequence mappability is an important task in genome re-sequencing. In the

(k,m)

-mappability problem, for a given sequence

T

of length

n

, our goal is to compute a table whose

i

th entry is the number of indices

j \ne i

such that length-

m

substrings of

T

starting at positions

i

and

j

have at most

k

mismatches. Previous works on this problem focused on heuristic approaches to compute a rough approximation of the result or on the case of

k=1

. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that works in

\mathcal{O}(n \min\{m^k,\log^{k+1} n\})

time and

\mathcal{O}(n)

space for

k=\mathcal{O}(1)

. It requires a carefu l adaptation of the technique of Cole et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also show

\mathcal{O}(n^2)

-time algorithms to compute all results for a fixed

m

and all

k=0,\ldots,m

or a fixed

k

and all

m=k,\ldots,n-1

. Finally we show that the

(k,m)

-mappability problem cannot be solved in strongly subquadratic time for

k,m = \Theta(\log n)

unless the Strong Exponential Time Hypothesis fails.Comment: Accepted to SPIRE 201

Last time updated on 23/11/2022

Last time updated on 11/07/2022

Last time updated on 13/10/2022

Last time updated on 04/01/2023

Last time updated on 11/07/2022