14 research outputs found
Efficient Computation of Sequence Mappability
Sequence mappability is an important task in genome re-sequencing. In the
-mappability problem, for a given sequence of length , our goal
is to compute a table whose th entry is the number of indices such
that length- substrings of starting at positions and have at
most mismatches. Previous works on this problem focused on heuristic
approaches to compute a rough approximation of the result or on the case of
. We present several efficient algorithms for the general case of the
problem. Our main result is an algorithm that works in time and space for
. It requires a carefu l adaptation of the technique of Cole
et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also
show -time algorithms to compute all results for a fixed
and all or a fixed and all . Finally we show
that the -mappability problem cannot be solved in strongly subquadratic
time for unless the Strong Exponential Time Hypothesis
fails.Comment: Accepted to SPIRE 201
Internal Quasiperiod Queries
Internal pattern matching requires one to answer queries about factors of a
given string. Many results are known on answering internal period queries,
asking for the periods of a given factor. In this paper we investigate (for the
first time) internal queries asking for covers (also known as quasiperiods) of
a given factor. We propose a data structure that answers such queries in
time for the shortest cover and in time for a representation of all the covers, after time
and space preprocessing.Comment: To appear in the SPIRE 2020 proceeding
A lower bound for the coverability problem in acyclic pushdown VAS
We investigate the coverability problem for a one-dimensional restriction of pushdown vector addition systems with states. We improve the lower complexity bound to PSpace, even in the acyclic case
Rectangular tile covers of 2D-strings
We consider tile covers of 2D-strings which are a generalization of periodicity of 1D-strings. We say that a 2D-string A is a tile cover of a 2D-string S if S can be decomposed into non-overlapping 2D-strings, each of them equal to A or to AT, where AT is the transpose of A. We show that all tile covers of a 2D-string of size N can be computed in O(N1+ε) time for any ε > 0. We also show a linear-time algorithm for computing all 1D-strings being tile covers of a 2D-string
Efficient computation of sequence mappability
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices j≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of k= 1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for k= O(1) , works in O(n) space and, with high probability, in O(n· min { mk, log kn}) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop O(n2) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all k∈ { 0 , … , m} or a fixed k and all m∈ { k, … , n}. Finally, we show that, for k, m= Θ (log n) , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018
Circular pattern matching with k mismatches
We consider the circular pattern matching with k mismatches (k-CPM) problem in which one is to compute the minimal Hamming distance of every length-m substring of T and any cyclic rotation of P, if this distance is no more than k. It is a variation of the well-studied k-mismatch problem. A multitude of papers has been devoted