1,526 research outputs found
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
Approximating Approximate Pattern Matching
Given a text of length and a pattern of length , the
approximate pattern matching problem asks for computation of a particular
\emph{distance} function between and every -substring of . We
consider a multiplicative approximation variant of this
problem, for distance function. In this paper, we describe two
-approximate algorithms with a runtime of
for all (constant) non-negative values
of . For constant we show a deterministic
-approximation algorithm. Previously, such run time was known
only for the case of distance, by Gawrychowski and Uzna\'nski [ICALP
2018] and only with a randomized algorithm. For constant we
show a randomized algorithm for the , thereby providing a smooth
tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for
Hamming distance (case of ) and of Gawrychowski and Uzna\'nski for
distance
Recognizing well-parenthesized expressions in the streaming model
Motivated by a concrete problem and with the goal of understanding the sense
in which the complexity of streaming algorithms is related to the complexity of
formal languages, we investigate the problem Dyck(s) of checking matching
parentheses, with different types of parenthesis.
We present a one-pass randomized streaming algorithm for Dyck(2) with space
\Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error.
We prove that this one-pass algorithm is optimal, up to a \polylog n factor,
even when two-sided error is allowed. For the lower bound, we prove a direct
sum result on hard instances by following the "information cost" approach, but
with a few twists. Indeed, we play a subtle game between public and private
coins. This mixture between public and private coins results from a balancing
act between the direct sum result and a combinatorial lower bound for the base
case.
Surprisingly, the space requirement shrinks drastically if we have access to
the input stream in reverse. We present a two-pass randomized streaming
algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and
one-sided error, where the second pass is in the reverse direction. Both
algorithms can be extended to Dyck(s) since this problem is reducible to
Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure
- …