Search CORE

1,526 research outputs found

Highly Scalable Algorithms for Robust String Barcoding

Author: DasGupta Bhaskar
Konwar Kishori M.
Mandoiu Ion I.
Shvartsman Alex A.
Publication venue
Publication date: 01/01/2005
Field of study

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

arXiv.org e-Print Archive

CiteSeerX

Approximating Approximate Pattern Matching

Author: Studený Jan
Uznański Przemysław
Publication venue
Publication date: 01/01/2019
Field of study

Given a text

T

of length

n

and a pattern

P

of length

m

, the approximate pattern matching problem asks for computation of a particular \emph{distance} function between

P

and every

m

-substring of

T

. We consider a

(1\pm\varepsilon)

multiplicative approximation variant of this problem, for

\ell_p

distance function. In this paper, we describe two

(1+\varepsilon)

-approximate algorithms with a runtime of

\widetilde{O}(\frac{n}{\varepsilon})

for all (constant) non-negative values of

p

. For constant

p \ge 1

we show a deterministic

(1+\varepsilon)

-approximation algorithm. Previously, such run time was known only for the case of

\ell_1

distance, by Gawrychowski and Uzna\'nski [ICALP 2018] and only with a randomized algorithm. For constant

0 \le p \le 1

we show a randomized algorithm for the

\ell_p

, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of

p=0

) and of Gawrychowski and Uzna\'nski for

\ell_1

distance

arXiv.org e-Print Archive

Repository for Publications and Research Data

Recognizing well-parenthesized expressions in the streaming model

Author: Magniez F.
Mathieu C.
Nayak A.
Publication venue
Publication date: 17/11/2009
Field of study

Motivated by a concrete problem and with the goal of understanding the sense in which the complexity of streaming algorithms is related to the complexity of formal languages, we investigate the problem Dyck(s) of checking matching parentheses, with

s

different types of parenthesis. We present a one-pass randomized streaming algorithm for Dyck(2) with space \Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error. We prove that this one-pass algorithm is optimal, up to a \polylog n factor, even when two-sided error is allowed. For the lower bound, we prove a direct sum result on hard instances by following the "information cost" approach, but with a few twists. Indeed, we play a subtle game between public and private coins. This mixture between public and private coins results from a balancing act between the direct sum result and a combinatorial lower bound for the base case. Surprisingly, the space requirement shrinks drastically if we have access to the input stream in reverse. We present a two-pass randomized streaming algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and one-sided error, where the second pass is in the reverse direction. Both algorithms can be extended to Dyck(s) since this problem is reducible to Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX