1,526 research outputs found

    Highly Scalable Algorithms for Robust String Barcoding

    Full text link
    String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

    Approximating Approximate Pattern Matching

    Full text link
    Given a text TT of length nn and a pattern PP of length mm, the approximate pattern matching problem asks for computation of a particular \emph{distance} function between PP and every mm-substring of TT. We consider a (1±ε)(1\pm\varepsilon) multiplicative approximation variant of this problem, for ℓp\ell_p distance function. In this paper, we describe two (1+ε)(1+\varepsilon)-approximate algorithms with a runtime of O~(nε)\widetilde{O}(\frac{n}{\varepsilon}) for all (constant) non-negative values of pp. For constant p≥1p \ge 1 we show a deterministic (1+ε)(1+\varepsilon)-approximation algorithm. Previously, such run time was known only for the case of ℓ1\ell_1 distance, by Gawrychowski and Uzna\'nski [ICALP 2018] and only with a randomized algorithm. For constant 0≤p≤10 \le p \le 1 we show a randomized algorithm for the ℓp\ell_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of p=0p=0) and of Gawrychowski and Uzna\'nski for ℓ1\ell_1 distance

    Recognizing well-parenthesized expressions in the streaming model

    Full text link
    Motivated by a concrete problem and with the goal of understanding the sense in which the complexity of streaming algorithms is related to the complexity of formal languages, we investigate the problem Dyck(s) of checking matching parentheses, with ss different types of parenthesis. We present a one-pass randomized streaming algorithm for Dyck(2) with space \Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error. We prove that this one-pass algorithm is optimal, up to a \polylog n factor, even when two-sided error is allowed. For the lower bound, we prove a direct sum result on hard instances by following the "information cost" approach, but with a few twists. Indeed, we play a subtle game between public and private coins. This mixture between public and private coins results from a balancing act between the direct sum result and a combinatorial lower bound for the base case. Surprisingly, the space requirement shrinks drastically if we have access to the input stream in reverse. We present a two-pass randomized streaming algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and one-sided error, where the second pass is in the reverse direction. Both algorithms can be extended to Dyck(s) since this problem is reducible to Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure
    • …
    corecore