381 research outputs found

    Streaming Pattern Matching with d Wildcards

    Get PDF
    In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol \u27?\u27. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space. In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space

    Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

    Get PDF
    Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average

    Upper and lower bounds for dynamic data structures on strings

    Get PDF
    We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length mm and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an O(m1/2−Δ)O(m^{1/2-\varepsilon}) time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1

    Querying XML data streams from wireless sensor networks: an evaluation of query engines

    Get PDF
    As the deployment of wireless sensor networks increase and their application domain widens, the opportunity for effective use of XML filtering and streaming query engines is ever more present. XML filtering engines aim to provide efficient real-time querying of streaming XML encoded data. This paper provides a detailed analysis of several such engines, focusing on the technology involved, their capabilities, their support for XPath and their performance. Our experimental evaluation identifies which filtering engine is best suited to process a given query based on its properties. Such metrics are important in establishing the best approach to filtering XML streams on-the-fly

    The k-mismatch problem revisited

    Get PDF
    We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: 1) Our first result is a deterministic O(nk2log⁥k/m+npolylogm)O(n k^2\log{k} / m+n \text{polylog} m) time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al. 2) We then give a randomised and online algorithm which runs in the same time complexity but requires only O(k2polylogm)O(k^2\text{polylog} {m}) space in total. 3) Next we give a randomised (1+Ï”)(1+\epsilon)-approximation algorithm for the streaming k-mismatch problem which uses O(k2polylogm/Ï”2)O(k^2\text{polylog} m / \epsilon^2) space and runs in O(polylogm/Ï”2)O(\text{polylog} m / \epsilon^2) worst-case time per arriving symbol. 4) Finally we combine our new results to derive a randomised O(k2polylogm)O(k^2\text{polylog} {m}) space algorithm for the streaming k-mismatch problem which runs in O(klog⁥k+polylogm)O(\sqrt{k}\log{k} + \text{polylog} {m}) worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors)

    Quasi-Periodicity in Streams

    Get PDF
    In this work, we show two streaming algorithms for computing the length of the shortest cover of a string of length n. We start by showing a two-pass algorithm that uses O(log^2 n) space and then show a one-pass streaming algorithm that uses O(sqrt{n log n}) space. Both algorithms run in near-linear time. The algorithms are randomized and compute the answer incorrectly with probability inverse-polynomial in n. We also show that there is no sublinear-space streaming algorithm for computing the length of the shortest seed of a string
    • 

    corecore