10 research outputs found

    The Swap Matching Problem Revisited

    Full text link
    In this paper, we revisit the much studied problem of Pattern Matching with Swaps (Swap Matching problem, for short). We first present a graph-theoretic model, which opens a new and so far unexplored avenue to solve the problem. Then, using the model, we devise two efficient algorithms to solve the swap matching problem. The resulting algorithms are adaptations of the classic shift-and algorithm. For patterns having length similar to the word-size of the target machine, both the algorithms run in linear time considering a fixed alphabet.Comment: 23 pages, 3 Figures and 17 Table

    Cartesian Tree Matching and Indexing

    Get PDF
    We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation

    A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

    Get PDF
    The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time

    Cartesian ํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฌธ์ž์—ด ๋งค์นญ ๋ฐ ์ธ๋ฑ์‹ฑ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ๋ฐ•๊ทผ์ˆ˜.We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Cartesian ํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ์ƒˆ๋กœ์šด ๋งค์นญ ๊ธฐ์ค€์ธ Cartesian ํŠธ๋ฆฌ ๋งค์นญ์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ๋‘ ๋ฌธ์ž์—ด์˜ Cartesian ํŠธ๋ฆฌ๊ฐ€ ์„œ๋กœ ๊ฐ™์„ ๋•Œ, ๋‘ ๋ฌธ์ž์—ด์„ ๋งค์นญ๋œ ๊ฒƒ์œผ๋กœ ์ •์˜ํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค. Cartesian ํŠธ๋ฆฌ ๋งค์นญ์˜ ๊ธฐ์ค€ ํ•˜์—์„œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธธ์ด n์ธ ํ…์ŠคํŠธ์™€ ๊ธธ์ด m์ธ ํŒจํ„ด ์‚ฌ์ด์˜ ๋‹จ์ผํŒจํ„ด๋งค์นญ ๋ฌธ์ œ์™€ ๊ธธ์ด n์ธ ํ…์ŠคํŠธ์™€ ๊ธธ์ด์˜ ํ•ฉ์ด m์ธ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŒจํ„ด ์‚ฌ์ด์˜ ๋‹ค์ค‘ํŒจํ„ด๋งค์นญ ๋ฌธ์ œ๋ฅผ ์ •์˜ํ•˜๊ณ , ๋‹จ์ผํŒจํ„ด๋งค์นญ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” O(n+m) ์‹œ๊ฐ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋‹ค์ค‘ํŒจํ„ด๋งค์นญ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” O((n+m) log k) ์‹œ๊ฐ„ ๊ฒฐ์ •๋ก ์  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ O(n+m) ์‹œ๊ฐ„ ๋ฌด์ž‘์œ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. ๋˜ํ•œ, Cartesian ํŠธ๋ฆฌ ๋งค์นญ์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค ์ž๋ฃŒ๊ตฌ์กฐ์ธ Cartesian ์ ‘๋ฏธ์‚ฌํŠธ๋ฆฌ๋ฅผ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” O(n) ์‹œ๊ฐ„ ๋ฌด์ž‘์œ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Cartesian tree๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹์ธ ๋ถ€๋ชจ๊ฑฐ๋ฆฌํ‘œํ˜„ (parent-distance representation)์„ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ„ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๋Š” ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์ œ์‹œํ•œ๋‹ค.Chapter 1 Introduction 1 Chapter 2 Problem Definition 4 2.1 Basic notations 4 2.2 Cartesian tree matching 4 Chapter 3 Single Pattern Matching in O(n + m) Time 7 3.1 Parent-distance representation 7 3.2 Computing parent-distance representation 9 3.3 Failure function 11 3.4 Text search 13 3.5 Computing failure function 13 3.6 Correctness and time complexity 14 3.7 Cartesian tree signature 15 Chapter 4 Multiple Pattern Matching in O((n + m) log k) Time 17 4.1 Constructing the Aho-Corasick automaton 17 4.2 Multiple pattern matching 21 Chapter 5 Cartesian Suffix Tree in Randomized O(n) Time 22 5.1 Defining Cartesian suffix tree 22 5.2 Constructing Cartesian suffix tree 23 Chapter 6 Conclusion 26 Bibliography 27 ์š”์•ฝ 31Maste

    Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

    Get PDF
    Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average
    corecore