6 research outputs found

    String Periods in the Order-Preserving Model

    Get PDF
    The order-preserving model (op-model, in short) was introduced quite recently but has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time O(n), O(n log log n), O(n log^2 log n/log log log n), O(n log n) depending on the type of periodicity. In the most general variant the number of different periods can be as big as Omega(n^2), and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of such periods

    String periods in the order-preserving model

    Full text link
    In the order-preserving model, two strings match if they share the same relative order between the characters at the corresponding positions. This model is quite recent, but it has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time O(n), O(nlog⁡log⁡n), O(nlog2⁡log⁡n/log⁡log⁡log⁡n), O(nlog⁡n) depending on the type of periodicity. In the most general variant, the number of different op-periods can be as big as Ω(n2), and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of op-periods. In particular, we characterize the Fine–Wilf property for coprime op-periods. © 2019 Elsevier Inc.Supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU's Horizon 2020 Research and Innovation Programme (grant no. 683064).Supported by the Ministry of Science and Higher Education of the Russian Federation, project 1.3253.2017.A part of this work was done during the workshop StringMasters in Warsaw 2017 that was sponsored by the Warsaw Center of Mathematics and Computer Science. The authors thank the participants of the workshop, especially Hideo Bannai and Shunsuke Inenaga, for helpful discussions

    Cartesian Tree Matching and Indexing

    Get PDF
    We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation

    Cartesian 트리에 기반한 문자열 매칭 및 인덱싱

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2020. 8. 박근수.We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.본 논문에서는 Cartesian 트리에 기반한 새로운 매칭 기준인 Cartesian 트리 매칭을 제안한다. 이는 두 문자열의 Cartesian 트리가 서로 같을 때, 두 문자열을 매칭된 것으로 정의하는 문제이다. Cartesian 트리 매칭의 기준 하에서, 본 연구에서는 길이 n인 텍스트와 길이 m인 패턴 사이의 단일패턴매칭 문제와 길이 n인 텍스트와 길이의 합이 m인 여러 개의 패턴 사이의 다중패턴매칭 문제를 정의하고, 단일패턴매칭 문제를 해결하는 O(n+m) 시간 알고리즘과 다중패턴매칭 문제를 해결하는 O((n+m) log k) 시간 결정론적 알고리즘 및 O(n+m) 시간 무작위 알고리즘을 제시한다. 또한, Cartesian 트리 매칭에 대한 인덱스 자료구조인 Cartesian 접미사트리를 정의하고, 이를 구축하는 O(n) 시간 무작위 알고리즘을 제시한다. 본 논문에서는 Cartesian tree를 표현하는 방식인 부모거리표현 (parent-distance representation)을 정의하고, 이를 이용하여 위 문제들을 해결하는 효율적인 알고리즘들을 제시한다.Chapter 1 Introduction 1 Chapter 2 Problem Definition 4 2.1 Basic notations 4 2.2 Cartesian tree matching 4 Chapter 3 Single Pattern Matching in O(n + m) Time 7 3.1 Parent-distance representation 7 3.2 Computing parent-distance representation 9 3.3 Failure function 11 3.4 Text search 13 3.5 Computing failure function 13 3.6 Correctness and time complexity 14 3.7 Cartesian tree signature 15 Chapter 4 Multiple Pattern Matching in O((n + m) log k) Time 17 4.1 Constructing the Aho-Corasick automaton 17 4.2 Multiple pattern matching 21 Chapter 5 Cartesian Suffix Tree in Randomized O(n) Time 22 5.1 Defining Cartesian suffix tree 22 5.2 Constructing Cartesian suffix tree 23 Chapter 6 Conclusion 26 Bibliography 27 요약 31Maste

    Computing Covers under Substring Consistent Equivalence Relations

    Full text link
    Covers are a kind of quasiperiodicity in strings. A string CC is a cover of another string TT if any position of TT is inside some occurrence of CC in TT. The shortest and longest cover arrays of TT have the lengths of the shortest and longest covers of each prefix of TT, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation \approx over strings is called a substring consistent equivalence relation (SCER) iff XYX \approx Y implies (1) X=Y|X| = |Y| and (2) X[i:j]Y[i:j]X[i:j] \approx Y[i:j] for all 1ijX1 \le i \le j \le |X|. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string TT under the identity relation will work for any SCERs taking the accordingly generalized border arrays.Comment: 16 page

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF
    corecore