12 research outputs found

    A Compact Index for Order-Preserving Pattern Matching

    Full text link
    Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a delta component, containing information on the absolute values. Experiments show that this approach is viable, faster than the available alternatives, and it is the first one offering simultaneously small space usage and fast retrieval.Comment: 16 pages. A preliminary version appeared in the Proc. IEEE Data Compression Conference, DCC 2017, Snowbird, UT, USA, 201

    A compact index for order-preserving pattern matching

    Get PDF
    Order-preserving pattern matching has been introduced recently, but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem, we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a \u3b4 component, containing information on the absolute values. Experiments show that this approach is viable, is faster than the available alternatives, and is the first one offering simultaneously small space usage and fast retrieval

    New Algorithms for δγ-Order Preserving Matching

    Get PDF
    Context: Order-preserving matching regards the relative order of strings. However, its application areas require more flexibility in the matching paradigm. We advance in this direction in this paper that extends our previous work [27]. Method: We define γ -order preserving matching as an approximate variant of order-preserving matching. We devise two solutions for it based on segment and Fenwick trees: segtreeBA and bitBA. Results: We experimentally show the efficiency of our algorithms compared to the ones presented in [26] (naiveA and updateBA). Also, we present applications of our approach in music retrieval and stock market analysis. Conclusions: Even though the worst-case time complexity of the proposed algorithms (namely, O(nm log m)) is higher than the Ѳ(nm)-time complexity of updateBA, their Ω (n log n) lower bound makes them more efficient in practice. On the other hand, we show that our approach is useful to identify similarity in music melodies and stock price trends through real application examples

    Nuevos Algoritmos para Búsqueda de Orden δγ

    Get PDF
    Context: Order-preserving matching regards the relative order of strings. However, its application areas require more flexibility in the matching paradigm. We advance in this direction in this paper that extends our previous work [27].Method: We define γ -order preserving matching as an approximate variant of order-preserving matching. We devise two solutions for it based on segment and Fenwick trees: segtreeBA and bitBA.Results: We experimentally show the efficiency of our algorithms compared to the ones presented in [26] (naiveA and updateBA). Also, we present applications of our approach in music retrieval and stock market analysis.Conclusions: Even though the worst-case time complexity of the proposed algorithms (namely, O(nm log m)) is higher than the Ѳ(nm)-time complexity of updateBA, their Ω (n log n) lower bound makes them more efficient in practice. On the other hand, we show that our approach is useful to identify similarity in music melodies and stock price trends through real application examples.Contexto: El emparejamiento de cadenas según el orden compara la estructura de las cadenas de texto. Sin embargo, sus áreas de aplicación requieren mayor flexibilidad en el criterio de comparación. Este artículo avanza en esta dirección al extender [27]. Método: Se define la búsqueda de orden – γ como una variante aproximada del problema de emparejamiento de cadenas según orden. Se proponen dos soluciones basadas en árboles de segmentos y árboles Fenwick: segtree BA and bit BA.Resultados: La eficiencia de los algoritmos propuestos se muestra experimentalmente comparándolos con los algoritmos presentados en [26] (naive A y update BA). Además, se presentan aplicaciones.Conclusiones: A pesar de que la complejidad en tiempo de peor-caso de los algoritmos propuestos (a decir, O (nm log m)) es mayor que la complejidad de update BA (Ѳ (nm)), su cota baja Ω(n log n) los hace más eficientes en la práctica. También se muestran aplicaciones del enfoque propuesto en recuperación de música y análisis del mercado de acciones con ejemplos reales

    Combinatorics and Algorithmics of Strings

    Get PDF
    Edited in cooperation with Robert MercaşStrings (aka sequences or words) form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the Latin alphabet), in the process of heredity transmission in living cells (through DNA sequences) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and is actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, encode and decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application.http://drops.dagstuhl.de/opus/volltexte/2014/4552

    Circular pattern matching with k mismatches

    Get PDF
    We consider the circular pattern matching with k mismatches (k-CPM) problem in which one is to compute the minimal Hamming distance of every length-m substring of T and any cyclic rotation of P, if this distance is no more than k. It is a variation of the well-studied k-mismatch problem. A multitude of papers has been devoted

    수치 문자열의 순서를 보존하는 매칭 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 박근수.String matching is a fundamental problem in computer science and has been extensively studied. Sometimes a string consists of numeric values instead of alphabet characters, and we are interested in some trends in the text rather than specific patterns. We introduce a new string matching problem called order-preserving matching on numeric strings, where a pattern matches a text substring of the same length if the relative orders in the substring coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching. In this thesis, we define order-preserving matching in numeric strings, and present various representations of order relations and efficient algorithms of order-preserving matching with those representations. For single pattern matching, we give an O(n log m) time algorithm with the prefix representation based on the KMP algorithm, and optimize it further to obtain O(n + m log m) time with the nearest neighbor representation, where n and m are the lengths of the text and the pattern, respectively. For multiple pattern matching, we present an O((n+m) log m) time algorithm with the prefix representation based on the Aho-Corasick algorithm, where n is the text length and m is the sum of the lengths of the patterns. Our algorithms are presented in binary order relations first, and then extended to ternary order relations. With our extensions, the time complexities in binary order relations can be achieved in ternary order relations as well.Contents Abstract i Contents ii List of Figures iv List of Tables v Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 Order-Preserving Pattern Matching 6 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Definitions of Order Relations . . . . . . . . . . . . . . . . 6 2.1.2 Number of Representations . . . . . . . . . . . . . . . . . 8 2.1.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . 8 2.2 O(n logm) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Prefix Representation . . . . . . . . . . . . . . . . . . . . 10 2.2.2 KMP Failure Function . . . . . . . . . . . . . . . . . . . . 11 ii 2.2.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Construction of KMP Failure Function . . . . . . . . . . . 15 2.2.5 Correctness and Time Complexity . . . . . . . . . . . . . 17 2.3 O(n + mlogm) Algorithm . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Nearest Neighbor Representation . . . . . . . . . . . . . . 17 2.3.2 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Construction of KMP Failure Function . . . . . . . . . . . 21 2.3.4 Correctness and Time Complexity . . . . . . . . . . . . . 22 2.3.5 Generalized Order-Preserving Matching . . . . . . . . . . 23 2.3.6 Remark on Alphabet Size . . . . . . . . . . . . . . . . . . 23 Chapter 3 Order-Preserving Multiple Pattern Matching 25 3.1 O((n + m) logm) Algorithm . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Aho-Corasick Automaton . . . . . . . . . . . . . . . . . . 26 3.1.2 Aho-Corasick Failure Function . . . . . . . . . . . . . . . 27 3.1.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.4 Construction of Aho-Corasick Failure Function . . . . . . 29 3.1.5 Correctness and Time Complexity . . . . . . . . . . . . . 32 Chapter 4 Extensions to Ternary Order Relations 33 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Extension of Prefix Representation . . . . . . . . . . . . . . . . . 34 4.3 Extension of Nearest Neighbor Representation . . . . . . . . . . . 38 4.4 Generalized Order-Preserving KMP Algorithm . . . . . . . . . . 42 Chapter 5 Conclusion 45 Bibliography 47Docto

    Algorithms for Order-Preserving Matching

    Get PDF
    String matching is a widely studied problem in Computer Science. There have been many recent developments in this field. One fascinating problem considered lately is the order-preserving matching (OPM) problem. The task is to find all the substrings in the text which have the same length and relative order as the pattern, where the relative order is the numerical order of the numbers in a string. The problem finds its applications in the areas involving time series or series of numbers. More specifically, it is useful for those who are interested in the relative order of the pattern and not in the pattern itself. For example, it can be used by analysts in a stock market to study movements of prices.  In addition to the OPM problem, we also studied its approximate variation. In approximate order-preserving matching, we search for those substrings in the text which have relative order similar to the pattern, i.e., relative order of the pattern matches with at most k mismatches. With respect to applications of order-preserving matching, approximate search is more meaningful than exact search. We developed various advanced solutions for the problem and its variant. Special emphasis was laid on the practical efficiency of the solutions. Particularly, we introduced a simple solution for the OPM problem using filtration. We proved experimentally that our method was effective and faster than the previous solutions for the problem. In addition, we combined the Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previous solution. Moreover, we proposed another efficient solution without filtration using the SIMD architecture. We also presented an offline solution based on the FM-index scheme. Furthermore, we proposed practical solutions for the approximate order-preserving matching problem and one of the solutions was the first sublinear solution on average for the problem

    Succinct Data Structures for Parameterized Pattern Matching and Related Problems

    Get PDF
    Let T be a fixed text-string of length n and P be a varying pattern-string of length |P| \u3c= n. Both T and P contain characters from a totally ordered alphabet Sigma of size sigma \u3c= n. Suffix tree is the ubiquitous data structure for answering a pattern matching query: report all the positions i in T such that T[i + k - 1] = P[k], 1 \u3c= k \u3c= |P|. Compressed data structures support pattern matching queries, using much lesser space than the suffix tree, mainly by relying on a crucial property of the leaves in the tree. Unfortunately, in many suffix tree variants (such as parameterized suffix tree, order-preserving suffix tree, and 2-dimensional suffix tree), this property does not hold. Consequently, compressed representations of these suffix tree variants have been elusive. We present the first compressed data structures for two important variants of the pattern matching problem: (1) Parameterized Matching -- report a position i in T if T[i + k - 1] = f(P[k]), 1 \u3c= k \u3c= |P|, for a one-to-one function f that renames the characters in P to the characters in T[i,i+|P|-1], and (2) Order-preserving Matching -- report a position i in T if T[i + j - 1] and T[i + k -1] have the same relative order as that of P[j] and P[k], 1 \u3c= j \u3c k \u3c= |P|. For each of these two problems, the existing suffix tree variant requires O(n*log n) bits of space and answers a query in O(|P|*log sigma + occ) time, where occ is the number of starting positions where a match exists. We present data structures that require O(n*log sigma) bits of space and answer a query in O((|P|+occ) poly(log n)) time. As a byproduct, we obtain compressed data structures for a few other variants, as well as introduce two new techniques (of independent interest) for designing compressed data structures for pattern matching
    corecore