47,406 research outputs found

    수치 문자열의 순서를 보존하는 매칭 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 박근수.String matching is a fundamental problem in computer science and has been extensively studied. Sometimes a string consists of numeric values instead of alphabet characters, and we are interested in some trends in the text rather than specific patterns. We introduce a new string matching problem called order-preserving matching on numeric strings, where a pattern matches a text substring of the same length if the relative orders in the substring coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching. In this thesis, we define order-preserving matching in numeric strings, and present various representations of order relations and efficient algorithms of order-preserving matching with those representations. For single pattern matching, we give an O(n log m) time algorithm with the prefix representation based on the KMP algorithm, and optimize it further to obtain O(n + m log m) time with the nearest neighbor representation, where n and m are the lengths of the text and the pattern, respectively. For multiple pattern matching, we present an O((n+m) log m) time algorithm with the prefix representation based on the Aho-Corasick algorithm, where n is the text length and m is the sum of the lengths of the patterns. Our algorithms are presented in binary order relations first, and then extended to ternary order relations. With our extensions, the time complexities in binary order relations can be achieved in ternary order relations as well.Contents Abstract i Contents ii List of Figures iv List of Tables v Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 Order-Preserving Pattern Matching 6 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Definitions of Order Relations . . . . . . . . . . . . . . . . 6 2.1.2 Number of Representations . . . . . . . . . . . . . . . . . 8 2.1.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . 8 2.2 O(n logm) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Prefix Representation . . . . . . . . . . . . . . . . . . . . 10 2.2.2 KMP Failure Function . . . . . . . . . . . . . . . . . . . . 11 ii 2.2.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Construction of KMP Failure Function . . . . . . . . . . . 15 2.2.5 Correctness and Time Complexity . . . . . . . . . . . . . 17 2.3 O(n + mlogm) Algorithm . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Nearest Neighbor Representation . . . . . . . . . . . . . . 17 2.3.2 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Construction of KMP Failure Function . . . . . . . . . . . 21 2.3.4 Correctness and Time Complexity . . . . . . . . . . . . . 22 2.3.5 Generalized Order-Preserving Matching . . . . . . . . . . 23 2.3.6 Remark on Alphabet Size . . . . . . . . . . . . . . . . . . 23 Chapter 3 Order-Preserving Multiple Pattern Matching 25 3.1 O((n + m) logm) Algorithm . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Aho-Corasick Automaton . . . . . . . . . . . . . . . . . . 26 3.1.2 Aho-Corasick Failure Function . . . . . . . . . . . . . . . 27 3.1.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.4 Construction of Aho-Corasick Failure Function . . . . . . 29 3.1.5 Correctness and Time Complexity . . . . . . . . . . . . . 32 Chapter 4 Extensions to Ternary Order Relations 33 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Extension of Prefix Representation . . . . . . . . . . . . . . . . . 34 4.3 Extension of Nearest Neighbor Representation . . . . . . . . . . . 38 4.4 Generalized Order-Preserving KMP Algorithm . . . . . . . . . . 42 Chapter 5 Conclusion 45 Bibliography 47Docto

    Duel and sweep algorithm for order-preserving pattern matching

    Full text link
    Given a text TT and a pattern PP over alphabet Σ\Sigma, the classic exact matching problem searches for all occurrences of pattern PP in text TT. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in O(n+mlogm)O(n + m\log m) time in general and O(n+m)O(n + m) time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in O(n2)O(n^2) time for duel stage and O(n2m)O(n^2 m) time for sweeping time with O(m3)O(m^3) preprocessing time.Comment: 13 pages, 5 figure

    A Compact Index for Order-Preserving Pattern Matching

    Full text link
    Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a delta component, containing information on the absolute values. Experiments show that this approach is viable, faster than the available alternatives, and it is the first one offering simultaneously small space usage and fast retrieval.Comment: 16 pages. A preliminary version appeared in the Proc. IEEE Data Compression Conference, DCC 2017, Snowbird, UT, USA, 201

    Order preserving pattern matching on trees and DAGs

    Full text link
    The order preserving pattern matching (OPPM) problem is, given a pattern string pp and a text string tt, find all substrings of tt which have the same relative orders as pp. In this paper, we consider two variants of the OPPM problem where a set of text strings is given as a tree or a DAG. We show that the OPPM problem for a single pattern pp of length mm and a text tree TT of size NN can be solved in O(m+N)O(m+N) time if the characters of pp are drawn from an integer alphabet of polynomial size. The time complexity becomes O(mlogm+N)O(m \log m + N) if the pattern pp is over a general ordered alphabet. We then show that the OPPM problem for a single pattern and a text DAG is NP-complete
    corecore