47,406 research outputs found
수치 문자열의 순서를 보존하는 매칭 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 박근수.String matching is a fundamental problem in computer science and has been extensively studied. Sometimes a string consists of numeric values instead of alphabet characters, and we are interested in some trends in the text rather than specific patterns. We introduce a new string matching problem called order-preserving matching on numeric strings, where a pattern matches a text substring of the same length if the relative orders in the substring coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching.
In this thesis, we define order-preserving matching in numeric strings, and present various representations of order relations and efficient algorithms of order-preserving matching with those representations. For single pattern matching, we give an O(n log m) time algorithm with the prefix representation based on the KMP algorithm, and optimize it further to obtain O(n + m log m) time with the nearest neighbor representation, where n and m are the lengths of the text and the pattern, respectively. For multiple pattern matching, we present an O((n+m) log m) time algorithm with the prefix representation based on the Aho-Corasick algorithm, where n is the text length and m is the sum of the lengths of the patterns. Our algorithms are presented in binary order relations first, and then extended to ternary order relations. With our extensions, the time complexities in binary order relations can be achieved in ternary order relations as well.Contents
Abstract i
Contents ii
List of Figures iv
List of Tables v
Chapter 1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 Order-Preserving Pattern Matching 6
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Definitions of Order Relations . . . . . . . . . . . . . . . . 6
2.1.2 Number of Representations . . . . . . . . . . . . . . . . . 8
2.1.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . 8
2.2 O(n logm) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Prefix Representation . . . . . . . . . . . . . . . . . . . . 10
2.2.2 KMP Failure Function . . . . . . . . . . . . . . . . . . . . 11
ii
2.2.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Construction of KMP Failure Function . . . . . . . . . . . 15
2.2.5 Correctness and Time Complexity . . . . . . . . . . . . . 17
2.3 O(n + mlogm) Algorithm . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Nearest Neighbor Representation . . . . . . . . . . . . . . 17
2.3.2 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3 Construction of KMP Failure Function . . . . . . . . . . . 21
2.3.4 Correctness and Time Complexity . . . . . . . . . . . . . 22
2.3.5 Generalized Order-Preserving Matching . . . . . . . . . . 23
2.3.6 Remark on Alphabet Size . . . . . . . . . . . . . . . . . . 23
Chapter 3 Order-Preserving Multiple Pattern Matching 25
3.1 O((n + m) logm) Algorithm . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Aho-Corasick Automaton . . . . . . . . . . . . . . . . . . 26
3.1.2 Aho-Corasick Failure Function . . . . . . . . . . . . . . . 27
3.1.3 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.4 Construction of Aho-Corasick Failure Function . . . . . . 29
3.1.5 Correctness and Time Complexity . . . . . . . . . . . . . 32
Chapter 4 Extensions to Ternary Order Relations 33
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Extension of Prefix Representation . . . . . . . . . . . . . . . . . 34
4.3 Extension of Nearest Neighbor Representation . . . . . . . . . . . 38
4.4 Generalized Order-Preserving KMP Algorithm . . . . . . . . . . 42
Chapter 5 Conclusion 45
Bibliography 47Docto
Duel and sweep algorithm for order-preserving pattern matching
Given a text and a pattern over alphabet , the classic exact
matching problem searches for all occurrences of pattern in text .
Unlike exact matching problem, order-preserving pattern matching (OPPM)
considers the relative order of elements, rather than their real values. In
this paper, we propose an efficient algorithm for OPPM problem using the
"duel-and-sweep" paradigm. Our algorithm runs in time in
general and time under an assumption that the characters in a string
can be sorted in linear time with respect to the string size. We also perform
experiments and show that our algorithm is faster that KMP-based algorithm.
Last, we introduce the two-dimensional order preserved pattern matching and
give a duel and sweep algorithm that runs in time for duel stage and
time for sweeping time with preprocessing time.Comment: 13 pages, 5 figure
A Compact Index for Order-Preserving Pattern Matching
Order-preserving pattern matching was introduced recently but it has already
attracted much attention. Given a reference sequence and a pattern, we want to
locate all substrings of the reference sequence whose elements have the same
relative order as the pattern elements. For this problem we consider the
offline version in which we build an index for the reference sequence so that
subsequent searches can be completed very efficiently. We propose a
space-efficient index that works well in practice despite its lack of good
worst-case time bounds. Our solution is based on the new approach of
decomposing the indexed sequence into an order component, containing ordering
information, and a delta component, containing information on the absolute
values. Experiments show that this approach is viable, faster than the
available alternatives, and it is the first one offering simultaneously small
space usage and fast retrieval.Comment: 16 pages. A preliminary version appeared in the Proc. IEEE Data
Compression Conference, DCC 2017, Snowbird, UT, USA, 201
Order preserving pattern matching on trees and DAGs
The order preserving pattern matching (OPPM) problem is, given a pattern
string and a text string , find all substrings of which have the
same relative orders as . In this paper, we consider two variants of the
OPPM problem where a set of text strings is given as a tree or a DAG. We show
that the OPPM problem for a single pattern of length and a text tree
of size can be solved in time if the characters of are
drawn from an integer alphabet of polynomial size. The time complexity becomes
if the pattern is over a general ordered alphabet. We
then show that the OPPM problem for a single pattern and a text DAG is
NP-complete
- …