11,336 research outputs found
Duel and sweep algorithm for order-preserving pattern matching
Given a text and a pattern over alphabet , the classic exact
matching problem searches for all occurrences of pattern in text .
Unlike exact matching problem, order-preserving pattern matching (OPPM)
considers the relative order of elements, rather than their real values. In
this paper, we propose an efficient algorithm for OPPM problem using the
"duel-and-sweep" paradigm. Our algorithm runs in time in
general and time under an assumption that the characters in a string
can be sorted in linear time with respect to the string size. We also perform
experiments and show that our algorithm is faster that KMP-based algorithm.
Last, we introduce the two-dimensional order preserved pattern matching and
give a duel and sweep algorithm that runs in time for duel stage and
time for sweeping time with preprocessing time.Comment: 13 pages, 5 figure
Average-Case Optimal Approximate Circular String Matching
Approximate string matching is the problem of finding all factors of a text t
of length n that are at a distance at most k from a pattern x of length m.
Approximate circular string matching is the problem of finding all factors of t
that are at a distance at most k from x or from any of its rotations. In this
article, we present a new algorithm for approximate circular string matching
under the edit distance model with optimal average-case search time O(n(k + log
m)/m). Optimal average-case search time can also be achieved by the algorithms
for multiple approximate string matching (Fredriksson and Navarro, 2004) using
x and its rotations as the set of multiple patterns. Here we reduce the
preprocessing time and space requirements compared to that approach
Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language
Objective: The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. Design: The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. Measurements: Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. Results: The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorith
Implementation and comparison of Berry-Ravindran and Zhu- Takaoka exact string matching algorithms in Indonesian-Batak Toba dictionary
Indonesia has a variety of local languages, which is the Batak Toba language. This time, there are still some Batak Toba people who do not know speak Batak Toba language fluently. Nowadays, desktop based dictionary is one of reference that very efficiently used to learn a language and also to increase vocabulary. In making the dictionary application, string matching can be implemented for word-searching process. String matching have some algorithm, which is Berry – Ravindran algorithm and Zhu-Takaoka algorithm and will be implemented on the dictionary application. Zhu-Takaoka algorithm and Berry – Ravindran algorithm have two phases, which are the preprocessing phase and the searching phase. Preprocessing phase is a process to make the shifting values according to in pattern that input by user. To know the shifting value with Zhu-Takaoka algorithm, it’s need Zhu-Takaoka Bad Character (Ztbc) and Boyer-Moore Good Suffix (Bmgs). Then, Ztbc will be compared to Bmgs to get the maximum value of them that will be set as shifting value. While Berry-Ravindran algorithm, to know the shifting value is needed Berry-Ravindran Bad Character, which the two characters right of the text at the position m + 1 and m+ 2, is needed to determine the shifting value, where m is length of the pattern
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
New Variants of Pattern Matching with Constants and Variables
Given a text and a pattern over two types of symbols called constants and
variables, the parameterized pattern matching problem is to find all
occurrences of substrings of the text that the pattern matches by substituting
a variable in the text for each variable in the pattern, where the substitution
should be injective. The function matching problem is a variant of it that
lifts the injection constraint. In this paper, we discuss variants of those
problems, where one can substitute a constant or a variable for each variable
of the pattern. We give two kinds of algorithms for both problems, a
convolution-based method and an extended KMP-based method, and analyze their
complexity.Comment: 15 pages, 2 figure
- …