194,337 research outputs found
The Swap Matching Problem Revisited
In this paper, we revisit the much studied problem of Pattern Matching with
Swaps (Swap Matching problem, for short). We first present a graph-theoretic
model, which opens a new and so far unexplored avenue to solve the problem.
Then, using the model, we devise two efficient algorithms to solve the swap
matching problem. The resulting algorithms are adaptations of the classic
shift-and algorithm. For patterns having length similar to the word-size of the
target machine, both the algorithms run in linear time considering a fixed
alphabet.Comment: 23 pages, 3 Figures and 17 Table
On the Comparison Complexity of the String Prefix-Matching Problem
In this paper we study the exact comparison complexity of the stringprefix-matching problem in the deterministic sequential comparison modelwith equality tests. We derive almost tight lower and upper bounds onthe number of symbol comparisons required in the worst case by on-lineprefix-matching algorithms for any fixed pattern and variable text. Unlikeprevious results on the comparison complexity of string-matching andprefix-matching algorithms, our bounds are almost tight for any particular pattern.We also consider the special case where the pattern and the text are thesame string. This problem, which we call the string self-prefix problem, issimilar to the pattern preprocessing step of the Knuth-Morris-Pratt string-matchingalgorithm that is used in several comparison efficient string-matchingand prefix-matching algorithms, including in our new algorithm.We obtain roughly tight lower and upper bounds on the number of symbolcomparisons required in the worst case by on-line self-prefix algorithms.Our algorithms can be implemented in linear time and space in thestandard uniform-cost random-access-machine model
Suffix Structures and Circular Pattern Problems
The suffix tree is a data structure used to represent all the suffixes in a string. However, a major problem with the suffix tree is its practical space requirement. In this dissertation, we propose an efficient data structure -- the virtual suffix tree (VST) -- which requires less space than other recently proposed data structures for suffix trees and suffix arrays. On average, the space requirement (including that for suffix arrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form, where n is the length of the sequence.;Markov models are very popular for modeling complex sequences. In this dissertation, we present the probabilistic suffix array (PSA), a space-efficient alternative to the probabilistic suffix tree (PST) used to represent Markov models. The PSA provides all the capabilities of the PST, such as learning and prediction, and maintains the same linear time construction (linearity with respect to sequence length). The PSA, however, has a significantly smaller memory requirement than the PST, for both the construction stage, and at the time of usage.;Using the proposed suffix data structures, we study the circular pattern matching (CPM) problem. We provide a linear time, linear space algorithm to solve the exact circular pattern matching problem. We then present four algorithms to address the approximate circular pattern matching (ACPM) problem. Our bidirectional ACPM algorithm provides the best time complexity when compared with other algorithms proposed in the literature. Further, we define the circular pattern discovery (CPD) problem and present algorithms to solve this problem. Using the proposed circular pattern matching algorithms, we perform experiments on computational analysis and function prediction for multidomain proteins
Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences
The equidistant subsequence pattern matching problem is considered. Given a
pattern string and a text string , we say that is an
\emph{equidistant subsequence} of if is a subsequence of the text such
that consecutive symbols of in the occurrence are equally spaced. We can
consider the problem of equidistant subsequences as generalizations of
(sub-)cadences. We give bit-parallel algorithms that yield time
algorithms for finding -(sub-)cadences and equidistant subsequences.
Furthermore, and time algorithms, respectively for
equidistant and Abelian equidistant matching for the case , are shown.
The algorithms make use of a technique that was recently introduced which can
efficiently compute convolutions with linear constraints
Adaptive Multi-Pattern Fast Block-Matching Algorithm Based on Motion Classification Techniques
Motion estimation is the most time-consuming subsystem in a video codec. Thus, more efficient methods of motion estimation should be investigated. Real video sequences usually exhibit a wide-range of motion content as well as different degrees of detail, which become particularly difficult to manage by typical block-matching algorithms. Recent developments in the area of motion estimation have focused on the adaptation to video contents. Adaptive thresholds and multi-pattern search algorithms have shown to achieve good performance when they success to adjust to motion characteristics. This paper proposes an adaptive algorithm, called MCS, that makes use of an especially tailored classifier that detects some motion cues and chooses the search pattern that best fits to them. Specifically, a hierarchical structure of binary linear classifiers is proposed. Our experimental results show that MCS notably reduces the computational cost with respect to an state-of-the-art method while maintaining the qualityPublicad
- …