15,775 research outputs found
Duel and sweep algorithm for order-preserving pattern matching
Given a text and a pattern over alphabet , the classic exact
matching problem searches for all occurrences of pattern in text .
Unlike exact matching problem, order-preserving pattern matching (OPPM)
considers the relative order of elements, rather than their real values. In
this paper, we propose an efficient algorithm for OPPM problem using the
"duel-and-sweep" paradigm. Our algorithm runs in time in
general and time under an assumption that the characters in a string
can be sorted in linear time with respect to the string size. We also perform
experiments and show that our algorithm is faster that KMP-based algorithm.
Last, we introduce the two-dimensional order preserved pattern matching and
give a duel and sweep algorithm that runs in time for duel stage and
time for sweeping time with preprocessing time.Comment: 13 pages, 5 figure
Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays
Understanding the local behaviour of structured multi-dimensional data is a
fundamental problem in various areas of computer science. As the amount of data
is often huge, it is desirable to obtain sublinear time algorithms, and
specifically property testers, to understand local properties of the data.
We focus on the natural local problem of testing pattern freeness: given a
large -dimensional array and a fixed -dimensional pattern over a
finite alphabet, we say that is -free if it does not contain a copy of
the forbidden pattern as a consecutive subarray. The distance of to
-freeness is the fraction of entries of that need to be modified to make
it -free. For any and any large enough pattern over
any alphabet, other than a very small set of exceptional patterns, we design a
tolerant tester that distinguishes between the case that the distance is at
least and the case that it is at most , with query
complexity and running time , where and
depend only on .
To analyze the testers we establish several combinatorial results, including
the following -dimensional modification lemma, which might be of independent
interest: for any large enough pattern over any alphabet (excluding a small
set of exceptional patterns for the binary case), and any array containing
a copy of , one can delete this copy by modifying one of its locations
without creating new -copies in .
Our results address an open question of Fischer and Newman, who asked whether
there exist efficient testers for properties related to tight substructures in
multi-dimensional structured data. They serve as a first step towards a general
understanding of local properties of multi-dimensional arrays, as any such
property can be characterized by a fixed family of forbidden patterns
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Quantum pattern matching fast on average
The -dimensional pattern matching problem is to find an occurrence of a
pattern of length within a text of length , with . This task models various problems in text and
image processing, among other application areas. This work describes a quantum
algorithm which solves the pattern matching problem for random patterns and
texts in time . For
large this is super-polynomially faster than the best possible classical
algorithm, which requires time . The
algorithm is based on the use of a quantum subroutine for finding hidden shifts
in dimensions, which is a variant of algorithms proposed by Kuperberg.Comment: 22 pages, 2 figures; v3: further minor changes, essentially published
versio
String Matching: Communication, Circuits, and Learning
String matching is the problem of deciding whether a given n-bit string contains a given k-bit pattern. We study the complexity of this problem in three settings.
- Communication complexity. For small k, we provide near-optimal upper and lower bounds on the communication complexity of string matching. For large k, our bounds leave open an exponential gap; we exhibit some evidence for the existence of a better protocol.
- Circuit complexity. We present several upper and lower bounds on the size of circuits with threshold and DeMorgan gates solving the string matching problem. Similarly to the above, our bounds are near-optimal for small k.
- Learning. We consider the problem of learning a hidden pattern of length at most k relative to the classifier that assigns 1 to every string that contains the pattern. We prove optimal bounds on the VC dimension and sample complexity of this problem
Fast Searching in Packed Strings
Given strings and the (exact) string matching problem is to find all
positions of substrings in matching . The classical Knuth-Morris-Pratt
algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear
time which is optimal if we can only read one character at the time. However,
most strings are stored in a computer in a packed representation with several
characters in a single word, giving us the opportunity to read multiple
characters simultaneously. In this paper we study the worst-case complexity of
string matching on strings given in packed representation. Let be
the lengths and , respectively, and let denote the size of the
alphabet. On a standard unit-cost word-RAM with logarithmic word size we
present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m +
\occ\right). Here \occ is the number of occurrences of in . For this improves the bound of the Knuth-Morris-Pratt algorithm.
Furthermore, if our algorithm is optimal since any
algorithm must spend at least \Omega(\frac{(n+m)\log
\sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to
read the input and report all occurrences. The result is obtained by a novel
automaton construction based on the Knuth-Morris-Pratt algorithm combined with
a new compact representation of subautomata allowing an optimal
tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM
200
- …