Search CORE

13,613 research outputs found

Data Structure Lower Bounds for Document Indexing Problems

Author: Afshani Peyman
Nielsen Jesper Sindahl
Publication venue
Publication date: 01/01/2016
Field of study

We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays

Author: Ben-Eliezer Omri
Korman Simon
Reichman Daniel
Publication venue
Publication date: 01/01/2017
Field of study

Understanding the local behaviour of structured multi-dimensional data is a fundamental problem in various areas of computer science. As the amount of data is often huge, it is desirable to obtain sublinear time algorithms, and specifically property testers, to understand local properties of the data. We focus on the natural local problem of testing pattern freeness: given a large

d

-dimensional array

A

and a fixed

d

-dimensional pattern

P

over a finite alphabet, we say that

A

P

-free if it does not contain a copy of the forbidden pattern

P

as a consecutive subarray. The distance of

A

P

-freeness is the fraction of entries of

A

that need to be modified to make it

P

-free. For any

\epsilon \in [0,1]

and any large enough pattern

P

over any alphabet, other than a very small set of exceptional patterns, we design a tolerant tester that distinguishes between the case that the distance is at least

\epsilon

and the case that it is at most

a_d \epsilon

, with query complexity and running time

c_d \epsilon^{-1}

, where

a_d < 1

and

c_d

depend only on

d

. To analyze the testers we establish several combinatorial results, including the following

d

-dimensional modification lemma, which might be of independent interest: for any large enough pattern

P

over any alphabet (excluding a small set of exceptional patterns for the binary case), and any array

A

containing a copy of

P

, one can delete this copy by modifying one of its locations without creating new

P

-copies in

A

. Our results address an open question of Fischer and Newman, who asked whether there exist efficient testers for properties related to tight substructures in multi-dimensional structured data. They serve as a first step towards a general understanding of local properties of multi-dimensional arrays, as any such property can be characterized by a fixed family of forbidden patterns

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Duel and sweep algorithm for order-preserving pattern matching

Author: A Amir
D Gusfield
DE Knuth
J Kim
M Crochemore
M Kubica
MM Hasan
R Cole
RN Horspool
RS Boyer
S Cho
S Faro
T Chhabra
U Vishkin
U Vishkin
Publication venue
Publication date: 26/05/2017
Field of study

Given a text

T

and a pattern

P

over alphabet

\Sigma

, the classic exact matching problem searches for all occurrences of pattern

P

in text

T

. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in

O(n + m\log m)

time in general and

O(n + m)

time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in

O(n^2)

time for duel stage and

O(n^2 m)

time for sweeping time with

O(m^3)

preprocessing time.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Crossref

String Matching: Communication, Circuits, and Learning

Author: Golovnev Alexander
Reichman Daniel
Shinkar Igor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

String matching is the problem of deciding whether a given n-bit string contains a given k-bit pattern. We study the complexity of this problem in three settings. - Communication complexity. For small k, we provide near-optimal upper and lower bounds on the communication complexity of string matching. For large k, our bounds leave open an exponential gap; we exhibit some evidence for the existence of a better protocol. - Circuit complexity. We present several upper and lower bounds on the size of circuits with threshold and DeMorgan gates solving the string matching problem. Similarly to the above, our bounds are near-optimal for small k. - Learning. We consider the problem of learning a hidden pattern of length at most k relative to the classifier that assigns 1 to every string that contains the pattern. We prove optimal bounds on the VC dimension and sample complexity of this problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Algorithms for Computing Abelian Periods of Words

Author: Fici Gabriele
Lecroq Thierry
Lefebvre Arnaud
Prieur-Gaston Elise
Publication venue: 'Elsevier BV'
Publication date: 10/06/2013
Field of study

Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the notion of an \emph{Abelian period} of a word. A word of length

n

over an alphabet of size

\sigma

can have

\Theta(n^{2})

distinct Abelian periods. The Brute-Force algorithm computes all the Abelian periods of a word in time

O(n^2 \times \sigma)

using

O(n \times \sigma)

space. We present an off-line algorithm based on a \sel function having the same worst-case theoretical complexity as the Brute-Force one, but outperforming it in practice. We then present on-line algorithms that also enable to compute all the Abelian periods of all the prefixes of

w

.Comment: Accepted for publication in Discrete Applied Mathematic

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo

Conditional Lower Bounds for Space/Time Tradeoffs

Author: A Abboud
A Amir
A Gajentaan
H Cohen
KG Larsen
M Patrascu
M Patrascu
R Agarwal
T Kopelowitz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2017
Field of study

In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P. A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact. In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This hardness is shown as a tradeoff between the space consumed by the data structure and the time needed to answer queries. The tradeoff may be either smooth or admit one or more singularity points. We reveal interesting connections between different space hardness conjectures and present matching upper bounds. We also apply these hardness conjectures to both static and dynamic problems and prove their conditional space hardness. We believe that this novel framework of polynomial space conjectures can play an important role in expressing polynomial space lower bounds of many important algorithmic problems. Moreover, it seems that it can also help in achieving a better understanding of the hardness of their corresponding problems in terms of time

arXiv.org e-Print Archive

Crossref

Quantum pattern matching fast on average

Author: Montanaro Ashley
Publication venue
Publication date: 26/08/2015
Field of study

The

d

-dimensional pattern matching problem is to find an occurrence of a pattern of length

m \times \dots \times m

within a text of length

n \times \dots \times n

, with

n \ge m

. This task models various problems in text and image processing, among other application areas. This work describes a quantum algorithm which solves the pattern matching problem for random patterns and texts in time

\widetilde{O}((n/m)^{d/2} 2^{O(d^{3/2}\sqrt{\log m})})

. For large

m

this is super-polynomially faster than the best possible classical algorithm, which requires time

\widetilde{\Omega}( (n/m)^d + n^{d/2} )

. The algorithm is based on the use of a quantum subroutine for finding hidden shifts in

d

dimensions, which is a variant of algorithms proposed by Kuperberg.Comment: 22 pages, 2 figures; v3: further minor changes, essentially published versio

arXiv.org e-Print Archive

CiteSeerX