Search CORE

15,346 research outputs found

Dictionary Matching with One Gap

Author: A. Amir
A. Amir
A. Amir
A. Amir
A.V. Aho
E. Ukkonen
E.M. McCreight
G. Kucherov
G. Myers
G. Myers
G. Navarro
G. Navarro
G.S. Brodal
J.C. Naa
K. Fredriksson
M. Morgante
M. Zhang
M.S. Rahman
P. Bille
T. Haapasalo
Publication venue
Publication date: 01/01/2014
Field of study

The dictionary matching with gaps problem is to preprocess a dictionary

D

d

gapped patterns

P_1,\ldots,P_d

over alphabet

\Sigma

, where each gapped pattern

P_i

is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text

T

of length

n

over alphabet

\Sigma

, the goal is to output all locations in

T

in which a pattern

P_i\in D

1\leq i\leq d

, ends. There is a renewed current interest in the gapped matching problem stemming from cyber security. In this paper we solve the problem where all patterns in the dictionary have one gap with at least

\alpha

and at most

\beta

don't cares, where

\alpha

and

\beta

are given parameters. Specifically, we show that the dictionary matching with a single gap problem can be solved in either

O(d\log d + |D|)

time and

O(d\log^{\varepsilon} d + |D|)

space, and query time

O(n(\beta -\alpha )\log\log d \log ^2 \min \{ d, \log |D| \} + occ)

, where

occ

is the number of patterns found, or preprocessing time and space:

O(d^2 + |D|)

, and query time

O(n(\beta -\alpha ) + occ)

, where

occ

is the number of patterns found. As far as we know, this is the best solution for this setting of the problem, where many overlaps may exist in the dictionary.Comment: A preliminary version was published at CPM 201

arXiv.org e-Print Archive

Crossref

Pattern Matching in Multiple Streams

Author: A. Amir
D. Breslauer
F. Ergun
G.M. Landau
G.M. Landau
H. Karloff
K. Abrahamson
M. Ružić
R. Clifford
R. Clifford
R. Clifford
R. Clifford
R. Clifford
T.S. Jayram
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

The complexity of the Multiple Pattern Matching Problem for random strings

Author: Bassino Frédérique
Rakotoarimalala Tsinjo
Sportiello Andrea
Publication venue
Publication date: 01/01/2017
Field of study

We generalise a multiple string pattern matching algorithm, recently proposed by Fredriksson and Grabowski [J. Discr. Alg. 7, 2009], to deal with arbitrary dictionaries on an alphabet of size

s

. If

r_m

is the number of words of length

m

in the dictionary, and

\phi(r) = \max_m \ln(s\, m\, r_m)/m

, the complexity rate for the string characters to be read by this algorithm is at most

\kappa_{{}_\textrm{UB}}\, \phi(r)

for some constant

\kappa_{{}_\textrm{UB}}

. On the other side, we generalise the classical lower bound of Yao [SIAM J. Comput. 8, 1979], for the problem with a single pattern, to deal with arbitrary dictionaries, and determine it to be at least

\kappa_{{}_\textrm{LB}}\, \phi(r)

. This proves the optimality of the algorithm, improving and correcting previous claims.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive

HAL-Paris 13

The k-mismatch problem revisited

Author: Clifford Raphaël
Fontaine Allyx
Porat Ely
Sach Benjamin
Starikovskaya Tatiana
Publication venue
Publication date: 27/08/2015
Field of study

We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: 1) Our first result is a deterministic

O(n k^2\log{k} / m+n \text{polylog} m)

time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al. 2) We then give a randomised and online algorithm which runs in the same time complexity but requires only

O(k^2\text{polylog} {m})

space in total. 3) Next we give a randomised

(1+\epsilon)

-approximation algorithm for the streaming k-mismatch problem which uses

O(k^2\text{polylog} m / \epsilon^2)

space and runs in

O(\text{polylog} m / \epsilon^2)

worst-case time per arriving symbol. 4) Finally we combine our new results to derive a randomised

O(k^2\text{polylog} {m})

space algorithm for the streaming k-mismatch problem which runs in

O(\sqrt{k}\log{k} + \text{polylog} {m})

worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors)

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Data Structure Lower Bounds for Document Indexing Problems

Author: Afshani Peyman
Nielsen Jesper Sindahl
Publication venue
Publication date: 01/01/2016
Field of study

We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

Author: Amir Amihood
Kopelowitz Tsvi
Levy Avivit
Pettie Seth
Porat Ely
Shalom B. Riva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Symposium on Algorithms and Computation (ISAAC 2016)
Publication date: 01/01/2016
Field of study

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal. We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions

Dagstuhl Research Online Publication Server