Search CORE

79,713 research outputs found

Faster Pattern Matching under Edit Distance

Author: Charalampopoulos P.
Kociumaka T.
Wellnitz P.
Publication venue
Publication date: 01/01/2022
Field of study

We consider the approximate pattern matching problem under the edit distance.Given a text

T

of length

n

, a pattern

P

of length

m

, and a threshold

k

, the task is to find the starting positions of all substrings of

T

thatcan be transformed to

P

with at most

k

edits. More than 20 years ago, Coleand Hariharan [SODA'98, J. Comput.'02] gave an

\mathcal{O}(n+k^4 \cdot n/m)

-time algorithm for this classic problem, and this runtime has not beenimproved since. Here, we present an algorithm that runs in time

\mathcal{O}(n+k^{3.5}\sqrt{\log m \log k} \cdot n/m)

, thus breaking through this long-standingbarrier. In the case where n^{1/4+\varepsilon} \leq k \leqn^{2/5-\varepsilon} for some arbitrarily small positive constant

\varepsilon

, our algorithm improves over the state-of-the-art by polynomialfactors: it is polynomially faster than both the algorithm of Cole andHariharan and the classic

\mathcal{O}(kn)

-time algorithm of Landau andVishkin [STOC'86, J. Algorithms'89]. We observe that the bottleneck case of the alternative

\mathcal{O}(n+k^4\cdot n/m)

-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz[FOCS'20] is when the text and the pattern are (almost) periodic. Our newalgorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching),which we solve by building on tools developed by Tiskin [SODA'10,Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Ouralgorithm relies only on a small set of primitive operations on strings andthus also applies to the fully-compressed setting (where text and pattern aregiven as straight-line programs) and to the dynamic setting (where we maintaina collection of strings under creation, splitting, and concatenation),improving over the state of the art.<br

MPG.PuRe

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree

Author: Equi Massimo
Grossi Roberto
Mäkinen Veli
Publication venue
Publication date: 08/07/2019
Field of study

Exact pattern matching in labeled graphs is the problem of searching paths of a graph

G=(V,E)

that spell the same string as the pattern

P[1..m]

. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks, where the nodes of some paths must match a sequence of labels or types. We describe a simple conditional lower bound that, for any constant

\epsilon>0

, an

O(|E|^{1 - \epsilon} \, m)

-time or an

O(|E| \, m^{1 - \epsilon})

-time algorithm for exact pattern matching on graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. The result holds even if restricted to undirected graphs of maximum degree three or directed acyclic graphs of maximum sum of indegree and outdegree three. Although a conditional lower bound of this kind can be somehow derived from previous results (Backurs and Indyk, FOCS'16), we give a direct reduction from SETH for dissemination purposes, as the result might interest researchers from several areas, such as computational biology, graph database, and graph mining, as mentioned before. Indeed, as approximate pattern matching on graphs can be solved in

O(|E|\,m)

time, exact and approximate matching are thus equally hard (quadratic time) on graphs under the SETH assumption. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively, where the latter ones have a matching SETH lower bound on computing the edit distance of two strings (Backurs and Indyk, STOC'15).Comment: Using Lemma 12 and Lemma 13 might to be enough to prove Lemma 14. However, the proof of Lemma 14 is correct if you assume that the graph used in the reduction is a DAG. Hence, since the problem is already quadratic for a DAG and a binary alphabet, it has to be quadratic also for a general graph and a binary alphabe

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Searching by approximate personal-name matching

Author: Camps Pare Rafael
Daude Ventura Jordi
Publication venue
Publication date: 01/01/2003
Field of study

We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Cluster-Wise Ratio Tests for Fast Camera Localization

Author: Díaz Raúl
Fowlkes Charless C.
Publication venue
Publication date: 20/05/2017
Field of study

Feature point matching for camera localization suffers from scalability problems. Even when feature descriptors associated with 3D scene points are locally unique, as coverage grows, similar or repeated features become increasingly common. As a result, the standard distance ratio-test used to identify reliable image feature points is overly restrictive and rejects many good candidate matches. We propose a simple coarse-to-fine strategy that uses conservative approximations to robust local ratio-tests that can be computed efficiently using global approximate k-nearest neighbor search. We treat these forward matches as votes in camera pose space and use them to prioritize back-matching within candidate camera pose clusters, exploiting feature co-visibility captured by clustering the 3D model camera pose graph. This approach achieves state-of-the-art camera localization results on a variety of popular benchmarks, outperforming several methods that use more complicated data structures and that make more restrictive assumptions on camera pose. We also carry out diagnostic analyses on a difficult test dataset containing globally repetitive structure that suggest our approach successfully adapts to the challenges of large-scale image localization

arXiv.org e-Print Archive

Crossref

Perceptually Motivated Shape Context Which Uses Shape Interiors

Author: Kakarala Ramakrishna
Premachandran Vittal
Publication venue
Publication date: 19/12/2012
Field of study

In this paper, we identify some of the limitations of current-day shape matching techniques. We provide examples of how contour-based shape matching techniques cannot provide a good match for certain visually similar shapes. To overcome this limitation, we propose a perceptually motivated variant of the well-known shape context descriptor. We identify that the interior properties of the shape play an important role in object recognition and develop a descriptor that captures these interior properties. We show that our method can easily be augmented with any other shape matching algorithm. We also show from our experiments that the use of our descriptor can significantly improve the retrieval rates

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Approximate Hamming distance in a stream

Author: Clifford Raphael
Starikovskaya Tatiana
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of computing a

(1+\epsilon)

-approximation of the Hamming distance between a pattern of length

n

and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an