Search CORE

6,056 research outputs found

Simple, compact and robust approximate string dictionary

Author: Belazzougui Djamal
Chegrane Ibrahim
Publication venue
Publication date: 22/08/2014
Field of study

This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary

D

d

strings of total length

n

over an alphabet of size

\sigma

. Given a bound

k

and a pattern

x

of length

m

, a query has to return all the strings of the dictionary which are at edit distance at most

k

from

x

, where the edit distance between two strings

x

and

y

is defined as the minimum-cost sequence of edit operations that transform

x

into

y

. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to

2\leq k<m

. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

arXiv.org e-Print Archive

CiteSeerX

A practical index for approximate dictionary matching with few mismatches

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue
Publication date: 11/02/2016
Field of study

Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in

q

-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

arXiv.org e-Print Archive

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

The flat space S-matrix from the AdS/CFT correspondence?

Author: Gary Michael
Giddings Steven B.
Publication venue: 'American Physical Society (APS)'
Publication date: 24/04/2009
Field of study

We investigate recovery of the bulk S-matrix from the AdS/CFT correspondence, at large radius. It was recently argued that some of the elements of the S-matrix might be read from CFT correlators, given a particular singularity structure of the latter, but leaving the question of more general S-matrix elements. Since in AdS/CFT, data must be specified on the boundary, we find certain limitations on the corresponding bulk wavepackets and on their localization properties. In particular, those we have found that approximately localize have low-energy tails, and corresponding power-law tails in position space. When their scattering is compared to that of "sharper" wavepackets typically used in scattering theory, one finds apparently significant differences, suggesting a possible lack of resolution via these wavepackets. We also give arguments that construction of the sharper wavepackets may require non-perturbative control of the boundary theory, and particular of the N^2 matrix degrees of freedom. These observations thus raise interesting questions about what principle would guarantee the appropriate control, and about how a boundary CFT can accurately approximate the flat space S-matrix.Comment: 26 pages. v2: typos fixed v3: minor improvements in discussio

arXiv.org e-Print Archive

CERN Document Server

Image Characterization and Classification by Physical Complexity

Author: Delahaye Jean-Paul
Gaucherel Cedric
Zenil Hector
Publication venue
Publication date: 03/07/2011
Field of study

We present a method for estimating the complexity of an image based on Bennett's concept of logical depth. Bennett identified logical depth as the appropriate measure of organized complexity, and hence as being better suited to the evaluation of the complexity of objects in the physical world. Its use results in a different, and in some sense a finer characterization than is obtained through the application of the concept of Kolmogorov complexity alone. We use this measure to classify images by their information content. The method provides a means for classifying and evaluating the complexity of objects by way of their visual representations. To the authors' knowledge, the method and application inspired by the concept of logical depth presented herein are being proposed and implemented for the first time.Comment: 30 pages, 21 figure

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Oxford University Research Archive

HAL-CIRAD

Hal-Diderot