6,056 research outputs found
Simple, compact and robust approximate string dictionary
This paper is concerned with practical implementations of approximate string
dictionaries that allow edit errors. In this problem, we have as input a
dictionary of strings of total length over an alphabet of size
. Given a bound and a pattern of length , a query has to
return all the strings of the dictionary which are at edit distance at most
from , where the edit distance between two strings and is defined as
the minimum-cost sequence of edit operations that transform into . The
cost of a sequence of operations is defined as the sum of the costs of the
operations involved in the sequence. In this paper, we assume that each of
these operations has unit cost and consider only three operations: deletion of
one character, insertion of one character and substitution of a character by
another. We present a practical implementation of the data structure we
recently proposed and which works only for one error. We extend the scheme to
. Our implementation has many desirable properties: it has a very
fast and space-efficient building algorithm. The dictionary data structure is
compact and has fast and robust query time. Finally our data structure is
simple to implement as it only uses basic techniques from the literature,
mainly hashing (linear probing and hash signatures) and succinct data
structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
The flat space S-matrix from the AdS/CFT correspondence?
We investigate recovery of the bulk S-matrix from the AdS/CFT correspondence,
at large radius. It was recently argued that some of the elements of the
S-matrix might be read from CFT correlators, given a particular singularity
structure of the latter, but leaving the question of more general S-matrix
elements. Since in AdS/CFT, data must be specified on the boundary, we find
certain limitations on the corresponding bulk wavepackets and on their
localization properties. In particular, those we have found that approximately
localize have low-energy tails, and corresponding power-law tails in position
space. When their scattering is compared to that of "sharper" wavepackets
typically used in scattering theory, one finds apparently significant
differences, suggesting a possible lack of resolution via these wavepackets. We
also give arguments that construction of the sharper wavepackets may require
non-perturbative control of the boundary theory, and particular of the N^2
matrix degrees of freedom. These observations thus raise interesting questions
about what principle would guarantee the appropriate control, and about how a
boundary CFT can accurately approximate the flat space S-matrix.Comment: 26 pages. v2: typos fixed v3: minor improvements in discussio
Image Characterization and Classification by Physical Complexity
We present a method for estimating the complexity of an image based on
Bennett's concept of logical depth. Bennett identified logical depth as the
appropriate measure of organized complexity, and hence as being better suited
to the evaluation of the complexity of objects in the physical world. Its use
results in a different, and in some sense a finer characterization than is
obtained through the application of the concept of Kolmogorov complexity alone.
We use this measure to classify images by their information content. The method
provides a means for classifying and evaluating the complexity of objects by
way of their visual representations. To the authors' knowledge, the method and
application inspired by the concept of logical depth presented herein are being
proposed and implemented for the first time.Comment: 30 pages, 21 figure
- …