6,056 research outputs found

    Simple, compact and robust approximate string dictionary

    Full text link
    This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary DD of dd strings of total length nn over an alphabet of size σ\sigma. Given a bound kk and a pattern xx of length mm, a query has to return all the strings of the dictionary which are at edit distance at most kk from xx, where the edit distance between two strings xx and yy is defined as the minimum-cost sequence of edit operations that transform xx into yy. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to 2k<m2\leq k<m. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

    A practical index for approximate dictionary matching with few mismatches

    Get PDF
    Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in qq-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

    The flat space S-matrix from the AdS/CFT correspondence?

    Full text link
    We investigate recovery of the bulk S-matrix from the AdS/CFT correspondence, at large radius. It was recently argued that some of the elements of the S-matrix might be read from CFT correlators, given a particular singularity structure of the latter, but leaving the question of more general S-matrix elements. Since in AdS/CFT, data must be specified on the boundary, we find certain limitations on the corresponding bulk wavepackets and on their localization properties. In particular, those we have found that approximately localize have low-energy tails, and corresponding power-law tails in position space. When their scattering is compared to that of "sharper" wavepackets typically used in scattering theory, one finds apparently significant differences, suggesting a possible lack of resolution via these wavepackets. We also give arguments that construction of the sharper wavepackets may require non-perturbative control of the boundary theory, and particular of the N^2 matrix degrees of freedom. These observations thus raise interesting questions about what principle would guarantee the appropriate control, and about how a boundary CFT can accurately approximate the flat space S-matrix.Comment: 26 pages. v2: typos fixed v3: minor improvements in discussio

    Image Characterization and Classification by Physical Complexity

    Full text link
    We present a method for estimating the complexity of an image based on Bennett's concept of logical depth. Bennett identified logical depth as the appropriate measure of organized complexity, and hence as being better suited to the evaluation of the complexity of objects in the physical world. Its use results in a different, and in some sense a finer characterization than is obtained through the application of the concept of Kolmogorov complexity alone. We use this measure to classify images by their information content. The method provides a means for classifying and evaluating the complexity of objects by way of their visual representations. To the authors' knowledge, the method and application inspired by the concept of logical depth presented herein are being proposed and implemented for the first time.Comment: 30 pages, 21 figure
    corecore