4,922 research outputs found

    Simple, compact and robust approximate string dictionary

    Full text link
    This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary DD of dd strings of total length nn over an alphabet of size σ\sigma. Given a bound kk and a pattern xx of length mm, a query has to return all the strings of the dictionary which are at edit distance at most kk from xx, where the edit distance between two strings xx and yy is defined as the minimum-cost sequence of edit operations that transform xx into yy. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to 2≤k<m2\leq k<m. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Non-convex optimization for 3D point source localization using a rotating point spread function

    Get PDF
    We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Applications include high-resolution single molecule localization microscopy as well as the problem addressed in this paper on localization of space debris using a space-based telescope. The localization problem is discretized on a cubical lattice where the coordinates of nonzero entries represent the 3D locations and the values of these entries the fluxes of the point sources. Finding the locations and fluxes of the point sources is a large-scale sparse 3D inverse problem. A new nonconvex regularization method with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed for 3D localization for the Poisson noise model. In addition, we propose a new scheme of estimation of the source fluxes from the KL data-fitting term. Numerical experiments illustrate the efficiency and stability of the algorithms that are trained on a random subset of image data before being applied to other images. Our 3D localization algorithms can be readily applied to other kinds of depth-encoding PSFs as well.Comment: 28 page
    • …
    corecore