4,922 research outputs found
Simple, compact and robust approximate string dictionary
This paper is concerned with practical implementations of approximate string
dictionaries that allow edit errors. In this problem, we have as input a
dictionary of strings of total length over an alphabet of size
. Given a bound and a pattern of length , a query has to
return all the strings of the dictionary which are at edit distance at most
from , where the edit distance between two strings and is defined as
the minimum-cost sequence of edit operations that transform into . The
cost of a sequence of operations is defined as the sum of the costs of the
operations involved in the sequence. In this paper, we assume that each of
these operations has unit cost and consider only three operations: deletion of
one character, insertion of one character and substitution of a character by
another. We present a practical implementation of the data structure we
recently proposed and which works only for one error. We extend the scheme to
. Our implementation has many desirable properties: it has a very
fast and space-efficient building algorithm. The dictionary data structure is
compact and has fast and robust query time. Finally our data structure is
simple to implement as it only uses basic techniques from the literature,
mainly hashing (linear probing and hash signatures) and succinct data
structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Non-convex optimization for 3D point source localization using a rotating point spread function
We consider the high-resolution imaging problem of 3D point source image
recovery from 2D data using a method based on point spread function (PSF)
engineering. The method involves a new technique, recently proposed by
S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain
depth from defocus. The amount of rotation of the PSF encodes the depth
position of the point source. Applications include high-resolution single
molecule localization microscopy as well as the problem addressed in this paper
on localization of space debris using a space-based telescope. The localization
problem is discretized on a cubical lattice where the coordinates of nonzero
entries represent the 3D locations and the values of these entries the fluxes
of the point sources. Finding the locations and fluxes of the point sources is
a large-scale sparse 3D inverse problem. A new nonconvex regularization method
with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed
for 3D localization for the Poisson noise model. In addition, we propose a new
scheme of estimation of the source fluxes from the KL data-fitting term.
Numerical experiments illustrate the efficiency and stability of the algorithms
that are trained on a random subset of image data before being applied to other
images. Our 3D localization algorithms can be readily applied to other kinds of
depth-encoding PSFs as well.Comment: 28 page
- …