Search CORE

4,922 research outputs found

Simple, compact and robust approximate string dictionary

Author: Belazzougui Djamal
Chegrane Ibrahim
Publication venue
Publication date: 22/08/2014
Field of study

This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary

D

d

strings of total length

n

over an alphabet of size

\sigma

. Given a bound

k

and a pattern

x

of length

m

, a query has to return all the strings of the dictionary which are at edit distance at most

k

from

x

, where the edit distance between two strings

x

and

y

is defined as the minimum-cost sequence of edit operations that transform

x

into

y

. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to

2\leq k<m

. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

arXiv.org e-Print Archive

CiteSeerX

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Non-convex optimization for 3D point source localization using a rotating point spread function

Author: Chan Raymond
Nikolova Mila
Plemmons Robert
Prasad Sudhakar
Wang Chao
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2018
Field of study

We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Applications include high-resolution single molecule localization microscopy as well as the problem addressed in this paper on localization of space debris using a space-based telescope. The localization problem is discretized on a cubical lattice where the coordinates of nonzero entries represent the 3D locations and the values of these entries the fluxes of the point sources. Finding the locations and fluxes of the point sources is a large-scale sparse 3D inverse problem. A new nonconvex regularization method with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed for 3D localization for the Poisson noise model. In addition, we propose a new scheme of estimation of the source fluxes from the KL data-fitting term. Numerical experiments illustrate the efficiency and stability of the algorithms that are trained on a random subset of image data before being applied to other images. Our 3D localization algorithms can be readily applied to other kinds of depth-encoding PSFs as well.Comment: 28 page

arXiv.org e-Print Archive