1,065 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Indexability, concentration, and VC theory
Degrading performance of indexing schemes for exact similarity search in high
dimensions has long since been linked to histograms of distributions of
distances and other 1-Lipschitz functions getting concentrated. We discuss this
observation in the framework of the phenomenon of concentration of measure on
the structures of high dimension and the Vapnik-Chervonenkis theory of
statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded,
improved and corrected version of the SISAP'2010 invited paper, this e-print,
v3
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
Lokalno diskriminantna projekcija difuzije i njena primjena za prepoznavanje emocionalnog stanja iz govornog signala
The existing Diffusion Maps method brings diffusion to data samples by Markov random walk. In this paper, to provide a general solution form of Diffusion Maps, first, we propose the generalized single-graph-diffusion embedding framework on the basis of graph embedding framework. Second, by designing the embedding graph of the framework, an algorithm, namely Locally Discriminant Diffusion Projection (LDDP), is proposed for speech emotion recognition. This algorithm is the projection form of the improved Diffusion Maps, which includes both discriminant information and local information. The linear or kernelized form of LDDP (i.e., LLDDP or KLDDP) is used to achieve the dimensionality reduction of original speech emotion features. We validate the proposed algorithm on two widely used speech emotion databases, EMO-DB and eNTERFACE\u2705. The experimental results show that the proposed LDDP methods, including LLDDP and KLDDP, outperform some other state-of-the-art dimensionality reduction methods which are based on graph embedding or discriminant analysis.Postojeće metode mapiranja difuzije u uzorke podataka primjenjuju Markovljevu slučajnu šetnju. U ovom radu, kako bismo pružili općenito rješenje za mapiranje difuzije, prvo predlažemo generalizirano okruženje za difuziju jednog grafa, zasnovano na okruženju za primjenu grafova. Drugo, konstruirajući ugrađeni graf, predlažemo algoritam lokalno diskriminantne projekcije difuzije (LDDP) za prepoznavanje emocionalnog stanja iz govornog signala. Ovaj algoritam je projekcija poboljšane difuzijske mape koja uključuje diskriminantnu i lokalnu informaciju. Linearna ili jezgrovita formulacija LDDP-a (i.e., LLDDP ili KLDDP) koristi se u svrhu redukcije dimenzionalnosti originalnog skupa značajki za prepoznavanje emocionalnog stanja iz govornog signala. Predloženi algoritam testiran je nad dvama široko korištenim bazama podataka za prepoznavanje emocionalnog stanja iz govornog signala, EMO-DB i eNTERFACE\u2705. Eksperimentalni rezultati pokazuju kako predložena LDDP metoda, uključujući LLDDP i KLDDP, pokazuje bolje ponašanje od nekih drugih najsuvremenijih metoda redukcije dimenzionalnosti, zasnovanim na ugrađenim grafovima ili analizi diskriminantnosti
- …