Search CORE

561 research outputs found

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Sparser Johnson-Lindenstrauss Transforms

Author: Kane Daniel M.
Nelson Jelani
Publication venue
Publication date: 01/01/2014
Field of study

We give two different and simple constructions for dimensionality reduction in

\ell_2

via linear mappings that are sparse: only an

O(\varepsilon)

-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion

1+\varepsilon

with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar, and Sarl\'{o}s (STOC 2010). Such distributions can be used to speed up applications where

\ell_2

dimensionality reduction is used.Comment: v6: journal version, minor changes, added Remark 23; v5: modified abstract, fixed typos, added open problem section; v4: simplified section 4 by giving 1 analysis that covers both constructions; v3: proof of Theorem 25 in v2 was written incorrectly, now fixed; v2: Added another construction achieving same upper bound, and added proof of near-tight lower bound for DKS schem

arXiv.org e-Print Archive

Crossref

Harvard University - DASH

eScholarship - University of California

Semi-Supervised Hashing for Large-Scale Search

Author: Jun Wang
S. Kumar
Shih-Fu Chang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Compressed Fingerprint Matching and Camera Identification via Random Projections

Author: BIANCHI TIZIANO
COLUCCIA GIULIO
MAGLI ENRICO
VALSESIA DIEGO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Sensor imperfections in the form of photo-response nonuniformity (PRNU) patterns are a well-established fingerprinting technique to link pictures to the camera sensors that acquired them. The noise-like characteristics of the PRNU pattern make it a difficult object to compress, thus hindering many interesting applications that would require storage of a large number of fingerprints or transmission over a bandlimited channel for real-time camera matching. In this paper, we propose to use realvalued or binary random projections to effectively compress the fingerprints at a small cost in terms of matching accuracy. The performance of randomly projected fingerprints is analyzed from a theoretical standpoint and experimentally verified on databases of real photographs. Practical issues concerning the complexity of implementing random projections are also addressed by using circulant matrices

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Siamese coding network and pair similarity prediction for near-duplicate image detection

Author: Fisichella Marco
Publication venue: London : Springer
Publication date: 01/01/2022
Field of study

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website that checks for plagiarism or piracy every time a new image is uploaded. Among the various approaches, near-duplicate detection was effectively addressed by SimPair LSH (Fisichella et al., in Decker, Lhotská, Link, Spies, Wagner (eds) Database and expert systems applications, Springer, 2014). As the name suggests, SimPair LSH uses locality sensitive hashing (LSH) and computes and stores in advance a small set of near-duplicate pairs present in the dataset and uses them to reduce the candidate set returned for a given query using the Triangle inequality. We develop an algorithm that predicts how the candidate set will be reduced. We also develop a new efficient method for near-duplicate image detection using a deep Siamese coding neural network that is able to extract effective features from images useful for building LSH indices. Extensive experiments on two benchmark datasets confirm the effectiveness of our deep Siamese coding network and prediction algorithm

Institutionelles Repositorium der Leibniz Universität Hannover