2,727 research outputs found
Adaptive Hash Retrieval with Kernel Based Similarity
Indexing methods have been widely used for fast data retrieval on large scale datasets. When the data are represented by high dimensional vectors, hashing is often used as an efficient solution for approximate similarity search. When a retrieval task does not involve supervised training data, most hashing methods aim at preserving data similarity defined by a distance metric on the feature vectors. Hash codes generated by these approaches normally maintain the Hamming distance of the data in accordance with the similarity function, but ignore the local details of the distribution of data. This objective is not suitable for k-nearest neighbor search since the similarity to the nearest neighbors can vary significantly for different data samples. In this paper, we present a novel adaptive similarity measure which is consistent with k-nearest neighbor search, and prove that it leads to a valid kernel if the original similarity function is a kernel function. Next we propose a method which calculates hash codes using the kernel function. With a low-rank approximation, our hashing framework is more effective than existing methods that preserve similarity over an arbitrary kernel. The proposed similarity function, hashing framework, and their combination demonstrate significant improvement when compared with several alternative state-of-the-art methods
Deep Hashing Network for Unsupervised Domain Adaptation
In recent years, deep neural networks have emerged as a dominant machine
learning tool for a wide variety of application domains. However, training a
deep neural network requires a large amount of labeled data, which is an
expensive process in terms of time, labor and human expertise. Domain
adaptation or transfer learning algorithms address this challenge by leveraging
labeled data in a different, but related source domain, to develop a model for
the target domain. Further, the explosive growth of digital data has posed a
fundamental challenge concerning its storage and retrieval. Due to its storage
and retrieval efficiency, recent years have witnessed a wide application of
hashing in a variety of computer vision applications. In this paper, we first
introduce a new dataset, Office-Home, to evaluate domain adaptation algorithms.
The dataset contains images of a variety of everyday objects from multiple
domains. We then propose a novel deep learning framework that can exploit
labeled source data and unlabeled target data to learn informative hash codes,
to accurately classify unseen target data. To the best of our knowledge, this
is the first research effort to exploit the feature learning capabilities of
deep neural networks to learn representative hash codes to address the domain
adaptation problem. Our extensive empirical studies on multiple transfer tasks
corroborate the usefulness of the framework in learning efficient hash codes
which outperform existing competitive baselines for unsupervised domain
adaptation.Comment: CVPR 201
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Towards Optimal Discrete Online Hashing with Balanced Similarity
When facing large-scale image datasets, online hashing serves as a promising
solution for online retrieval and prediction tasks. It encodes the online
streaming data into compact binary codes, and simultaneously updates the hash
functions to renew codes of the existing dataset. To this end, the existing
methods update hash functions solely based on the new data batch, without
investigating the correlation between such new data and the existing dataset.
In addition, existing works update the hash functions using a relaxation
process in its corresponding approximated continuous space. And it remains as
an open problem to directly apply discrete optimizations in online hashing. In
this paper, we propose a novel supervised online hashing method, termed
Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above
problems in a unified framework. BSODH employs a well-designed hashing
algorithm to preserve the similarity between the streaming data and the
existing dataset via an asymmetric graph regularization. We further identify
the "data-imbalance" problem brought by the constructed asymmetric graph, which
restricts the application of discrete optimization in our problem. Therefore, a
novel balanced similarity is further proposed, which uses two equilibrium
factors to balance the similar and dissimilar weights and eventually enables
the usage of discrete optimizations. Extensive experiments conducted on three
widely-used benchmarks demonstrate the advantages of the proposed method over
the state-of-the-art methods.Comment: 8 pages, 11 figures, conferenc
- …