1,211 research outputs found
Dual Asymmetric Deep Hashing Learning
Due to the impressive learning power, deep learning has achieved a remarkable
performance in supervised hash function learning. In this paper, we propose a
novel asymmetric supervised deep hashing method to preserve the semantic
structure among different categories and generate the binary codes
simultaneously. Specifically, two asymmetric deep networks are constructed to
reveal the similarity between each pair of images according to their semantic
labels. The deep hash functions are then learned through two networks by
minimizing the gap between the learned features and discrete codes.
Furthermore, since the binary codes in the Hamming space also should keep the
semantic affinity existing in the original space, another asymmetric pairwise
loss is introduced to capture the similarity between the binary codes and
real-value features. This asymmetric loss not only improves the retrieval
performance, but also contributes to a quick convergence at the training phase.
By taking advantage of the two-stream deep structures and two types of
asymmetric pairwise functions, an alternating algorithm is designed to optimize
the deep features and high-quality binary codes efficiently. Experimental
results on three real-world datasets substantiate the effectiveness and
superiority of our approach as compared with state-of-the-art.Comment: 12 pages, 6 figures, 7 tables, 37 conference
Collaborative Learning for Extremely Low Bit Asymmetric Hashing
Hashing techniques are in great demand for a wide range of real-world
applications such as image retrieval and network compression. Nevertheless,
existing approaches could hardly guarantee a satisfactory performance with the
extremely low-bit (e.g., 4-bit) hash codes due to the severe information loss
and the shrink of the discrete solution space. In this paper, we propose a
novel \textit{Collaborative Learning} strategy that is tailored for generating
high-quality low-bit hash codes. The core idea is to jointly distill
bit-specific and informative representations for a group of pre-defined code
lengths. The learning of short hash codes among the group can benefit from the
manifold shared with other long codes, where multiple views from different hash
codes provide the supplementary guidance and regularization, making the
convergence faster and more stable. To achieve that, an asymmetric hashing
framework with two variants of multi-head embedding structures is derived,
termed as Multi-head Asymmetric Hashing (MAH), leading to great efficiency of
training and querying. Extensive experiments on three benchmark datasets have
been conducted to verify the superiority of the proposed MAH, and have shown
that the 8-bit hash codes generated by MAH achieve of the MAP (Mean
Average Precision (MAP)) score on the CIFAR-10 dataset, which significantly
surpasses the performance of the 48-bit codes by the state-of-the-arts in image
retrieval tasks
Deep Supervised Hashing leveraging Quadratic Spherical Mutual Information for Content-based Image Retrieval
Several deep supervised hashing techniques have been proposed to allow for
efficiently querying large image databases. However, deep supervised image
hashing techniques are developed, to a great extent, heuristically often
leading to suboptimal results. Contrary to this, we propose an efficient deep
supervised hashing algorithm that optimizes the learned codes using an
information-theoretic measure, the Quadratic Mutual Information (QMI). The
proposed method is adapted to the needs of large-scale hashing and information
retrieval leading to a novel information-theoretic measure, the Quadratic
Spherical Mutual Information (QSMI). Apart from demonstrating the effectiveness
of the proposed method under different scenarios and outperforming existing
state-of-the-art image hashing techniques, this paper provides a structured way
to model the process of information retrieval and develop novel methods adapted
to the needs of each application
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Deep Cross-Modal Hashing
Due to its low storage cost and fast query speed, cross-modal hashing (CMH)
has been widely used for similarity search in multimedia retrieval
applications. However, almost all existing CMH methods are based on
hand-crafted features which might not be optimally compatible with the
hash-code learning procedure. As a result, existing CMH methods with
handcrafted features may not achieve satisfactory performance. In this paper,
we propose a novel cross-modal hashing method, called deep crossmodal hashing
(DCMH), by integrating feature learning and hash-code learning into the same
framework. DCMH is an end-to-end learning framework with deep neural networks,
one for each modality, to perform feature learning from scratch. Experiments on
two real datasets with text-image modalities show that DCMH can outperform
other baselines to achieve the state-of-the-art performance in cross-modal
retrieval applications.Comment: 12 page
Clustering is Efficient for Approximate Maximum Inner Product Search
Efficient Maximum Inner Product Search (MIPS) is an important task that has a
wide applicability in recommendation systems and classification with a large
number of classes. Solutions based on locality-sensitive hashing (LSH) as well
as tree-based solutions have been investigated in the recent literature, to
perform approximate MIPS in sublinear time. In this paper, we compare these to
another extremely simple approach for solving approximate MIPS, based on
variants of the k-means clustering algorithm. Specifically, we propose to train
a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine
Similarity Search (MCSS). Experiments on two standard recommendation system
benchmarks as well as on large vocabulary word embeddings, show that this
simple approach yields much higher speedups, for the same retrieval precision,
than current state-of-the-art hashing-based and tree-based methods. This simple
method also yields more robust retrievals when the query is corrupted by noise.Comment: 10 pages, Under review at ICLR 201
Extreme Classification in Log Memory
We present Merged-Averaged Classifiers via Hashing (MACH) for
K-classification with ultra-large values of K. Compared to traditional
one-vs-all classifiers that require O(Kd) memory and inference cost, MACH only
need O(d log K) (d is dimensionality )memory while only requiring O(K log K + d
log K) operation for inference. MACH is a generic K-classification algorithm,
with provably theoretical guarantees, which requires O(log K) memory without
any assumption on the relationship between classes. MACH uses universal hashing
to reduce classification with a large number of classes to few independent
classification tasks with small (constant) number of classes. We provide
theoretical quantification of discriminability-memory tradeoff. With MACH we
can train ODP dataset with 100,000 classes and 400,000 features on a single
Titan X GPU, with the classification accuracy of 19.28%, which is the
best-reported accuracy on this dataset. Before this work, the best performing
baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB
model size) and achieves 9% accuracy. In contrast, MACH can achieve 9% accuracy
with 480x reduction in the model size (of mere 0.3GB). With MACH, we also
demonstrate complete training of fine-grained imagenet dataset (compressed size
104GB), with 21,000 classes, on a single GPU. To the best of our knowledge,
this is the first work to demonstrate complete training of these extreme-class
datasets on a single Titan X
Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval
With the increasing number of online stores, there is a pressing need for
intelligent search systems to understand the item photos snapped by customers
and search against large-scale product databases to find their desired items.
However, it is challenging for conventional retrieval systems to match up the
item photos captured by customers and the ones officially released by stores,
especially for garment images. To bridge the customer- and store- provided
garment photos, existing studies have been widely exploiting the clothing
attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to
learn a common embedding space for garment representations. Unfortunately they
omit the sequential correlation of attributes and consume large quantity of
human labors to label the landmarks. In this paper, we propose a deep
multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain
embedding and sequential attribute learning are modeled simultaneously.
Sequential attribute learning not only provides the semantic guidance for
embedding, but also generates rich attention on discriminative local details
(\textit{e.g.,} black buttons) of clothing items without requiring extra
landmark labels. This leads to promising performance and 306 boost on
efficiency when compared with the state-of-the-art models, which is
demonstrated through rigorous experiments on two public fashion datasets
CNN-VWII: An Efficient Approach for Large-Scale Video Retrieval by Image Queries
This paper aims to solve the problem of large-scale video retrieval by a
query image. Firstly, we define the problem of top- image to video query.
Then, we combine the merits of convolutional neural networks(CNN for short) and
Bag of Visual Word(BoVW for short) module to design a model for video frames
information extraction and representation. In order to meet the requirements of
large-scale video retrieval, we proposed a visual weighted inverted index(VWII
for short) and related algorithm to improve the efficiency and accuracy of
retrieval process. Comprehensive experiments show that our proposed technique
achieves substantial improvements (up to an order of magnitude speed up) over
the state-of-the-art techniques with similar accuracy.Comment: submitted to Pattern Recognition Letter
- …