333 research outputs found
Shared Predictive Cross-Modal Deep Quantization
With explosive growth of data volume and ever-increasing diversity of data
modalities, cross-modal similarity search, which conducts nearest neighbor
search across different modalities, has been attracting increasing interest.
This paper presents a deep compact code learning solution for efficient
cross-modal similarity search. Many recent studies have proven that
quantization-based approaches perform generally better than hashing-based
approaches on single-modal similarity search. In this paper, we propose a deep
quantization approach, which is among the early attempts of leveraging deep
neural networks into quantization-based cross-modal similarity search. Our
approach, dubbed shared predictive deep quantization (SPDQ), explicitly
formulates a shared subspace across different modalities and two private
subspaces for individual modalities, and representations in the shared subspace
and the private subspaces are learned simultaneously by embedding them to a
reproducing kernel Hilbert space, where the mean embedding of different
modality distributions can be explicitly compared. In addition, in the shared
subspace, a quantizer is learned to produce the semantics preserving compact
codes with the help of label alignment. Thanks to this novel network
architecture in cooperation with supervised quantization training, SPDQ can
preserve intramodal and intermodal similarities as much as possible and greatly
reduce quantization error. Experiments on two popular benchmarks corroborate
that our approach outperforms state-of-the-art methods
Correlation Hashing Network for Efficient Cross-Modal Retrieval
Hashing is widely applied to approximate nearest neighbor search for
large-scale multimodal retrieval with storage and computation efficiency.
Cross-modal hashing improves the quality of hash coding by exploiting semantic
correlations across different modalities. Existing cross-modal hashing methods
first transform data into low-dimensional feature vectors, and then generate
binary codes by another separate quantization step. However, suboptimal hash
codes may be generated since the quantization error is not explicitly minimized
and the feature representation is not jointly optimized with the binary codes.
This paper presents a Correlation Hashing Network (CHN) approach to cross-modal
hashing, which jointly learns good data representation tailored to hash coding
and formally controls the quantization error. The proposed CHN is a hybrid deep
architecture that constitutes a convolutional neural network for learning good
image representations, a multilayer perception for learning good text
representations, two hashing layers for generating compact binary codes, and a
structured max-margin loss that integrates all things together to enable
learning similarity-preserving and high-quality hash codes. Extensive empirical
study shows that CHN yields state of the art cross-modal retrieval performance
on standard benchmarks.Comment: 7 page
Simultaneous Feature Aggregating and Hashing for Large-scale Image Search
In most state-of-the-art hashing-based visual search systems, local image
descriptors of an image are first aggregated as a single feature vector. This
feature vector is then subjected to a hashing function that produces a binary
hash code. In previous work, the aggregating and the hashing processes are
designed independently. In this paper, we propose a novel framework where
feature aggregating and hashing are designed simultaneously and optimized
jointly. Specifically, our joint optimization produces aggregated
representations that can be better reconstructed by some binary codes. This
leads to more discriminative binary hash codes and improved retrieval accuracy.
In addition, we also propose a fast version of the recently-proposed Binary
Autoencoder to be used in our proposed framework. We perform extensive
retrieval experiments on several benchmark datasets with both SIFT and
convolutional features. Our results suggest that the proposed framework
achieves significant improvements over the state of the art.Comment: Accepted to CVPR 201
Unsupervised Multi-modal Hashing for Cross-modal retrieval
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure
SADIH: Semantic-Aware DIscrete Hashing
Due to its low storage cost and fast query speed, hashing has been recognized
to accomplish similarity search in large-scale multimedia retrieval
applications. Particularly supervised hashing has recently received
considerable research attention by leveraging the label information to preserve
the pairwise similarities of data points in the Hamming space. However, there
still remain two crucial bottlenecks: 1) the learning process of the full
pairwise similarity preservation is computationally unaffordable and unscalable
to deal with big data; 2) the available category information of data are not
well-explored to learn discriminative hash functions. To overcome these
challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH)
framework, which aims to directly embed the transformed semantic information
into the asymmetric similarity approximation and discriminative hashing
function learning. Specifically, a semantic-aware latent embedding is
introduced to asymmetrically preserve the full pairwise similarities while
skillfully handle the cumbersome n times n pairwise similarity matrix.
Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the
data structures in the discriminative latent semantic space and perform data
reconstruction. Moreover, an efficient alternating optimization algorithm is
proposed to solve the resulting discrete optimization problem. Extensive
experimental results on multiple large-scale datasets demonstrate that our
SADIH can clearly outperform the state-of-the-art baselines with the additional
benefit of lower computational costs.Comment: Accepted by The Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Collaborative Learning for Extremely Low Bit Asymmetric Hashing
Hashing techniques are in great demand for a wide range of real-world
applications such as image retrieval and network compression. Nevertheless,
existing approaches could hardly guarantee a satisfactory performance with the
extremely low-bit (e.g., 4-bit) hash codes due to the severe information loss
and the shrink of the discrete solution space. In this paper, we propose a
novel \textit{Collaborative Learning} strategy that is tailored for generating
high-quality low-bit hash codes. The core idea is to jointly distill
bit-specific and informative representations for a group of pre-defined code
lengths. The learning of short hash codes among the group can benefit from the
manifold shared with other long codes, where multiple views from different hash
codes provide the supplementary guidance and regularization, making the
convergence faster and more stable. To achieve that, an asymmetric hashing
framework with two variants of multi-head embedding structures is derived,
termed as Multi-head Asymmetric Hashing (MAH), leading to great efficiency of
training and querying. Extensive experiments on three benchmark datasets have
been conducted to verify the superiority of the proposed MAH, and have shown
that the 8-bit hash codes generated by MAH achieve of the MAP (Mean
Average Precision (MAP)) score on the CIFAR-10 dataset, which significantly
surpasses the performance of the 48-bit codes by the state-of-the-arts in image
retrieval tasks
Deep Ordinal Hashing with Spatial Attention
Hashing has attracted increasing research attentions in recent years due to
its high efficiency of computation and storage in image retrieval. Recent works
have demonstrated the superiority of simultaneous feature representations and
hash functions learning with deep neural networks. However, most existing deep
hashing methods directly learn the hash functions by encoding the global
semantic information, while ignoring the local spatial information of images.
The loss of local spatial structure makes the performance bottleneck of hash
functions, therefore limiting its application for accurate similarity
retrieval. In this work, we propose a novel Deep Ordinal Hashing (DOH) method,
which learns ordinal representations by leveraging the ranking structure of
feature space from both local and global views. In particular, to effectively
build the ranking structure, we propose to learn the rank correlation space by
exploiting the local spatial information from Fully Convolutional Network (FCN)
and the global semantic information from the Convolutional Neural Network (CNN)
simultaneously. More specifically, an effective spatial attention model is
designed to capture the local spatial information by selectively learning
well-specified locations closely related to target objects. In such hashing
framework,the local spatial and global semantic nature of images are captured
in an end-to-end ranking-to-hashing manner. Experimental results conducted on
three widely-used datasets demonstrate that the proposed DOH method
significantly outperforms the state-of-the-art hashing methods
Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval
With the increasing number of online stores, there is a pressing need for
intelligent search systems to understand the item photos snapped by customers
and search against large-scale product databases to find their desired items.
However, it is challenging for conventional retrieval systems to match up the
item photos captured by customers and the ones officially released by stores,
especially for garment images. To bridge the customer- and store- provided
garment photos, existing studies have been widely exploiting the clothing
attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to
learn a common embedding space for garment representations. Unfortunately they
omit the sequential correlation of attributes and consume large quantity of
human labors to label the landmarks. In this paper, we propose a deep
multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain
embedding and sequential attribute learning are modeled simultaneously.
Sequential attribute learning not only provides the semantic guidance for
embedding, but also generates rich attention on discriminative local details
(\textit{e.g.,} black buttons) of clothing items without requiring extra
landmark labels. This leads to promising performance and 306 boost on
efficiency when compared with the state-of-the-art models, which is
demonstrated through rigorous experiments on two public fashion datasets
A Survey on Learning to Hash
Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine
Intelligence (TPAMI
A Comprehensive Survey on Cross-modal Retrieval
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.Comment: 20 pages, 11 figures, 9 table
- …