377 research outputs found
HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval
As the rapid growth of multi-modal data, hashing methods for cross-modal
retrieval have received considerable attention. Deep-networks-based cross-modal
hashing methods are appealing as they can integrate feature learning and hash
coding into end-to-end trainable frameworks. However, it is still challenging
to find content similarities between different modalities of data due to the
heterogeneity gap. To further address this problem, we propose an adversarial
hashing network with attention mechanism to enhance the measurement of content
similarities by selectively focusing on informative parts of multi-modal data.
The proposed new adversarial network, HashGAN, consists of three building
blocks: 1) the feature learning module to obtain feature representations, 2)
the generative attention module to generate an attention mask, which is used to
obtain the attended (foreground) and the unattended (background) feature
representations, 3) the discriminative hash coding module to learn hash
functions that preserve the similarities between different modalities. In our
framework, the generative module and the discriminative module are trained in
an adversarial way: the generator is learned to make the discriminator cannot
preserve the similarities of multi-modal data w.r.t. the background feature
representations, while the discriminator aims to preserve the similarities of
multi-modal data w.r.t. both the foreground and the background feature
representations. Extensive evaluations on several benchmark datasets
demonstrate that the proposed HashGAN brings substantial improvements over
other state-of-the-art cross-modal hashing methods.Comment: 10 pages, 8 figures, 3 table
Attribute-Guided Network for Cross-Modal Zero-Shot Hashing
Zero-Shot Hashing aims at learning a hashing model that is trained only by
instances from seen categories but can generate well to those of unseen
categories. Typically, it is achieved by utilizing a semantic embedding space
to transfer knowledge from seen domain to unseen domain. Existing efforts
mainly focus on single-modal retrieval task, especially Image-Based Image
Retrieval (IBIR). However, as a highlighted research topic in the field of
hashing, cross-modal retrieval is more common in real world applications. To
address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a
novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but
also Text-Based Image Retrieval (TBIR). In particular, AgNet aligns different
modal data into a semantically rich attribute space, which bridges the gap
caused by modality heterogeneity and zero-shot setting. We also design an
effective strategy that exploits the attribute to guide the generation of hash
codes for image and text within the same network. Extensive experimental
results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the
superiority of AgNet on both cross-modal and single-modal zero-shot image
retrieval tasks.Comment: 9 pages, 8 figure
Fusion-supervised Deep Cross-modal Hashing
Deep hashing has recently received attention in cross-modal retrieval for its
impressive advantages. However, existing hashing methods for cross-modal
retrieval cannot fully capture the heterogeneous multi-modal correlation and
exploit the semantic information. In this paper, we propose a novel
\emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly,
FDCH learns unified binary codes through a fusion hash network with paired
samples as input, which effectively enhances the modeling of the correlation of
heterogeneous multi-modal data. Then, these high-quality unified hash codes
further supervise the training of the modality-specific hash networks for
encoding out-of-sample queries. Meanwhile, both pair-wise similarity
information and classification information are embedded in the hash networks
under one stream framework, which simultaneously preserves cross-modal
similarity and keeps semantic consistency. Experimental results on two
benchmark datasets demonstrate the state-of-the-art performance of FDCH
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Given the benefits of its low storage requirements and high retrieval
efficiency, hashing has recently received increasing attention. In
particular,cross-modal hashing has been widely and successfully used in
multimedia similarity search applications. However, almost all existing methods
employing cross-modal hashing cannot obtain powerful hash codes due to their
ignoring the relative similarity between heterogeneous data that contains
richer semantic information, leading to unsatisfactory retrieval performance.
In this paper, we propose a triplet-based deep hashing (TDH) network for
cross-modal retrieval. First, we utilize the triplet labels, which describes
the relative relationships among three instances as supervision in order to
capture more general semantic correlations between cross-modal instances. We
then establish a loss function from the inter-modal view and the intra-modal
view to boost the discriminative abilities of the hash codes. Finally, graph
regularization is introduced into our proposed TDH method to preserve the
original semantic similarity between hash codes in Hamming space. Experimental
results show that our proposed method outperforms several state-of-the-art
approaches on two popular cross-modal datasets
Transfer Adversarial Hashing for Hamming Space Retrieval
Hashing is widely applied to large-scale image retrieval due to the storage
and retrieval efficiency. Existing work on deep hashing assumes that the
database in the target domain is identically distributed with the training set
in the source domain. This paper relaxes this assumption to a transfer
retrieval setting, which allows the database and the training set to come from
different but relevant domains. However, the transfer retrieval setting will
introduce two technical difficulties: first, the hash model trained on the
source domain cannot work well on the target domain due to the large
distribution gap; second, the domain gap makes it difficult to concentrate the
database points to be within a small Hamming ball. As a consequence, transfer
retrieval performance within Hamming Radius 2 degrades significantly in
existing hashing methods. This paper presents Transfer Adversarial Hashing
(TAH), a new hybrid deep architecture that incorporates a pairwise
-distribution cross-entropy loss to learn concentrated hash codes and an
adversarial network to align the data distributions between the source and
target domains. TAH can generate compact transfer hash codes for efficient
image retrieval on both source and target domains. Comprehensive experiments
validate that TAH yields state of the art Hamming space retrieval performance
on standard datasets
Using Deep Cross Modal Hashing and Error Correcting Codes for Improving the Efficiency of Attribute Guided Facial Image Retrieval
With benefits of fast query speed and low storage cost, hashing-based image
retrieval approaches have garnered considerable attention from the research
community. In this paper, we propose a novel Error-Corrected Deep Cross Modal
Hashing (CMH-ECC) method which uses a bitmap specifying the presence of certain
facial attributes as an input query to retrieve relevant face images from the
database. In this architecture, we generate compact hash codes using an
end-to-end deep learning module, which effectively captures the inherent
relationships between the face and attribute modality. We also integrate our
deep learning module with forward error correction codes to further reduce the
distance between different modalities of the same subject. Specifically, the
properties of deep hashing and forward error correction codes are exploited to
design a cross modal hashing framework with high retrieval performance.
Experimental results using two standard datasets with facial attributes-image
modalities indicate that our CMH-ECC face image retrieval model outperforms
most of the current attribute-based face image retrieval approaches.Comment: To be published in Proc. IEEE Global SIP 201
DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs
Due to the high storage and search efficiency, hashing has become prevalent
for large-scale similarity search. Particularly, deep hashing methods have
greatly improved the search performance under supervised scenarios. In
contrast, unsupervised deep hashing models can hardly achieve satisfactory
performance due to the lack of reliable supervisory similarity signals. To
address this issue, we propose a novel deep unsupervised hashing model, dubbed
DistillHash, which can learn a distilled data set consisted of data pairs,
which have confidence similarity signals. Specifically, we investigate the
relationship between the initial noisy similarity signals learned from local
structures and the semantic similarity labels assigned by a Bayes optimal
classifier. We show that under a mild assumption, some data pairs, of which
labels are consistent with those assigned by the Bayes optimal classifier, can
be potentially distilled. Inspired by this fact, we design a simple yet
effective strategy to distill data pairs automatically and further adopt a
Bayesian learning framework to learn hash functions from the distilled data
set. Extensive experimental results on three widely used benchmark datasets
show that the proposed DistillHash consistently accomplishes the
state-of-the-art search performance
A Decade Survey of Content Based Image Retrieval using Deep Learning
The content based image retrieval aims to find the similar images from a
large scale dataset against a query image. Generally, the similarity between
the representative features of the query image and dataset images is used to
rank the images for retrieval. In early days, various hand designed feature
descriptors have been investigated based on the visual cues such as color,
texture, shape, etc. that represent the images. However, the deep learning has
emerged as a dominating alternative of hand-designed feature engineering from a
decade. It learns the features automatically from the data. This paper presents
a comprehensive survey of deep learning based developments in the past decade
for content based image retrieval. The categorization of existing
state-of-the-art methods from different perspectives is also performed for
greater understanding of the progress. The taxonomy used in this survey covers
different supervision, different networks, different descriptor type and
different retrieval type. A performance analysis is also performed using the
state-of-the-art methods. The insights are also presented for the benefit of
the researchers to observe the progress and to make the best choices. The
survey presented in this paper will help in further research progress in image
retrieval using deep learning
MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
Cross-modal retrieval has drawn wide interest for retrieval across different
modalities of data. However, existing methods based on DNN face the challenge
of insufficient cross-modal training data, which limits the training
effectiveness and easily leads to overfitting. Transfer learning is for
relieving the problem of insufficient training data, but it mainly focuses on
knowledge transfer only from large-scale datasets as single-modal source domain
to single-modal target domain. Such large-scale single-modal datasets also
contain rich modal-independent semantic knowledge that can be shared across
different modalities. Besides, large-scale cross-modal datasets are very
labor-consuming to collect and label, so it is significant to fully exploit the
knowledge in single-modal datasets for boosting cross-modal retrieval. This
paper proposes modal-adversarial hybrid transfer network (MHTN), which to the
best of our knowledge is the first work to realize knowledge transfer from
single-modal source domain to cross-modal target domain, and learn cross-modal
common representation. It is an end-to-end architecture with two subnetworks:
(1) Modal-sharing knowledge transfer subnetwork is proposed to jointly transfer
knowledge from a large-scale single-modal dataset in source domain to all
modalities in target domain with a star network structure, which distills
modal-independent supplementary knowledge for promoting cross-modal common
representation learning. (2) Modal-adversarial semantic learning subnetwork is
proposed to construct an adversarial training mechanism between common
representation generator and modality discriminator, making the common
representation discriminative for semantics but indiscriminative for modalities
to enhance cross-modal semantic consistency during transfer process.
Comprehensive experiments on 4 widely-used datasets show its effectiveness and
generality.Comment: 12 pages, submitted to IEEE Transactions on Cybernetic
Joint Cluster Unary Loss for Efficient Cross-Modal Hashing
With the rapid growth of various types of multimodal data, cross-modal deep
hashing has received broad attention for solving cross-modal retrieval problems
efficiently. Most cross-modal hashing methods follow the traditional supervised
hashing framework in which the data pairs and data triplets
are generated for training, but the training procedure is less efficient
because the complexity is high for large-scale dataset. To address these
issues, we propose a novel and efficient cross-modal hashing algorithm in which
the unary loss is introduced. First of all, We introduce the Cross-Modal Unary
Loss (CMUL) with complexity to bridge the traditional triplet loss and
classification-based unary loss. A more accurate bound of the triplet loss for
structured multilabel data is also proposed in CMUL. Second, we propose the
novel Joint Cluster Cross-Modal Hashing (JCCH) algorithm for efficient hash
learning, in which the CMUL is involved. The resultant hashcodes form several
clusters in which the hashcodes in the same cluster share similar semantic
information, and the heterogeneity gap on different modalities is diminished by
sharing the clusters. The proposed algorithm is able to be applied to various
types of data, and experiments on large-scale datasets show that the proposed
method is superior over or comparable with state-of-the-art cross-modal hashing
methods, and training with the proposed method is more efficient than others
- …