2 research outputs found
Fusion-supervised Deep Cross-modal Hashing
Deep hashing has recently received attention in cross-modal retrieval for its
impressive advantages. However, existing hashing methods for cross-modal
retrieval cannot fully capture the heterogeneous multi-modal correlation and
exploit the semantic information. In this paper, we propose a novel
\emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly,
FDCH learns unified binary codes through a fusion hash network with paired
samples as input, which effectively enhances the modeling of the correlation of
heterogeneous multi-modal data. Then, these high-quality unified hash codes
further supervise the training of the modality-specific hash networks for
encoding out-of-sample queries. Meanwhile, both pair-wise similarity
information and classification information are embedded in the hash networks
under one stream framework, which simultaneously preserves cross-modal
similarity and keeps semantic consistency. Experimental results on two
benchmark datasets demonstrate the state-of-the-art performance of FDCH
Task-adaptive Asymmetric Deep Cross-modal Hashing
Supervised cross-modal hashing aims to embed the semantic correlations of
heterogeneous modality data into the binary hash codes with discriminative
semantic labels. Because of its advantages on retrieval and storage efficiency,
it is widely used for solving efficient cross-modal retrieval. However,
existing researches equally handle the different tasks of cross-modal
retrieval, and simply learn the same couple of hash functions in a symmetric
way for them. Under such circumstance, the uniqueness of different cross-modal
retrieval tasks are ignored and sub-optimal performance may be brought.
Motivated by this, we present a Task-adaptive Asymmetric Deep Cross-modal
Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash
functions for two sub-retrieval tasks via simultaneous modality representation
and asymmetric hash learning. Unlike previous cross-modal hashing approaches,
our learning framework jointly optimizes semantic preserving that transforms
deep features of multimedia data into binary hash codes, and the semantic
regression which directly regresses query modality representation to explicit
label. With our model, the binary codes can effectively preserve semantic
correlations across different modalities, meanwhile, adaptively capture the
query semantics. The superiority of TA-ADCMH is proved on two standard datasets
from many aspects