1,311 research outputs found
Ranking-based Deep Cross-modal Hashing
Cross-modal hashing has been receiving increasing interests for its low
storage cost and fast query speed in multi-modal data retrievals. However, most
existing hashing methods are based on hand-crafted or raw level features of
objects, which may not be optimally compatible with the coding process.
Besides, these hashing methods are mainly designed to handle simple pairwise
similarity. The complex multilevel ranking semantic structure of instances
associated with multiple labels has not been well explored yet. In this paper,
we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH
firstly uses the feature and label information of data to derive a
semi-supervised semantic ranking list. Next, to expand the semantic
representation power of hand-crafted features, RDCMH integrates the semantic
ranking information into deep cross-modal hashing and jointly optimizes the
compatible parameters of deep feature representations and of hashing functions.
Experiments on real multi-modal datasets show that RDCMH outperforms other
competitive baselines and achieves the state-of-the-art performance in
cross-modal retrieval applications
Deep Binary Reconstruction for Cross-modal Hashing
With the increasing demand of massive multimodal data storage and
organization, cross-modal retrieval based on hashing technique has drawn much
attention nowadays. It takes the binary codes of one modality as the query to
retrieve the relevant hashing codes of another modality. However, the existing
binary constraint makes it difficult to find the optimal cross-modal hashing
function. Most approaches choose to relax the constraint and perform
thresholding strategy on the real-value representation instead of directly
solving the original objective. In this paper, we first provide a concrete
analysis about the effectiveness of multimodal networks in preserving the
inter- and intra-modal consistency. Based on the analysis, we provide a
so-called Deep Binary Reconstruction (DBRC) network that can directly learn the
binary hashing codes in an unsupervised fashion. The superiority comes from a
proposed simple but efficient activation function, named as Adaptive Tanh
(ATanh). The ATanh function can adaptively learn the binary codes and be
trained via back-propagation. Extensive experiments on three benchmark datasets
demonstrate that DBRC outperforms several state-of-the-art methods in both
image2text and text2image retrieval task.Comment: 8 pages, 5 figures, accepted by ACM Multimedia 201
Unsupervised Generative Adversarial Cross-modal Hashing
Cross-modal hashing aims to map heterogeneous multimedia data into a common
Hamming space, which can realize fast and flexible retrieval across different
modalities. Unsupervised cross-modal hashing is more flexible and applicable
than supervised methods, since no intensive labeling work is involved. However,
existing unsupervised methods learn hashing functions by preserving inter and
intra correlations, while ignoring the underlying manifold structure across
different modalities, which is extremely helpful to capture meaningful nearest
neighbors of different modalities for cross-modal retrieval. To address the
above problem, in this paper we propose an Unsupervised Generative Adversarial
Cross-modal Hashing approach (UGACH), which makes full use of GAN's ability for
unsupervised representation learning to exploit the underlying manifold
structure of cross-modal data. The main contributions can be summarized as
follows: (1) We propose a generative adversarial network to model cross-modal
hashing in an unsupervised fashion. In the proposed UGACH, given a data of one
modality, the generative model tries to fit the distribution over the manifold
structure, and select informative data of another modality to challenge the
discriminative model. The discriminative model learns to distinguish the
generated data and the true positive data sampled from correlation graph to
achieve better retrieval accuracy. These two models are trained in an
adversarial way to improve each other and promote hashing function learning.
(2) We propose a correlation graph based approach to capture the underlying
manifold structure across different modalities, so that data of different
modalities but within the same manifold can have smaller Hamming distance and
promote retrieval accuracy. Extensive experiments compared with 6
state-of-the-art methods verify the effectiveness of our proposed approach.Comment: 8 pages, accepted by 32th AAAI Conference on Artificial Intelligence
(AAAI), 201
- …