2 research outputs found
Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities
Cross-modal retrieval aims to retrieve relevant data across different
modalities (e.g., texts vs. images). The common strategy is to apply
element-wise constraints between manually labeled pair-wise items to guide the
generators to learn the semantic relationships between the modalities, so that
the similar items can be projected close to each other in the common
representation subspace. However, such constraints often fail to preserve the
semantic structure between unpaired but semantically similar items (e.g. the
unpaired items with the same class label are more similar than items with
different labels). To address the above problem, we propose a novel cross-modal
similarity transferring (CMST) method to learn and preserve the semantic
relationships between unpaired items in an unsupervised way. The key idea is to
learn the quantitative similarities in single-modal representation subspace,
and then transfer them to the common representation subspace to establish the
semantic relationships between unpaired items across modalities. Experiments
show that our method outperforms the state-of-the-art approaches both in the
class-based and pair-based retrieval tasks
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
With the development of web technology, multi-modal or multi-view data has
surged as a major stream for big data, where each modal/view encodes individual
property of data objects. Often, different modalities are complementary to each
other. Such fact motivated a lot of research attention on fusing the
multi-modal feature spaces to comprehensively characterize the data objects.
Most of the existing state-of-the-art focused on how to fuse the energy or
information from multi-modal spaces to deliver a superior performance over
their counterparts with single modal. Recently, deep neural networks have
exhibited as a powerful architecture to well capture the nonlinear distribution
of high-dimensional multimedia data, so naturally does for multi-modal data.
Substantial empirical studies are carried out to demonstrate its advantages
that are benefited from deep multi-modal methods, which can essentially deepen
the fusion from multi-modal deep feature spaces. In this paper, we provide a
substantial overview of the existing state-of-the-arts on the filed of
multi-modal data analytics from shallow to deep spaces. Throughout this survey,
we further indicate that the critical components for this field go to
collaboration, adversarial competition and fusion over multi-modal spaces.
Finally, we share our viewpoints regarding some future directions on this
field.Comment: Appearing at ACM TOMM, 26 page