5,665 research outputs found
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Multi-View Data Generation Without View Supervision
The development of high-dimensional generative models has recently gained a
great surge of interest with the introduction of variational auto-encoders and
generative adversarial neural networks. Different variants have been proposed
where the underlying latent space is structured, for example, based on
attributes describing the data to generate. We focus on a particular problem
where one aims at generating samples corresponding to a number of objects under
various views. We assume that the distribution of the data is driven by two
independent latent factors: the content, which represents the intrinsic
features of an object, and the view, which stands for the settings of a
particular observation of that object. Therefore, we propose a generative model
and a conditional variant built on such a disentangled latent space. This
approach allows us to generate realistic samples corresponding to various
objects in a high variety of views. Unlike many multi-view approaches, our
model doesn't need any supervision on the views but only on the content.
Compared to other conditional generation approaches that are mostly based on
binary or categorical attributes, we make no such assumption about the factors
of variations. Our model can be used on problems with a huge, potentially
infinite, number of categories. We experiment it on four image datasets on
which we demonstrate the effectiveness of the model and its ability to
generalize.Comment: Published as a conference paper at ICLR 201
- …