3 research outputs found
Adversarially Approximated Autoencoder for Image Generation and Manipulation
Regularized autoencoders learn the latent codes, a structure with the
regularization under the distribution, which enables them the capability to
infer the latent codes given observations and generate new samples given the
codes. However, they are sometimes ambiguous as they tend to produce
reconstructions that are not necessarily faithful reproduction of the inputs.
The main reason is to enforce the learned latent code distribution to match a
prior distribution while the true distribution remains unknown. To improve the
reconstruction quality and learn the latent space a manifold structure, this
work present a novel approach using the adversarially approximated autoencoder
(AAAE) to investigate the latent codes with adversarial approximation. Instead
of regularizing the latent codes by penalizing on the distance between the
distributions of the model and the target, AAAE learns the autoencoder flexibly
and approximates the latent space with a simpler generator. The ratio is
estimated using generative adversarial network (GAN) to enforce the similarity
of the distributions. Additionally, the image space is regularized with an
additional adversarial regularizer. The proposed approach unifies two deep
generative models for both latent space inference and diverse generation. The
learning scheme is realized without regularization on the latent codes, which
also encourages faithful reconstruction. Extensive validation experiments on
four real-world datasets demonstrate the superior performance of AAAE. In
comparison to the state-of-the-art approaches, AAAE generates samples with
better quality and shares the properties of regularized autoencoder with a nice
latent manifold structure
Learning Discriminative Hashing Codes for Cross-Modal Retrieval based on Multi-view Features
Hashing techniques have been applied broadly in retrieval tasks due to their
low storage requirements and high speed of processing. Many hashing methods
based on a single view have been extensively studied for information retrieval.
However, the representation capacity of a single view is insufficient and some
discriminative information is not captured, which results in limited
improvement. In this paper, we employ multiple views to represent images and
texts for enriching the feature information. Our framework exploits the
complementary information among multiple views to better learn the
discriminative compact hash codes. A discrete hashing learning framework that
jointly performs classifier learning and subspace learning is proposed to
complete multiple search tasks simultaneously. Our framework includes two
stages, namely a kernelization process and a quantization process.
Kernelization aims to find a common subspace where multi-view features can be
fused. The quantization stage is designed to learn discriminative unified
hashing codes. Extensive experiments are performed on single-label datasets
(WiKi and MMED) and multi-label datasets (MIRFlickr and NUS-WIDE) and the
experimental results indicate the superiority of our method compared with the
state-of-the-art methods.Comment: 28 pages, 10 figures, 13 tables. The paper is under consideration at
Pattern Analysis and Application
Semantic Granularity Metric Learning for Visual Search
Deep metric learning applied to various applications has shown promising
results in identification, retrieval and recognition. Existing methods often do
not consider different granularity in visual similarity. However, in many
domain applications, images exhibit similarity at multiple granularities with
visual semantic concepts, e.g. fashion demonstrates similarity ranging from
clothing of the exact same instance to similar looks/design or a common
category. Therefore, training image triplets/pairs used for metric learning
inherently possess different degree of information. However, the existing
methods often treats them with equal importance during training. This hinders
capturing the underlying granularities in feature similarity required for
effective visual search.
In view of this, we propose a new deep semantic granularity metric learning
(SGML) that develops a novel idea of leveraging attribute semantic space to
capture different granularity of similarity, and then integrate this
information into deep metric learning. The proposed method simultaneously
learns image attributes and embeddings using multitask CNNs. The two tasks are
not only jointly optimized but are further linked by the semantic granularity
similarity mappings to leverage the correlations between the tasks. To this
end, we propose a new soft-binomial deviance loss that effectively integrates
the degree of information in training samples, which helps to capture visual
similarity at multiple granularities. Compared to recent ensemble-based
methods, our framework is conceptually elegant, computationally simple and
provides better performance. We perform extensive experiments on benchmark
metric learning datasets and demonstrate that our method outperforms recent
state-of-the-art methods, e.g., 1-4.5\% improvement in Recall@1 over the
previous state-of-the-arts [1],[2] on DeepFashion In-Shop dataset.Comment: 10 pages, 10 figure