608 research outputs found
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning
The performance of generative zero-shot methods mainly depends on the quality
of generated features and how well the model facilitates knowledge transfer
between visual and semantic domains. The quality of generated features is a
direct consequence of the ability of the model to capture the several modes of
the underlying data distribution. To address these issues, we propose a new
two-level joint maximization idea to augment the generative network with an
inference network during training which helps our model capture the several
modes of the data and generate features that better represent the underlying
data distribution. This provides strong cross-modal interaction for effective
transfer of knowledge between visual and semantic domains. Furthermore,
existing methods train the zero-shot classifier either on generate synthetic
image features or latent embeddings produced by leveraging representation
learning. In this work, we unify these paradigms into a single model which in
addition to synthesizing image features, also utilizes the representation
learning capabilities of the inference network to provide discriminative
features for the final zero-shot recognition task. We evaluate our approach on
four benchmark datasets i.e. CUB, FLO, AWA1 and AWA2 against several
state-of-the-art methods, and show its performance. We also perform ablation
studies to analyze and understand our method more carefully for the Generalized
Zero-shot Learning task.Comment: Under Submissio
Generalized Adversarially Learned Inference
Allowing effective inference of latent vectors while training GANs can
greatly increase their applicability in various downstream tasks. Recent
approaches, such as ALI and BiGAN frameworks, develop methods of inference of
latent variables in GANs by adversarially training an image generator along
with an encoder to match two joint distributions of image and latent vector
pairs. We generalize these approaches to incorporate multiple layers of
feedback on reconstructions, self-supervision, and other forms of supervision
based on prior or learned knowledge about the desired solutions. We achieve
this by modifying the discriminator's objective to correctly identify more than
two joint distributions of tuples of an arbitrary number of random variables
consisting of images, latent vectors, and other variables generated through
auxiliary tasks, such as reconstruction and inpainting or as outputs of
suitable pre-trained models. We design a non-saturating maximization objective
for the generator-encoder pair and prove that the resulting adversarial game
corresponds to a global optimum that simultaneously matches all the
distributions. Within our proposed framework, we introduce a novel set of
techniques for providing self-supervised feedback to the model based on
properties, such as patch-level correspondence and cycle consistency of
reconstructions. Through comprehensive experiments, we demonstrate the
efficacy, scalability, and flexibility of the proposed approach for a variety
of tasks.Comment: AAAI 2021 (accepted for publication
- …