9,825 research outputs found

    Multi-Generator Generative Adversarial Nets

    Full text link
    We propose a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapse and delivering state-of-the-art results. A minimax formulation is able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture GAN (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators' distributions and the empirical data distribution is minimal, whilst the JSD among generators' distributions is maximal, hence effectively avoiding the mode collapse. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by generators

    Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control

    Full text link
    Generating sequential decision process from huge amounts of measured process data is a future research direction for collaborative factory automation, making full use of those online or offline process data to directly design flexible make decisions policy, and evaluate performance. The key challenges for the sequential decision process is to online generate sequential decision-making policy directly, and transferring knowledge across tasks domain. Most multi-task policy generating algorithms often suffer from insufficient generating cross-task sharing structure at discrete-time nonlinear systems with applications. This paper proposes the multi-task generative adversarial nets with shared memory for cross-domain coordination control, which can generate sequential decision policy directly from raw sensory input of all of tasks, and online evaluate performance of system actions in discrete-time nonlinear systems. Experiments have been undertaken using a professional flexible manufacturing testbed deployed within a smart factory of Weichai Power in China. Results on three groups of discrete-time nonlinear control tasks show that our proposed model can availably improve the performance of task with the help of other related tasks

    Dual Discriminator Generative Adversarial Nets

    Full text link
    We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database

    Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching

    Full text link
    Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue. We employ a generative model to iteratively sample a subset of challenging negative samples to fool our classification model. Both models are alternatively optimized using REINFORCE algorithm. The proposed method is completely different from previous ones, where negative samples in training set are directly used or uniformly down-sampled. Further, we propose using Multi-scale Matching which explicitly inspects the correlation between words and ngrams of different levels of granularity. We evaluate the proposed method on SemEval 2016 and SemEval 2017 datasets and achieves state-of-the-art or similar performance

    Segmentation Guided Image-to-Image Translation with Adversarial Networks

    Full text link
    Recently image-to-image translation has received increasing attention, which aims to map images in one domain to another specific one. Existing methods mainly solve this task via a deep generative model, and focus on exploring the relationship between different domains. However, these methods neglect to utilize higher-level and instance-specific information to guide the training process, leading to a great deal of unrealistic generated images of low quality. Existing methods also lack of spatial controllability during translation. To address these challenge, we propose a novel Segmentation Guided Generative Adversarial Networks (SGGAN), which leverages semantic segmentation to further boost the generation performance and provide spatial mapping. In particular, a segmentor network is designed to impose semantic information on the generated images. Experimental results on multi-domain face image translation task empirically demonstrate our ability of the spatial modification and our superiority in image quality over several state-of-the-art methods.Comment: Accepted for publication in 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019

    3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks

    Full text link
    Recently researchers have been shifting their focus towards learned 3D shape descriptors from hand-craft ones to better address challenging issues of the deformation and structural variation inherently present in 3D objects. 3D geometric data are often transformed to 3D Voxel grids with regular format in order to be better fed to a deep neural net architecture. However, the computational intractability of direct application of 3D convolutional nets to 3D volumetric data severely limits the efficiency (i.e. slow processing) and effectiveness (i.e. unsatisfied accuracy) in processing 3D geometric data. In this paper, powered with a novel design of adversarial networks (3D-A-Nets), we have developed a novel 3D deep dense shape descriptor (3D-DDSD) to address the challenging issues of efficient and effective 3D volumetric data processing. We developed new definition of 2D multilayer dense representation (MDR) of 3D volumetric data to extract concise but geometrically informative shape description and a novel design of adversarial networks that jointly train a set of convolution neural network (CNN), recurrent neural network (RNN) and an adversarial discriminator. More specifically, the generator network produces 3D shape features that encourages the clustering of samples from the same category with correct class label, whereas the discriminator network discourages the clustering by assigning them misleading adversarial class labels. By addressing the challenges posed by the computational inefficiency of direct application of CNN to 3D volumetric data, 3D-A-Nets can learn high-quality 3D-DSDD which demonstrates superior performance on 3D shape classification and retrieval over other state-of-the-art techniques by a great margin.Comment: 8 pages, 8 figure

    Tensorizing Generative Adversarial Nets

    Full text link
    Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such as mobile phones. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the generative performance and sample quality. To learn the model, we employ an efficient algorithm which alternatively optimizes both discriminator and generator. Experimental outcomes demonstrate that our model can achieve high compression rate for model parameters up to 3535 times when compared to the original GAN for MNIST dataset.Comment: 4 pages, 3 figure

    Student's t-Generative Adversarial Networks

    Full text link
    Generative Adversarial Networks (GANs) have a great performance in image generation, but they need a large scale of data to train the entire framework, and often result in nonsensical results. We propose a new method referring to conditional GAN, which equipments the latent noise with mixture of Student's t-distribution with attention mechanism in addition to class information. Student's t-distribution has long tails that can provide more diversity to the latent noise. Meanwhile, the discriminator in our model implements two tasks simultaneously, judging whether the images come from the true data distribution, and identifying the class of each generated images. The parameters of the mixture model can be learned along with those of GANs. Moreover, we mathematically prove that any multivariate Student's t-distribution can be obtained by a linear transformation of a normal multivariate Student's t-distribution. Experiments comparing the proposed method with typical GAN, DeliGAN and DCGAN indicate that, our method has a great performance on generating diverse and legible objects with limited data

    Multi-View Image Generation from a Single-View

    Full text link
    This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input. To generate realistic-looking images with different views from the input, we propose a new image generation model termed VariGANs that combines the strengths of the variational inference and the Generative Adversarial Networks (GANs). Our proposed VariGANs model generates the target image in a coarse-to-fine manner instead of a single pass which suffers from severe artifacts. It first performs variational inference to model global appearance of the object (e.g., shape and color) and produce a coarse image with a different view. Conditioned on the generated low resolution images, it then proceeds to perform adversarial learning to fill details and generate images of consistent details with the input. Extensive experiments conducted on two clothing datasets, MVC and DeepFashion, have demonstrated that images of a novel view generated by our model are more plausible than those generated by existing approaches, in terms of more consistent global appearance as well as richer and sharper details

    Medical Image Generation using Generative Adversarial Networks

    Full text link
    Generative adversarial networks (GANs) are unsupervised Deep Learning approach in the computer vision community which has gained significant attention from the last few years in identifying the internal structure of multimodal medical imaging data. The adversarial network simultaneously generates realistic medical images and corresponding annotations, which proven to be useful in many cases such as image augmentation, image registration, medical image generation, image reconstruction, and image-to-image translation. These properties bring the attention of the researcher in the field of medical image analysis and we are witness of rapid adaption in many novel and traditional applications. This chapter provides state-of-the-art progress in GANs-based clinical application in medical image generation, and cross-modality synthesis. The various framework of GANs which gained popularity in the interpretation of medical images, such as Deep Convolutional GAN (DCGAN), Laplacian GAN (LAPGAN), pix2pix, CycleGAN, and unsupervised image-to-image translation model (UNIT), continue to improve their performance by incorporating additional hybrid architecture, has been discussed. Further, some of the recent applications of these frameworks for image reconstruction, and synthesis, and future research directions in the area have been covered.Comment: 19 pages, 3 figures, 5 table
    • …
    corecore