13,272 research outputs found

    Diversity vs. Recognizability: Human-like generalization in one-shot generative models

    Full text link
    Robust generalization to new concepts has long remained a distinctive feature of human intelligence. However, recent progress in deep generative models has now led to neural architectures capable of synthesizing novel instances of unknown visual concepts from a single training example. Yet, a more precise comparison between these models and humans is not possible because existing performance metrics for generative models (i.e., FID, IS, likelihood) are not appropriate for the one-shot generation scenario. Here, we propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity (i.e., intra-class variability). Using this framework, we perform a systematic evaluation of representative one-shot generative models on the Omniglot handwritten dataset. We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. Extensive analyses of the effect of key model parameters further revealed that spatial attention and context integration have a linear contribution to the diversity-recognizability trade-off. In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability. Using the diversity-recognizability framework, we were able to identify models and parameters that closely approximate human data

    One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach

    Get PDF
    Deep learning, even if it is very successful nowadays, traditionally needs very large amounts of labeled data to perform excellent on the classification task. In an attempt to solve this problem, the one-shot learning paradigm, which makes use of just one labeled sample per class and prior knowledge, becomes increasingly important. In this paper, we propose a new one-shot learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform classification. Complementary to prior studies, MoVAE represents a shift of paradigm in comparison with the usual one-shot learning methods, as it does not use any prior knowledge. Instead, it starts from zero knowledge and one labeled sample per class. Afterward, by using unlabeled data and the generalization learning concept (in a way, more as humans do), it is capable to gradually improve by itself its performance. Even more, if there are no unlabeled data available MoVAE can still perform well in one-shot learning classification. We demonstrate empirically the efficiency of our proposed approach on three datasets, i.e. the handwritten digits (MNIST), fashion products (Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE outperforms state-of-the-art one-shot learning algorithms

    Generative and Discriminative Text Classification with Recurrent Neural Networks

    Full text link
    We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models

    Generative Adversarial Text to Image Synthesis

    Full text link
    Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image model- ing, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.Comment: ICML 201
    • …
    corecore