52 research outputs found

    On the Equivalence Between Deep NADE and Generative Stochastic Networks

    Full text link
    Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P(x)P(\mathbf{x}). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P(x)P(\mathbf{x}) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201

    Learning visual representations with neural networks for video captioning and image generation

    Full text link
    La recherche sur les reĢseaux de neurones a permis de reĢaliser de larges progreĢ€s durant la dernieĢ€re deĢcennie. Non seulement les reĢseaux de neurones ont eĢteĢ appliqueĢs avec succeĢ€s pour reĢsoudre des probleĢ€mes de plus en plus complexes; mais ils sont aussi devenus lā€™approche dominante dans les domaines ouĢ€ ils ont eĢteĢ testeĢs tels que la compreĢhension du langage, les agents jouant aĢ€ des jeux de manieĢ€re automatique ou encore la vision par ordinateur, graĢ‚ce aĢ€ leurs capaciteĢs calculatoires et leurs efficaciteĢs statistiques. La preĢsente theĢ€se eĢtudie les reĢseaux de neurones appliqueĢs aĢ€ des probleĢ€mes en vision par ordinateur, ouĢ€ les repreĢsentations seĢmantiques abstraites jouent un roĢ‚le fondamental. Nous deĢmontrerons, aĢ€ la fois par la theĢorie et par lā€™expeĢrimentation, la capaciteĢ des reĢseaux de neurones aĢ€ apprendre de telles repreĢsentations aĢ€ partir de donneĢes, avec ou sans supervision. Le contenu de la theĢ€se est diviseĢ en deux parties. La premieĢ€re partie eĢtudie les reĢseaux de neurones appliqueĢs aĢ€ la description de videĢo en langage naturel, neĢcessitant lā€™apprentissage de repreĢsentation visuelle. Le premier modeĢ€le proposeĢ permet dā€™avoir une attention dynamique sur les diffeĢrentes trames de la videĢo lors de la geĢneĢration de la description textuelle pour de courtes videĢos. Ce modeĢ€le est ensuite ameĢlioreĢ par lā€™introduction dā€™une opeĢration de convolution reĢcurrente. Par la suite, la dernieĢ€re section de cette partie identifie un probleĢ€me fondamental dans la description de videĢo en langage naturel et propose un nouveau type de meĢtrique dā€™eĢvaluation qui peut eĢ‚tre utiliseĢ empiriquement comme un oracle afin dā€™analyser les performances de modeĢ€les concernant cette taĢ‚che. La deuxieĢ€me partie se concentre sur lā€™apprentissage non-superviseĢ et eĢtudie une famille de modeĢ€les capables de geĢneĢrer des images. En particulier, lā€™accent est mis sur les ā€œNeural Autoregressive Density Estimators (NADEs), une famille de modeĢ€les probabilistes pour les images naturelles. Ce travail met tout dā€™abord en eĢvidence une connection entre les modeĢ€les NADEs et les reĢseaux stochastiques geĢneĢratifs (GSN). De plus, une ameĢlioration des modeĢ€les NADEs standards est proposeĢe. DeĢnommeĢs NADEs iteĢratifs, cette ameĢlioration introduit plusieurs iteĢrations lors de lā€™infeĢrence du modeĢ€le NADEs tout en preĢservant son nombre de parameĢ€tres. DeĢbutant par une revue chronologique, ce travail se termine par un reĢsumeĢ des reĢcents deĢveloppements en lien avec les contributions preĢsenteĢes dans les deux parties principales, concernant les probleĢ€mes dā€™apprentissage de repreĢsentation seĢmantiques pour les images et les videĢos. De prometteuses directions de recherche sont envisageĢes.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networksā€™ superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Full text link
    A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm

    Methods for Optimization and Regularization of Generative Models

    Get PDF
    This thesis studies the problem of regularizing and optimizing generative models, often using insights and techniques from kernel methods. The work proceeds in three main themes. Conditional score estimation. We propose a method for estimating conditional densities based on a rich class of RKHS exponential family models. The algorithm works by solving a convex quadratic problem for fitting the gradient of the log density, the score, thus avoiding the need for estimating the normalizing constant. We show the resulting estimator to be consistent and provide convergence rates when the model is well-specified. Structuring and regularizing implicit generative models. In a first contribution, we introduce a method for learning Generative Adversarial Networks, a class of Implicit Generative Models, using a parametric family of Maximum Mean Discrepancies (MMD). We show that controlling the gradient of the critic function defining the MMD is vital for having a sensible loss function. Moreover, we devise a method to enforce exact, analytical gradient constraints. As a second contribution, we introduce and study a new generative model suited for data with low intrinsic dimension embedded in a high dimensional space. This model combines two components: an implicit model, which can learn the low-dimensional support of data, and an energy function, to refine the probability mass by importance sampling on the support of the implicit model. We further introduce algorithms for learning such a hybrid model and for efficient sampling. Optimizing implicit generative models. We first study the Wasserstein gradient flow of the Maximum Mean Discrepancy in a non-parametric setting and provide smoothness conditions on the trajectory of the flow to ensure global convergence. We identify cases when this condition does not hold and propose a new algorithm based on noise injection to mitigate this problem. In a second contribution, we consider the Wasserstein gradient flow of generic loss functionals in a parametric setting. This flow is invariant to the model's parameterization, just like the Fisher gradient flows in information geometry. It has the additional benefit to be well defined even for models with varying supports, which is particularly well suited for implicit generative models. We then introduce a general framework for approximating the Wasserstein natural gradient by leveraging a dual formulation of the Wasserstein pseudo-Riemannian metric that we restrict to a Reproducing Kernel Hilbert Space. The resulting estimator is scalable and provably consistent as it relies on Nystrom methods
    • ā€¦
    corecore