    A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling

    This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence. This review is not meant to be a tutorial, but when necessary, we provide self-contained derivations for completeness. This review has two features. First, though there are different perspectives to classify DGMs, we choose to organize this review from the perspective of graphical modeling, because the learning methods for directed DGMs and undirected DGMs are fundamentally different. Second, we differentiate model definitions from model learning algorithms, since different learning algorithms can be applied to solve the learning problem on the same model, and an algorithm can be applied to learn different models. We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms. We also discuss promising future research directions.Comment: add SN-GANs, SA-GANs, conditional generation (cGANs, AC-GANs). arXiv admin note: text overlap with arXiv:1606.00709, arXiv:1801.03558 by other author

    Stein Variational Message Passing for Continuous Graphical Models

    We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest. Our approach combines SVGD with a set of structured local kernel functions defined on the Markov blanket of each node, which alleviates the curse of high dimensionality and simultaneously yields a distributed algorithm for decentralized inference tasks. We justify our method with theoretical analysis and show that the use of local kernels can be viewed as a new type of localized approximation that matches the target distribution on the conditional distributions of each node over its Markov blanket. Our empirical results show that our method outperforms a variety of baselines including standard MCMC and particle message passing methods

    Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

    In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.Comment: 121 page

    Convex variational methods for multiclass data segmentation on graphs

    Graph-based variational methods have recently shown to be highly competitive for various classification problems of high-dimensional data, but are inherently difficult to handle from an optimization perspective. This paper proposes a convex relaxation for a certain set of graph-based multiclass data segmentation problems, featuring region homogeneity terms, supervised information and/or certain constraints or penalty terms acting on the class sizes. Particular applications include semi-supervised classification of high-dimensional data and unsupervised segmentation of unstructured 3D point clouds. Theoretical analysis indicates that the convex relaxation closely approximates the original NP-hard problems, and these observations are also confirmed experimentally. An efficient duality based algorithm is developed that handles all constraints on the labeling function implicitly. Experiments on semi-supervised classification indicate consistently higher accuracies than related local minimization approaches, and considerably so when the training data are not uniformly distributed among the data set. The accuracies are also highly competitive against a wide range of other established methods on three benchmark datasets. Experiments on 3D point clouds acquired by a LaDAR in outdoor scenes, demonstrate that the scenes can accurately be segmented into object classes such as vegetation, the ground plane and human-made structures

    Variational reaction-diffusion systems for semantic segmentation

    A novel global energy model for multi-class semantic image segmentation is proposed that admits very efficient exact inference and derivative calculations for learning. Inference in this model is equivalent to MAP inference in a particular kind of vector-valued Gaussian Markov random field, and ultimately reduces to solving a linear system of linear PDEs known as a reaction-diffusion system. Solving this system can be achieved in time scaling near-linearly in the number of image pixels by reducing it to sequential FFTs, after a linear change of basis. The efficiency and differentiability of the model make it especially well-suited for integration with convolutional neural networks, even allowing it to be used in interior, feature-generating layers and stacked multiple times. Experimental results are shown demonstrating that the model can be employed profitably in conjunction with different convolutional net architectures, and that doing so compares favorably to joint training of a fully-connected CRF with a convolutional net

    Discriminative Embeddings of Latent Variable Models for Structured Data

    Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced a number of interdisciplinary areas such as computational biology and drug design. Typically, kernels are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods from scaling up to millions of data points, and exploiting discriminative information to learn feature representations. We propose, structure2vec, an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information. Interestingly, structure2vec extracts features by performing a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are 10,00010,000 times smaller, while at the same time achieving the state-of-the-art predictive performance.Comment: ICML 201

    A Tutorial on Deep Latent Variable Models of Natural Language

    There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.Comment: EMNLP 2018 Tutoria

    Advances in Variational Inference

    Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions

    Optimal strategies for the control of autonomous vehicles in data assimilation

    We propose a method to compute optimal control paths for autonomous vehicles deployed for the purpose of inferring a velocity field. In addition to being advected by the flow, the vehicles are able to effect a fixed relative speed with arbitrary control over direction. It is this direction that is used as the basis for the locally optimal control algorithm presented here, with objective formed from the variance trace of the expected posterior distribution. We present results for linear flows near hyperbolic fixed points

    Bilevel approaches for learning of variational imaging models

    We review some recent learning approaches in variational imaging, based on bilevel optimisation, and emphasize the importance of their treatment in function space. The paper covers both analytical and numerical techniques. Analytically, we include results on the existence and structure of minimisers, as well as optimality conditions for their characterisation. Based on this information, Newton type methods are studied for the solution of the problems at hand, combining them with sampling techniques in case of large databases. The computational verification of the developed techniques is extensively documented, covering instances with different type of regularisers, several noise models, spatially dependent weights and large image databases