3,788 research outputs found
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling
This document aims to provide a review on learning with deep generative
models (DGMs), which is an highly-active area in machine learning and more
generally, artificial intelligence. This review is not meant to be a tutorial,
but when necessary, we provide self-contained derivations for completeness.
This review has two features. First, though there are different perspectives to
classify DGMs, we choose to organize this review from the perspective of
graphical modeling, because the learning methods for directed DGMs and
undirected DGMs are fundamentally different. Second, we differentiate model
definitions from model learning algorithms, since different learning algorithms
can be applied to solve the learning problem on the same model, and an
algorithm can be applied to learn different models. We thus separate model
definition and model learning, with more emphasis on reviewing, differentiating
and connecting different learning algorithms. We also discuss promising future
research directions.Comment: add SN-GANs, SA-GANs, conditional generation (cGANs, AC-GANs). arXiv
admin note: text overlap with arXiv:1606.00709, arXiv:1801.03558 by other
author
Stein Variational Message Passing for Continuous Graphical Models
We propose a novel distributed inference algorithm for continuous graphical
models, by extending Stein variational gradient descent (SVGD) to leverage the
Markov dependency structure of the distribution of interest. Our approach
combines SVGD with a set of structured local kernel functions defined on the
Markov blanket of each node, which alleviates the curse of high dimensionality
and simultaneously yields a distributed algorithm for decentralized inference
tasks. We justify our method with theoretical analysis and show that the use of
local kernels can be viewed as a new type of localized approximation that
matches the target distribution on the conditional distributions of each node
over its Markov blanket. Our empirical results show that our method outperforms
a variety of baselines including standard MCMC and particle message passing
methods
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces
In this paper, we set forth a new vision of reinforcement learning developed
by us over the past few years, one that yields mathematically rigorous
solutions to longstanding important questions that have remained unresolved:
(i) how to design reliable, convergent, and robust reinforcement learning
algorithms (ii) how to guarantee that reinforcement learning satisfies
pre-specified "safety" guarantees, and remains in a stable region of the
parameter space (iii) how to design "off-policy" temporal difference learning
algorithms in a reliable and stable manner, and finally (iv) how to integrate
the study of reinforcement learning into the rich theory of stochastic
optimization. In this paper, we provide detailed answers to all these questions
using the powerful framework of proximal operators.
The key idea that emerges is the use of primal dual spaces connected through
the use of a Legendre transform. This allows temporal difference updates to
occur in dual spaces, allowing a variety of important technical advantages. The
Legendre transform elegantly generalizes past algorithms for solving
reinforcement learning problems, such as natural gradient methods, which we
show relate closely to the previously unconnected framework of mirror descent
methods. Equally importantly, proximal operator theory enables the systematic
development of operator splitting methods that show how to safely and reliably
decompose complex products of gradients that occur in recent variants of
gradient-based temporal difference learning. This key technical innovation
makes it possible to finally design "true" stochastic gradient methods for
reinforcement learning. Finally, Legendre transforms enable a variety of other
benefits, including modeling sparsity and domain geometry. Our work builds
extensively on recent work on the convergence of saddle-point algorithms, and
on the theory of monotone operators.Comment: 121 page
Convex variational methods for multiclass data segmentation on graphs
Graph-based variational methods have recently shown to be highly competitive
for various classification problems of high-dimensional data, but are
inherently difficult to handle from an optimization perspective. This paper
proposes a convex relaxation for a certain set of graph-based multiclass data
segmentation problems, featuring region homogeneity terms, supervised
information and/or certain constraints or penalty terms acting on the class
sizes. Particular applications include semi-supervised classification of
high-dimensional data and unsupervised segmentation of unstructured 3D point
clouds. Theoretical analysis indicates that the convex relaxation closely
approximates the original NP-hard problems, and these observations are also
confirmed experimentally. An efficient duality based algorithm is developed
that handles all constraints on the labeling function implicitly. Experiments
on semi-supervised classification indicate consistently higher accuracies than
related local minimization approaches, and considerably so when the training
data are not uniformly distributed among the data set. The accuracies are also
highly competitive against a wide range of other established methods on three
benchmark datasets. Experiments on 3D point clouds acquired by a LaDAR in
outdoor scenes, demonstrate that the scenes can accurately be segmented into
object classes such as vegetation, the ground plane and human-made structures
Variational reaction-diffusion systems for semantic segmentation
A novel global energy model for multi-class semantic image segmentation is
proposed that admits very efficient exact inference and derivative calculations
for learning. Inference in this model is equivalent to MAP inference in a
particular kind of vector-valued Gaussian Markov random field, and ultimately
reduces to solving a linear system of linear PDEs known as a reaction-diffusion
system. Solving this system can be achieved in time scaling near-linearly in
the number of image pixels by reducing it to sequential FFTs, after a linear
change of basis. The efficiency and differentiability of the model make it
especially well-suited for integration with convolutional neural networks, even
allowing it to be used in interior, feature-generating layers and stacked
multiple times. Experimental results are shown demonstrating that the model can
be employed profitably in conjunction with different convolutional net
architectures, and that doing so compares favorably to joint training of a
fully-connected CRF with a convolutional net
Discriminative Embeddings of Latent Variable Models for Structured Data
Kernel classifiers and regressors designed for structured data, such as
sequences, trees and graphs, have significantly advanced a number of
interdisciplinary areas such as computational biology and drug design.
Typically, kernels are designed beforehand for a data type which either exploit
statistics of the structures or make use of probabilistic generative models,
and then a discriminative classifier is learned based on the kernels via convex
optimization. However, such an elegant two-stage approach also limited kernel
methods from scaling up to millions of data points, and exploiting
discriminative information to learn feature representations.
We propose, structure2vec, an effective and scalable approach for structured
data representation based on the idea of embedding latent variable models into
feature spaces, and learning such feature spaces using discriminative
information. Interestingly, structure2vec extracts features by performing a
sequence of function mappings in a way similar to graphical model inference
procedures, such as mean field and belief propagation. In applications
involving millions of data points, we showed that structure2vec runs 2 times
faster, produces models which are times smaller, while at the same
time achieving the state-of-the-art predictive performance.Comment: ICML 201
A Tutorial on Deep Latent Variable Models of Natural Language
There has been much recent, exciting work on combining the complementary
strengths of latent variable models and deep learning. Latent variable modeling
makes it easy to explicitly specify model constraints through conditional
independence properties, while deep learning makes it possible to parameterize
these conditional likelihoods with powerful function approximators. While these
"deep latent variable" models provide a rich, flexible framework for modeling
many real-world phenomena, difficulties exist: deep parameterizations of
conditional likelihoods usually make posterior inference intractable, and
latent variable objectives often complicate backpropagation by introducing
points of non-differentiability. This tutorial explores these issues in depth
through the lens of variational inference.Comment: EMNLP 2018 Tutoria
Advances in Variational Inference
Many modern unsupervised or semi-supervised machine learning algorithms rely
on Bayesian probabilistic models. These models are usually intractable and thus
require approximate inference. Variational inference (VI) lets us approximate a
high-dimensional Bayesian posterior with a simpler variational distribution by
solving an optimization problem. This approach has been successfully used in
various models and large-scale applications. In this review, we give an
overview of recent trends in variational inference. We first introduce standard
mean field variational inference, then review recent advances focusing on the
following aspects: (a) scalable VI, which includes stochastic approximations,
(b) generic VI, which extends the applicability of VI to a large class of
otherwise intractable models, such as non-conjugate models, (c) accurate VI,
which includes variational models beyond the mean field approximation or with
atypical divergences, and (d) amortized VI, which implements the inference over
local latent variables with inference networks. Finally, we provide a summary
of promising future research directions
Optimal strategies for the control of autonomous vehicles in data assimilation
We propose a method to compute optimal control paths for autonomous vehicles
deployed for the purpose of inferring a velocity field. In addition to being
advected by the flow, the vehicles are able to effect a fixed relative speed
with arbitrary control over direction. It is this direction that is used as the
basis for the locally optimal control algorithm presented here, with objective
formed from the variance trace of the expected posterior distribution. We
present results for linear flows near hyperbolic fixed points
Bilevel approaches for learning of variational imaging models
We review some recent learning approaches in variational imaging, based on
bilevel optimisation, and emphasize the importance of their treatment in
function space. The paper covers both analytical and numerical techniques.
Analytically, we include results on the existence and structure of minimisers,
as well as optimality conditions for their characterisation. Based on this
information, Newton type methods are studied for the solution of the problems
at hand, combining them with sampling techniques in case of large databases.
The computational verification of the developed techniques is extensively
documented, covering instances with different type of regularisers, several
noise models, spatially dependent weights and large image databases
- …