12 research outputs found
Efficient Gradient-Free Variational Inference using Policy Search
Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources
Efficient Gradient-Free Variational Inference using Policy Search
Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources
Versatile Inverse Reinforcement Learning via Cumulative Rewards
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior
Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference
An important feature of Bayesian statistics is the opportunity to do
sequential inference: the posterior distribution obtained after seeing a
dataset can be used as prior for a second inference. However, when Monte Carlo
sampling methods are used for inference, we only have a set of samples from the
posterior distribution. To do sequential inference, we then either have to
evaluate the second posterior at only these locations and reweight the samples
accordingly, or we can estimate a functional description of the posterior
probability distribution from the samples and use that as prior for the second
inference. Here, we investigated to what extent we can obtain an accurate joint
posterior from two datasets if the inference is done sequentially rather than
jointly, under the condition that each inference step is done using Monte Carlo
sampling. To test this, we evaluated the accuracy of kernel density estimates,
Gaussian mixtures, vine copulas and Gaussian processes in approximating
posterior distributions, and then tested whether these approximations can be
used in sequential inference. In low dimensionality, Gaussian processes are
more accurate, whereas in higher dimensionality Gaussian mixtures or vine
copulas perform better. In our test cases, posterior approximations are
preferable over direct sample reweighting, although joint inference is still
preferable over sequential inference. Since the performance is case-specific,
we provide an R package mvdens with a unified interface for the density
approximation methods
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Modelling highly multi-modal data is a challenging problem in machine learning.
Most algorithms are based on maximizing the likelihood, which corresponds
to the M(oment)-projection of the data distribution to the model distribution.
The M-projection forces the model to average over modes it cannot represent.
In contrast, the I(nformation)-projection ignores such modes in the data and concentrates on the modes the model can represent.
Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes.
Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data.
In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts.
Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound.
We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets
Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides
Haptic guidance is a powerful technique to combine the strengths of humans
and autonomous systems for teleoperation. The autonomous system can provide
haptic cues to enable the operator to perform precise movements; the operator
can interfere with the plan of the autonomous system leveraging his/her
superior cognitive capabilities. However, providing haptic cues such that the
individual strengths are not impaired is challenging because low forces provide
little guidance, whereas strong forces can hinder the operator in realizing
his/her plan. Based on variational inference, we learn a Gaussian mixture model
(GMM) over trajectories to accomplish a given task. The learned GMM is used to
construct a potential field which determines the haptic cues. The potential
field smoothly changes during teleoperation based on our updated belief over
the plans and their respective phases. Furthermore, new plans are learned
online when the operator does not follow any of the proposed plans, or after
changes in the environment. User studies confirm that our framework helps users
perform teleoperation tasks more accurately than without haptic cues and, in
some cases, faster. Moreover, we demonstrate the use of our framework to help a
subject teleoperate a 7 DoF manipulator in a pick-and-place task.Comment: 19 pages, 9 figure
Information Maximizing Curriculum: A Curriculum-Based Approach for Training Mixtures of Experts
Mixtures of Experts (MoE) are known for their ability to learn complex
conditional distributions with multiple modes. However, despite their
potential, these models are challenging to train and often tend to produce poor
performance, explaining their limited popularity. Our hypothesis is that this
under-performance is a result of the commonly utilized maximum likelihood (ML)
optimization, which leads to mode averaging and a higher likelihood of getting
stuck in local maxima. We propose a novel curriculum-based approach to learning
mixture models in which each component of the MoE is able to select its own
subset of the training data for learning. This approach allows for independent
optimization of each component, resulting in a more modular architecture that
enables the addition and deletion of components on the fly, leading to an
optimization less susceptible to local optima. The curricula can ignore
data-points from modes not represented by the MoE, reducing the mode-averaging
problem. To achieve a good data coverage, we couple the optimization of the
curricula with a joint entropy objective and optimize a lower bound of this
objective. We evaluate our curriculum-based approach on a variety of multimodal
behavior learning tasks and demonstrate its superiority over competing methods
for learning MoE models and conditional generative models
Projections for Approximate Policy Iteration Algorithms
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms