29 research outputs found
Monte Carlo Chess
MCC, a UCT based Chess engine, was created in order to test the performance of Monte-Carlo
Tree Search for the game of Chess. Mainly by modifications that increase the accuracy of the
simulation strategy, the performance of the base implementation was improved by approximately 864 Elo. MCC performed still too bad to compete with Minimax based chess programs or to seriously suffer from search traps
Trust-region variational inference with gaussian mixture models
Funding Information: This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 645582 (RoMaNS). Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt. Publisher Copyright: © 2020 Oleg Arenz, Mingjun Zhong and Gerhard Neumann.Peer reviewe
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Modelling highly multi-modal data is a challenging problem in machine learning.
Most algorithms are based on maximizing the likelihood, which corresponds
to the M(oment)-projection of the data distribution to the model distribution.
The M-projection forces the model to average over modes it cannot represent.
In contrast, the I(nformation)-projection ignores such modes in the data and concentrates on the modes the model can represent.
Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes.
Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data.
In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts.
Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound.
We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets
Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides
Haptic guidance is a powerful technique to combine the strengths of humans
and autonomous systems for teleoperation. The autonomous system can provide
haptic cues to enable the operator to perform precise movements; the operator
can interfere with the plan of the autonomous system leveraging his/her
superior cognitive capabilities. However, providing haptic cues such that the
individual strengths are not impaired is challenging because low forces provide
little guidance, whereas strong forces can hinder the operator in realizing
his/her plan. Based on variational inference, we learn a Gaussian mixture model
(GMM) over trajectories to accomplish a given task. The learned GMM is used to
construct a potential field which determines the haptic cues. The potential
field smoothly changes during teleoperation based on our updated belief over
the plans and their respective phases. Furthermore, new plans are learned
online when the operator does not follow any of the proposed plans, or after
changes in the environment. User studies confirm that our framework helps users
perform teleoperation tasks more accurately than without haptic cues and, in
some cases, faster. Moreover, we demonstrate the use of our framework to help a
subject teleoperate a 7 DoF manipulator in a pick-and-place task.Comment: 19 pages, 9 figure
State-regularized policy search for linearized dynamical systems
Trajectory-Centric Reinforcement Learning and Trajectory
Optimization methods optimize a sequence of feedbackcontrollers
by taking advantage of local approximations of
model dynamics and cost functions. Stability of the policy update
is a major issue for these methods, rendering them hard
to apply for highly nonlinear systems. Recent approaches
combine classical Stochastic Optimal Control methods with
information-theoretic bounds to control the step-size of the
policy update and could even be used to train nonlinear deep
control policies. These methods bound the relative entropy
between the new and the old policy to ensure a stable policy
update. However, despite the bound in policy space, the
state distributions of two consecutive policies can still differ
significantly, rendering the used local approximate models invalid.
To alleviate this issue we propose enforcing a relative
entropy constraint not only on the policy update, but also on
the update of the state distribution, around which the dynamics
and cost are being approximated. We present a derivation
of the closed-form policy update and show that our approach
outperforms related methods on two nonlinear and highly dynamic
simulated systems
Probabilistic approach to physical object disentangling
Physically disentangling entangled objects from each other is a problem
encountered in waste segregation or in any task that requires disassembly of
structures. Often there are no object models, and, especially with cluttered
irregularly shaped objects, the robot can not create a model of the scene due
to occlusion. One of our key insights is that based on previous sensory input
we are only interested in moving an object out of the disentanglement around
obstacles. That is, we only need to know where the robot can successfully move
in order to plan the disentangling. Due to the uncertainty we integrate
information about blocked movements into a probability map. The map defines the
probability of the robot successfully moving to a specific configuration. Using
as cost the failure probability of a sequence of movements we can then plan and
execute disentangling iteratively. Since our approach circumvents only
previously encountered obstacles, new movements will yield information about
unknown obstacles that block movement until the robot has learned to circumvent
all obstacles and disentangling succeeds. In the experiments, we use a special
probabilistic version of the Rapidly exploring Random Tree (RRT) algorithm for
planning and demonstrate successful disentanglement of objects both in 2-D and
3-D simulation, and, on a KUKA LBR 7-DOF robot. Moreover, our approach
outperforms baseline methods
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(nformation)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets
Trust-Region Variational Inference with Gaussian Mixture Models
Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources
Optimal Control and Inverse Optimal Control by Distribution Matching
Optimal control is a powerful approach to achieve optimal behavior. However, it typically requires a manual specification of a cost function which often contains several objectives, such as reaching goal positions at different time steps or energy efficiency. Manually trading-off these objectives is often difficult and requires a high engineering effort. In this paper, we present a new approach to specify optimal behavior. We directly specify the desired behavior by a distribution over future states or features of the states. For example, the experimenter could choose to reach certain mean positions with given accuracy/variance at specified time steps. Our approach also unifies optimal control and inverse optimal control in one framework. Given a desired state distribution, we estimate a cost function such that the optimal controller matches the desired distribution. If the desired distribution is estimated from expert demonstrations, our approach performs inverse optimal control. We evaluate our approach on several optimal and inverse optimal control tasks on non-linear systems using incremental linearizations similar to differential dynamic programming approaches