168,108 research outputs found
Policy search with high-dimensional context variables
Direct contextual policy search methods learn to improve policy
parameters and simultaneously generalize these parameters
to different context or task variables. However, learning
from high-dimensional context variables, such as camera images,
is still a prominent problem in many real-world tasks.
A naive application of unsupervised dimensionality reduction
methods to the context variables, such as principal component
analysis, is insufficient as task-relevant input may be ignored.
In this paper, we propose a contextual policy search method in
the model-based relative entropy stochastic search framework
with integrated dimensionality reduction. We learn a model of
the reward that is locally quadratic in both the policy parameters
and the context variables. Furthermore, we perform supervised
linear dimensionality reduction on the context variables
by nuclear norm regularization. The experimental results
show that the proposed method outperforms naive dimensionality
reduction via principal component analysis and
a state-of-the-art contextual policy search method
Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation
Contextual policy search (CPS) is a class of multi-task reinforcement
learning algorithms that is particularly useful for robotic applications. A
recent state-of-the-art method is Contextual Covariance Matrix Adaptation
Evolution Strategies (C-CMA-ES). It is based on the standard black-box
optimization algorithm CMA-ES. There are two useful extensions of CMA-ES that
we will transfer to C-CMA-ES and evaluate empirically: ACM-ES, which uses a
comparison-based surrogate model, and aCMA-ES, which uses an active update of
the covariance matrix. We will show that improvements with these methods can be
impressive in terms of sample-efficiency, although this is not relevant any
more for the robotic domain.Comment: Supplementary material for poster paper accepted at GECCO 2019;
https://doi.org/10.1145/3319619.332193
Active Sensing as Bayes-Optimal Sequential Decision Making
Sensory inference under conditions of uncertainty is a major problem in both
machine learning and computational neuroscience. An important but poorly
understood aspect of sensory processing is the role of active sensing. Here, we
present a Bayes-optimal inference and control framework for active sensing,
C-DAC (Context-Dependent Active Controller). Unlike previously proposed
algorithms that optimize abstract statistical objectives such as information
maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy
[Najemnik & Geisler, 2005], our active sensing model directly minimizes a
combination of behavioral costs, such as temporal delay, response error, and
effort. We simulate these algorithms on a simple visual search task to
illustrate scenarios in which context-sensitivity is particularly beneficial
and optimization with respect to generic statistical objectives particularly
inadequate. Motivated by the geometric properties of the C-DAC policy, we
present both parametric and non-parametric approximations, which retain
context-sensitivity while significantly reducing computational complexity.
These approximations enable us to investigate the more complex problem
involving peripheral vision, and we notice that the difference between C-DAC
and statistical policies becomes even more evident in this scenario.Comment: Scheduled to appear in UAI 201
The OS* Algorithm: a Joint Approach to Exact Optimization and Sampling
Most current sampling algorithms for high-dimensional distributions are based
on MCMC techniques and are approximate in the sense that they are valid only
asymptotically. Rejection sampling, on the other hand, produces valid samples,
but is unrealistically slow in high-dimension spaces. The OS* algorithm that we
propose is a unified approach to exact optimization and sampling, based on
incremental refinements of a functional upper bound, which combines ideas of
adaptive rejection sampling and of A* optimization search. We show that the
choice of the refinement can be done in a way that ensures tractability in
high-dimension spaces, and we present first experiments in two different
settings: inference in high-order HMMs and in large discrete graphical models.Comment: 21 page
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach
We systematically investigate the effect heterogeneity of job search
programmes for unemployed workers. To investigate possibly heterogeneous
employment effects, we combine non-experimental causal empirical models with
Lasso-type estimators. The empirical analyses are based on rich administrative
data from Swiss social security records. We find considerable heterogeneities
only during the first six months after the start of training. Consistent with
previous results of the literature, unemployed persons with fewer employment
opportunities profit more from participating in these programmes. Furthermore,
we also document heterogeneous employment effects by residence status. Finally,
we show the potential of easy-to-implement programme participation rules for
improving average employment effects of these active labour market programmes
Learning Contact-Rich Manipulation Skills with Guided Policy Search
Autonomous learning of object manipulation skills can enable robots to
acquire rich behavioral repertoires that scale to the variety of objects found
in the real world. However, current motion skill learning methods typically
restrict the behavior to a compact, low-dimensional representation, limiting
its expressiveness and generality. In this paper, we extend a recently
developed policy search method \cite{la-lnnpg-14} and use it to learn a range
of dynamic manipulation behaviors with highly general policy representations,
without using known models or example demonstrations. Our approach learns a set
of trajectories for the desired motion skill by using iteratively refitted
time-varying linear models, and then unifies these trajectories into a single
control policy that can generalize to new situations. To enable this method to
run on a real robot, we introduce several improvements that reduce the sample
count and automate parameter selection. We show that our method can acquire
fast, fluent behaviors after only minutes of interaction time, and can learn
robust controllers for complex tasks, including putting together a toy
airplane, stacking tight-fitting lego blocks, placing wooden rings onto
tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps
onto bottles
- …