196,021 research outputs found
Growing Action Spaces
In complex tasks, such as those with large combinatorial action spaces,
random exploration may be too inefficient to achieve meaningful learning
progress. In this work, we use a curriculum of progressively growing action
spaces to accelerate learning. We assume the environment is out of our control,
but that the agent may set an internal curriculum by initially restricting its
action space. Our approach uses off-policy reinforcement learning to estimate
optimal value functions for multiple action spaces simultaneously and
efficiently transfers data, value estimates, and state representations from
restricted action spaces to the full task. We show the efficacy of our approach
in proof-of-concept control tasks and on challenging large-scale StarCraft
micromanagement tasks with large, multi-agent action spaces
Protecting children from unhealthy food marketing:A comparative policy analysis in Australia, Fiji and Thailand
Restrictions on marketing of unhealthy foods and beverages to children is a globally recommended policy measure to improve diets and health. The aim of the analysis was to identify opportunities to enable policy learning and shift beliefs of relevant actors, to engender policy progress on restrictions on marketing of unhealthy foods to children. We drew on the Advocacy Coalition Framework to thematically analyse data from qualitative policy interviews conducted Australia (n = 24), Fiji (n = 10) and Thailand (n = 20). In all three countries two clear and opposing advocacy coalitions were evident within the policy subsystem related to regulation of unhealthy food marketing, which we termed the 'strengthen regulation' and 'minimal/self regulation' coalitions. Contributors to policy stasis on this issue were identified as tensions between public health and economic objectives of government, and limited formal and informal spaces for productive dialogue. The analysis also identified opportunities for policy learning that could enable policy progress on restrictions on marketing of unhealthy foods to children as: taking an incremental approach to policy change, defining permitted (rather than restricted) foods, investing in new public health expertise related to emerging marketing approaches and scaling up of monitoring of impacts. The insights from this study are likely to be relevant to many countries seeking to strengthen regulation of marketing to children, in response to recent global recommendations.</p
RODE: Learning Roles to Decompose Multi-Agent Tasks
Role-based learning holds the promise of achieving scalable multi-agent
learning by decomposing complex tasks using roles. However, it is largely
unclear how to efficiently discover such a set of roles. To solve this problem,
we propose to first decompose joint action spaces into restricted role action
spaces by clustering actions according to their effects on the environment and
other agents. Learning a role selector based on action effects makes role
discovery much easier because it forms a bi-level learning hierarchy -- the
role selector searches in a smaller role space and at a lower temporal
resolution, while role policies learn in significantly reduced primitive
action-observation spaces. We further integrate information about action
effects into the role policies to boost learning efficiency and policy
generalization. By virtue of these advances, our method (1) outperforms the
current state-of-the-art MARL algorithms on 10 of the 14 scenarios that
comprise the challenging StarCraft II micromanagement benchmark and (2)
achieves rapid transfer to new environments with three times the number of
agents. Demonstrative videos are available at
https://sites.google.com/view/rode-marl
+SPACES: Serious Games for Role-Playing Government Policies
The paper explores how role-play simulations can be used to support policy discussion and refinement in virtual worlds. Although the work described is set primarily within the context of policy formulation for government, the lessons learnt are applicable to online learning and collaboration within virtual environments. The paper describes how the +Spaces project is using both 2D and 3D virtual spaces to
engage with citizens to explore issues relevant to new government policies. It also focuses on the most challenging part of the project, which is to provide environments that can simulate some of the complexities of real life. Some examples of different approaches to simulation in virtual spaces are provided and the issues associated with them are further examined.
We conclude that the use of role-play simulations seem to offer the most benefits in terms of providing a generalizable framework for citizens to engage with real issues arising from future policy decisions. Role-plays have also been shown to be a useful tool for engaging learners in the complexities of real-world issues, often generating insights which would not be possible using more conventional techniques
A Theory of Cheap Control in Embodied Systems
We present a framework for designing cheap control architectures for embodied
agents. Our derivation is guided by the classical problem of universal
approximation, whereby we explore the possibility of exploiting the agent's
embodiment for a new and more efficient universal approximation of behaviors
generated by sensorimotor control. This embodied universal approximation is
compared with the classical non-embodied universal approximation. To exemplify
our approach, we present a detailed quantitative case study for policy models
defined in terms of conditional restricted Boltzmann machines. In contrast to
non-embodied universal approximation, which requires an exponential number of
parameters, in the embodied setting we are able to generate all possible
behaviors with a drastically smaller model, thus obtaining cheap universal
approximation. We test and corroborate the theory experimentally with a
six-legged walking machine. The experiments show that the sufficient controller
complexity predicted by our theory is tight, which means that the theory has
direct practical implications. Keywords: cheap design, embodiment, sensorimotor
loop, universal approximation, conditional restricted Boltzmann machineComment: 27 pages, 10 figure
Model-Based Reinforcement Learning with Continuous States and Actions
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces
© 2007 EUCA.A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decision-making horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means of a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller
- …