196,021 research outputs found

    Growing Action Spaces

    Get PDF
    In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress. In this work, we use a curriculum of progressively growing action spaces to accelerate learning. We assume the environment is out of our control, but that the agent may set an internal curriculum by initially restricting its action space. Our approach uses off-policy reinforcement learning to estimate optimal value functions for multiple action spaces simultaneously and efficiently transfers data, value estimates, and state representations from restricted action spaces to the full task. We show the efficacy of our approach in proof-of-concept control tasks and on challenging large-scale StarCraft micromanagement tasks with large, multi-agent action spaces

    Protecting children from unhealthy food marketing:A comparative policy analysis in Australia, Fiji and Thailand

    Get PDF
    Restrictions on marketing of unhealthy foods and beverages to children is a globally recommended policy measure to improve diets and health. The aim of the analysis was to identify opportunities to enable policy learning and shift beliefs of relevant actors, to engender policy progress on restrictions on marketing of unhealthy foods to children. We drew on the Advocacy Coalition Framework to thematically analyse data from qualitative policy interviews conducted Australia (n = 24), Fiji (n = 10) and Thailand (n = 20). In all three countries two clear and opposing advocacy coalitions were evident within the policy subsystem related to regulation of unhealthy food marketing, which we termed the 'strengthen regulation' and 'minimal/self regulation' coalitions. Contributors to policy stasis on this issue were identified as tensions between public health and economic objectives of government, and limited formal and informal spaces for productive dialogue. The analysis also identified opportunities for policy learning that could enable policy progress on restrictions on marketing of unhealthy foods to children as: taking an incremental approach to policy change, defining permitted (rather than restricted) foods, investing in new public health expertise related to emerging marketing approaches and scaling up of monitoring of impacts. The insights from this study are likely to be relevant to many countries seeking to strengthen regulation of marketing to children, in response to recent global recommendations.</p

    RODE: Learning Roles to Decompose Multi-Agent Tasks

    Full text link
    Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. However, it is largely unclear how to efficiently discover such a set of roles. To solve this problem, we propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces. We further integrate information about action effects into the role policies to boost learning efficiency and policy generalization. By virtue of these advances, our method (1) outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. Demonstrative videos are available at https://sites.google.com/view/rode-marl

    +SPACES: Serious Games for Role-Playing Government Policies

    Get PDF
    The paper explores how role-play simulations can be used to support policy discussion and refinement in virtual worlds. Although the work described is set primarily within the context of policy formulation for government, the lessons learnt are applicable to online learning and collaboration within virtual environments. The paper describes how the +Spaces project is using both 2D and 3D virtual spaces to engage with citizens to explore issues relevant to new government policies. It also focuses on the most challenging part of the project, which is to provide environments that can simulate some of the complexities of real life. Some examples of different approaches to simulation in virtual spaces are provided and the issues associated with them are further examined. We conclude that the use of role-play simulations seem to offer the most benefits in terms of providing a generalizable framework for citizens to engage with real issues arising from future policy decisions. Role-plays have also been shown to be a useful tool for engaging learners in the complexities of real-world issues, often generating insights which would not be possible using more conventional techniques

    A Theory of Cheap Control in Embodied Systems

    Full text link
    We present a framework for designing cheap control architectures for embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent's embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the classical non-embodied universal approximation. To exemplify our approach, we present a detailed quantitative case study for policy models defined in terms of conditional restricted Boltzmann machines. In contrast to non-embodied universal approximation, which requires an exponential number of parameters, in the embodied setting we are able to generate all possible behaviors with a drastically smaller model, thus obtaining cheap universal approximation. We test and corroborate the theory experimentally with a six-legged walking machine. The experiments show that the sufficient controller complexity predicted by our theory is tight, which means that the theory has direct practical implications. Keywords: cheap design, embodiment, sensorimotor loop, universal approximation, conditional restricted Boltzmann machineComment: 27 pages, 10 figure

    Model-Based Reinforcement Learning with Continuous States and Actions

    No full text
    Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment

    Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces

    No full text
    © 2007 EUCA.A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decision-making horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means of a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller
    corecore