Search CORE

402 research outputs found

Credit assignment in multiple goal embodied visuomotor behavior

Author: Ballard Dana H.
Rothkopf Constantin A.
Publication venue
Publication date: 01/01/2010
Field of study

The intrinsic complexity of the brain can lead one to set aside issues related to its relationships with the body, but the field of embodied cognition emphasizes that understanding brain function at the system level requires one to address the role of the brain-body interface. It has only recently been appreciated that this interface performs huge amounts of computation that does not have to be repeated by the brain, and thus affords the brain great simplifications in its representations. In effect the brain’s abstract states can refer to coded representations of the world created by the body. But even if the brain can communicate with the world through abstractions, the severe speed limitations in its neural circuitry mean that vast amounts of indexing must be performed during development so that appropriate behavioral responses can be rapidly accessed. One way this could happen would be if the brain used a decomposition whereby behavioral primitives could be quickly accessed and combined. This realization motivates our study of independent sensorimotor task solvers, which we call modules, in directing behavior. The issue we focus on herein is how an embodied agent can learn to calibrate such individual visuomotor modules while pursuing multiple goals. The biologically plausible standard for module programming is that of reinforcement given during exploration of the environment. However this formulation contains a substantial issue when sensorimotor modules are used in combination: The credit for their overall performance must be divided amongst them. We show that this problem can be solved and that diverse task combinations are beneficial in learning and not a complication, as usually assumed. Our simulations show that fast algorithms are available that allot credit correctly and are insensitive to measurement noise

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Planning under risk and uncertainty

Author: Forsell Nicklas
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concentrates on the optimization of large-scale management policies under conditions of risk and uncertainty. In paper I, we address the problem of solving large-scale spatial and temporal natural resource management problems. To model these types of problems, the framework of graph-based Markov decision processes (GMDPs) can be used. Two algorithms for computation of high-quality management policies are presented: the first is based on approximate linear programming (ALP) and the second is based on mean-field approximation and approximate policy iteration (MF-API). The applicability and efficiency of the algorithms were demonstrated by their ability to compute near-optimal management policies for two large-scale management problems. It was concluded that the two algorithms compute policies of similar quality. However, the MF-API algorithm should be used when both the policy and the expected value of the computed policy are required, while the ALP algorithm may be preferred when only the policy is required. In paper II, a number of reinforcement learning algorithms are presented that can be used to compute management policies for GMDPs when the transition function can only be simulated because its explicit formulation is unknown. Studies of the efficiency of the algorithms for three management problems led us to conclude that some of these algorithms were able to compute near-optimal management policies. In paper III, we used the GMDP framework to optimize long-term forestry management policies under stochastic wind-damage events. The model was demonstrated by a case study of an estate consisting of 1,200 ha of forest land, divided into 623 stands. We concluded that managing the estate according to the risk of wind damage increased the expected net present value (NPV) of the whole estate only slightly, less than 2%, under different wind-risk assumptions. Most of the stands were managed in the same manner as when the risk of wind damage was not considered. However, the analysis rests on properties of the model that need to be refined before definite conclusions can be drawn

Epsilon Open Archive

Scalable Planning and Learning for Multiagent POMDPs: Extended Version

Author: Amato Christopher
Oliehoek Frans A.
Publication venue
Publication date: 19/12/2014
Field of study

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems

arXiv.org e-Print Archive

University of Liverpool Repository

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Association for the Advancement of Artificial Intelligence: AAAI Publications