Search CORE

1,102 research outputs found

Regret Bounds for Reinforcement Learning with Policy Advice

Author: C. Tekin
M.L. Puterman
N. Cesa-Bianchi
R. Ortner
R.S. Sutton
T. Jaksch
Publication venue
Publication date: 01/01/2013
Field of study

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Experimental analysis of sample-based maps for long-term SLAM

Author: Anguelov D.
Biber P.
Biber P.
Biber P.
Duda R.O.
Grossberg S.
Peter Biber
Stachniss C.
Sutton R.S.
Thrun S.
Tom Duckett
Wang C.-W.
Yamauchi B.
Zimmer U.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2008
Field of study

This paper presents a system for long-term SLAM (simultaneous localization and mapping) by mobile service robots and its experimental evaluation in a real dynamic environment. To deal with the stability-plasticity dilemma (the trade-off between adaptation to new patterns and preservation of old patterns), the environment is represented at multiple timescales simultaneously (5 in our experiments). A sample-based representation is proposed, where older memories fade at different rates depending on the timescale, and robust statistics are used to interpret the samples. The dynamics of this representation are analysed in a five week experiment, measuring the relative influence of short- and long-term memories over time, and further demonstrating the robustness of the approach

University of Lincoln Institutional Repository

CiteSeerX

Crossref

Probabilistic Inference for Fast Learning in Control

Author: A. Girard
C.E. Rasmussen
C.E. Rasmussen
C.G. Atkeson
E. Snelson
J. Peters
K. Doya
R.S. Sutton
S. Schaal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We provide a novel framework for very fast model-based reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, non-parametric models to describe the world based on previously collected experience. We demonstrate learning on the cart-pole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a hand-full of iterations

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Fermionic Molecular Dynamics for nuclear dynamics and thermodynamics

Author: D.A. Berry
F.P. Kelly
K.E. Avrachenkov
L.P. Kaelbling
M. Hutter
M. Hutter
P. Samuelson
P.R. Kumar
R.H. Strotz
R.S. Sutton
S. Frederick
S. Kakade
S. Mahadevan
S.J. Russell
Publication venue
Publication date: 01/01/2006
Field of study

A new Fermionic Molecular Dynamics (FMD) model based on a Skyrme functional is proposed in this paper. After introducing the basic formalism, some first applications to nuclear structure and nuclear thermodynamics are presentedComment: 5 pages, Proceedings of the French-Japanese Symposium, September 2008. To be published in Int. J. of Mod. Phys.

arXiv.org e-Print Archive

HAL - Normandie Université

CiteSeerX

HAL-IN2P3

Crossref

The Australian National University

HAL-CEA

Actor-Critic Policy Learning in Cooperative Planning

Author: Bhatnagar S.
Howard R. A.
Murphey R.
Puterman M. L.
Russell S.
Sutton R.S.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/08/2010
Field of study

In this paper, we introduce a method for learning and adapting cooperative control strategies in real-time stochastic domains. Our framework is an instance of the intelligent cooperative control architecture (iCCA)[superscript 1]. The agent starts by following the "safe" plan calculated by the planning module and incrementally adapting its policy to maximize the cumulative rewards. Actor-critic and consensus-based bundle algorithm (CBBA) were employed as the building blocks of the iCCA framework. We demonstrate the performance of our approach by simulating limited fuel unmanned aerial vehicles aiming for stochastic targets. In one experiment where the optimal solution can be calculated, the integrated framework boosted the optimality of the solution by an average of %10, when compared to running each of the modules individually, while keeping the computational load within the requirements for real-time implementation.Boeing Scientific Research LaboratoriesUnited States. Air Force Office of Scientific Research (Grant FA9550-08-1-0086

DSpace@MIT

Crossref

Using artificial intelligence techniques for strategy generation in the Commons game

Author: A. Abraham
C.A. Kirts
K.S. Narendra
N. Baba
N. Baba
N. Faysse
P. Brunovsḱý
R. Axelrod
R.B. Powers
R.S. Sutton
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/2011
Field of study

In this paper, we consider the use of artificial intelligence techniques to aid in discovery of winning strategies for the Commons Game (CG). The game represents a common scenario in which multiple parties share the use of a self-replenishing resource. The resource deteriorates quickly if used indiscriminately. If used responsibly, however, the resource thrives. We consider the scenario one player uses hill climbing or particle swarm optimization to select the course of action, while the remaining N − 1 players use a fixed probability vector. We show that hill climbing and particle swarm optimization consistently generate winning strategies

NILU Brage

Crossref

Carleton University Institutional Repository

NORA - Norwegian Open Research Archives

Agder University Research Archive

What works? Interventions to reduce readmission after hip fracture: A rapid review of systematic reviews

Author: Kearney R.S.
Sutton E.L.
Publication venue
Publication date: 01/07/2021
Field of study

Background: Hip fracture is a common serious injury in older people and reducing readmission after hip fracture is a priority in many healthcare systems. Interventions which significantly reduce readmission after hip fracture have been identified and the aim of this review is to collate and summarise the efficacy of these interventions in one place. Methods: In a rapid review of systematic reviews one reviewer (ELS) searched the Ovid SP version of Medline and the Cochrane Database of Systematic Reviews. Titles and abstracts of 915 articles were reviewed. Nineteen systematic reviews were included. (ELS) used a data extraction sheet to capture data on interventions and their effect on readmission. A second reviewer (RK) verified data extraction in a random sample of four systematic reviews. Results were not meta-analysed. Odds and risk ratios are presented where available. Results: Three interventions significantly reduce readmission in elderly populations after hip fracture: personalised discharge planning, self-care and regional anaesthesia. Three interventions are not conclusively supported by evidence: Oral Nutritional Supplementation, integration of care, and case management. Two interventions do not affect readmission after hip fracture: Enhanced Recovery pathways and comprehensive geriatric assessment. Conclusions: Three interventions are most effective at reducing readmissions in older people: discharge planning, self-care, and regional anaesthesia. Further work is needed to optimise interventions and ensure the most at-risk populations benefit from them, and complete development work on interventions (e.g. interventions to reduce loneliness) and intervention components (e.g. adapting self-care interventions for dementia patients) which have not been fully tested yet.</p

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

Coventry University Pure Portal

Recommended from our members

SMART (Stochastic Model Acquisition with ReinforcemenT) learning agents: A preliminary report

Author: C. Boutilier
G.J. Tesauro
G.L. Drescher
L. Dehaspe
L.P. Kaelbling
R.E. Fikes
R.S. Sutton
S.H. Muggleton
T. Oates
T. Oates
W. Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present a framework for building agents that learn using SMART, a system that combines stochastic model acquisition with reinforcement learning to enable an agent to model its environment through experience and subsequently form action selection policies using the acquired model. We extend an existing algorithm for automatic creation of stochastic strips operators [9] as a preliminary method of environment modelling. We then define the process of generation of future states using these operators and an initial state and finally show the process by which the agent can use the generated states to form a policy with a standard reinforcement learning algorithm. The potential of SMART is exemplified using the well-known predator prey scenario. Results of applying SMART to this environment and directions for future work are discussed

City Research Online

Crossref

Bayesian Nonparametric Inverse Reinforcement Learning

Author: B.D. Argall
C. Andrieu
C.L. Baker
E.B. Sudderth
G.O. Roberts
M. Lopes
M.D. Escobar
R.M. Neal
R.M. Neal
R.S. Sutton
S. Geman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small

CiteSeerX

DSpace@MIT

Crossref

Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

Author: C. Dimitrakakis
J. Vermorel
K.S. Narendra
O.C. Granmo
P. Auer
R. Dearden
R.S. Sutton
S. Russel
T.M. Mitchell
W.R. Thompson
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2010
Field of study

The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions

Crossref

NORA - Norwegian Open Research Archives

Agder University Research Archive