Search CORE

186 research outputs found

Adaptive Critics and the Basal Ganglia

Author: Barto Andrew G.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/1995
Field of study

One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents ” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop interaction. An embedded agent has to make decisions under time pressure and uncertainty and has to learn without the help of an ever-present knowledgeable teacher. Although the novelty of this emphasis may be inconspicuous to a biologist, animals being the prototypical embedded agents, this emphasis is a significant departure from the more traditional focus in artificial intelligence on reasoning within circumscribed domains removed from the flow of real-world events. One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL). Unlike the more widely studied supervised learning systems, which learn from a set of examples of correct input/output behavior, RL systems adjust their behavior with the goal of maximizing the frequency and/or magnitude of the reinforcing events they encounter over time. While the core ideas of modern RL come from theories of animal classical and instrumenta

CiteSeerX

ScholarWorks@UMass Amherst

Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning

Author: Barto Andrew G.
Şimşek Özgür
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2004
Field of study

We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

Author: Barto Andrew G.
McGovern Amy
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2001
Field of study

This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online. By creating useful new subgoals while learning, the agent is able to accelerate learning on the current task and to transfer its expertise to other, related tasks through the reuse of its ability to attain subgoals. The agent discovers subgoals based on commonalities across multiple paths to a solution. We cast the task of finding these commonalities as a multiple-instance learning problem and use the concept of diverse density to find solutions. We illustrate this approach using several gridworld tasks

CiteSeerX

ScholarWorks@UMass Amherst

Betweenness Centrality as a Basis for Forming Skills

Author: Barto Andrew G.
Şimşek Özgür
Publication venue
Publication date: 12/04/2007
Field of study

We show that betweenness centrality, a graph-theoretic measure widely used in social network analysis, provides a sound basis for autonomously forming useful high-level behaviors, or skills, from available primitives— the smallest behavioral units available to an autonomous agent

OPUS

Accelerating Reinforcement Learning through the Discovery of Useful Subgoals

Author: Barto Andrew G.
McGovern Amy
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2001
Field of study

An ability to adjust to changing environments and unforeseen circumstances is likely to be an important component of a successful autonomous space robot. This paper shows how to augment reinforcement learning algorithms with a method for automatically discovering certain types of subgoals online. By creating useful new subgoals while learning, the agent is able to accelerate learning on a current task and to transfer its expertise to related tasks through the reuse of its ability to attain subgoals. Subgoals are created based on commonalities across multiple paths to a solution. We cast the task of finding these commonalities as a multiple-instance learning problem and use the concept of diverse density to find solutions. We introduced this approach in [10] and here we present additional results for a simulated mobile robot task

CiteSeerX

ScholarWorks@UMass Amherst

Scaling MAP-Elites to Deep Neuroevolution

Author: Back Thomas
Barto Andrew G
Beyer Lucas
Colas Cédric
Glorot Xavier
Kingma Diederik P
Lehman Joel
Schmidhuber Jürgen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/06/2020
Field of study

Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of MAP-Elites and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we propose to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally, we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.Comment: Accepted to GECCO 202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Author: A.G. Barto
Andrew W. Moore
Christopher G. Atkeson
J. Simons
R.E. Bellman
R.S. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Online Learning Adaptation Strategy for DASH Clients

Author: Barto Andrew G
Bellman Richard
Bokani Ayub
Choi Lark Kwon
Claeys Maxim
Rose Oliver
Vriendt Johan De
Xing Min
Zanforlin Marco
Zinner Thomas
Publication venue
Publication date: 01/01/2016
Field of study

In this work, we propose an online adaptation logic for Dynamic Adaptive Streaming over HTTP (DASH) clients, where each client selects the representation that maximize the long term expected reward. The latter is defined as a combination of the decoded quality, the quality fluctuations and the rebuffering events experienced by the user during the playback. To solve this problem, we cast a Markov Decision Process (MDP) optimization for the selection of the optimal representations. System dynamics required in the MDP model are a priori unknown and are therefore learned through a Reinforcement Learning (RL) technique. The developed learning process exploits a parallel learning technique that improves the learning rate and limits sub-optimal choices, leading to a fast and yet accurate learning process that quickly converges to high and stable rewards. Therefore, the efficiency of our controller is not sacrificed for fast convergence. Simulation results show that our algorithm achieves a higher QoE than existing RL algorithms in the literature as well as heuristic solutions, as it is able to increase average QoE and reduce quality fluctuations

Infoscience - École polytechnique fédérale de Lausanne

Crossref

UCL Discovery

Archivio istituzionale della ricerca - Università di Padova