560 research outputs found
Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
We address the problem of autonomously learning controllers for
vision-capable mobile robots. We extend McCallum's (1995) Nearest-Sequence
Memory algorithm to allow for general metrics over state-action trajectories.
We demonstrate the feasibility of our approach by successfully running our
algorithm on a real mobile robot. The algorithm is novel and unique in that it
(a) explores the environment and learns directly on a mobile robot without
using a hand-made computer model as an intermediate step, (b) does not require
manual discretization of the sensor input space, (c) works in piecewise
continuous perceptual spaces, and (d) copes with partial observability.
Together this allows learning from much less experience compared to previous
methods.Comment: 14 pages, 8 figure
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Recommended from our members
An Architecture for Multilevel Learning and Robotic Control based on Concept Generation
Robot and multi-robot systems are inherently complex systems, for which designing the programs to control their behaviours proves complicated. Moreover, control programs that have been successfully designed for a particular environment and task can become useless if either of these change. It is for this reason that this thesis investigates the use of machine learning within robot and multi-robot systems. It explores an architecture for machine learning, applied to autonomous mobile robots based on dividing the learning task into two individual but interleaved sub-tasks.
The first sub-task consists of finding an appropriate representation on which to base behaviour learning. The thesis explores the viability of using multidimensional classification techniques to generalise the original sensor and motor representations into abstract hierarchies of 'concepts'. To construct concepts the research used standard classification techniques, and experimented with a novel method of multidimensional data classification based on 'Q-analysis'. Results suggest that this may be a powerful new approach to concept learning.
The second sub-task consists of using the previously acquired concepts as the representation for behaviour learning. The thesis explores whether it is possible to learn robotic behaviours represented using concepts. Results show that is possible to learn low-level behaviours such as navigation and higher-level ones such as ball passing in robot football.
The thesis concludes that the proposed architecture is viable for robotic behaviour learning and control, and that incorporating Q-analysis based classification results in a promising new approach to the control of robot and multi-robot systems
Closed-Loop Learning of Visual Control Policies
In this paper we present a general, flexible framework for learning mappings
from images to actions by interacting with the environment. The basic idea is
to introduce a feature-based image classifier in front of a reinforcement
learning algorithm. The classifier partitions the visual space according to the
presence or absence of few highly informative local descriptors that are
incrementally selected in a sequence of attempts to remove perceptual aliasing.
We also address the problem of fighting overfitting in such a greedy algorithm.
Finally, we show how high-level visual features can be generated when the power
of local descriptors is insufficient for completely disambiguating the aliased
states. This is done by building a hierarchy of composite features that consist
of recursive spatial combinations of visual features. We demonstrate the
efficacy of our algorithms by solving three visual navigation tasks and a
visual version of the classical Car on the Hill control problem
A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation
In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments
Growing Action Spaces
In complex tasks, such as those with large combinatorial action spaces,
random exploration may be too inefficient to achieve meaningful learning
progress. In this work, we use a curriculum of progressively growing action
spaces to accelerate learning. We assume the environment is out of our control,
but that the agent may set an internal curriculum by initially restricting its
action space. Our approach uses off-policy reinforcement learning to estimate
optimal value functions for multiple action spaces simultaneously and
efficiently transfers data, value estimates, and state representations from
restricted action spaces to the full task. We show the efficacy of our approach
in proof-of-concept control tasks and on challenging large-scale StarCraft
micromanagement tasks with large, multi-agent action spaces
VQQL. Applying vector quantization to reinforcement learning
Proceeding of: RoboCup-99: Robot Soccer World Cup III, July 27 to August 6, 1999, Stockholm, SwedenReinforcement learning has proven to be a set of successful techniques for finding optimal policies on uncertain and/or dynamic domains, such as the RoboCup. One of the problems on using such techniques appears with large state and action spaces, as it is the case of input information coming from the Robosoccer simulator. In this paper, we describe a new mechanism for solving the states generalization problem in reinforcement learning algorithms. This clustering mechanism is based on the vector quantization technique for signal analog-to-digital conversion and compression, and on the Generalized Lloyd Algorithm for the design of vector quantizers. Furthermore, we present the VQQL model, that integrates Q-Learning as reinforcement learning technique and vector quantization as state generalization technique. We show some results on applying this model to learning the interception task skill for Robosoccer agents.Publicad
- …