Search CORE

7 research outputs found

Reinforcement Learning for Racecar Control

Author: Cleland Benjamin George
Publication venue: The University of Waikato
Publication date: 01/01/2006
Field of study

This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent

Research Commons@Waikato

Modeling Dynamical Systems with Structured Predictive State Representations.

Author: Wolfe Britton D.
Publication venue
Publication date
Field of study

Predictive state representations (PSRs) are a class of models that represent the state of a dynamical system as a set of predictions about future events. PSRs can model partially observable, stochastic dynamical systems, including any system that can be modeled by a finite partially observable Markov decision process (POMDP). Using PSR models can help an artificial intelligence agent learn an accurate model of its environment (which is a dynamical system) from its experience in that environment. Specifically, I present the suffix-history algorithm and demonstrate that it can learn PSR models that are generally more accurate than POMDP models learned from the same amount of experience. The suffix-history algorithm learns a type of PSR called the linear PSR. However, it is intractable to learn a linear PSR (or a POMDP) to model large systems because these models do not take advantage of regularities or structure in the environment. Therefore, I present three new classes of PSR models that exploit different types of structure in an environment: hierarchical PSRs, factored PSRs, and multi-mode PSRs. Hierarchical PSRs exploit temporal structure in the environment, because a temporally abstract model can be simpler than a fully-detailed model. I demonstrate that learning a hierarchical PSR is tractable in environments in which learning a single linear PSR is intractable. Factored PSRs model systems with vector-valued observations, exploiting conditional independence among the components of the observation vectors. Leveraging that conditional independence can lead to a factored PSR model that is exponentially smaller than an unstructured model of the same system. Finally, multi-mode PSRs model systems that switch among several modes of operation. The modes used by multi-mode PSRs are defined in terms of past and future observations, which leads to advantages both when learning the model and when using it to make predictions. For each class of structured PSR models, I develop a learning algorithm that scales to larger systems than the suffix-history algorithm but still leverages the advantage of predictive state for learning accurate models.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64601/1/bdwolfe_1.pd

Deep Blue Documents at the University of Michigan

Using predictive representations to improve generalization in reinforcement learning

Author: Brian Tanner
Eddie J. Rafols
Mark B. Ring
Richard S. Sutton
Publication venue
Publication date
Field of study

The predictive representations hypothesis holds that particularly good generalization will result from representing the state of the world in terms of predictions about possible future experience. This hypothesis has been a central motivation behind recent research in, for example, PSRs and TD networks. In this paper we present the first explicit investigation of this hypothesis. We show in a reinforcement-learning example (a grid-world navigation task) that a predictive representation in tabular form can learn much faster than both the tabular explicit-state representation and a tabular history-based method.

CiteSeerX