223 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Computationally Efficient Relational Reinforcement Learning
Relational Reinforcement Learning (RRL) is a technique that enables Reinforcement Learning (RL) agents to generalize from their experience, allowing them to learn over large or potentially infinite state spaces, to learn context sensitive behaviors, and to learn to solve variable goals and to transfer knowledge between similar situations. Prior RRL architectures are not sufficiently computationally efficient to see use outside of small, niche roles within larger Artificial Intelligence (AI) architectures. I present a novel online, incremental RRL architecture and an implementation that is orders of magnitude faster than its predecessors. The first aspect of this architecture that I explore is a computationally efficient implementation of an adaptive Hierarchical Tile Coding (aHTC), a kind of Adaptive Tile Coding (ATC) in which more general tiles which cover larger portions of the state-action space are kept as ones that cover smaller portions of the state-action space are introduced, using k-dimensional tries (k-d tries) to implement the value function for non-relational Temporal Difference (TD) methods. In order to achieve comparable performance for RRL, I implement the Rete algorithm to replace my k-d tries due to its efficient handling of both the variable binding problem and variable numbers of actions. Tying aHTCs and Rete together, I present a rule grammar that both maps aHTCs onto Rete and allows the architecture to automatically extract relational features in order to support adaptation of the value function over time. I experiment with several refinement criteria and additional functionality with which my agents attempt to determine if rerefinement using different features might allow them to better learn a near optimal policy. I present optimal results using a value criterion for several variants of BlocksWorld. I provide transfer results for BlocksWorld and a scalable Taxicab domain. I additionally introduce a Higher Order Grammar (HOG) that grants online, incremental RRL agents additional flexibility to introduce additional variables and corresponding relations as needed in order to learn effective value functions. I evaluate agents that use the HOG on a version of Blocks World and on an Adventure task. In summary, I present a new online, incremental RRL architecture, a grammar to map aHTCs onto the Rete, and an implementation that is orders of magnitude faster than its predecessors.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145859/1/bazald_1.pd
A neural network-based trajectory planner for redundant systems using direct inverse modeling
Redundant (i.e., under-determined) systems can not be trained effectively using direct inverse modeling with supervised learning, for reasons well out-lined by Michael Jordan at MIT. There is a loop-hole , however, in Jordan\u27s preconditions, which seems to allow just such an architecture. A robot path planner implementing a cerebellar inspired habituation paradigm with such an architecture will be introduced. The system, called ARTFORMS, for Adaptive Redundant Trajectory Formation System uses on-line training of multiple CMACS. CMACs are locally generalizing networks, and have an a priori deterministic geometric input space mapping. These properties together with on-line learning and rapid convergence satisfy the loop-hole conditions. Issues of stability/plasticity, presentation order and generalization, computational complexity, and subsumptive fusion of multiple networks are discussed.
Two implementations are described. The first is shown not to be goal directed enough for ultimate success. The second, which is highly successful, is made more goal directed by the addition of secondary training, which reduces the dimensionality of the problem by using a set of constraint equations. Running open loop with respect to posture (the system metric which reduces dimensionality) is seen to be the root cause of the first system\u27s failure, not the use of the direct inverse method. In fact, several nice properties of direct inverse modeling contribute to the system\u27s convergence speed, robustness and compliance.
The central problem used to demonstrate this method is the control of trajectory formation for a planar kinematic chain with a variable number of joints.
Finally, this method is extended to implement adaptive obstacle avoidance
Intelligent flight control systems
The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms
Error minimising gradients for improving cerebellar model articulation controller performance
In motion control applications where the desired trajectory velocity exceeds an actuator’s maximum velocity limitations, large position errors will occur between the desired and actual trajectory responses. In these situations standard control approaches cannot predict the output saturation of the actuator and thus the associated error summation cannot be minimised.An adaptive feedforward control solution such as the Cerebellar Model Articulation Controller (CMAC) is able to provide an inherent level of prediction for these situations, moving the system output in the direction of the excessive desired velocity before actuator saturation occurs. However the pre-empting level of a CMAC is not adaptive, and thus the optimal point in time to start moving the system output in the direction of the excessive desired velocity remains unsolved. While the CMAC can adaptively minimise an actuator’s position error, the minimisation of the summation of error over time created by the divergence of the desired and actual trajectory responses requires an additional adaptive level of control.This thesis presents an improved method of training CMACs to minimise the summation of error over time created when the desired trajectory velocity exceeds the actuator’s maximum velocity limitations. This improved method called the Error Minimising Gradient Controller (EMGC) is able to adaptively modify a CMAC’s training signal so that the CMAC will start to move the output of the system in the direction of the excessive desired velocity with an optimised pre-empting level.The EMGC was originally created to minimise the loss of linguistic information conveyed through an actuated series of concatenated hand sign gestures reproducing deafblind sign language. The EMGC concept however is able to be implemented on any system where the error summation associated with excessive desired velocities needs to be minimised, with the EMGC producing an improved output approximation over using a CMAC alone.In this thesis, the EMGC was tested and benchmarked against a feedforward / feedback combined controller using a CMAC and PID controller. The EMGC was tested on an air-muscle actuator for a variety of situations comprising of a position discontinuity in a continuous desired trajectory. Tested situations included various discontinuity magnitudes together with varying approach and departure gradient profiles.Testing demonstrated that the addition of an EMGC can reduce a situation’s error summation magnitude if the base CMAC controller has not already provided a prior enough pre-empting output in the direction of the situation. The addition of an EMGC to a CMAC produces an improved approximation of reproduced motion trajectories, not only minimising position error for a single sampling instance, but also over time for periodic signals
A wavelet-based CMAC for enhanced multidimensional learning
The CMAC (Cerebellar Model Articulation Controller) neural network has been successfully used in control systems and other applications for many years. The network structure is modular and associative, allowing for rapid learning convergence with an ease of implementation in either hardware or software. The rate of convergence of the network is determined largely by the choice of the receptive field shape and the generalization parameter. This research contains a rigorous analysis of the rate of convergence with the standard CMAC, as well as the rate of convergence of networks using other receptive field shape. The effects of decimation from state-space to weight space are examined in detail. This analysis shows CMAC to be an adaptive lowpass filter, where the filter dynamics are governed by the generalization parameter. A more general CMAC is derived using wavelet-based receptive fields and a controllable decimation scheme, that is capable of convergence at any frequency within the Nyquist limits. The flexible decimation structure facilitates the optimization of computation for complex multidimensional problems. The stability of the wavelet-based CMAC is also examined
Recommended from our members
An Architecture for Multilevel Learning and Robotic Control based on Concept Generation
Robot and multi-robot systems are inherently complex systems, for which designing the programs to control their behaviours proves complicated. Moreover, control programs that have been successfully designed for a particular environment and task can become useless if either of these change. It is for this reason that this thesis investigates the use of machine learning within robot and multi-robot systems. It explores an architecture for machine learning, applied to autonomous mobile robots based on dividing the learning task into two individual but interleaved sub-tasks.
The first sub-task consists of finding an appropriate representation on which to base behaviour learning. The thesis explores the viability of using multidimensional classification techniques to generalise the original sensor and motor representations into abstract hierarchies of 'concepts'. To construct concepts the research used standard classification techniques, and experimented with a novel method of multidimensional data classification based on 'Q-analysis'. Results suggest that this may be a powerful new approach to concept learning.
The second sub-task consists of using the previously acquired concepts as the representation for behaviour learning. The thesis explores whether it is possible to learn robotic behaviours represented using concepts. Results show that is possible to learn low-level behaviours such as navigation and higher-level ones such as ball passing in robot football.
The thesis concludes that the proposed architecture is viable for robotic behaviour learning and control, and that incorporating Q-analysis based classification results in a promising new approach to the control of robot and multi-robot systems
- …