223 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Computationally Efficient Relational Reinforcement Learning

    Full text link
    Relational Reinforcement Learning (RRL) is a technique that enables Reinforcement Learning (RL) agents to generalize from their experience, allowing them to learn over large or potentially infinite state spaces, to learn context sensitive behaviors, and to learn to solve variable goals and to transfer knowledge between similar situations. Prior RRL architectures are not sufficiently computationally efficient to see use outside of small, niche roles within larger Artificial Intelligence (AI) architectures. I present a novel online, incremental RRL architecture and an implementation that is orders of magnitude faster than its predecessors. The first aspect of this architecture that I explore is a computationally efficient implementation of an adaptive Hierarchical Tile Coding (aHTC), a kind of Adaptive Tile Coding (ATC) in which more general tiles which cover larger portions of the state-action space are kept as ones that cover smaller portions of the state-action space are introduced, using k-dimensional tries (k-d tries) to implement the value function for non-relational Temporal Difference (TD) methods. In order to achieve comparable performance for RRL, I implement the Rete algorithm to replace my k-d tries due to its efficient handling of both the variable binding problem and variable numbers of actions. Tying aHTCs and Rete together, I present a rule grammar that both maps aHTCs onto Rete and allows the architecture to automatically extract relational features in order to support adaptation of the value function over time. I experiment with several refinement criteria and additional functionality with which my agents attempt to determine if rerefinement using different features might allow them to better learn a near optimal policy. I present optimal results using a value criterion for several variants of BlocksWorld. I provide transfer results for BlocksWorld and a scalable Taxicab domain. I additionally introduce a Higher Order Grammar (HOG) that grants online, incremental RRL agents additional flexibility to introduce additional variables and corresponding relations as needed in order to learn effective value functions. I evaluate agents that use the HOG on a version of Blocks World and on an Adventure task. In summary, I present a new online, incremental RRL architecture, a grammar to map aHTCs onto the Rete, and an implementation that is orders of magnitude faster than its predecessors.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145859/1/bazald_1.pd

    A neural network-based trajectory planner for redundant systems using direct inverse modeling

    Get PDF
    Redundant (i.e., under-determined) systems can not be trained effectively using direct inverse modeling with supervised learning, for reasons well out-lined by Michael Jordan at MIT. There is a loop-hole , however, in Jordan\u27s preconditions, which seems to allow just such an architecture. A robot path planner implementing a cerebellar inspired habituation paradigm with such an architecture will be introduced. The system, called ARTFORMS, for Adaptive Redundant Trajectory Formation System uses on-line training of multiple CMACS. CMACs are locally generalizing networks, and have an a priori deterministic geometric input space mapping. These properties together with on-line learning and rapid convergence satisfy the loop-hole conditions. Issues of stability/plasticity, presentation order and generalization, computational complexity, and subsumptive fusion of multiple networks are discussed. Two implementations are described. The first is shown not to be goal directed enough for ultimate success. The second, which is highly successful, is made more goal directed by the addition of secondary training, which reduces the dimensionality of the problem by using a set of constraint equations. Running open loop with respect to posture (the system metric which reduces dimensionality) is seen to be the root cause of the first system\u27s failure, not the use of the direct inverse method. In fact, several nice properties of direct inverse modeling contribute to the system\u27s convergence speed, robustness and compliance. The central problem used to demonstrate this method is the control of trajectory formation for a planar kinematic chain with a variable number of joints. Finally, this method is extended to implement adaptive obstacle avoidance

    Intelligent flight control systems

    Get PDF
    The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms

    Error minimising gradients for improving cerebellar model articulation controller performance

    Get PDF
    In motion control applications where the desired trajectory velocity exceeds an actuator’s maximum velocity limitations, large position errors will occur between the desired and actual trajectory responses. In these situations standard control approaches cannot predict the output saturation of the actuator and thus the associated error summation cannot be minimised.An adaptive feedforward control solution such as the Cerebellar Model Articulation Controller (CMAC) is able to provide an inherent level of prediction for these situations, moving the system output in the direction of the excessive desired velocity before actuator saturation occurs. However the pre-empting level of a CMAC is not adaptive, and thus the optimal point in time to start moving the system output in the direction of the excessive desired velocity remains unsolved. While the CMAC can adaptively minimise an actuator’s position error, the minimisation of the summation of error over time created by the divergence of the desired and actual trajectory responses requires an additional adaptive level of control.This thesis presents an improved method of training CMACs to minimise the summation of error over time created when the desired trajectory velocity exceeds the actuator’s maximum velocity limitations. This improved method called the Error Minimising Gradient Controller (EMGC) is able to adaptively modify a CMAC’s training signal so that the CMAC will start to move the output of the system in the direction of the excessive desired velocity with an optimised pre-empting level.The EMGC was originally created to minimise the loss of linguistic information conveyed through an actuated series of concatenated hand sign gestures reproducing deafblind sign language. The EMGC concept however is able to be implemented on any system where the error summation associated with excessive desired velocities needs to be minimised, with the EMGC producing an improved output approximation over using a CMAC alone.In this thesis, the EMGC was tested and benchmarked against a feedforward / feedback combined controller using a CMAC and PID controller. The EMGC was tested on an air-muscle actuator for a variety of situations comprising of a position discontinuity in a continuous desired trajectory. Tested situations included various discontinuity magnitudes together with varying approach and departure gradient profiles.Testing demonstrated that the addition of an EMGC can reduce a situation’s error summation magnitude if the base CMAC controller has not already provided a prior enough pre-empting output in the direction of the situation. The addition of an EMGC to a CMAC produces an improved approximation of reproduced motion trajectories, not only minimising position error for a single sampling instance, but also over time for periodic signals

    A wavelet-based CMAC for enhanced multidimensional learning

    Get PDF
    The CMAC (Cerebellar Model Articulation Controller) neural network has been successfully used in control systems and other applications for many years. The network structure is modular and associative, allowing for rapid learning convergence with an ease of implementation in either hardware or software. The rate of convergence of the network is determined largely by the choice of the receptive field shape and the generalization parameter. This research contains a rigorous analysis of the rate of convergence with the standard CMAC, as well as the rate of convergence of networks using other receptive field shape. The effects of decimation from state-space to weight space are examined in detail. This analysis shows CMAC to be an adaptive lowpass filter, where the filter dynamics are governed by the generalization parameter. A more general CMAC is derived using wavelet-based receptive fields and a controllable decimation scheme, that is capable of convergence at any frequency within the Nyquist limits. The flexible decimation structure facilitates the optimization of computation for complex multidimensional problems. The stability of the wavelet-based CMAC is also examined
    • …
    corecore