Search CORE

192 research outputs found

Hierarchically Clustered Adaptive Quantization CMAC and Its Learning Convergence

Author: Lai Edmund M-K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2007
Field of study

No abstract availabl

Massey Research Online

A brief review of neural networks based learning and control and their applications for robots

Author: Jiang Yiming
Li Guang
Li Yanan
Na Jing
Yang Chenguang
Zhong Junpei
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

Crossref

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Queen Mary Research Online

Sussex Research Online

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Computationally Efficient Relational Reinforcement Learning

Author: Bloch Mitchell
Publication venue
Publication date: 01/01/2018
Field of study

Relational Reinforcement Learning (RRL) is a technique that enables Reinforcement Learning (RL) agents to generalize from their experience, allowing them to learn over large or potentially infinite state spaces, to learn context sensitive behaviors, and to learn to solve variable goals and to transfer knowledge between similar situations. Prior RRL architectures are not sufficiently computationally efficient to see use outside of small, niche roles within larger Artificial Intelligence (AI) architectures. I present a novel online, incremental RRL architecture and an implementation that is orders of magnitude faster than its predecessors. The first aspect of this architecture that I explore is a computationally efficient implementation of an adaptive Hierarchical Tile Coding (aHTC), a kind of Adaptive Tile Coding (ATC) in which more general tiles which cover larger portions of the state-action space are kept as ones that cover smaller portions of the state-action space are introduced, using k-dimensional tries (k-d tries) to implement the value function for non-relational Temporal Difference (TD) methods. In order to achieve comparable performance for RRL, I implement the Rete algorithm to replace my k-d tries due to its efficient handling of both the variable binding problem and variable numbers of actions. Tying aHTCs and Rete together, I present a rule grammar that both maps aHTCs onto Rete and allows the architecture to automatically extract relational features in order to support adaptation of the value function over time. I experiment with several refinement criteria and additional functionality with which my agents attempt to determine if rerefinement using different features might allow them to better learn a near optimal policy. I present optimal results using a value criterion for several variants of BlocksWorld. I provide transfer results for BlocksWorld and a scalable Taxicab domain. I additionally introduce a Higher Order Grammar (HOG) that grants online, incremental RRL agents additional flexibility to introduce additional variables and corresponding relations as needed in order to learn effective value functions. I evaluate agents that use the HOG on a version of Blocks World and on an Adventure task. In summary, I present a new online, incremental RRL architecture, a grammar to map aHTCs onto the Rete, and an implementation that is orders of magnitude faster than its predecessors.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145859/1/bazald_1.pd

Deep Blue Documents at the University of Michigan

Locomotion training of legged robots using hybrid machine learning techniques

Author: Doerschuk Peggy I.
Li Andrew L.
Simon William E.
Zhang Wen-Ran
Publication venue
Publication date
Field of study

In this study artificial neural networks and fuzzy logic are used to control the jumping behavior of a three-link uniped robot. The biped locomotion control problem is an increment of the uniped locomotion control. Study of legged locomotion dynamics indicates that a hierarchical controller is required to control the behavior of a legged robot. A structured control strategy is suggested which includes navigator, motion planner, biped coordinator and uniped controllers. A three-link uniped robot simulation is developed to be used as the plant. Neurocontrollers were trained both online and offline. In the case of on-line training, a reinforcement learning technique was used to train the neurocontroller to make the robot jump to a specified height. After several hundred iterations of training, the plant output achieved an accuracy of 7.4%. However, when jump distance and body angular momentum were also included in the control objectives, training time became impractically long. In the case of off-line training, a three-layered backpropagation (BP) network was first used with three inputs, three outputs and 15 to 40 hidden nodes. Pre-generated data were presented to the network with a learning rate as low as 0.003 in order to reach convergence. The low learning rate required for convergence resulted in a very slow training process which took weeks to learn 460 examples. After training, performance of the neurocontroller was rather poor. Consequently, the BP network was replaced by a Cerebeller Model Articulation Controller (CMAC) network. Subsequent experiments described in this document show that the CMAC network is more suitable to the solution of uniped locomotion control problems in terms of both learning efficiency and performance. A new approach is introduced in this report, viz., a self-organizing multiagent cerebeller model for fuzzy-neural control of uniped locomotion is suggested to improve training efficiency. This is currently being evaluated for a possible patent by NASA, Johnson Space Center. An alternative modular approach is also developed which uses separate controllers for each stage of the running stride. A self-organizing fuzzy-neural controller controls the height, distance and angular momentum of the stride. A CMAC-based controller controls the movement of the leg from the time the foot leaves the ground to the time of landing. Because the leg joints are controlled at each time step during flight, movement is smooth and obstacles can be avoided. Initial results indicate that this approach can yield fast, accurate results

NASA Technical Reports Server

A neural network-based trajectory planner for redundant systems using direct inverse modeling

Author: Rudolph Franklin J
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/1992
Field of study

Redundant (i.e., under-determined) systems can not be trained effectively using direct inverse modeling with supervised learning, for reasons well out-lined by Michael Jordan at MIT. There is a loop-hole , however, in Jordan\u27s preconditions, which seems to allow just such an architecture. A robot path planner implementing a cerebellar inspired habituation paradigm with such an architecture will be introduced. The system, called ARTFORMS, for Adaptive Redundant Trajectory Formation System uses on-line training of multiple CMACS. CMACs are locally generalizing networks, and have an a priori deterministic geometric input space mapping. These properties together with on-line learning and rapid convergence satisfy the loop-hole conditions. Issues of stability/plasticity, presentation order and generalization, computational complexity, and subsumptive fusion of multiple networks are discussed. Two implementations are described. The first is shown not to be goal directed enough for ultimate success. The second, which is highly successful, is made more goal directed by the addition of secondary training, which reduces the dimensionality of the problem by using a set of constraint equations. Running open loop with respect to posture (the system metric which reduces dimensionality) is seen to be the root cause of the first system\u27s failure, not the use of the direct inverse method. In fact, several nice properties of direct inverse modeling contribute to the system\u27s convergence speed, robustness and compliance. The central problem used to demonstrate this method is the control of trajectory formation for a planar kinematic chain with a variable number of joints. Finally, this method is extended to implement adaptive obstacle avoidance

UNH Scholars' Repository

Learning for Coordination of Vision and Action

Author: Bajcsy Ruzena
Mitchell Tom
Salganicoff Marcos
Publication venue: ScholarlyCommons
Publication date: 23/11/1992
Field of study

We define the problem of visuomotor coordination and identify bottleneck problems in the implementation of general purpose vision and action systems. We conjecture that machine learning methods provide a general purpose mechanism for combining specific visual and action modules in a task-independent way. We also maintain that successful learning systems reflect realities of the environment, exploit context information, and identify limitations in perceptual algorithms which cannot be captured by the designer. We then propose a multi-step find-and-fetch mobile robot search and retrieval task. This task illustrates where current learning approaches provide solutions and where future research opportunities exist

ScholarlyCommons@Penn

Error minimising gradients for improving cerebellar model articulation controller performance

Author: Scarfe Peter Craig
Publication venue: Curtin University
Publication date: 01/01/2009
Field of study

In motion control applications where the desired trajectory velocity exceeds an actuator’s maximum velocity limitations, large position errors will occur between the desired and actual trajectory responses. In these situations standard control approaches cannot predict the output saturation of the actuator and thus the associated error summation cannot be minimised.An adaptive feedforward control solution such as the Cerebellar Model Articulation Controller (CMAC) is able to provide an inherent level of prediction for these situations, moving the system output in the direction of the excessive desired velocity before actuator saturation occurs. However the pre-empting level of a CMAC is not adaptive, and thus the optimal point in time to start moving the system output in the direction of the excessive desired velocity remains unsolved. While the CMAC can adaptively minimise an actuator’s position error, the minimisation of the summation of error over time created by the divergence of the desired and actual trajectory responses requires an additional adaptive level of control.This thesis presents an improved method of training CMACs to minimise the summation of error over time created when the desired trajectory velocity exceeds the actuator’s maximum velocity limitations. This improved method called the Error Minimising Gradient Controller (EMGC) is able to adaptively modify a CMAC’s training signal so that the CMAC will start to move the output of the system in the direction of the excessive desired velocity with an optimised pre-empting level.The EMGC was originally created to minimise the loss of linguistic information conveyed through an actuated series of concatenated hand sign gestures reproducing deafblind sign language. The EMGC concept however is able to be implemented on any system where the error summation associated with excessive desired velocities needs to be minimised, with the EMGC producing an improved output approximation over using a CMAC alone.In this thesis, the EMGC was tested and benchmarked against a feedforward / feedback combined controller using a CMAC and PID controller. The EMGC was tested on an air-muscle actuator for a variety of situations comprising of a position discontinuity in a continuous desired trajectory. Tested situations included various discontinuity magnitudes together with varying approach and departure gradient profiles.Testing demonstrated that the addition of an EMGC can reduce a situation’s error summation magnitude if the base CMAC controller has not already provided a prior enough pre-empting output in the direction of the situation. The addition of an EMGC to a CMAC produces an improved approximation of reproduced motion trajectories, not only minimising position error for a single sampling instance, but also over time for periodic signals

espace@Curtin