34,183 research outputs found

    Evolutionary Algorithms for Reinforcement Learning

    Full text link
    There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

    Reactive with tags classifier system applied to real robot navigation

    Get PDF
    7th IEEE International Conference on Emerging Technologies and Factory Automation. Barcelona, 18-21 October 1999.A reactive with tags classifier system (RTCS) is a special classifier system. This system combines the execution capabilities of symbolic systems and the learning capabilities of genetic algorithms. A RTCS is able to learn symbolic rules that allow to generate sequence of actions, chaining rules among different time instants, and react to new environmental situations, considering the last environmental situation to take a decision. The capacity of RTCS to learn good rules has been prove in robotics navigation problem. Results show the suitability of this approximation to the navigation problem and the coherence of extracted rules

    An enhanced classifier system for autonomous robot navigation in dynamic environments

    Get PDF
    In many cases, a real robot application requires the navigation in dynamic environments. The navigation problem involves two main tasks: to avoid obstacles and to reach a goal. Generally, this problem could be faced considering reactions and sequences of actions. For solving the navigation problem a complete controller, including actions and reactions, is needed. Machine learning techniques has been applied to learn these controllers. Classifier Systems (CS) have proven their ability of continuos learning in these domains. However, CS have some problems in reactive systems. In this paper, a modified CS is proposed to overcome these problems. Two special mechanisms are included in the developed CS to allow the learning of both reactions and sequences of actions. The learning process has been divided in two main tasks: first, the discrimination between a predefined set of rules and second, the discovery of new rules to obtain a successful operation in dynamic environments. Different experiments have been carried out using a mini-robot Khepera to find a generalised solution. The results show the ability of the system to continuous learning and adaptation to new situations.Publicad

    Distributed ARTMAP

    Full text link
    Distributed coding at the hidden layer of a multi-layer perceptron (MLP) endows the network with memory compression and noise tolerance capabilities. However, an MLP typically requires slow off-line learning to avoid catastrophic forgetting in an open input environment. An adaptive resonance theory (ART) model is designed to guarantee stable memories even with fast on-line learning. However, ART stability typically requires winner-take-all coding, which may cause category proliferation in a noisy input environment. Distributed ARTMAP (dARTMAP) seeks to combine the computational advantages of MLP and ART systems in a real-time neural network for supervised learning. This system incorporates elements of the unsupervised dART model as well as new features, including a content-addressable memory (CAM) rule. Simulations show that dARTMAP retains fuzzy ARTMAP accuracy while significantly improving memory compression. The model's computational learning rules correspond to paradoxical cortical data.Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657

    Credit Assignment in Adaptive Evolutionary Algorithms

    Get PDF
    In this paper, a new method for assigning credit to search\ud operators is presented. Starting with the principle of optimizing\ud search bias, search operators are selected based on an ability to\ud create solutions that are historically linked to future generations.\ud Using a novel framework for defining performance\ud measurements, distributing credit for performance, and the\ud statistical interpretation of this credit, a new adaptive method is\ud developed and shown to outperform a variety of adaptive and\ud non-adaptive competitors

    A reactive approach to classifier systems

    Get PDF
    IEEE International Conference on Systems, Man, and Cybernetics. San Diego, CA, 11-14 Oct. 1998The navigation problem involves how to reach a goal avoiding obstacles in dynamic environments. This problem can be faced considering reactions and/or sequences of actions. Classifier Systems (CS) have proven their ability of continuous learning, however they have some problems in reactive systems. A modified CS is proposed to overcome these problems. Two special mechanisms are included in the developed CS to allow the learning of both reactions and sequences of actions. This learning process involves two main tasks: first, discriminating between rules and second, the discovery of new rules to obtain a successful operation in dynamic environments. Different experiments have been carried out using a mini-robot Khepera to find a generalized solution. The results show the ability of the system for continuous learning and adaptation to new situations

    Applying classifier systems to learn the reactions in mobile robots

    Get PDF
    The navigation problem involves how to reach a goal avoiding obstacles in dynamic environments. This problem can be faced considering reactions and sequences of actions. Classifier systems (CSs) have proven their ability of continuous learning, however, they have some problems in reactive systems. A modified CS, namely a reactive classifier system (RCS), is proposed to overcome those problems. Two special mechanisms are included in the RCS: the non-existence of internal cycles inside the CS (no internal cycles) and the fusion of environmental message with the messages posted to the message list in the previous instant (generation list through fusion). These mechanisms allow the learning of both reactions and sequences of actions. This learning process involves two main tasks: first, discriminate between rules and, second, the discovery of new rules to obtain a successful operation in dynamic environments. DiVerent experiments have been carried out using a mini-robot Khepera to find a generalized solution. The results show the ability of the system for continuous learning and adaptation to new situations.Publicad

    Learning sequences of rules using classifier systems with tags

    Get PDF
    IEEE International Conference on Systems, Man, and Cybernetics. Tokyo, 12-15 October 1999.The objective of this paper was to obtain an encoding structure that would allow the genetic evolution of rules in such a manner that the number of rules and relationship in a classifier system (CS) would be learnt in the evolution process. For this purpose, an area that allows the definition of rule groups has been entered into the condition and message part of the encoded rules. This area is called internal tag. This term was coined because the system has some similarities with natural processes that take place in certain animal species, where the existence of tags allows them to communicate and recognize each other. Such CS is called a tag classifier system (TCS). The TCS has been tested in the game of draughts and compared with the classical CS. The results show an improving of the CS performance

    Sustainable Cooperative Coevolution with a Multi-Armed Bandit

    Get PDF
    This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201
    • …
    corecore