34 research outputs found

    Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

    Full text link
    A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems

    Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

    Full text link
    Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts as a Q-function approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction - GNMC) and study its operation on XCSF's trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive

    Comparison of Adaptive Behaviors of an Animat in Different Markovian 2-D Environments Using XCS Classifier Systems

    Get PDF
    RÉSUMÉ Le mot "Animat" fut introduit par Stewart W. Wilson en 1985 et a rapidement gagné en popularité dans la lignée des conférences SAB (Simulation of Adaptive Behavior: From Animals to Animats) qui se sont tenues entre 1991 à 2010. Comme la signification du terme "animat" a passablement évoluée au cours de ces années, il est important de préciser que nous avons choisi d'étudier l'animat tel que proposée originellement par Wilson. La recherche sur les animats est un sous-domaine du calcul évolutif, de l'apprentissage machine, du comportement adaptatif et de la vie artificielle. Le but ultime des recherches sur les animats est de construire des animaux artificiels avec des capacités sensorimotrices limitées, mais capables d'adopter un comportement adaptatif pour survivre dans un environnement imprévisible. Différents scénarios d'interaction entre un animat et un environnement donné ont été étudiés et rapportés dans la littérature. Un de ces scénario est de considérer un problème d'animat comme un problème d'apprentissage par renforcement (tel que les processus de décision markovien) et de le résoudre par l'apprentissage de systèmes de classeurs (LCS, Learning Classification Systems) possédant une certaine capacité de généralisation. L'apprentissage d'un système de classification LCS est équivalent à un système qui peut apprendre des chaînes simples de règles en interagissant avec l'environnement et en reçevant diverses récompenses. Le XCS (eXtended Classification System) introduit par Wilson en 1995 est le LCS le plus populaire actuellement. Il utilise le Q-Learning pour résoudre les problèmes d'affectation de crédit (récompense), et il sépare les variables d'adaptation de l'algorithme génétique de celles reliées au mécanisme d'attribution des récompenses. Dans notre recherche, nous avons étudié les performances de XCS, et plusieurs de ses variantes, pour gérer un animat explorant différents types d'environnements 2D à la recherche de nourriture. Les environnements 2D traditionnellement nommés WOODS1, WOODS2 et MAZE5 ont été étudiés, de même que des environnements S2DM (Square 2D Maze) que nous avons conçus pour notre étude. Les variantes de XCS sont XCSS (avec l'opérateur "Specify" qui permet de diminuer la portée de certains classificateurs), et XCSG (avec la descente du gradient en fonction des valeurs de prédiction).---------- Abstract The word “Animat” was introduced by Stewart W. Wilson in 1985 and became popular since the SAB line conferences “Simulation of Adaptive Behavior: from Animals to Animats” that were held between 1991 and 2010. Since the use of this word in the scientific literature has fairly evolved over the years, it is important to specify in this thesis that we have chosen to adopt the definition that was originally proposed by Wilson. The research on animat is a subfield of evolutionary computation, machine learning, adaptive behavior and artificial life. The ultimate goal of animat research is to build artificial animals with limited sensory-motor capabilities but able to behave in an adaptive way to survive in an unknown environment. Different scenarios of interaction between a given animat and a given environment have been studied and reported in the literature. One of the scenarios is to consider animat problems as a reinforcement learning problem (such as a Markov decision processes) and solve it by Learning Classifier Systems (LCS) with certain generalization ability. A Learning classifier system is equivalent to a learning system that can learn simple strings of rules by interacting with the environment and receiving diverse payoffs (rewards). The XCS (eXtented Classification System) [1], introduced by Wilson in 1995, is the most popular Learning Classifier System at the moment. It uses Q-learning to deal with the problem of credit assignment and it separates the fitness variable for genetic algorithm from those linked to credit assignment mechanisms. In our research, we have studied XCS performances and many of its variants, to manage an animat exploring different types of 2D environments in search of food. 2D environments traditionally named WOODS1, WOODS 2 and MAZE5 have been studied, as well as several designed S2DM (SQUARE 2D MAZE) environments which we have conceived for our study. The variants of XCS are XCSS (with the Specify operator which allows removing detrimental rules), and XCSG (using gradient descent according to the prediction value)

    A brief history of learning classifier systems: from CS-1 to XCS and its variants

    Get PDF
    © 2015, Springer-Verlag Berlin Heidelberg. The direction set by Wilson’s XCS is that modern Learning Classifier Systems can be characterized by their use of rule accuracy as the utility metric for the search algorithm(s) discovering useful rules. Such searching typically takes place within the restricted space of co-active rules for efficiency. This paper gives an overview of the evolution of Learning Classifier Systems up to XCS, and then of some of the subsequent developments of Wilson’s algorithm to different types of learning

    XCS Classifier System with Experience Replay

    Full text link
    XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks -- among which the Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage stored raw experiences encountered so far. To bridge this gap, this paper investigates the benefits of extending XCS with ER. On the one hand, we demonstrate that for single-step tasks ER bears massive potential for improvements in terms of sample efficiency. On the shady side, however, we reveal that the use of ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains
    corecore