12 research outputs found

    Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

    Full text link
    A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems

    XCS Performance and Population STRUCTURE IN MULTI-STEP ENVIRONMENTS

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre-DSC:DXN039134 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    An overview of LCS research from 2021 to 2022

    Get PDF

    Comparison of Adaptive Behaviors of an Animat in Different Markovian 2-D Environments Using XCS Classifier Systems

    Get PDF
    RÉSUMÉ Le mot "Animat" fut introduit par Stewart W. Wilson en 1985 et a rapidement gagné en popularité dans la lignée des conférences SAB (Simulation of Adaptive Behavior: From Animals to Animats) qui se sont tenues entre 1991 à 2010. Comme la signification du terme "animat" a passablement évoluée au cours de ces années, il est important de préciser que nous avons choisi d'étudier l'animat tel que proposée originellement par Wilson. La recherche sur les animats est un sous-domaine du calcul évolutif, de l'apprentissage machine, du comportement adaptatif et de la vie artificielle. Le but ultime des recherches sur les animats est de construire des animaux artificiels avec des capacités sensorimotrices limitées, mais capables d'adopter un comportement adaptatif pour survivre dans un environnement imprévisible. Différents scénarios d'interaction entre un animat et un environnement donné ont été étudiés et rapportés dans la littérature. Un de ces scénario est de considérer un problème d'animat comme un problème d'apprentissage par renforcement (tel que les processus de décision markovien) et de le résoudre par l'apprentissage de systèmes de classeurs (LCS, Learning Classification Systems) possédant une certaine capacité de généralisation. L'apprentissage d'un système de classification LCS est équivalent à un système qui peut apprendre des chaînes simples de règles en interagissant avec l'environnement et en reçevant diverses récompenses. Le XCS (eXtended Classification System) introduit par Wilson en 1995 est le LCS le plus populaire actuellement. Il utilise le Q-Learning pour résoudre les problèmes d'affectation de crédit (récompense), et il sépare les variables d'adaptation de l'algorithme génétique de celles reliées au mécanisme d'attribution des récompenses. Dans notre recherche, nous avons étudié les performances de XCS, et plusieurs de ses variantes, pour gérer un animat explorant différents types d'environnements 2D à la recherche de nourriture. Les environnements 2D traditionnellement nommés WOODS1, WOODS2 et MAZE5 ont été étudiés, de même que des environnements S2DM (Square 2D Maze) que nous avons conçus pour notre étude. Les variantes de XCS sont XCSS (avec l'opérateur "Specify" qui permet de diminuer la portée de certains classificateurs), et XCSG (avec la descente du gradient en fonction des valeurs de prédiction).---------- Abstract The word “Animat” was introduced by Stewart W. Wilson in 1985 and became popular since the SAB line conferences “Simulation of Adaptive Behavior: from Animals to Animats” that were held between 1991 and 2010. Since the use of this word in the scientific literature has fairly evolved over the years, it is important to specify in this thesis that we have chosen to adopt the definition that was originally proposed by Wilson. The research on animat is a subfield of evolutionary computation, machine learning, adaptive behavior and artificial life. The ultimate goal of animat research is to build artificial animals with limited sensory-motor capabilities but able to behave in an adaptive way to survive in an unknown environment. Different scenarios of interaction between a given animat and a given environment have been studied and reported in the literature. One of the scenarios is to consider animat problems as a reinforcement learning problem (such as a Markov decision processes) and solve it by Learning Classifier Systems (LCS) with certain generalization ability. A Learning classifier system is equivalent to a learning system that can learn simple strings of rules by interacting with the environment and receiving diverse payoffs (rewards). The XCS (eXtented Classification System) [1], introduced by Wilson in 1995, is the most popular Learning Classifier System at the moment. It uses Q-learning to deal with the problem of credit assignment and it separates the fitness variable for genetic algorithm from those linked to credit assignment mechanisms. In our research, we have studied XCS performances and many of its variants, to manage an animat exploring different types of 2D environments in search of food. 2D environments traditionally named WOODS1, WOODS 2 and MAZE5 have been studied, as well as several designed S2DM (SQUARE 2D MAZE) environments which we have conceived for our study. The variants of XCS are XCSS (with the Specify operator which allows removing detrimental rules), and XCSG (using gradient descent according to the prediction value)

    学習戦略に基づく学習分類子システムの設計

    Get PDF
    On Learning Classifier Systems dubbed LCSs a leaning strategy which defines how LCSs cover a state-action space in a problem can be one of the most fundamental options in designing LCSs. There lacks an intensive study of the learning strategy to understand whether and how the learning strategy affects the performance of LCSs. This lack has resulted in the current design methodology of LCS which does not carefully consider the types of learning strategy. The thesis clarifies a need of a design methodology of LCS based on the learning strategy. That is, the thesis shows the learning strategy can be an option that determines the potential performance of LCSs and then claims that LCSs should be designed on the basis of the learning strategy in order to improve the performance of LCSs. First, the thesis empirically claims that the current design methodology of LCS, without the consideration of learning strategy, can be limited to design a proper LCS to solve a problem. This supports the need of design methodology based on the learning strategy. Next, the thesis presents an example of how LCS can be designed on the basis of the learning strategy. The thesis empirically show an adequate learning strategy improving the performance of LCS can be decided depending on a type of problem difficulties such as missing attributes. Then, the thesis draws an inclusive guideline that explains which learning strategy should be used to address which types of problem difficulties. Finally, the thesis further shows, on an application of LCS for a human daily activity recognition problem, the adequate learning strategy according to the guideline effectively improves the performance of the application. The thesis concludes that the learning strategy is the option of the LCS design which determines the potential performance of LCSs. Thus, before designing any type of LCSs including their applications, the learning strategy should be adequately selected at first, because their performance degrades when they employ an inadequate learning strategy to a problem they want to solve. In other words, LCSs should be designed on the basis of the adequate learning strategy.電気通信大学201

    Evolutionary Reinforcement Learning of Spoken Dialogue Strategies

    Get PDF
    Institute for Communicating and Collaborative SystemsFrom a system developer's perspective, designing a spoken dialogue system can be a time-consuming and difficult process. A developer may spend a lot of time anticipating how a potential user might interact with the system and then deciding on the most appropriate system response. These decisions are encoded in a dialogue strategy, essentially a mapping between anticipated user inputs and appropriate system outputs. To reduce the time and effort associated with developing a dialogue strategy, recent work has concentrated on modelling the development of a dialogue strategy as a sequential decision problem. Using this model, reinforcement learning algorithms have been employed to generate dialogue strategies automatically. These algorithms learn strategies by interacting with simulated users. Some progress has been made with this method but a number of important challenges remain. For instance, relatively little success has been achieved with the large state representations that are typical of real-life systems. Another crucial issue is the time and effort associated with the creation of simulated users. In this thesis, I propose an alternative to existing reinforcement learning methods of dialogue strategy development. More specifically, I explore how XCS, an evolutionary reinforcement learning algorithm, can be used to find dialogue strategies that cover large state spaces. Furthermore, I suggest that hand-coded simulated users are sufficient for the learning of useful dialogue strategies. I argue that the use of evolutionary reinforcement learning and hand-coded simulated users is an effective approach to the rapid development of spoken dialogue strategies. Finally, I substantiate this claim by evaluating a learned strategy with real users. Both the learned strategy and a state-of-the-art hand-coded strategy were integrated into an end-to-end spoken dialogue system. The dialogue system allowed real users to make flight enquiries using a live database for an Edinburgh-based airline. The performance of the learned and hand-coded strategies were compared. The evaluation results show that the learned strategy performs as well as the hand-coded one (81% and 77% task completion respectively) but takes much less time to design (two days instead of two weeks). Moreover, the learned strategy compares favourably with previous user evaluations of learned strategies

    Feedback of delayed rewards in XCS for environments with aliasing states

    No full text
    Wilson [13] showed how delayed reward feedback can be used to solve many multi-step problems for the widely used XCS learning classifier system. However, Wilson's method based on back-propagation with discounting from Q-learning runs into difficulties in environments with aliasing states, since the local reward function often does not converge. This paper describes a different approach to reward feedback, in which a layered reward scheme for XCS classifiers is learnt during training. We show that, with a relatively minor modification to XCS feedback, the approach not only solves problems such as Woodsl but can also solve aliasing states problems such as Littman57, Miya-zakiA and MazeB
    corecore