18 research outputs found
Comparison of Adaptive Behaviors of an Animat in Different Markovian 2-D Environments Using XCS Classifier Systems
RÉSUMÉ Le mot "Animat" fut introduit par Stewart W. Wilson en 1985 et a rapidement gagné en popularité dans la lignée des conférences SAB (Simulation of Adaptive Behavior: From Animals to Animats) qui se sont tenues entre 1991 à 2010. Comme la signification du terme "animat" a passablement évoluée au cours de ces années, il est important de préciser que nous avons choisi d'étudier l'animat tel que proposée originellement par Wilson. La recherche sur les animats est un sous-domaine du calcul évolutif, de l'apprentissage machine, du comportement adaptatif et de la vie artificielle. Le but ultime des recherches sur les animats est de construire des animaux artificiels avec des capacités sensorimotrices limitées, mais capables d'adopter un comportement adaptatif pour survivre dans un environnement imprévisible. Différents scénarios d'interaction entre un animat et un environnement donné ont été étudiés et rapportés dans la littérature. Un de ces scénario est de considérer un problème d'animat comme un problème d'apprentissage par renforcement (tel que les processus de décision markovien) et de le résoudre par l'apprentissage de systèmes de classeurs (LCS, Learning Classification Systems) possédant une certaine capacité de généralisation. L'apprentissage d'un système de classification LCS est équivalent à un système qui peut apprendre des chaînes simples de règles en interagissant avec l'environnement et en reçevant diverses récompenses.
Le XCS (eXtended Classification System) introduit par Wilson en 1995 est le LCS le plus populaire actuellement. Il utilise le Q-Learning pour résoudre les problèmes d'affectation de crédit (récompense), et il sépare les variables d'adaptation de l'algorithme génétique de celles reliées au mécanisme d'attribution des récompenses.
Dans notre recherche, nous avons étudié les performances de XCS, et plusieurs de ses variantes, pour gérer un animat explorant différents types d'environnements 2D à la recherche de nourriture. Les environnements 2D traditionnellement nommés WOODS1, WOODS2 et MAZE5 ont été étudiés, de même que des environnements S2DM (Square 2D Maze) que nous avons conçus pour notre étude. Les variantes de XCS sont XCSS (avec l'opérateur "Specify" qui permet de diminuer la portée de certains classificateurs), et XCSG (avec la descente du gradient en fonction des valeurs de prédiction).---------- Abstract The word “Animat” was introduced by Stewart W. Wilson in 1985 and became popular since the SAB line conferences “Simulation of Adaptive Behavior: from Animals to Animats” that were held between 1991 and 2010. Since the use of this word in the scientific literature has fairly evolved over the years, it is important to specify in this thesis that we have chosen to adopt the definition that was originally proposed by Wilson. The research on animat is a subfield of evolutionary computation, machine learning, adaptive behavior and artificial life. The ultimate goal of animat research is to build artificial animals with limited sensory-motor capabilities but able to behave in an adaptive way to survive in an unknown environment. Different scenarios of interaction between a given animat and a given environment have been studied and reported in the literature. One of the scenarios is to consider animat problems as a reinforcement learning problem (such as a Markov decision processes) and solve it by Learning Classifier Systems (LCS) with certain generalization ability. A Learning classifier system is equivalent to a learning system that can learn simple strings of rules by interacting with the environment and receiving diverse payoffs (rewards). The XCS (eXtented Classification System) [1], introduced by Wilson in 1995, is the most popular Learning Classifier System at the moment. It uses Q-learning to deal with the problem of credit assignment and it separates the fitness variable for genetic algorithm from those linked to credit assignment mechanisms. In our research, we have studied XCS performances and many of its variants, to manage an animat exploring different types of 2D environments in search of food. 2D environments traditionally named WOODS1, WOODS 2 and MAZE5 have been studied, as well as several designed S2DM (SQUARE 2D MAZE) environments which we have conceived for our study. The variants of XCS are XCSS (with the Specify operator which allows removing detrimental rules), and XCSG (using gradient descent according to the prediction value)
Event-driven Hybrid Classifier Systems and Online Learning for Soccer Game Strategies
The field of robot soccer is a useful setting for the study of artificial intelligence and machin
学習戦略に基づく学習分類子システムの設計
On Learning Classifier Systems dubbed LCSs a leaning strategy which defines how LCSs cover a state-action space in a problem can be one of the most fundamental options in designing LCSs. There lacks an intensive study of the learning strategy to understand whether and how the learning strategy affects the performance of LCSs. This lack has resulted in the current design methodology of LCS which does not carefully consider the types of learning strategy. The thesis clarifies a need of a design methodology of LCS based on the learning strategy. That is, the thesis shows the learning strategy can be an option that determines the potential performance of LCSs and then claims that LCSs should be designed on the basis of the learning strategy in order to improve the performance of LCSs. First, the thesis empirically claims that the current design methodology of LCS, without the consideration of learning strategy, can be limited to design a proper LCS to solve a problem. This supports the need of design methodology based on the learning strategy. Next, the thesis presents an example of how LCS can be designed on the basis of the learning strategy. The thesis empirically show an adequate learning strategy improving the performance of LCS can be decided depending on a type of problem difficulties such as missing attributes. Then, the thesis draws an inclusive guideline that explains which learning strategy should be used to address which types of problem difficulties. Finally, the thesis further shows, on an application of LCS for a human daily activity recognition problem, the adequate learning strategy according to the guideline effectively improves the performance of the application. The thesis concludes that the learning strategy is the option of the LCS design which determines the potential performance of LCSs. Thus, before designing any type of LCSs including their applications, the learning strategy should be adequately selected at first, because their performance degrades when they employ an inadequate learning strategy to a problem they want to solve. In other words, LCSs should be designed on the basis of the adequate learning strategy.電気通信大学201
XCS Algorithms for a Linear Combination of Discounted and Undiscounted Reward Markovian Decision Processes
RÉSUMÉ : Plusieurs études ont montré que combiner certains prédicteurs ensemble peut améliorer la justesse de la prédiction dans certains domaines comme la psychologie, les statistiques ou les sciences du management. Toutefois, aucune de ces études n'ont testé la combinaison de techniques d'apprentissage par renforcement. Notre étude vise à développer un algorithme basé sur deux algorithmes qui sont des formes approximatives d'apprentissage par renforcement répétés dans XCS. Cet algorithme, MIXCS, est une combinaison des techniques de Q-learning et de R-learning pour calculer la combinaison linéaire du payoff résultant des actions de l'agent, et aussi la correspondance entre la prédiction au niveau du système et la valeur réelle des actions de l'agent. MIXCS fait une prévision du payoff espéré pour chacune des actions disponibles pour l'agent.
Nous avons testé MIXCS dans deux environnements à deux dimensions, Environment1 et Environment2, qui reproduisent les actions possibles dans un marché financier (acheter, vendre, ne rien faire) pour évaluer les performances d'un agent qui veut obtenir un profit espéré. Nous avons calculé le payoff optimal moyen dans nos deux environnements et avons comparé avec les résultats obtenus par MIXCS. Nous avons obtenu deux résultats. En premier, les résultats de MIXCS sont semblables au payoff optimal moyen pour Environments1, mais pas pour Environment2. Deuxièmement, l'agent obtient le payoff optimal moyen quand il prend l'action "vendre" dans les deux environnements.----------ABSTRACT : Many studies have shown that combining individual predictors improved the accuracy of predictions in different domains such as psychology, statistics and management sciences. However, these studies have not tested the combination of reinforcement learning techniques. This study aims to develop an algorithm based on two iterative approximate forms of reinforcement learning algorithm in XCS. This algorithm, named MIXCS, is a combination of Q-learning and R-learning techniques to compute the linear combination payoff and the correspondence between the system prediction and the action value. As such, MIXCS predicts the payoff to be expected for each possible action.
We test MIXCS in two two-dimensional grids called Environment1 and Environment2, which represent financial markets actions of buying, selling and holding to evaluate the performance of an agent as a trader to gain the desired profit. We calculate the optimum average payoff to predict the value of the next movement in both Environment1 and Environment2 and compare the results with those obtained with MIXCS. The results show that the performance of MIXCS is close to optimum average reward in Environment1, but not in Environment2. Also, the agent reaches the maximum reward by taking selling actions in both Environments
Learning classifier systems from first principles: A probabilistic reformulation of learning classifier systems from the perspective of machine learning
Learning Classifier Systems (LCS) are a family of rule-based machine learning methods. They aim at the autonomous production of potentially human readable results that are the most compact generalised representation whilst also maintaining high predictive accuracy, with a wide range of application areas, such as autonomous robotics, economics, and multi-agent systems. Their design is mainly approached heuristically and, even though their performance is competitive in regression and classification tasks, they do not meet their expected performance in sequential decision tasks despite being initially designed for such tasks. It is out contention that improvement is hindered by a lack of theoretical understanding of their underlying mechanisms and dynamics.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Neuroevolutionary reinforcement learning for generalized control of simulated helicopters
This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks