64 research outputs found

    MILCS: A mutual information learning classifier system

    Get PDF
    This paper introduces a new variety of learning classifier system (LCS), called MILCS, which utilizes mutual information as fitness feedback. Unlike most LCSs, MILCS is specifically designed for supervised learning. MILCS's design draws on an analogy to the structural learning approach of cascade correlation networks. We present preliminary results, and contrast them to results from XCS. We discuss the explanatory power of the resulting rule sets, and introduce a new technique for visualizing explanatory power. Final comments include future directions for this research, including investigations in neural networks and other systems. Copyright 2007 ACM

    Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data

    Get PDF
    Background: DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality. Aim The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks. Method: We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS. Results: The results indicate that the use of feature selection/ranking methods is essential for tackling high-dimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set. Conclusion: Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features

    SupRB: A Supervised Rule-based Learning System for Continuous Problems

    Get PDF
    We propose the SupRB learning system, a new Pittsburgh-style learning classifier system (LCS) for supervised learning on multi-dimensional continuous decision problems. SupRB learns an approximation of a quality function from examples (consisting of situations, choices and associated qualities) and is then able to make an optimal choice as well as predict the quality of a choice in a given situation. One area of application for SupRB is parametrization of industrial machinery. In this field, acceptance of the recommendations of machine learning systems is highly reliant on operators' trust. While an essential and much-researched ingredient for that trust is prediction quality, it seems that this alone is not enough. At least as important is a human-understandable explanation of the reasoning behind a recommendation. While many state-of-the-art methods such as artificial neural networks fall short of this, LCSs such as SupRB provide human-readable rules that can be understood very easily. The prevalent LCSs are not directly applicable to this problem as they lack support for continuous choices. This paper lays the foundations for SupRB and shows its general applicability on a simplified model of an additive manufacturing problem.Comment: Submitted to the Genetic and Evolutionary Computation Conference 2020 (GECCO 2020

    Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

    Full text link
    Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts as a Q-function approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction - GNMC) and study its operation on XCSF's trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive

    Facing online challenges using learning classifier systems

    Get PDF
    Els grans avenços en el camp de l’aprenentatge automàtic han resultat en el disseny de màquines competents que són capaces d’aprendre i d’extreure informació útil i original de l’experiència. Recentment, algunes d’aquestes tècniques d’aprenentatge s’han aplicat amb èxit per resoldre problemes del món real en àmbits tecnològics, mèdics, científics i industrials, els quals no es podien tractar amb tècniques convencionals d’anàlisi ja sigui per la seva complexitat o pel gran volum de dades a processar. Donat aquest èxit inicial, actualment els sistemes d’aprenentatge s’enfronten a problemes de complexitat més elevada, el que ha resultat en un augment de l’activitat investigadora entorn sistemes capaços d’afrontar nous problemes del món real eficientment i de manera escalable. Una de les famílies d’algorismes més prometedores en l’aprenentatge automàtic són els sistemes classificadors basats en algorismes genetics (LCSs), el funcionament dels quals s’inspira en la natura. Els LCSs intenten representar les polítiques d’actuació d’experts humans amb un conjunt de regles que s’empren per escollir les millors accions a realitzar en tot moment. Així doncs, aquests sistemes aprenen polítiques d’actuació de manera incremental a mida que van adquirint experiència a través de la informació nova que se’ls va presentant durant el temps. Els LCSs s’han aplicat, amb èxit, a camps tan diversos com la predicció de càncer de pròstata o el suport a la inversió en borsa, entre altres. A més en alguns casos s’ha demostrat que els LCSs realitzen tasques superant la precisió dels éssers humans. El propòsit d’aquesta tesi és explorar la naturalesa de l’aprenentatge online dels LCSs d’estil Michigan per a la mineria de grans quantitats de dades en forma de fluxos d’informació continus a alta velocitat i canviants en el temps. Molt sovint, l’extracció de coneixement a partir d’aquestes fonts de dades és clau per tal d’obtenir una millor comprensió dels processos que les dades estan descrivint. Així, aprendre d’aquestes dades planteja nous reptes a les tècniques tradicionals d’aprenentatge automàtic, les quals no estan dissenyades per tractar fluxos de dades continus i on els conceptes i els nivells de soroll poden variar amb el temps de forma arbitrària. La contribució de la present tesi pren l’eXtended Classifier System (XCS), el LCS d’estil Michigan més estudiat i un dels algoritmes d’aprenentatge automàtic més competents, com el punt de partida. D’aquesta manera els reptes abordats en aquesta tesi són dos: el primer desafiament és la construcció d’un sistema supervisat competent sobre el framework dels LCSs d’estil Michigan que aprèn dels fluxos de dades amb una capacitat de reacció ràpida als canvis de concepte i entrades amb soroll. Com moltes aplicacions científiques i industrials generen grans quantitats de dades sense etiquetar, el segon repte és aplicar les lliçons apreses per continuar amb el disseny de LCSs d’estil Michigan capaços de solucionar problemes online sense assumir una estructura a priori en els dades d’entrada.Los grandes avances en el campo del aprendizaje automático han resultado en el diseño de máquinas capaces de aprender y de extraer información útil y original de la experiencia. Recientemente alguna de estas técnicas de aprendizaje se han aplicado con éxito para resolver problemas del mundo real en ámbitos tecnológicos, médicos, científicos e industriales, los cuales no se podían tratar con técnicas convencionales de análisis ya sea por su complejidad o por el gran volumen de datos a procesar. Dado este éxito inicial, los sistemas de aprendizaje automático se enfrentan actualmente a problemas de complejidad cada vez m ́as elevada, lo que ha resultado en un aumento de la actividad investigadora en sistemas capaces de afrontar nuevos problemas del mundo real de manera eficiente y escalable. Una de las familias más prometedoras dentro del aprendizaje automático son los sistemas clasificadores basados en algoritmos genéticos (LCSs), el funcionamiento de los cuales se inspira en la naturaleza. Los LCSs intentan representar las políticas de actuación de expertos humanos usando conjuntos de reglas que se emplean para escoger las mejores acciones a realizar en todo momento. Así pues estos sistemas aprenden políticas de actuación de manera incremental mientras van adquiriendo experiencia a través de la nueva información que se les va presentando. Los LCSs se han aplicado con éxito en campos tan diversos como en la predicción de cáncer de próstata o en sistemas de soporte de bolsa, entre otros. Además en algunos casos se ha demostrado que los LCSs realizan tareas superando la precisión de expertos humanos. El propósito de la presente tesis es explorar la naturaleza online del aprendizaje empleado por los LCSs de estilo Michigan para la minería de grandes cantidades de datos en forma de flujos continuos de información a alta velocidad y cambiantes en el tiempo. La extracción del conocimiento a partir de estas fuentes de datos es clave para obtener una mejor comprensión de los procesos que se describen. Así, aprender de estos datos plantea nuevos retos a las técnicas tradicionales, las cuales no están diseñadas para tratar flujos de datos continuos y donde los conceptos y los niveles de ruido pueden variar en el tiempo de forma arbitraria. La contribución del la presente tesis toma el eXtended Classifier System (XCS), el LCS de tipo Michigan más estudiado y uno de los sistemas de aprendizaje automático más competentes, como punto de partida. De esta forma los retos abordados en esta tesis son dos: el primer desafío es la construcción de un sistema supervisado competente sobre el framework de los LCSs de estilo Michigan que aprende de flujos de datos con una capacidad de reacción rápida a los cambios de concepto y al ruido. Como muchas aplicaciones científicas e industriales generan grandes volúmenes de datos sin etiquetar, el segundo reto es aplicar las lecciones aprendidas para continuar con el diseño de nuevos LCSs de tipo Michigan capaces de solucionar problemas online sin asumir una estructura a priori en los datos de entrada.Last advances in machine learning have fostered the design of competent algorithms that are able to learn and extract novel and useful information from data. Recently, some of these techniques have been successfully applied to solve real-­‐world problems in distinct technological, scientific and industrial areas; problems that were not possible to handle by the traditional engineering methodology of analysis either for their inherent complexity or by the huge volumes of data involved. Due to the initial success of these pioneers, current machine learning systems are facing problems with higher difficulties that hamper the learning process of such algorithms, promoting the interest of practitioners for designing systems that are able to scalably and efficiently tackle real-­‐world problems. One of the most appealing machine learning paradigms are Learning Classifier Systems (LCSs), and more specifically Michigan-­‐style LCSs, an open framework that combines an apportionment of credit mechanism with a knowledge discovery technique inspired by biological processes to evolve their internal knowledge. In this regard, LCSs mimic human experts by making use of rule lists to choose the best action to a given problem situation, acquiring their knowledge through the experience. LCSs have been applied with relative success to a wide set of real-­‐ world problems such as cancer prediction or business support systems, among many others. Furthermore, on some of these areas LCSs have demonstrated learning capacities that exceed those of human experts for that particular task. The purpose of this thesis is to explore the online learning nature of Michigan-­‐style LCSs for mining large amounts of data in the form of continuous, high speed and time-­‐changing streams of information. Most often, extracting knowledge from these data is key, in order to gain a better understanding of the processes that the data are describing. Learning from these data poses new challenges to traditional machine learning techniques, which are not typically designed to deal with data in which concepts and noise levels may vary over time. The contribution of this thesis takes the extended classifier system (XCS), the most studied Michigan-­‐style LCS and one of the most competent machine learning algorithms, as the starting point. Thus, the challenges addressed in this thesis are twofold: the first challenge is building a competent supervised system based on the guidance of Michigan-­‐style LCSs that learns from data streams with a fast reaction capacity to changes in concept and noisy inputs. As many scientific and industrial applications generate vast amounts of unlabelled data, the second challenge is to apply the lessons learned in the previous issue to continue with the design of unsupervised Michigan-­‐style LCSs that handle online problems without assuming any a priori structure in input data
    corecore