11 research outputs found

    A brief history of learning classifier systems: from CS-1 to XCS and its variants

    Get PDF
    © 2015, Springer-Verlag Berlin Heidelberg. The direction set by Wilson’s XCS is that modern Learning Classifier Systems can be characterized by their use of rule accuracy as the utility metric for the search algorithm(s) discovering useful rules. Such searching typically takes place within the restricted space of co-active rules for efficiency. This paper gives an overview of the evolution of Learning Classifier Systems up to XCS, and then of some of the subsequent developments of Wilson’s algorithm to different types of learning

    An overview of LCS research from 2021 to 2022

    Get PDF

    Evolutionary computation for software testing

    Get PDF
    A variety of products undergo a transformation from a pure mechanical design to more and more software and electronic components. A polarized example are watches. Several decades ago they have been purely mechanical. Modern smart watches are almost completely electronic devices which heavily rely on software. Further, a smart watch offers a lot more features than just the information about the current time. This change had a crucial impact on how software is being developed. A first attempt to control the rising complexity was to move to agile development practices such as extreme programming or scrum. This rise in complexity is not only affecting the development process but also quality assurance and software testing. If a product contains more and more features then this leads to a higher number of tests necessary to ensure quality standards. Furthermore agile development practices work in an iterative manner which leads to repetitive testing that puts more effort on the testing team. We aimed within the thesis to ease the pain of testing. Thereby we examined a series of subproblems that arise. A key complexity is the number of test cases. We intended to reduce the number of test cases before they are executed manually or implemented as automated tests. Thereby we examined the test specification and based on the requirements coverage of the individual tests, we were able to identify redundant tests. We relied on a novel metaheuristic called GCAIS which we improved upon iteratively. Another task is to control the remaining complexity. Testing is often time crucial and an appropriate subset of the available tests must be chosen in order to get a quick insight into the status of the device under test. We examined this challenge in two different testing scenarios. The first scenario is located in semi-automated testing where engineers execute a set of automated tests locally and closely observe the behaviour of the system under test. We extended GCAIS to compute test suites that satisfy different criteria if provided with sufficient search time. The second use case is located in fully automated testing in a continuous integration (CI) setting. CI focuses on frequent software build cycles which also include testing. These builds contain a testing stage which greatly emphasizes speed. Thus there we also have to compute crucial tests. However, due to the nature of the process we have to continuously recompute a test suite for each build as the software and maybe even the test cases at hand have changed. Hence it is hard to compute the test suite ahead of time and these tests have to be determined as part of the CI execution. Thus we switched to a computational lightweight learning classifier system (LCS) to prioritize and select test cases. We integrated a series of innovations we made into an LCS known as XCSF such as continuous priorities, experience replay and transfer learning. This enabled us to outperform a state of the art artificial neural network which is used by companies such as Netflix. We further investigated how LCS can be made faster using parallelism. We developed generic approaches which may run on any multicore computing device. This is of interest for our CI use case as the build server's architecture is unknown. However, the methods are also independent of the concrete LCS and are not linked to our testing problem. We identified that many of the challenges that need to be faced in the CI use case have been tackled by Organic Computing (OC), for example the need to adapt to an ever changing environment. Hence we relied on OC design principles to create a system architecture which wraps the LCS developed and integrates it into existing CI processes. The final system is robust and highly autonomous. A side-effect of the high degree of autonomy is a high level of automatization which fits CI well. We also gave insight on the usability and delivery of the full system to our industrial partner. Test engineers can easily integrate it with a few lines of code and need no knowledge about LCS and OC in order to use it. Another implication of the developed system is that OC's ideas and design principles can also be employed outside the field of embedded systems. This shows that OC has a greater level of generality. The process of testing and correcting found errors is still only partially automated. We make a first step into automating the entire process and thereby take an analogy to the concept of self-healing of OC. As a first proof of concept of this school of thought we take a look at touch interfaces. There we can automatically manipulate the software to fulfill the specified behaviour. Thus only a minimalistic amount of manual work is required

    Subject Index

    Get PDF

    Facing online challenges using learning classifier systems

    Get PDF
    Els grans avenços en el camp de l’aprenentatge automàtic han resultat en el disseny de màquines competents que són capaces d’aprendre i d’extreure informació útil i original de l’experiència. Recentment, algunes d’aquestes tècniques d’aprenentatge s’han aplicat amb èxit per resoldre problemes del món real en àmbits tecnològics, mèdics, científics i industrials, els quals no es podien tractar amb tècniques convencionals d’anàlisi ja sigui per la seva complexitat o pel gran volum de dades a processar. Donat aquest èxit inicial, actualment els sistemes d’aprenentatge s’enfronten a problemes de complexitat més elevada, el que ha resultat en un augment de l’activitat investigadora entorn sistemes capaços d’afrontar nous problemes del món real eficientment i de manera escalable. Una de les famílies d’algorismes més prometedores en l’aprenentatge automàtic són els sistemes classificadors basats en algorismes genetics (LCSs), el funcionament dels quals s’inspira en la natura. Els LCSs intenten representar les polítiques d’actuació d’experts humans amb un conjunt de regles que s’empren per escollir les millors accions a realitzar en tot moment. Així doncs, aquests sistemes aprenen polítiques d’actuació de manera incremental a mida que van adquirint experiència a través de la informació nova que se’ls va presentant durant el temps. Els LCSs s’han aplicat, amb èxit, a camps tan diversos com la predicció de càncer de pròstata o el suport a la inversió en borsa, entre altres. A més en alguns casos s’ha demostrat que els LCSs realitzen tasques superant la precisió dels éssers humans. El propòsit d’aquesta tesi és explorar la naturalesa de l’aprenentatge online dels LCSs d’estil Michigan per a la mineria de grans quantitats de dades en forma de fluxos d’informació continus a alta velocitat i canviants en el temps. Molt sovint, l’extracció de coneixement a partir d’aquestes fonts de dades és clau per tal d’obtenir una millor comprensió dels processos que les dades estan descrivint. Així, aprendre d’aquestes dades planteja nous reptes a les tècniques tradicionals d’aprenentatge automàtic, les quals no estan dissenyades per tractar fluxos de dades continus i on els conceptes i els nivells de soroll poden variar amb el temps de forma arbitrària. La contribució de la present tesi pren l’eXtended Classifier System (XCS), el LCS d’estil Michigan més estudiat i un dels algoritmes d’aprenentatge automàtic més competents, com el punt de partida. D’aquesta manera els reptes abordats en aquesta tesi són dos: el primer desafiament és la construcció d’un sistema supervisat competent sobre el framework dels LCSs d’estil Michigan que aprèn dels fluxos de dades amb una capacitat de reacció ràpida als canvis de concepte i entrades amb soroll. Com moltes aplicacions científiques i industrials generen grans quantitats de dades sense etiquetar, el segon repte és aplicar les lliçons apreses per continuar amb el disseny de LCSs d’estil Michigan capaços de solucionar problemes online sense assumir una estructura a priori en els dades d’entrada.Los grandes avances en el campo del aprendizaje automático han resultado en el diseño de máquinas capaces de aprender y de extraer información útil y original de la experiencia. Recientemente alguna de estas técnicas de aprendizaje se han aplicado con éxito para resolver problemas del mundo real en ámbitos tecnológicos, médicos, científicos e industriales, los cuales no se podían tratar con técnicas convencionales de análisis ya sea por su complejidad o por el gran volumen de datos a procesar. Dado este éxito inicial, los sistemas de aprendizaje automático se enfrentan actualmente a problemas de complejidad cada vez m ́as elevada, lo que ha resultado en un aumento de la actividad investigadora en sistemas capaces de afrontar nuevos problemas del mundo real de manera eficiente y escalable. Una de las familias más prometedoras dentro del aprendizaje automático son los sistemas clasificadores basados en algoritmos genéticos (LCSs), el funcionamiento de los cuales se inspira en la naturaleza. Los LCSs intentan representar las políticas de actuación de expertos humanos usando conjuntos de reglas que se emplean para escoger las mejores acciones a realizar en todo momento. Así pues estos sistemas aprenden políticas de actuación de manera incremental mientras van adquiriendo experiencia a través de la nueva información que se les va presentando. Los LCSs se han aplicado con éxito en campos tan diversos como en la predicción de cáncer de próstata o en sistemas de soporte de bolsa, entre otros. Además en algunos casos se ha demostrado que los LCSs realizan tareas superando la precisión de expertos humanos. El propósito de la presente tesis es explorar la naturaleza online del aprendizaje empleado por los LCSs de estilo Michigan para la minería de grandes cantidades de datos en forma de flujos continuos de información a alta velocidad y cambiantes en el tiempo. La extracción del conocimiento a partir de estas fuentes de datos es clave para obtener una mejor comprensión de los procesos que se describen. Así, aprender de estos datos plantea nuevos retos a las técnicas tradicionales, las cuales no están diseñadas para tratar flujos de datos continuos y donde los conceptos y los niveles de ruido pueden variar en el tiempo de forma arbitraria. La contribución del la presente tesis toma el eXtended Classifier System (XCS), el LCS de tipo Michigan más estudiado y uno de los sistemas de aprendizaje automático más competentes, como punto de partida. De esta forma los retos abordados en esta tesis son dos: el primer desafío es la construcción de un sistema supervisado competente sobre el framework de los LCSs de estilo Michigan que aprende de flujos de datos con una capacidad de reacción rápida a los cambios de concepto y al ruido. Como muchas aplicaciones científicas e industriales generan grandes volúmenes de datos sin etiquetar, el segundo reto es aplicar las lecciones aprendidas para continuar con el diseño de nuevos LCSs de tipo Michigan capaces de solucionar problemas online sin asumir una estructura a priori en los datos de entrada.Last advances in machine learning have fostered the design of competent algorithms that are able to learn and extract novel and useful information from data. Recently, some of these techniques have been successfully applied to solve real-­‐world problems in distinct technological, scientific and industrial areas; problems that were not possible to handle by the traditional engineering methodology of analysis either for their inherent complexity or by the huge volumes of data involved. Due to the initial success of these pioneers, current machine learning systems are facing problems with higher difficulties that hamper the learning process of such algorithms, promoting the interest of practitioners for designing systems that are able to scalably and efficiently tackle real-­‐world problems. One of the most appealing machine learning paradigms are Learning Classifier Systems (LCSs), and more specifically Michigan-­‐style LCSs, an open framework that combines an apportionment of credit mechanism with a knowledge discovery technique inspired by biological processes to evolve their internal knowledge. In this regard, LCSs mimic human experts by making use of rule lists to choose the best action to a given problem situation, acquiring their knowledge through the experience. LCSs have been applied with relative success to a wide set of real-­‐ world problems such as cancer prediction or business support systems, among many others. Furthermore, on some of these areas LCSs have demonstrated learning capacities that exceed those of human experts for that particular task. The purpose of this thesis is to explore the online learning nature of Michigan-­‐style LCSs for mining large amounts of data in the form of continuous, high speed and time-­‐changing streams of information. Most often, extracting knowledge from these data is key, in order to gain a better understanding of the processes that the data are describing. Learning from these data poses new challenges to traditional machine learning techniques, which are not typically designed to deal with data in which concepts and noise levels may vary over time. The contribution of this thesis takes the extended classifier system (XCS), the most studied Michigan-­‐style LCS and one of the most competent machine learning algorithms, as the starting point. Thus, the challenges addressed in this thesis are twofold: the first challenge is building a competent supervised system based on the guidance of Michigan-­‐style LCSs that learns from data streams with a fast reaction capacity to changes in concept and noisy inputs. As many scientific and industrial applications generate vast amounts of unlabelled data, the second challenge is to apply the lessons learned in the previous issue to continue with the design of unsupervised Michigan-­‐style LCSs that handle online problems without assuming any a priori structure in input data

    Principled design of evolutionary learning sytems for large scale data mining

    Get PDF
    Currently, the data mining and machine learning fields are facing new challenges because of the amount of information that is collected and needs processing. Many sophisticated learning approaches cannot simply cope with large and complex domains, because of the unmanageable execution times or the loss of prediction and generality capacities that occurs when the domains become more complex. Therefore, to cope with the volumes of information of the current realworld problems there is a need to push forward the boundaries of sophisticated data mining techniques. This thesis is focused on improving the efficiency of Evolutionary Learning systems in large scale domains. Specifically the objective of this thesis is improving the efficiency of the Bioinformatic Hierarchical Evolutionary Learning (BioHEL) system, a system designed with the purpose of handling large domains. This is a classifier system that uses an Iterative Rule Learning approach to generate a set of rules one by one using consecutive Genetic Algorithms. This system have shown to be very competitive so far in large and complex domains. In particular, BioHEL has obtained very important results when solving protein structure prediction problems and has won related merits, such as being placed among the best algorithms for this purpose at the Critical Assessment of Techniques for Protein Structure Prediction (CASP) in 2008 and 2010, and winning the bronze medal at the HUMIES Awards for Human-competitive results in 2007. However, there is still a need to analyse this system in a principled way to determine how the current mechanisms work together to solve larger domains and determine the aspects of the system that can be improved towards this aim. To fulfil the objective of this thesis, the work is divided in two parts. In the first part of the thesis exhaustive experimentation was carried out to determine ways in which the system could be improved. From this exhaustive analysis three main weaknesses are pointed out: a) the problem-dependancy of parameters in BioHEL's fitness function, which results in having a system difficult to set up and which requires an extensive preliminary experimentation to determine the adequate values for these parameters; b) the execution time of the learning process, which at the moment does not use any parallelisation techniques and depends on the size of the training sets; and c) the lack of global supervision over the generated solutions which comes from the usage of the Iterative Rule Learning paradigm and produces larger rule sets in which there is no guarantee of minimality or maximal generality. The second part of the thesis is focused on tackling each one of the weaknesses abovementioned to have a system capable of handling larger domains. First a heuristic approach to set parameters within BioHEL's fitness function is developed. Second a new parallel evaluation process that runs on General Purpose Graphic Processing Units was developed. Finally, post-processing operators to tackle the generality and cardinality of the generated solutions are proposed. By means of these enhancements we managed to improve the BioHEL system to reduce both the learning and the preliminary experimentation time, increase the generality of the final solutions and make the system more accessible for end-users. Moreover, as the techniques discussed in this thesis can be easily extended to other Evolutionary Learning systems we consider them important additions to the research in this field towards tackling large scale domains

    Resource management and scalability of the XCSF learning classifier system

    Get PDF
    AbstractIt has been shown many times that the evolutionary online learning XCS classifier system is a robustly generalizing reinforcement learning system, which also yields highly competitive results in data mining applications. The XCSF version of the system is a real-valued function approximation system, which learns piecewise overlapping local linear models to approximate an iteratively sampled function. While the theory on the binary domain side goes as far as showing that XCS can PAC learn a slightly restricted set of k-DNF problems, theory for XCSF is still rather sparse. This paper takes the theory from the XCS side and projects it onto the real-valued XCSF domain. For a set of functions, in which fitness guidance is given, we even show that XCSF scales optimally with respect to the population size, requiring only a constant overhead to ensure that the evolutionary process can locally optimize the evolving structures. Thus, we provide foundations concerning scalability and resource management for XCSF. Furthermore, we reveal dimensions of problem difficulty for XCSF — and local linear learners in general — showing how structural alignment, that is, alignment of XCSF’s solution representation to the problem structure, can reduce the complexity of challenging problems by orders of magnitude
    corecore