12 research outputs found

    Scaling Genetic Algorithms to Large Distributed Datasets

    Get PDF
    Analysing large-scale data brings promises of new levels of scientific discovery and economic value. However, the fact that such volume of data is by its nature distributed and the need for new computational methods to be effective in the face of significant changes in data complexity and size has led to the need to develop large-scale data analytics. Genetic algorithms (GAs) have proven their flexibility in many application areas, and substantial research has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts, we reject approaches based on the centralisation of data in the main memory of a single node or requiring remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines. In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models. We adopt the two models to distribute BioHEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. We investigate the effect of GA control parameters (population size and migration frequency).We study the accuracy, time performance and scalability of the proposed models. Our results show that our distributed genetic algorithm design provides a good tradeoff between accuracy and time. We then extend the two models using automatic termination and population sizing to enhance the distributed genetic algorithm ease-of-use. Moreover, after testing this strategy on both models, we show that the applied automation offers a promising enhancement on the performance of the initially designed GA models

    Scaling Genetic Algorithms to Large Distributed Datasets

    Get PDF
    Analysing large-scale data brings promises of new levels of scientific discovery and economic value. However, the fact that such a volume of data is by its nature distributed and the need for new computational methods to be effective in the face of significant changes in data complexity and size has led to the need to develop large-scale data analytics. Genetic algorithms (GAs) have proven their flexibility in many application areas, and substantial research has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts, we reject approaches based on the centralisation of data in the main memory of a single node or requiring remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines. In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models. We adopt the two models to distribute BioHEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. We investigate the effect of GA control parameters (population size and migration frequency). We study the accuracy, time performance and scalability of the proposed models. Our results show that our distributed genetic algorithm design provides a good tradeoff between accuracy and time

    Design and Investigation of a Multi Agent Based XCS Learning Classifier System with Distributed Rules

    Get PDF
    This thesis has introduced and investigated a new kind of rule-based evolutionary online learning system. It addressed the problem of distributing the knowledge of a Learning Classifier System, that is represented by a population of classifiers. The result is a XCS-derived Learning Classifier System 'XCS with Distributed Rules' (XCS-DR) that introduces independent, interacting agents to distribute the system's acquired knowledge evenly. The agents act collaboratively to solve problem instances at hand. XCS-DR's design and architecture have been explained and its classification performance has been evaluated and scrutinized in detail in this thesis. While not reaching optimal performance, compared to the original XCS, it could be shown that XCS-DR still yields satisfactory classification results. It could be shown that in the simple case of applying only one agent, the introduced system performs as accurately as XCS

    Controlled self-organisation using learning classifier systems

    Get PDF
    The complexity of technical systems increases, breakdowns occur quite often. The mission of organic computing is to tame these challenges by providing degrees of freedom for self-organised behaviour. To achieve these goals, new methods have to be developed. The proposed observer/controller architecture constitutes one way to achieve controlled self-organisation. To improve its design, multi-agent scenarios are investigated. Especially, learning using learning classifier systems is addressed

    Controlled self-organisation using learning classifier systems

    Get PDF
    The complexity of technical systems increases, breakdowns occur quite often. The mission of organic computing is to tame these challenges by providing degrees of freedom for self-organised behaviour. To achieve these goals, new methods have to be developed. The proposed observer/controller architecture constitutes one way to achieve controlled self-organisation. To improve its design, multi-agent scenarios are investigated. Especially, learning using learning classifier systems is addressed

    Databook for human factors engineers. Volume 2 - Common formulas, metrics, definitions

    Get PDF
    Human factors engineering manual including mathematical formulas, nomographs, conversion tables, units of measurement, and nomenclature

    Journal of Telecommunications and Information Technology, 2006, nr 4

    Get PDF
    kwartalni

    Proceedings of the 6th International Symposium on the Mediterranean Pig. October 11 – 13, 2007. Messina - Capo d’Orlando (ME), Italy

    Get PDF
    These proceedings publish 79 communications that were distributed in six sessions and in one conference at the 6th Symposium on the Mediterranean Pig, both as main lectures, oral and poster presentation. The major arguments treated are the improvement and the management of the genetic resources, the sanitary approaches in the outdoor systems, the feeding and the rearing techniques, the quality of meat and meat products, the traceability for typical products and their socio-economical dynamics. Particular attention is given to the pig's local breeds and to their meat products, highlighting the importance to preserve the biodiversity as well as the typicality of some unique pork products. The monitoring of pig parasitic diseases is examined as well as the non conventional rearing systems used for typical pig breeds and their effects on the pork quality. It is highlighted also the importance of the products traceability and the need to better understand the purchasing dynamics of typical pork products

    Principled design of evolutionary learning sytems for large scale data mining

    Get PDF
    Currently, the data mining and machine learning fields are facing new challenges because of the amount of information that is collected and needs processing. Many sophisticated learning approaches cannot simply cope with large and complex domains, because of the unmanageable execution times or the loss of prediction and generality capacities that occurs when the domains become more complex. Therefore, to cope with the volumes of information of the current realworld problems there is a need to push forward the boundaries of sophisticated data mining techniques. This thesis is focused on improving the efficiency of Evolutionary Learning systems in large scale domains. Specifically the objective of this thesis is improving the efficiency of the Bioinformatic Hierarchical Evolutionary Learning (BioHEL) system, a system designed with the purpose of handling large domains. This is a classifier system that uses an Iterative Rule Learning approach to generate a set of rules one by one using consecutive Genetic Algorithms. This system have shown to be very competitive so far in large and complex domains. In particular, BioHEL has obtained very important results when solving protein structure prediction problems and has won related merits, such as being placed among the best algorithms for this purpose at the Critical Assessment of Techniques for Protein Structure Prediction (CASP) in 2008 and 2010, and winning the bronze medal at the HUMIES Awards for Human-competitive results in 2007. However, there is still a need to analyse this system in a principled way to determine how the current mechanisms work together to solve larger domains and determine the aspects of the system that can be improved towards this aim. To fulfil the objective of this thesis, the work is divided in two parts. In the first part of the thesis exhaustive experimentation was carried out to determine ways in which the system could be improved. From this exhaustive analysis three main weaknesses are pointed out: a) the problem-dependancy of parameters in BioHEL's fitness function, which results in having a system difficult to set up and which requires an extensive preliminary experimentation to determine the adequate values for these parameters; b) the execution time of the learning process, which at the moment does not use any parallelisation techniques and depends on the size of the training sets; and c) the lack of global supervision over the generated solutions which comes from the usage of the Iterative Rule Learning paradigm and produces larger rule sets in which there is no guarantee of minimality or maximal generality. The second part of the thesis is focused on tackling each one of the weaknesses abovementioned to have a system capable of handling larger domains. First a heuristic approach to set parameters within BioHEL's fitness function is developed. Second a new parallel evaluation process that runs on General Purpose Graphic Processing Units was developed. Finally, post-processing operators to tackle the generality and cardinality of the generated solutions are proposed. By means of these enhancements we managed to improve the BioHEL system to reduce both the learning and the preliminary experimentation time, increase the generality of the final solutions and make the system more accessible for end-users. Moreover, as the techniques discussed in this thesis can be easily extended to other Evolutionary Learning systems we consider them important additions to the research in this field towards tackling large scale domains
    corecore