19,114 research outputs found

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

    EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching from Scratch

    Full text link
    Designing the structure of neural networks is considered one of the most challenging tasks in deep learning, especially when there is few prior knowledge about the task domain. In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain. Specifically, we first use primary succession to rapidly evolve a population of poorly initialized neural network structures into a more diverse population, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Extinction is applied in both stages to reduce computational cost. Mimicry is employed during the entire evolution process to help the inferior networks imitate the behavior of a superior network and gene duplication is utilized to duplicate the learned blocks of novel structures, both of which help to find better network structures. Experimental results show that our proposed approach can achieve similar or better performance compared to the existing genetic approaches with dramatically reduced computation cost. For example, the network discovered by our approach on CIFAR-100 dataset achieves 78.1% test accuracy under 120 GPU hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [35].Comment: CVPR 201

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    On-line multiobjective automatic control system generation by evolutionary algorithms

    Get PDF
    Evolutionary algorithms are applied to the on- line generation of servo-motor control systems. In this paper, the evolving population of controllers is evaluated at run-time via hardware in the loop, rather than on a simulated model. Disturbances are also introduced at run-time in order to pro- duce robust performance. Multiobjective optimisation of both PI and Fuzzy Logic controllers is considered. Finally an on-line implementation of Genetic Programming is presented based around the Simulink standard blockset. The on-line designed controllers are shown to be robust to both system noise and ex- ternal disturbances while still demonstrating excellent steady- state and dvnamic characteristics

    Ant colony optimisation and local search for bin-packing and cutting stock problems

    Get PDF
    The Bin Packing Problem and the Cutting Stock Problem are two related classes of NP-hard combinatorial optimization problems. Exact solution methods can only be used for very small instances, so for real-world problems, we have to rely on heuristic methods. In recent years, researchers have started to apply evolutionary approaches to these problems, including Genetic Algorithms and Evolutionary Programming. In the work presented here, we used an ant colony optimization (ACO) approach to solve both Bin Packing and Cutting Stock Problems. We present a pure ACO approach, as well as an ACO approach augmented with a simple but very effective local search algorithm. It is shown that the pure ACO approach can compete with existing evolutionary methods, whereas the hybrid approach can outperform the best-known hybrid evolutionary solution methods for certain problem classes. The hybrid ACO approach is also shown to require different parameter values from the pure ACO approach and to give a more robust performance across different problems with a single set of parameter values. The local search algorithm is also run with random restarts and shown to perform significantly worse than when combined with ACO
    corecore