19,114 research outputs found
Automating biomedical data science through tree-based pipeline optimization
Over the past decade, data science and machine learning has grown from a
mysterious art form to a staple tool across a variety of fields in academia,
business, and government. In this paper, we introduce the concept of tree-based
pipeline optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement a Tree-based Pipeline Optimization
Tool (TPOT) and demonstrate its effectiveness on a series of simulated and
real-world genetic data sets. In particular, we show that TPOT can build
machine learning pipelines that achieve competitive classification accuracy and
discover novel pipeline operators---such as synthetic feature
constructors---that significantly improve classification accuracy on these data
sets. We also highlight the current challenges to pipeline optimization, such
as the tendency to produce pipelines that overfit the data, and suggest future
research paths to overcome these challenges. As such, this work represents an
early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding
EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching from Scratch
Designing the structure of neural networks is considered one of the most
challenging tasks in deep learning, especially when there is few prior
knowledge about the task domain. In this paper, we propose an
Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of
succession, extinction, mimicry, and gene duplication to search neural network
structure from scratch with poorly initialized simple network and few
constraints forced during the evolution, as we assume no prior knowledge about
the task domain. Specifically, we first use primary succession to rapidly
evolve a population of poorly initialized neural network structures into a more
diverse population, followed by a secondary succession stage for fine-grained
searching based on the networks from the primary succession. Extinction is
applied in both stages to reduce computational cost. Mimicry is employed during
the entire evolution process to help the inferior networks imitate the behavior
of a superior network and gene duplication is utilized to duplicate the learned
blocks of novel structures, both of which help to find better network
structures. Experimental results show that our proposed approach can achieve
similar or better performance compared to the existing genetic approaches with
dramatically reduced computation cost. For example, the network discovered by
our approach on CIFAR-100 dataset achieves 78.1% test accuracy under 120 GPU
hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [35].Comment: CVPR 201
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
On-line multiobjective automatic control system generation by evolutionary algorithms
Evolutionary algorithms are applied to the on- line generation of servo-motor control systems. In this paper, the evolving population of controllers is evaluated at run-time via hardware in the loop, rather than on a simulated model. Disturbances are also introduced at run-time in order to pro- duce robust performance. Multiobjective optimisation of both PI and Fuzzy Logic controllers is considered. Finally an on-line implementation of Genetic Programming is presented based around the Simulink standard blockset. The on-line designed controllers are shown to be robust to both system noise and ex- ternal disturbances while still demonstrating excellent steady- state and dvnamic characteristics
Ant colony optimisation and local search for bin-packing and cutting stock problems
The Bin Packing Problem and the Cutting Stock Problem are two related classes of NP-hard combinatorial optimization problems. Exact solution methods can only be used for very small instances, so for real-world problems, we have to rely on heuristic methods. In recent years, researchers have started to apply evolutionary approaches to these problems, including Genetic Algorithms and Evolutionary Programming. In the work presented here, we used an ant colony optimization (ACO) approach to solve both Bin Packing and Cutting Stock Problems. We present a pure ACO approach, as well as an ACO approach augmented with a simple but very effective local search algorithm. It is shown that the pure ACO approach can compete with existing evolutionary methods, whereas the hybrid approach can outperform the best-known hybrid evolutionary solution methods for certain problem classes. The hybrid ACO approach is also shown to require different parameter values from the pure ACO approach and to give a more robust performance across different problems with a single set of parameter values. The local search algorithm is also run with random restarts and shown to perform significantly worse than when combined with ACO
- …