Search CORE

167,029 research outputs found

A Preliminary Study on the Use of Fuzzy Rough Set Based Feature Selection for Improving Evolutionary Instance Selection Algorithms

Author: Derrac Joaquín
Herrera Triguero Francisco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In recent years, the increasing interest in fuzzy rough set theory has allowed the definition of novel accurate methods for feature selection. Although their stand-alone application can lead to the construction of high quality classifiers, they can be improved even more if other preprocessing techniques, such as instance selection, are considered. With the aim of enhancing the nearest neighbor classifier, we present a hybrid algorithm for instance and feature selection, where evolutionary search in the instances’ space is combined with a fuzzy rough set based feature selection procedure. The preliminary results, contrasted through nonparametric statistical tests, suggest that our proposal can improve greatly the performance of the preprocessing techniques in isolation.Project TIN2008-06681-C06-01Spanish Ministry of EducationResearch Foundation - Flander

Repositorio Institucional Universidad de Granada

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

Ensemble Learning for Free with Evolutionary Algorithms ?

Author: Gagné Christian
Schoenauer Marc
Sebag Michèle
Tomassini Marco
Publication venue
Publication date: 01/01/2007
Field of study

Evolutionary Learning proceeds by evolving a population of classifiers, from which it generally returns (with some notable exceptions) the single best-of-run classifier as final result. In the meanwhile, Ensemble Learning, one of the most efficient approaches in supervised Machine Learning for the last decade, proceeds by building a population of diverse classifiers. Ensemble Learning with Evolutionary Computation thus receives increasing attention. The Evolutionary Ensemble Learning (EEL) approach presented in this paper features two contributions. First, a new fitness function, inspired by co-evolution and enforcing the classifier diversity, is presented. Further, a new selection criterion based on the classification margin is proposed. This criterion is used to extract the classifier ensemble from the final population only (Off-line) or incrementally along evolution (On-line). Experiments on a set of benchmark problems show that Off-line outperforms single-hypothesis evolutionary learning and state-of-art Boosting and generates smaller classifier ensembles

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Polytechnique

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Author: Banzhaf W.
Bergstra J.
Feurer M.
Hastie T. J.
Snoek J.
Urbanowicz R. J.
Publication venue
Publication date: 19/03/2016
Field of study

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

arXiv.org e-Print Archive

Crossref

Scipedia

A Self-adaptive Multipeak Artificial Immune Genetic Algorithm

Author: Jiang Fei
Li Qingzhao
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2016
Field of study

Genetic algorithm is a global probability search algorithm developed by simulating the biological natural selection and genetic evolution mechanism and it has excellent global search ability, however, in practical applications, premature convergence occurs easily in the genetic algorithm. This paper proposes an self-adaptive multi-peak immune genetic algorithm (SMIGA) and this algorithm integrates immunity thought in the biology immune system into the evolutionary process of genetic algorithm, uses self-adaptive dynamic vaccination and provides a downtime criterion, the selection strategy of immune vaccine and the construction method of immune operators so as to promote the population develop towards the optimization trend and suppress the degeneracy phenomenon in the optimization by using the feature information in a selective and purposive manner. The simulation experiment shows that the method of this paper can better solve the optimization problem of multi-peak functions, realize global optimum search, overcome the prematurity problem of the antibody population and improve the effectiveness and robustness of optimization

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Semantic variation operators for multidimensional genetic programming

Author: Cava William La
Cava William La
Fine Steven B.
James Gareth
McConaghy Trent
Muñoz Luis
Pedregosa Fabian
Silva Sara
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2019
Field of study

Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

arXiv.org e-Print Archive

Crossref

Temporal Feature Selection with Symbolic Regression

Author: Fusting Christopher Winter
Publication venue: UVM ScholarWorks
Publication date: 01/01/2017
Field of study

Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic

ScholarWorks @ UVM

A generic optimising feature extraction method using multiobjective genetic programming

Author: Rockett P.I
Zhang Y.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we present a generic, optimising feature extraction method using multiobjective genetic programming. We re-examine the feature extraction problem and show that effective feature extraction can significantly enhance the performance of pattern recognition systems with simple classifiers. A framework is presented to evolve optimised feature extractors that transform an input pattern space into a decision space in which maximal class separability is obtained. We have applied this method to real world datasets from the UCI Machine Learning and StatLog databases to verify our approach and compare our proposed method with other reported results. We conclude that our algorithm is able to produce classifiers of superior (or equivalent) performance to the conventional classifiers examined, suggesting removal of the need to exhaustively evaluate a large family of conventional classifiers on any new problem. (C) 2010 Elsevier B.V. All rights reserved

White Rose Research Online

Recommended from our members

Mother nature's tolerant ways: why non-genetic inheritance has nothing to do with evolution

Author: Dickins BJA
Dickins TE
Publication venue: 'Elsevier BV'
Publication date: 16/05/2007
Field of study

Recently a number of theorists have suggested that evolution can use non-genetic or environmental inheritance to pass on adaptations (e.g. Mameli, 2004). Furthermore, it has been suggested that nongenetic, or environmental factors, can play a central role in the process of evolution that is not captured by the neo-Darwinian view which places natural selection centre-stage (e.g. Odling-Smee, Laland & Feldman, 2003). In this paper we present and clarify neo-Darwinian theory and then take issue with the notions of contemporary gene-centred selection and inheritance that non-genetic inheritance theorists have used. We claim that they have misunderstood the distinction and relationship between intrinsic and extrinsic inheritance and we clarify this with a number of examples from the behavioural and biological sciences. According to this analysis there is no such thing as biologically independent nongenetic inheritance, all extrinsic inheritance is a consequence of traits and dispositions that are intrinsic to an organism and intrinsic design can only be explained through neo-Darwinism. We point to the implications this view has for current conceptions of cultural evolution

Nottingham Trent Institutional Repository (IRep)