31 research outputs found
Can Genetic Programming Do Manifold Learning Too?
Exploratory data analysis is a fundamental aspect of knowledge discovery that
aims to find the main characteristics of a dataset. Dimensionality reduction,
such as manifold learning, is often used to reduce the number of features in a
dataset to a manageable level for human interpretation. Despite this, most
manifold learning techniques do not explain anything about the original
features nor the true characteristics of a dataset. In this paper, we propose a
genetic programming approach to manifold learning called GP-MaL which evolves
functional mappings from a high-dimensional space to a lower dimensional space
through the use of interpretable trees. We show that GP-MaL is competitive with
existing manifold learning algorithms, while producing models that can be
interpreted and re-used on unseen data. A number of promising future directions
of research are found in the process.Comment: 16 pages, accepted in EuroGP '1
Feature Selection via Chaotic Antlion Optimization
Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the
quality of the data training fitting) while minimizing the number of features used.This work was partially supported by the IPROCOM Marie Curie initial training network, funded
through the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework
Programme FP7/2007-2013/ under REA grants agreement No. 316555, and by the Romanian
National Authority for Scientific Research, CNDIUEFISCDI, project number PN-II-PT-PCCA-2011-3.2-
0917. The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript
GEML: A Grammatical Evolution, Machine Learning Approach to Multi-class Classification
In this paper, we propose a hybrid approach to solving multi-class problems which combines evolutionary computation with elements of traditional machine learning. The method, Grammatical Evolution Machine Learning (GEML) adapts machine learning concepts from decision tree learning and clustering methods and integrates these into a Grammatical Evolution framework. We investigate the effectiveness of GEML on several supervised, semi-supervised and unsupervised multi-class problems and demonstrate its competitive performance when compared with several well known machine learning algorithms. The GEML framework evolves human readable solutions which provide an explanation of the logic behind its classification decisions, offering a significant advantage over existing paradigms for unsupervised and semi-supervised learning. In addition we also examine the possibility of improving the performance of the algorithm through the application of several ensemble techniques
Evolving genetic programming classifiers with loop structures
Loop structure is a fundamental flow control in programming languages for repeating certain operations. It is not widely used in Genetic Programming as it introduces extra complexity in the search
Extracting image features for classification by two-tier genetic programming
Image classification is a complex but important task especially in the areas of machine vision and image analysis such as remote sensing and face recognition. One of the challenges in image classification is finding an optimal set of features for a particular task because the choice of features has direct impact on the classification performance. However the goodness of a feature is highly problem dependent and often domain knowledge is required. To address these issues we introduce a Genetic Programming (GP) based image classification method, Two-Tier GP, which directly operates on raw pixels rather than features. The first tier in a classifier is for automatically defining features based on raw image input, while the second tier makes decision. Compared to conventional feature based image classification methods, Two-Tier GP achieved better accuracies on a range of different tasks. Furthermore by using the features defined by the first tier of these Two-Tier GP classifiers, conventional classification methods obtained higher accuracies than classifying on manually designed features. Analysis on evolved Two-Tier image classifiers shows that there are genuine features captured in the programs and the mechanism of achieving high accuracy can be revealed. The Two-Tier GP method has clear advantages in image classification, such as high accuracy, good interpretability and the removal of explicit feature extraction process
Two-Tier genetic programming: towards raw pixel-based image classification
Classifying images is of great importance in machine vision and image analysis applications such as object recognition and face detection. Conventional methods build classifiers based on certain types of image features instead of raw pixels because the dimensionality of raw inputs is often too large. Determining an optimal set of features for a particular task is usually the focus of conventional image classification methods. In this study we propose a Genetic Programming (GP) method by which raw images can be directly fed as the classification inputs. It is named as Two-Tier GP as every classifier evolved by it has two tiers, the other for computing features based on raw pixel input, one for making decisions. Relevant features are expected to be self-constructed by GP along the evolutionary process. This method is compared with feature based image classification by GP and another GP method which also aims to automatically extract image features. Four different classification tasks are used in the comparison, and the results show that the highest accuracies are achieved by Two-Tier GP. Further analysis on the evolved solutions reveals that there are genuine features formulated by the evolved solutions which can classify target images accurately