4,626 research outputs found

    Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

    Full text link
    Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally pre-defined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this paper, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.Comment: 29 pages, accepted by Evolutionary Computation (Journal), MIT Pres

    A System for Accessible Artificial Intelligence

    Full text link
    While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 worksho

    Application of multiobjective genetic programming to the design of robot failure recognition systems

    Get PDF
    We present an evolutionary approach using multiobjective genetic programming (MOGP) to derive optimal feature extraction preprocessing stages for robot failure detection. This data-driven machine learning method is compared both with conventional (nonevolutionary) classifiers and a set of domain-dependent feature extraction methods. We conclude MOGP is an effective and practical design method for failure recognition systems with enhanced recognition accuracy over conventional classifiers, independent of domain knowledge

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    Generating Redundant Features with Unsupervised Multi-Tree Genetic Programming

    Full text link
    Recently, feature selection has become an increasingly important area of research due to the surge in high-dimensional datasets in all areas of modern life. A plethora of feature selection algorithms have been proposed, but it is difficult to truly analyse the quality of a given algorithm. Ideally, an algorithm would be evaluated by measuring how well it removes known bad features. Acquiring datasets with such features is inherently difficult, and so a common technique is to add synthetic bad features to an existing dataset. While adding noisy features is an easy task, it is very difficult to automatically add complex, redundant features. This work proposes one of the first approaches to generating redundant features, using a novel genetic programming approach. Initial experiments show that our proposed method can automatically create difficult, redundant features which have the potential to be used for creating high-quality feature selection benchmark datasets.Comment: 16 pages, preprint for EuroGP '1

    Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges

    Get PDF
    A variety of methods have been applied to the architectural configuration and learning or training of artificial deep neural networks (DNN). These methods play a crucial role in the success or failure of the DNN for most problems and applications. Evolutionary Algorithms (EAs) are gaining momentum as a computationally feasible method for the automated optimisation and training of DNNs. Neuroevolution is a term which describes these processes of automated configuration and training of DNNs using EAs. While many works exist in the literature, no comprehensive surveys currently exist focusing exclusively on the strengths and limitations of using neuroevolution approaches in DNNs. Prolonged absence of such surveys can lead to a disjointed and fragmented field preventing DNNs researchers potentially adopting neuroevolutionary methods in their own research, resulting in lost opportunities for improving performance and wider application within real-world deep learning problems. This paper presents a comprehensive survey, discussion and evaluation of the state-of-the-art works on using EAs for architectural configuration and training of DNNs. Based on this survey, the paper highlights the most pertinent current issues and challenges in neuroevolution and identifies multiple promising future research directions.Comment: 20 pages (double column), 2 figures, 3 tables, 157 reference

    Optimizing Neural Architecture Search using Limited GPU Time in a Dynamic Search Space: A Gene Expression Programming Approach

    Full text link
    Efficient identification of people and objects, segmentation of regions of interest and extraction of relevant data in images, texts, audios and videos are evolving considerably in these past years, which deep learning methods, combined with recent improvements in computational resources, contributed greatly for this achievement. Although its outstanding potential, development of efficient architectures and modules requires expert knowledge and amount of resource time available. In this paper, we propose an evolutionary-based neural architecture search approach for efficient discovery of convolutional models in a dynamic search space, within only 24 GPU hours. With its efficient search environment and phenotype representation, Gene Expression Programming is adapted for network's cell generation. Despite having limited GPU resource time and broad search space, our proposal achieved similar state-of-the-art to manually-designed convolutional networks and also NAS-generated ones, even beating similar constrained evolutionary-based NAS works. The best cells in different runs achieved stable results, with a mean error of 2.82% in CIFAR-10 dataset (which the best model achieved an error of 2.67%) and 18.83% for CIFAR-100 (best model with 18.16%). For ImageNet in the mobile setting, our best model achieved top-1 and top-5 errors of 29.51% and 10.37%, respectively. Although evolutionary-based NAS works were reported to require a considerable amount of GPU time for architecture search, our approach obtained promising results in little time, encouraging further experiments in evolutionary-based NAS, for search and network representation improvements.Comment: Accepted for presentation at the IEEE Congress on Evolutionary Computation (IEEE CEC) 202

    A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks

    Full text link
    Image classification is a difficult machine learning task, where Convolutional Neural Networks (CNNs) have been applied for over 20 years in order to solve the problem. In recent years, instead of the traditional way of only connecting the current layer with its next layer, shortcut connections have been proposed to connect the current layer with its forward layers apart from its next layer, which has been proved to be able to facilitate the training process of deep CNNs. However, there are various ways to build the shortcut connections, it is hard to manually design the best shortcut connections when solving a particular problem, especially given the design of the network architecture is already very challenging. In this paper, a hybrid evolutionary computation (EC) method is proposed to \textit{automatically} evolve both the architecture of deep CNNs and the shortcut connections. Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs. The proposed algorithm is evaluated on three widely used benchmark datasets of image classification and compared with 12 peer Non-EC based competitors and one EC based competitor. The experimental results demonstrate that the proposed method outperforms all of the peer competitors in terms of classification accuracy

    A Genetic Programming Approach for Computer Vision: Classifying High-level Image Features from Convolutional Layers with an Evolutionary Algorithm

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceComputer Vision is a sub-field of Artificial Intelligence that provides a visual perception component to computers, mimicking human intelligence. One of its tasks is image classification and Convolutional Neural Networks (CNNs) have been the most implemented algorithm in the last few years, with few changes made to the fully-connected layer of those neural networks. Nonetheless, recent research has been showing their accuracy could be improved in certain cases by implementing other algorithms for the classification of high-level image features from convolutional layers. Thus, the main research question for this document is: To what extent does the substitution of the fully-connected layer in Convolutional Neural Networks for an evolutionary algorithm affect the performance of those CNN models? The proposed two-step approach in this study does the classification of high-level image features with a state-of-the-art GP-based algorithm for multiclass classification called M4GP. This is conducted using secondary data with different characteristics, to better benchmark the implementation and to carefully investigate different outcomes. Results indicate the new learning approach yielded similar performance in the dataset with a low number of output classes. However, none of the M4GP models was able to surpass the results of the fully-connected layers in terms of test accuracy. Even so, this might be an interesting route if one has a powerful computer and needs a very light classifier in terms of model size. The results help to understand in which situation it might be beneficial to perform a similar experimental setup, either in the context of a work project or concerning a novel research topic
    • …
    corecore