52,568 research outputs found

    Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

    Full text link
    Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally pre-defined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this paper, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.Comment: 29 pages, accepted by Evolutionary Computation (Journal), MIT Pres

    Multi-region System Modelling by using Genetic Programming to Extract Rule Consequent Functions in a TSK Fuzzy System

    Full text link
    [EN] This paper aims to build a fuzzy system by means of genetic programming, which is used to extract the relevant function for each rule consequent through symbolic regression. The employed TSK fuzzy system is complemented with a variational Bayesian Gaussian mixture clustering method, which identifies the domain partition, simultaneously specifying the number of rules as well as the parameters in the fuzzy sets. The genetic programming approach is accompanied with an orthogonal least square algorithm, to extract robust rule consequent functions for the fuzzy system. The proposed model is validated with a synthetic surface, and then with real data from a gas turbine compressor map case, which is compared with an adaptive neuro-fuzzy inference system model. The results have demonstrated the efficacy of the proposed approach for modelling system with small data or bifurcating dynamics, where the analytical equations are not available, such as those in a typical industrial setting.Research supported by EPSRC Grant EVES (EP/R029741/1).Zhang, Y.; Martínez-García, M.; Serrano, J.; Latimer, A. (2019). Multi-region System Modelling by using Genetic Programming to Extract Rule Consequent Functions in a TSK Fuzzy System. IEEE. 987-992. https://doi.org/10.1109/ICARM.2019.8834163S98799

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

    Regulatory motif discovery using a population clustering evolutionary algorithm

    Get PDF
    This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences
    corecore