78 research outputs found

    Surrogate-assisted Genetic Programming with Simplified Models for Automated Design of Dispatching Rules

    No full text
    © 2013 IEEE. Automated design of dispatching rules for production systems has been an interesting research topic over the last several years. Machine learning, especially genetic programming (GP), has been a powerful approach to dealing with this design problem. However, intensive computational requirements, accuracy and interpretability are still its limitations. This paper aims at developing a new surrogate assisted GP to help improving the quality of the evolved rules without significant computational costs. The experiments have verified the effectiveness and efficiency of the proposed algorithms as compared to those in the literature. Furthermore, new simplification and visualisation approaches have also been developed to improve the interpretability of the evolved rules. These approaches have shown great potentials and proved to be a critical part of the automated design system

    GPGC: Genetic programming for automatic clustering using a flexible non-hyper-spherical graph-based approach

    No full text
    © 2017 ACM. Genetic programming (GP) has been shown to be very effective for performing data mining tasks. Despite this, it has seen relatively little use in clustering. In this work, we introduce a new GP approach for performing graph-based (GPGC) non-hyper-spherical clustering where the number of clusters is not required to be set in advance. The proposed GPGC approach is compared with a number of well known methods on a large number of data sets with a wide variety of shapes and sizes. Our results show that GPGC is the most generalisable of the tested methods, achieving good performance across all datasets. GPGC significantly outperforms all existing methods on the hardest ellipsoidal datasets, without needing the user to pre-define the number of clusters. To our knowledge, this is the first work which proposes using GP for graph-based clustering

    An automatic region detection and processing approach in genetic programming for binary image classification

    No full text
    © 2017 IEEE. In image classification, region detection is an effective approach to reducing the dimensionality of the image data but requires human intervention. Genetic Programming (GP) as an evolutionary computation technique can automatically identify important regions, and conduct feature extraction, feature construction and classification simultaneously. In this paper, an automatic region detection and processing approach in GP (GP-RDP) method is proposed for image classification. This approach is able to evolve important image operators to deal with detected regions for facilitating feature extraction and construction. To evaluate the performance of the proposed method, five recent GP methods and seven non-GP methods based on three types of image features are used for comparison on four image data sets. The results reveal that the proposed method can achieve comparable performance on easy data sets and significantly better performance on difficult data sets than the other comparable methods. To further demonstrate the interpretability and understandability of the proposed method, two evolved programs are analysed. The analysis shows the good interpretability of the GP-RDP method and proves that the GP-RDP method is able to identify prominent regions, evolve effective image operators to process these regions, extract and construct good features for efficient image classification

    A Tri-objective Method for Bi-objective Feature Selection in Classification

    No full text
    Minimizing the number of selected features and maximizing the classification performance are two main objectives in feature selection, which can be formulated as a biobjective optimization problem. Due to the complex interactions between features, a solution (i.e., feature subset) with poor objective values does not mean that all the features it selects are useless, as some of them combined with other complementary features can greatly improve the classification performance. Thus, it is necessary to consider not only the performance of feature subsets in the objective space, but also their differences in the search space, to explore more promising feature combinations. To this end, this paper proposes a tri-objective method for bi-objective feature selection in classification, which solves a bi-objective feature selection problem as a triobjective problem by considering the diversity (differences) between feature subsets in the search space as the third objective. The selection based on the converted triobjective method can maintain a balance between minimizing the number of selected features, maximizing the classification performance, and exploring more promising feature subsets. Furthermore, a novel initialization strategy and an offspring reproduction operator are proposed to promote the diversity of feature subsets in the objective space and improve the search ability, respectively. The proposed algorithm is compared with five multi-objective-based feature selection methods, six typical feature selection methods, and two peer methods with diversity as a helper objective. Experimental results on 20 real-world classification datasets suggest that the proposed method outperforms the compared methods in most scenarios

    Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression

    No full text
    Model complexity has a close relationship with the generalization ability and the interpretability of the learned models. Simple models are more likely to generalize well and easy to interpret. However, too much emphasis on minimizing complexity can prevent the discovery of more complex yet more accurate solutions. Genetic programming (GP) has a trend of generating overcomplex models that are difficult to interpret while not being able to generalize well. This work proposes a novel complexity measure based on the Rademacher complexity for GP for symbolic regression. The complexity of an evolved model is measured by the maximum correlation between the model and the Rademacher variables on the selected training instances. Taking minimizing the training error and the Rademacher complexity of the models as the two objectives, the proposed GP method has shown to be much superior to the standard GP on generalization performance. Compared with GP equipped with two state-of-the-art complexity measures, the proposed method still has a notable advance on generating a better front consisting of individuals with lower generalization errors and being simpler in the behavioral complexity. Further analyses reveal that compared with the state-of-the-art methods, the proposed GP method evolves models that are much closer to the target models in the model structure, and have better interpretability. © 2020 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works </p

    Confidence-based Ant Colony Optimization for Capacitated Electric Vehicle Routing Problem with Comparison of Different Encoding Schemes

    No full text
    The blossoming of electric vehicles gives rise to a new vehicle routing problem called capacitated electric vehicle routing problem. Since charging is not as convenient as refueling, both the service of customers and the recharging of vehicles should be considered. In this paper, we propose a confidence-based bi-level ant colony optimization algorithm to solve the problem. It divides the whole problem into the upper-level sub-problem capacitated vehicle routing problem and the lower-level sub-problem fixed routing vehicle charging problem. For the upper-level sub-problem, an ant colony optimization algorithm is used to generate customer service sequence. Both the direct encoding scheme and the order-first split-second encoding scheme are implemented to make a guideline of their applicable scenes. For the lower-level sub-problem, a new heuristic called simple enumeration is proposed to generate recharging schedules for vehicles. Between the two sub-problems, a confidence-based selection method is proposed to select promising customer service sequence to conduct local search and lower-level optimization. By setting adaptive confidence thresholds, the inferior service sequences that have little chance to become the iteration best are eliminated during the execution. Experiments show that the proposed algorithm has reached the state-of-the-art level and updated eight best known solutions of the benchmark

    Particle swarm optimisation representations for simultaneous clustering and feature selection

    No full text
    © 2016 IEEE. Clustering, the process of grouping unlabelled data, is an important task in data analysis. It is regarded as one of the most difficult tasks due to the large search space that must be explored. Feature selection is commonly used to reduce the size of a search space, and evolutionary computation (EC) is a group of techniques which are known to give good solutions to difficult problems such as clustering or feature selection. However, there has been relatively little work done on simultaneous clustering and feature selection using EC methods. In this paper we compare medoid and centroid representations that allow particle swarm optimisation (PSO) to perform simultaneous clustering and feature selection. We propose several new techniques which improve clustering performance and ensure valid solutions are generated. Experiments are conducted on a variety of real-world and synthetic datasets in order to analyse the effectiveness of the PSO representations across several different criteria. We show that a medoid representation can achieve superior results compared to the widely used centroid representation

    Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression

    No full text
    When learning from high-dimensional data for symbolic regression (SR), genetic programming (GP) typically could not generalize well. Feature selection, as a data preprocessing method, can potentially contribute not only to improving the efficiency of learning algorithms but also to enhancing the generalization ability. However, in GP for high-dimensional SR, feature selection before learning is seldom considered. In this paper, we propose a new feature selection method based on permutation to select features for high-dimensional SR using GP. A set of experiments has been conducted to investigate the performance of the proposed method on the generalization of GP for high-dimensional SR. The regression results confirm the superior performance of the proposed method over the other examined feature selection methods. Further analysis indicates that the models evolved by the proposed method are more likely to contain only the truly relevant features and have better interpretability. © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Using particle swarm optimisation and the silhouette metric to estimate the number of clusters, select features, and perform clustering

    No full text
    © Springer International Publishing AG 2017. One of the most difficult problems in clustering, the task of grouping similar instances in a dataset, is automatically determining the number of clusters that should be created. When a dataset has a large number of attributes (features), this task becomes even more difficult due to the relationship between the number of features and the number of clusters produced. One method of addressing this is feature selection, the process of selecting a subset of features to be used. Evolutionary computation techniques have been used very effectively for solving clustering problems, but have seen little use for simultaneously performing the three tasks of clustering, feature selection, and determining the number of clusters. Furthermore, only a small number of existing methods exist, but they have a number of limitations that affect their performance and scalability. In this work, we introduce a number of novel techniques for improving the performance of these three tasks using particle swarm optimisation and statistical techniques. We conduct a series of experiments across a range of datasets with clustering problems of varying difficulty. The results show our proposed methods achieve significantly better clustering performance than existing methods, while only using a small number of features and automatically determining the number of clusters more accurately

    Genetic Programming for Automatic Global and Local Feature Extraction to Image Classification

    No full text
    © 2018 IEEE. Feature extraction is an essential process to image classification. Existing feature extraction methods can extract important and discriminative image features but often require domain expert and human intervention. Genetic Programming (GP) can automatically extract features which are more adaptive to different image classification tasks. However, the majority GP-based methods only extract relatively simple features of one type i.e. local or global, which are not effective and efficient for complex image classification. In this paper, a new GP method (GP-GLF) is proposed to achieve automatically and simultaneously global and local feature extraction to image classification. To extract discriminative image features, several effective and well-known feature extraction methods, such as HOG, SIFT and LBP, are employed as GP functions in global and local scenarios. A novel program structure is developed to allow GP-GLF to evolve descriptors that can synthesise feature vectors from the input image and the automatically detected regions using these functions. The performance of the proposed method is evaluated on four different image classification data sets of varying difficulty and compared with seven GP based methods and a set of non-GP methods. Experimental results show that the proposed method achieves significantly better or similar performance than almost all the peer methods. Further analysis on the evolved programs shows the good interpretability of the GP-GLF method
    corecore