92 research outputs found

    Document clustering with optimized unsupervised feature selection and centroid allocation

    Get PDF
    An effective document clustering system can significantly improve the tasks of document analysis, grouping, and retrieval. The performance of a document clustering system mainly depends on document preparation and allocation of cluster positions. As achieving optimal document clustering is a combinatorial NP-hard optimization problem, it becomes essential to utilize non-traditional methods to look for optimal or near-optimal solutions. During the allocation of cluster positions or the centroids allocation process, the extra text features that represent keywords in each document have an effect on the clustering results. A large number of features need to be reduced using dimensionality reduction techniques. Feature selection is an important step that can be used to reduce the redundant and inconsistent features. Due to a large number of the potential feature combinations, text feature selection is considered a complicated process. The persistent drawbacks of the current text feature selection methods such as local optima and absence of class labels of features were addressed in this thesis. The supervised and unsupervised feature selection methods were investigated. To address the problems of optimizing the supervised feature selection methods so as to improve document clustering, memetic hybridization between filter and wrapper feature selection, known as Memetic Algorithm Feature Selection, was presented first. In order to deal with the unlabelled features, unsupervised feature selection method was also proposed. The proposed unsupervised feature selection method integrates Simulated Annealing to the global search using Differential Evolution. This combination also aims to combine the advantages of both the wrapper and filter methods in a memetic scheme but on an unsupervised basis. Two versions of this hybridization were proposed. The first was named Differential Evolution Simulated Annealing, which uses the standard mutation of Differential Evolution, and the second was named Dichotomous Differential Evolution Simulated Annealing, which used the dichotomous mutation of the differential evolution. After feature selection two centroid allocation methods were proposed; the first is the combination of Chaotic Logistic Search and Discrete Differential Evolution global search, which was named Differential Evolution Memetic Clustering (DEMC) and the second was based on using the Gradient search using the k-means as a local search with a modified Differential Harmony global Search. The resulting method was named Memetic Differential Harmony Search (MDHS). In order to intensify the exploitation aspect of MDHS, a binomial crossover was used with it. Finally, the improved method is named Crossover Memetic Differential Harmony Search (CMDHS). The test results using the F-measure, Average Distance of Document to Cluster (ADDC) and the nonparametric statistical tests showed the superiority of the CMDHS over the baseline methods, namely the HS, DHS, k-means and the MDHS. The tests also show that CMDHS is better than the DEMC proposed earlier. Finally the proposed CMDHS was compared with two current state-of-the-art methods, namely a Krill Herd (KH) based centroid allocation method and an Artifice Bee Colony (ABC) based method, and found to outperform these two methods in most cases

    Unsupervised text Feature Selection using memetic Dichotomous Differential Evolution

    Get PDF
    Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feature selection could be modelled as an optimization problem due to the large number of possible solutions that might be valid. In this paper, a memetic method that combines Differential Evolution (DE) with Simulated Annealing (SA) for unsupervised FS was proposed. Due to the use of only two values indicating the existence or absence of the feature, a binary version of differential evolution is used. A dichotomous DE was used for the purpose of the binary version, and the proposed method is named Dichotomous Differential Evolution Simulated Annealing (DDESA). This method uses dichotomous mutation instead of using the standard mutation DE to be more effective for binary purposes. The Mean Absolute Distance (MAD) filter was used as the feature subset internal evaluation measure in this paper. The proposed method was compared with other state-of-the-art methods including the standard DE combined with SA, which is named DESA in this paper, using five benchmark datasets. The F-micro, F-macro (F-scores) and Average Distance of Document to Cluster (ADDC) measures were utilized as the evaluation measures. The Reduction Rate (RR) was also used as an evaluation measure. Test results showed that the proposed DDESA outperformed the other tested methods in performing the unsupervised text feature selection

    A Survey of Evolutionary Continuous Dynamic Optimization Over Two Decades:Part B

    Get PDF
    Many real-world optimization problems are dynamic. The field of dynamic optimization deals with such problems where the search space changes over time. In this two-part paper, we present a comprehensive survey of the research in evolutionary dynamic optimization for single-objective unconstrained continuous problems over the last two decades. In Part A of this survey, we propose a new taxonomy for the components of dynamic optimization algorithms, namely, convergence detection, change detection, explicit archiving, diversity control, and population division and management. In comparison to the existing taxonomies, the proposed taxonomy covers some additional important components, such as convergence detection and computational resource allocation. Moreover, we significantly expand and improve the classifications of diversity control and multi-population methods, which are under-represented in the existing taxonomies. We then provide detailed technical descriptions and analysis of different components according to the suggested taxonomy. Part B of this survey provides an indepth analysis of the most commonly used benchmark problems, performance analysis methods, static optimization algorithms used as the optimization components in the dynamic optimization algorithms, and dynamic real-world applications. Finally, several opportunities for future work are pointed out

    Intelligent Leukaemia Diagnosis with Bare-Bones PSO based Feature Optimization

    Get PDF
    In this research, we propose an intelligent decision support system for acute lymphoblastic leukaemia (ALL) diagnosis using microscopic images. Two Bare-bones Particle Swarm Optimization (BBPSO) algorithms are proposed to identify the most significant discriminative characteristics of healthy and blast cells to enable efficient ALL classification. The first BBPSO variant incorporates accelerated chaotic search mechanisms of food chasing and enemy avoidance to diversify the search and mitigate the premature convergence of the original BBPSO algorithm. The second BBPSO variant exhibits both of the abovementioned new search mechanisms in a subswarm-based search. Evaluated with the ALL-IDB2 database, both proposed algorithms achieve superior geometric mean performances of 94.94% and 96.25%, respectively, and outperform other metaheuristic search and related methods significantly for ALL classification

    OBKA-FS: an oppositional-based binary kidney-inspired search algorithm for feature selection

    Get PDF
    Feature selection is a key step when building an automatic classification system. Numerous evolutionary algorithms applied to remove irrelevant features in order to make the classifier perform more accurate. Kidney-inspired search algorithm (KA) is a very modern evolutionary algorithm. The original version of KA performed more effectively compared with other evolutionary algorithms. However, KA was proposed for continuous search spaces. For feature subset selection and many optimization problems such as classification, binary discrete space is required. Moreover, the movement operator of solutions is notably affected by its own best-known solution found up to now, denoted as Sbest. This may be inadequate if Sbest is located near a local optimum as it will direct the search process to a suboptimal solution. In this study, a three-fold improvement in the existing KA is proposed. First, a binary version of the kidney-inspired algorithm (BKA-FS) for feature subset selection is introduced to improve classification accuracy in multi-class classification problems. Second, the proposed BKA-FS is integrated into an oppositional-based initialization method in order to start with good initial solutions. Thus, this improved algorithm denoted as OBKA-FS. Third, a novel movement strategy based on the calculation of mutual information (MI), which gives OBKA-FS the ability to work in a discrete binary environment has been proposed. For evaluation, an experiment was conducted using ten UCI machine learning benchmark instances. Results show that OBKA-FS outperforms the existing state-of-the-art evolutionary algorithms for feature selection. In particular, OBKA-FS obtained better accuracy with same or fewer features and higher dependency with less redundancy. Thus, the results confirm the high performance of the improved kidney-inspired algorithm in solving optimization problems such as feature selection

    A Survey on Evolutionary Computation Approaches to Feature Selection

    Get PDF
    Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.</p

    Evolving machine learning and deep learning models using evolutionary algorithms

    Get PDF
    Despite the great success in data mining, machine learning and deep learning models are yet subject to material obstacles when tackling real-life challenges, such as feature selection, initialization sensitivity, as well as hyperparameter optimization. The prevalence of these obstacles has severely constrained conventional machine learning and deep learning methods from fulfilling their potentials. In this research, three evolving machine learning and one evolving deep learning models are proposed to eliminate above bottlenecks, i.e. improving model initialization, enhancing feature representation, as well as optimizing model configuration, respectively, through hybridization between the advanced evolutionary algorithms and the conventional ML and DL methods. Specifically, two Firefly Algorithm based evolutionary clustering models are proposed to optimize cluster centroids in K-means and overcome initialization sensitivity as well as local stagnation. Secondly, a Particle Swarm Optimization based evolving feature selection model is developed for automatic identification of the most effective feature subset and reduction of feature dimensionality for tackling classification problems. Lastly, a Grey Wolf Optimizer based evolving Convolutional Neural Network-Long Short-Term Memory method is devised for automatic generation of the optimal topological and learning configurations for Convolutional Neural Network-Long Short-Term Memory networks to undertake multivariate time series prediction problems. Moreover, a variety of tailored search strategies are proposed to eliminate the intrinsic limitations embedded in the search mechanisms of the three employed evolutionary algorithms, i.e. the dictation of the global best signal in Particle Swarm Optimization, the constraint of the diagonal movement in Firefly Algorithm, as well as the acute contraction of search territory in Grey Wolf Optimizer, respectively. The remedy strategies include the diversification of guiding signals, the adaptive nonlinear search parameters, the hybrid position updating mechanisms, as well as the enhancement of population leaders. As such, the enhanced Particle Swarm Optimization, Firefly Algorithm, and Grey Wolf Optimizer variants are more likely to attain global optimality on complex search landscapes embedded in data mining problems, owing to the elevated search diversity as well as the achievement of advanced trade-offs between exploration and exploitation

    Improving K-means clustering with enhanced Firefly Algorithms

    Get PDF
    In this research, we propose two variants of the Firefly Algorithm (FA), namely inward intensified exploration FA (IIEFA) and compound intensified exploration FA (CIEFA), for undertaking the obstinate problems of initialization sensitivity and local optima traps of the K-means clustering model. To enhance the capability of both exploitation and exploration, matrix-based search parameters and dispersing mechanisms are incorporated into the two proposed FA models. We first replace the attractiveness coefficient with a randomized control matrix in the IIEFA model to release the FA from the constraints of biological law, as the exploitation capability in the neighbourhood is elevated from a one-dimensional to multi-dimensional search mechanism with enhanced diversity in search scopes, scales, and directions. Besides that, we employ a dispersing mechanism in the second CIEFA model to dispatch fireflies with high similarities to new positions out of the close neighbourhood to perform global exploration. This dispersing mechanism ensures sufficient variance between fireflies in comparison to increase search efficiency. The ALL-IDB2 database, a skin lesion data set, and a total of 15 UCI data sets are employed to evaluate efficiency of the proposed FA models on clustering tasks. The minimum Redundancy Maximum Relevance (mRMR)-based feature selection method is also adopted to reduce feature dimensionality. The empirical results indicate that the proposed FA models demonstrate statistically significant superiority in both distance and performance measures for clustering tasks in comparison with conventional K-means clustering, five classical search methods, and five advanced FA variants

    A review of population-based metaheuristics for large-scale black-box global optimization: Part B

    Get PDF
    This paper is the second part of a two-part survey series on large-scale global optimization. The first part covered two major algorithmic approaches to large-scale optimization, namely decomposition methods and hybridization methods such as memetic algorithms and local search. In this part we focus on sampling and variation operators, approximation and surrogate modeling, initialization methods, and parallelization. We also cover a range of problem areas in relation to large-scale global optimization, such as multi-objective optimization, constraint handling, overlapping components, the component imbalance issue, and benchmarks, and applications. The paper also includes a discussion on pitfalls and challenges of current research and identifies several potential areas of future research

    Hybrid harmony search algorithm for continuous optimization problems

    Get PDF
    Harmony Search (HS) algorithm has been extensively adopted in the literature to address optimization problems in many different fields, such as industrial design, civil engineering, electrical and mechanical engineering problems. In order to ensure its search performance, HS requires extensive tuning of its four parameters control namely harmony memory size (HMS), harmony memory consideration rate (HMCR), pitch adjustment rate (PAR), and bandwidth (BW). However, tuning process is often cumbersome and is problem dependent. Furthermore, there is no one size fits all problems. Additionally, despite many useful works, HS and its variant still suffer from weak exploitation which can lead to poor convergence problem. Addressing these aforementioned issues, this thesis proposes to augment HS with adaptive tuning using Grey Wolf Optimizer (GWO). Meanwhile, to enhance its exploitation, this thesis also proposes to adopt a new variant of the opposition-based learning technique (OBL). Taken together, the proposed hybrid algorithm, called IHS-GWO, aims to address continuous optimization problems. The IHS-GWO is evaluated using two standard benchmarking sets and two real-world optimization problems. The first benchmarking set consists of 24 classical benchmark unimodal and multimodal functions whilst the second benchmark set contains 30 state-of-the-art benchmark functions from the Congress on Evolutionary Computation (CEC). The two real-world optimization problems involved the three-bar truss and spring design. Statistical analysis using Wilcoxon rank-sum and Friedman of IHS-GWO’s results with recent HS variants and other metaheuristic demonstrate superior performance
    corecore