4 research outputs found

    Ant-based sorting and ACO-based clustering approaches: A review

    Get PDF
    Data clustering is used in a number of fields including statistics, bioinformatics, machine learning exploratory data analysis, image segmentation, security, medical image analysis, web handling and mathematical programming.Its role is to group data into clusters with high similarity within clusters and with high dissimilarity between clusters.This paper reviews the problems that affect clustering performance for deterministic clustering and stochastic clustering approaches.In deterministic clustering, the problems are caused by sensitivity to the number of provided clusters.In stochastic clustering, problems are caused either by the absence of an optimal number of clusters or by the projection of data.The review is focused on ant-based sorting and ACO-based clustering which have problems of slow convergence, un-robust results and local optima solution.The results from this review can be used as a guide for researchers working in the area of data clustering as it shows the strengths and weaknesses of using both clustering approaches

    Balancing exploration and exploitation in ACS algorithms for data clustering

    Get PDF
    Ant colony optimization (ACO) is a swarm algorithm inspired by different behaviors of ants. The algorithm minimizes deterministic imperfections by assuming the clustering problem as an optimization problem. A balanced exploration and exploitation activity is necessary to produce optimal results. ACO for clustering (ACOC) is an ant colony system (ACS) algorithm inspired by the foraging behavior of ants for clustering tasks. The ACOC performs clustering based on random initial centroids, which are generated iteratively during the algorithm run. This makes the algorithm deviate from the clustering solution and performs a biased exploration. This study proposes a modified ACOC called the population ACOC (P-ACOC) to address this issue. The proposed P-ACOC allows the ants to process and update their own centroid during the algorithm run, thereby intensifying the search at the neighborhood before moving to another location.However, the algorithm quickly produces a premature convergence due to the exploitation of the same clustering results during centroid update. To resolve this issue, this study proposes a second modification by adding a restart strategy that balances between the exploration and exploitation strategy in P-ACOC.Each time the algorithm begins to converge with the same clustering solution, the restart strategy is performed to change the behavior of the algorithm from exploitation to exploration. The performance of the proposed algorithm is compared with that of several common clustering algorithms using real-world datasets. The results show that the accuracy of the proposed algorithm surpasses those of other algorithms

    Document clustering with optimized unsupervised feature selection and centroid allocation

    Get PDF
    An effective document clustering system can significantly improve the tasks of document analysis, grouping, and retrieval. The performance of a document clustering system mainly depends on document preparation and allocation of cluster positions. As achieving optimal document clustering is a combinatorial NP-hard optimization problem, it becomes essential to utilize non-traditional methods to look for optimal or near-optimal solutions. During the allocation of cluster positions or the centroids allocation process, the extra text features that represent keywords in each document have an effect on the clustering results. A large number of features need to be reduced using dimensionality reduction techniques. Feature selection is an important step that can be used to reduce the redundant and inconsistent features. Due to a large number of the potential feature combinations, text feature selection is considered a complicated process. The persistent drawbacks of the current text feature selection methods such as local optima and absence of class labels of features were addressed in this thesis. The supervised and unsupervised feature selection methods were investigated. To address the problems of optimizing the supervised feature selection methods so as to improve document clustering, memetic hybridization between filter and wrapper feature selection, known as Memetic Algorithm Feature Selection, was presented first. In order to deal with the unlabelled features, unsupervised feature selection method was also proposed. The proposed unsupervised feature selection method integrates Simulated Annealing to the global search using Differential Evolution. This combination also aims to combine the advantages of both the wrapper and filter methods in a memetic scheme but on an unsupervised basis. Two versions of this hybridization were proposed. The first was named Differential Evolution Simulated Annealing, which uses the standard mutation of Differential Evolution, and the second was named Dichotomous Differential Evolution Simulated Annealing, which used the dichotomous mutation of the differential evolution. After feature selection two centroid allocation methods were proposed; the first is the combination of Chaotic Logistic Search and Discrete Differential Evolution global search, which was named Differential Evolution Memetic Clustering (DEMC) and the second was based on using the Gradient search using the k-means as a local search with a modified Differential Harmony global Search. The resulting method was named Memetic Differential Harmony Search (MDHS). In order to intensify the exploitation aspect of MDHS, a binomial crossover was used with it. Finally, the improved method is named Crossover Memetic Differential Harmony Search (CMDHS). The test results using the F-measure, Average Distance of Document to Cluster (ADDC) and the nonparametric statistical tests showed the superiority of the CMDHS over the baseline methods, namely the HS, DHS, k-means and the MDHS. The tests also show that CMDHS is better than the DEMC proposed earlier. Finally the proposed CMDHS was compared with two current state-of-the-art methods, namely a Krill Herd (KH) based centroid allocation method and an Artifice Bee Colony (ABC) based method, and found to outperform these two methods in most cases
    corecore