13 research outputs found

    Unsupervised text Feature Selection using memetic Dichotomous Differential Evolution

    Get PDF
    Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feature selection could be modelled as an optimization problem due to the large number of possible solutions that might be valid. In this paper, a memetic method that combines Differential Evolution (DE) with Simulated Annealing (SA) for unsupervised FS was proposed. Due to the use of only two values indicating the existence or absence of the feature, a binary version of differential evolution is used. A dichotomous DE was used for the purpose of the binary version, and the proposed method is named Dichotomous Differential Evolution Simulated Annealing (DDESA). This method uses dichotomous mutation instead of using the standard mutation DE to be more effective for binary purposes. The Mean Absolute Distance (MAD) filter was used as the feature subset internal evaluation measure in this paper. The proposed method was compared with other state-of-the-art methods including the standard DE combined with SA, which is named DESA in this paper, using five benchmark datasets. The F-micro, F-macro (F-scores) and Average Distance of Document to Cluster (ADDC) measures were utilized as the evaluation measures. The Reduction Rate (RR) was also used as an evaluation measure. Test results showed that the proposed DDESA outperformed the other tested methods in performing the unsupervised text feature selection

    An enhanced krill herd optimization technique used for classification problem

    Get PDF
    In this paper, this method is intended to improve the optimization of the classification problem in machine learning. The EKH as a global search optimization method, it allocates the best representation of the solution (krill individual) whereas it uses the simulated annealing (SA) to modify the generated krill individuals (each individual represents a set of bits). The test results showed that the KH outperformed other methods using the external and internal evaluation measures

    Text dimensionality reduction for document clustering using hybrid memetic feature selection

    Full text link
    In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL)

    Differential evolution memetic document clustering using chaotic logistic local search

    Full text link
    In this paper, we propose a Memetic-based clustering method that improves the partitioning of document clustering. Our proposed method is named as Differential Evolution Memetic Clustering (DEMC). Differential Evolution (DE) is used for the selection of the best set of cluster centres (centroids) while the Chaotic Logistic Search (CLS) is used to enhance the best set of solutions found by DE. For the purpose of comparison, the DEMC is compared with the basic DE, Differential Evolution Simulated Annealing (DESA) and the Differential Evolution K-Means (DEKM) methods as well as the traditional partitioning clustering using the K-means. The DEMC is also compared with the recently proposed Chaotic Gradient Artificial Bee Colony (CGABC) document clustering method. The reuters-21578, a pair of the 20-news group, classic 3 and TDT benchmark collection (TDT5) along with real-world six-event-crimes datasets are used in the experiments in this paper. The results showed that the proposed DEMC outperformed the other methods in terms of the convergence rate measured by the fitness function (ADDC) and the compactness of the resulted clusters measured by the F-macro and F-micro measures

    Enhancing digital forensic analysis using memetic algorithm feature selection method for document clustering

    Full text link
    Text clustering is an effective way that helps crime investigation through grouping of crime-related documents. This paper proposes a Memetic Algorithm Feature Selection (MAFS) approach to enhance the performance of document clustering algorithms used to partition crime reports and criminal news as well as some benchmark text datasets. Two clustering algorithms have been selected to demonstrate the effectiveness of the proposed MAFS method; they are the k-means and Spherical k-means (Spk). The reason behind using these clustering methods is to observe the performance of these algorithms before and after applying a hybrid FS that uses a Memetic scheme. The proposed MAFS method combines a Genetic Algorithm-based wrapper FS with the Relief-F filter. The performance evaluation was based on the clustering outcomes before and after applying the proposed MAFS method. The test results showed that the performance of both k-means and spk improved after the MAFS

    Text document clustering using memetic feature selection

    Full text link
    With the wide increase of the volume of electronic documents, it becomes inevitable the need to invent more sophisticated machine learning methods to manage the issue. In this paper, a Memetic feature selection technique is proposed to improve the k-means and the spherical k-means clustering algorithms. The proposed Memetic feature selection technique combines the wrapper inductive method with the filter ranking method. The internal and external clustering evaluation measures are used to assess the resulted clusters. The test results showed that after using the proposed hybrid method, the resulted clusters were more accurate and more compacted in comparison to the clusters resulted from using the GA-selected feature or using the entire feature space

    Adaptive crossover memetic differential harmony search for optimizing document clustering

    Full text link
    An Adaptive Crossover Memetic Differential Harmony Search (ACMDHS) method was developed for optimizing document clustering in this paper. Due to the complexity of the documents available today, the allocation of the centroid of the document clusters and finding the optimum clusters in the search space are more complex to deal with. One of the possible enhancements on the document clustering is the use of Harmony Search (HS) algorithm to optimize the search. As HS is highly dependent on its control parameters, a differential version of HS was introduced. In the modified version of HS, the Band Width parameter (BW) has been replaced by another pitch adjustment technique due to the sensitivity of the BW parameter. Thus, the Differential Evolution (DE) mutation was used instead. In this paper the DE crossover was also used with the Differential HS for further search space exploitation, the produced global search is named Crossover DHS (CDHS). Moreover, DE crossover (Cr) and mutation (F) probabilities are dynamically tuned through generations. The Memetic optimization was used to enhance the local search capability of CDHS. The proposed ACMDHS was compared to other document clustering techniques using HS, DHS, and K-means methods. It was also compared to its other two variants which are the Memetic DHS (MDHS) and the Crossover Memetic Differential Harmony Search (CMDHS). Moreover, two state-of-the-art clustering methods were also considered in comparisons, the Chaotic Gradient Artificial Bee Colony (CGABC) and the Differential Evolution Memetic Clustering (DEMC). From the experimental results, it was shown that CMDHS variant (the non-adaptive version of ACMDHS) and ACMDHS were highly competitive while both CMDHS and ACMDHS were superior to all other methods
    corecore