107 research outputs found

    Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach

    Get PDF
    Defining the correct number of clusters is one of the most fundamental tasks in graph clustering. When it comes to large graphs, this task becomes more challenging because of the lack of prior information. This paper presents an approach to solve this problem based on the Bat Algorithm, one of the most promising swarm intelligence based algorithms. We chose to call our solution, “Bat-Cluster (BC).” This approach allows an automation of graph clustering based on a balance between global and local search processes. The simulation of four benchmark graphs of different sizes shows that our proposed algorithm is efficient and can provide higher precision and exceed some best-known values

    Adaptive firefly algorithm for hierarchical text clustering

    Get PDF
    Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

    ANALISIS CLUSTER OTOMATIS MENGGUNAKAN ALGORITMA NOVEL MODIFIED DIFFERENTIAL EVOLUTION

    Get PDF
    Analisis cluster merupakan salah satu permasalahan pembelajaran tidak terbimbing dan teknik datamining yang penting. Akan tetapi, untuk menentukan jumlah cluster akhir merupakan suatu tugasyang menantang. Oleh karena itu, penelitian ini bermaksud mengusulkan algoritma novel modifieddifferential evolution (NMDE)  dan  algoritma k-means  (NMDE-k-means)  pada  analisis clusterotomatis.  Algoritma  ini  dapat  menentukan  jumlah cluster akhir  dan  melakukan  pengelompokandata secara otomatis. Pada prinsipnya Algoritma NMDE akan melakukan pencarian global untukmenemukan  jumlah cluster dan  partisi  data,  sedangkan  algoritma k-means  akan  memperbaikikinerja algoritma NMDE dalam menentukan centroid cluster. Empat data set yang sudah dikenalyaitu  Iris,  Wine,  Glass  dan  Vowel  digunakan  untuk  memvalidasi  algoritma  ini.  Hasil  komputasimenunjukkan  bahwa  algoritma  ini lebih  baik  dibandingkan dengan  empat  algoritma  clusterotomatis  lainnya  yaitu improved automatic clustering  based differential evolution (ACDE),automatic  clustering  using  differential  evolutionandk-means (ACDE-k-means)  dan  algoritmacluster otomatis yang berbasis particle swarm optimization (PSO) serta genetic algorithm (GA

    Consensus clustering with differential evolution

    Get PDF
    summary:Consensus clustering algorithms are used to improve properties of traditional clustering methods, especially their accuracy and robustness. In this article, we introduce our approach that is based on a refinement of the set of initial partitions and uses differential evolution algorithm in order to find the most valid solution. Properties of the algorithm are demonstrated on four benchmark datasets

    Evaluation of Differential Evolution Algorithm with Various Mutation Strategies for Clustering Problems

    Get PDF
    Evolutionary Algorithms (EAs) based pattern recognition has emerged as an alternative solution to data analysis problems to enhance the efficiency and accuracy of mining processes. Differential Evolution (DE) is one rival and powerful instance of EAs, and DE has been successfully used for cluster analysis in recent years. Mutation strategy, one of the main processes of DE, uses scaled differences of individuals that are chosen randomly from the population to generate a mutant (trial) vector. The achievement of the DE algorithm for solving optimization problems highly relies on an adopted mutation strategy. In this paper, an empirical study was presented to investigate the effectiveness of six frequently used mutation strategies for solving clustering problems. The experimental tests were conducted on the most widely used data set for EAs based clustering, and the quality of cluster solutions and convergence characteristics of DE variants were evaluated. The obtained results pointed out that the mutation strategies that use the guidance information from the best solution mange to find more stable results whereas the random mutation strategies are able to find high quality solutions with slower convergence rate. This study aims to provide some information and insights to develop better DE mutation schemes for clustering

    Text documents clustering using modified multi-verse optimizer

    Get PDF
    In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the final results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce significant results in comparison with three well-established methods

    Development of a R package to facilitate the learning of clustering techniques

    Get PDF
    This project explores the development of a tool, in the form of a R package, to ease the process of learning clustering techniques, how they work and what their pros and cons are. This tool should provide implementations for several different clustering techniques with explanations in order to allow the student to get familiar with the characteristics of each algorithm by testing them against several different datasets while deepening their understanding of them through the explanations. Additionally, these explanations should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching too.Grado en Ingeniería Informátic

    GF-CLUST: A nature-inspired algorithm for automatic text clustering

    Get PDF
    Text clustering is a task of grouping similar documents into a cluster while assigning the dissimilar ones in other clusters.A well-known clustering method which is the K-means algorithm is extensively employed in many disciplines.However, there is a big challenge to determine the number of clusters using K-means. This paper presents a new clustering algorithm, termed Gravity Firefly Clustering (GF-CLUST) that utilizes Firefly Algorithm for dynamic document clustering. The GF-CLUST features the ability of identifying the appropriate number of clusters for a given text collection, which is a challenging problem in document clustering. It determines documents having strong force as centers and creates clusters based on cosine similarity measurement.This is followed by selecting potential clusters and merging small clusters to them. Experiments on various document datasets, such as 20 Newgroups, Reuters-21578 and TREC collection are conducted to evaluate the performance of the proposed GF-CLUST. The results of purity, F-measure and Entropy of GF-CLUST outperform the ones produced by existing clustering techniques, such as K-means, Particle Swarm Optimization (PSO) and Practical General Stochastic Clustering Method (pGSCM).Furthermore, the number of obtained clusters in GF-CLUST is near to the actual number of clusters as compared to pGSCM
    corecore