96,496 research outputs found

    Clustering analysis of railway driving missions with niching

    Get PDF
    A wide number of applications requires classifying or grouping data into a set of categories or clusters. Most popular clustering techniques to achieve this objective are K-means clustering and hierarchical clustering. However, both of these methods necessitate the a priori setting of the cluster number. In this paper, a clustering method based on the use of a niching genetic algorithm is presented, with the aim of finding the best compromise between the inter-cluster distance maximization and the intra-cluster distance minimization. This method is applied to three clustering benchmarks and to the classification of driving missions for railway applications

    Hybrid optimization for k-means clustering learning enhancement

    Get PDF
    In recent years, combinational optimization issues are introduced as critical problems in clustering algorithms to partition data in a way that optimizes the performance of clustering. K-means algorithm is one of the famous and more popular clustering algorithms which can be simply implemented and it can easily solve the optimization issue with less extra information. But the problems associated with Kmeans algorithm are high error rate, high intra cluster distance and low accuracy. In this regard, researchers have worked to improve the problems computationally, creating efficient solutions that lead to better data analysis through the K-means clustering algorithm. The aim of this study is to improve the accuracy of the Kmeans algorithm using hybrid and meta-heuristic methods. To this end, a metaheuristic approach was proposed for the hybridization of K-means algorithm scheme. It obtained better results by developing a hybrid Genetic Algorithm-K-means (GAK- means) and a hybrid Partial Swarm Optimization-K-means (PSO-K-means) method. Finally, the meta-heuristic of Genetic Algorithm-Partial Swarm Optimization (GAPSO) and Partial Swarm Optimization-Genetic Algorithm (PSOGA) through the K-means algorithm were proposed. The study adopted a methodological approach to achieve the goal in three phases. First, it developed a hybrid GA-based K-means algorithm through a new crossover algorithm based on the range of attributes in order to decrease the number of errors and increase the accuracy rate. Then, a hybrid PSO-based K-means algorithm was mooted by a new calculation function based on the range of domain for decreasing intra-cluster distance and increasing the accuracy rate. Eventually, two meta-heuristic algorithms namely GAPSO-K-means and PSOGA-K-means algorithms were introduced by combining the proposed algorithms to increase the number of correct answers and improve the accuracy rate. The approach was evaluated using six integer standard data sets provided by the University of California Irvine (UCI). Findings confirmed that the hybrid optimization approach enhanced the performance of K-means clustering algorithm. Although both GA-K-means and PSO-K-means improved the result of K-means algorithm, GAPSO-K-means and PSOGA-K-means meta-heuristic algorithms outperformed the hybrid approaches. PSOGA-K-means resulted in 5%- 10% more accuracy for all data sets in comparison with other methods. The approach adopted in this study successfully increased the accuracy rate of the clustering analysis and decreased its error rate and intra-cluster distance

    Incremental Genetic K-means Algorithm and its Application in Gene Expression Data Analysis

    Get PDF
    Background In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. Results In this paper, we propose a new clustering algorithm, Incremental Genetic K-means Algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at http://database.cs.wayne.edu/proj/FGKA/index.htm. Conclusions Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster

    Clustering for binary data sets by using genetic algorithm-incremental K-means

    Get PDF
    This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental Kmeans (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers

    Implementation of Feature Selection to Reduce the Number of Features in Determining the Initial Centroid of K-Means Algorithm

    Get PDF
    Clustering is a data mining method to group data based on its features or attributes. One reasonably popular clustering algorithm is K-Means. K-Means algorithm is often optimized with methods such as the genetic algorithm (GA) to overcome the problem of determining the initial random centroid. Many features in a dataset can reduce the accuracy and increase the computational time of model execution. Feature selection is an algorithm that can reduce data dimension by removing less relevant features for modeling. Therefore, this research will implement Feature selection on the K-Means algorithm optimized with the Dynamic Artificial Chromosome Genetic Algorithm (DAC GA). From the experimental results with ten datasets, it is found that reducing the number of features with feature selection can speed up the computation time of DAC GA to K-Means process by 17,5%. However, all experiments resulted in higher Sum of Square Distance (SSD) and Davies Bouldin Index (DBI) values in clustering results with selected features

    Application of data mining techniques in bioinformatics

    Get PDF
    With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Bioinformatics, an upcoming field in today’s world, which involves use of large databases can be effectively searched through data mining techniques to derive useful rules. Based on the type of knowledge that is mined, data mining techniques [1] can be mainly classified into association rules, decision trees and clustering. Until recently, biology lacked the tools to analyze massive repositories of information such as the human genome database [3]. The data mining techniques are effectively used to extract meaningful relationships from these data.Data mining is especially used in microarray analysis which is used to study the activity of different cells under different conditions. Two algorithms under each mining techniques were implemented for a large database and compared with each other. 1. Association Rule Mining: - (a) a priori (b) partition 2. Clustering: - (a) k-means (b) k-medoids 3. Classification Rule Mining:- Decision tree generation using (a) gini index (b) entropy value. Genetic algorithms were applied to association and classification techniques. Further, kmeans and Density Based Spatial Clustering of Applications of Noise (DBSCAN) clustering techniques [1] were applied to a microarray dataset and compared. The microarray dataset was downloaded from internet using the Gene Array Analyzer Software(GAAS).The clustering was done on the basis of the signal color intensity of the genes in the microarray experiment. The following results were obtained:- 1. Association:- For smaller databases, the a priori algorithm works better than partition algorithm and for larger databases partition works better. 2. Clustering:- With respect to the number of interchanges, k-medoids algorithm works better than k-means algorithm. 3. Classification:- The results were similar for both the indices (gini index and entropy value). The application of genetic algorithm improved the efficiency of the association and classification techniques. For the microarray dataset, it was found that DBSCAN is less efficient than k-means when the database is small but for larger database DBSCAN is more accurate and efficient in terms of no. of clusters and time of execution. DBSCAN execution time increases linearly with the increase in database and was much lesser than that of k-means for larger database. Owing to the involvement of large datasets and the need to derive results from them, data mining techniques can be effectively put in use in the field of Bio-informatics [2]. The techniques can be applied to find associations among the genes, cluster similar gene and protein sequences and draw decision trees to classify the genes. Further, the data mining techniques can be made more efficient by applying genetic algorithms which greatly improves the search procedure and reduces the execution time

    Pengelompokan Gambar Berdasarkan Fitur Warna Dan Tekstur Menggunakan FGKA Clustering (Fast Genetics K-Means Algorithm) Untuk Pencocokan Gambar

    Get PDF
    A large collections of digital images are being created. Usually, the only way of searching these collections was by using meta data (like caption or keywords). This way is not effective, impractical, need a big size of database and giving inaccurate result. Recently, it has been developed many ways in image retrieval that use image content (color, shape, and texture) that more recognised with CBIR ( Content Based Images Retrieval). The use of centroid produced from clustered HSV Histogram and Gabor Filter using FGKA, can be used for searching parameter. FGKA is merger of Genetic Algorithm and Kmeans Clustering Algorithm. FGKA is always converge to global optimum. Image Clustering and Matching based on color-texture feature are better than based on color feature only, texture only or using non-clustering method. Keywords: Genetics Algorithm, K-Means Clustering, CBIR, HSV Histogram, Gabor Filter

    A multi-objective genetic graph-based clustering algorithm with memory optimization

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. H. D. Menéndez, D. F. Barrero, and D. Camacho, "A multi-objective genetic graph-based clustering algorithm with memory optimization", in 2013 IEEE Congress on Evolutionary Computation (CEC), 2013, pp. 3174 - 3181Clustering is one of the most versatile tools for data analysis. Over the last few years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the Spectral Clustering algorithm, which is based on graph cut: it initially generates a Similarity Graph using a distance measure and then uses its Graph Spectrum to find the best cut. Memory consuption is a serious limitation in that algorithm: The Similarity Graph representation usually requires a very large matrix with a high memory cost. This work proposes a new algorithm, based on a previous implementation named Genetic Graph-based Clustering (GGC), that improves the memory usage while maintaining the quality of the solution. The new algorithm, called Multi-Objective Genetic Graph-based Clustering (MOGGC), uses an evolutionary approach introducing a Multi-Objective Genetic Algorithm to manage a reduced version of the Similarity Graph. The experimental validation shows that MOGGC increases the memory efficiency, maintaining and improving the GGC results in the synthetic and real datasets used in the experiments. An experimental comparison with several classical clustering methods (EM, SC and K-means) has been included to show the efficiency of the proposed algorithm.This work has been partly supported by: Spanish Ministry of Science and Education under project TIN2010-19872

    Implementasi dan Analisis Content-Based Image Retrieval pada Citra X-Ray menggunakan Algoritma Hierarki dan Algoritma Fast Genetic K-Means

    Get PDF
    ABSTRAKSI: Image Retrieval adalah proses melihat, mencari, dan mengambil citra dari basis data citra yang besar. Salah satu jenis Image Retrieval yang sangat terkenal adalah Content-Based Image Retrieval, yaitu proses pengambilan citra yang menggunakan ciri-ciri visual dari citra. Salah satu proses yang paling penting dalam sistem Content-Based Image Retrieval adalah preprocessing berupa klasterisasi citra. Proses ini dilakukan untuk mempercepat pengambilan citra dan meningkatkan akurasi dalam pencarian citra. Tugas akhir ini menggunakan Algoritma Hirarki dan Algoritma Fast Genetic K-Means dalam melakukan klasterisasi citra. Proses yang dilakukan adalah dengan mengekstrak ciri citra xray yang telah di-resize dengan menggunakan transformasi Haar Wavelet lalu diklaster berdasarkan bagian tubuhnya. Pengujian dilakukan dengan beberapa skenario untuk dilihat sistem dilihat dari pengaruh operator Algoritma Fast Genetic K-Means dalam terhadap nilai TWCV dan akurasi serta hasil evaluasi sistem Content-Based Image Retrieval dengan parameter precision dan recall. Hasil yang didapatkan dari pengujian yang dilakukan adalah klasterisasi citra dapat diimplementasikan dengan menggunakan algoritma Hirarki dan Algoritma Fast Genetic K-Means dengan akurasi yang didapatkan adalah 83,75%, nilai precision 0,72925, dan nilai recall 0,711.Kata Kunci : citra, klasterisasi, Fast Genetic K-Means, image retrieval, hirarkiABSTRACT: Image retrieval is the process of browsing searching, and retrieving images from a large database of digital image. One of image retrieval system present today is content-based image retrieval, which is the image retrieving process using visual features. One of useful process in Content-Based Image Retrieval system is preprocessing in image clustering. This process has been treated for speeding up image retrieval in image database and improving accuracy. This final project uses Hierarchical Algorithm and Fast Genetic KMeans Algorithm in image clustering. The process is done with extracting the xray features which is have resized using Haar Wavelet, then clusterizing based on parts of body. Tests carried out with several scenarios to see the system from he influence of Fast Genetic K-Means operators to TWCV value and Content-Based Image Retrieval system evaluation values using precision and recall. The results of testing system, image clustering can be implemented using Hierarchical algorithm and Fast Genetic K-Means algorithm with 83,75% accuracy, precision 0,72925, and recall 0,711.Keyword: image, clustering, Fast Genetic K-Means, image retrieval
    corecore