94,575 research outputs found

    A study of the effects of clustering and local search on radio network design: evolutionary computation approaches

    Get PDF
    Eighth International Conference on Hybrid Intelligent Systems. Barcelona, 10-12 September 2008The goal of this paper is twofold. First, we want to make a study about how evolutionary computation techniques can efficiently solve the radio network design problem. For this goal we test several evolutionary computation techniques within the OPLINK experimental framework and compare them. Second, we propose a clustering approach and a 2-OPT in order to improve the results obtained by the evolutionary algorithms. Experiments carried out provide empirical evidence of how clustering-based techniques help in improving all algorithms tested. Extensive computational tests, including ones without clustering and 2-OPT, are performed with three evolutionary algorithms: genetic algorithms, memetic algorithms and chromosome appearance probability matrix algorithms.Publicad

    A new unsupervised feature selection method for text clustering based on genetic algorithms

    Get PDF
    Nowadays a vast amount of textual information is collected and stored in various databases around the world, including the Internet as the largest database of all. This rapidly increasing growth of published text means that even the most avid reader cannot hope to keep up with all the reading in a field and consequently the nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. Text clustering is one of the most important areas in text mining, which includes text preprocessing, dimension reduction by selecting some terms (features) and finally clustering using selected terms. Feature selection appears to be the most important step in the process. Conventional unsupervised feature selection methods define a measure of the discriminating power of terms to select proper terms from corpus. However up to now the valuation of terms in groups has not been investigated in reported works. In this paper a new and robust unsupervised feature selection approach is proposed that evaluates terms in groups. In addition a new Modified Term Variance measuring method is proposed for evaluating groups of terms. Furthermore a genetic based algorithm is designed and implemented for finding the most valuable groups of terms based on the new measure. These terms then will be utilized to generate the final feature vector for the clustering process . In order to evaluate and justify our approach the proposed method and also a conventional term variance method are implemented and tested using corpus collection Reuters-21578. For a more accurate comparison, methods have been tested on three corpuses and for each corpus clustering task has been done ten times and results are averaged. Results of comparing these two methods are very promising and show that our method produces better average accuracy and F1-measure than the conventional term variance method

    Fuzzy clustering of univariate and multivariate time series by genetic multiobjective optimization

    Get PDF
    Given a set of time series, it is of interest to discover subsets that share similar properties. For instance, this may be useful for identifying and estimating a single model that may fit conveniently several time series, instead of performing the usual identification and estimation steps for each one. On the other hand time series in the same cluster are related with respect to the measures assumed for cluster analysis and are suitable for building multivariate time series models. Though many approaches to clustering time series exist, in this view the most effective method seems to have to rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coe±cients, spectral measures or the eigenvectors of the covariance matrix. Some new indexes based on goodnessof-fit criteria will be proposed in this paper for fuzzy clustering of multivariate time series. A general purpose fuzzy clustering algorithm may be used to estimate the proper cluster structure according to some internal criteria of cluster validity. Such indexes are known to measure actually definite often conflicting cluster properties, compactness or connectedness, for instance, or distribution, orientation, size and shape. It is argued that the multiobjective optimization supported by genetic algorithms is a most effective choice in such a di±cult context. In this paper we use the Xie-Beni index and the C-means functional as objective functions to evaluate the cluster validity in a multiobjective optimization framework. The concept of Pareto optimality in multiobjective genetic algorithms is used to evolve a set of potential solutions towards a set of optimal non-dominated solutions. Genetic algorithms are well suited for implementing di±cult optimization problems where objective functions do not usually have good mathematical properties such as continuity, differentiability or convexity. In addition the genetic algorithms, as population based methods, may yield a complete Pareto front at each step of the iterative evolutionary procedure. The method is illustrated by means of a set of real data and an artificial multivariate time series data set.Fuzzy clustering, Internal criteria of cluster validity, Genetic algorithms, Multiobjective optimization, Time series, Pareto optimality

    Initial Centroid Determination Using Genetic Algorithm in Data Clustering

    Get PDF
    Clustering K-Means using random initial determination centroid. Generated random centroids using K-Means trapped in optimum local which results in poor clustering quality. Initial centroids in k-means will examine effect of genetic algorithms are each tested on data with dimension reduction and without dimension reduction. Based on the results of initial centroid testing obtained from genetic algorithms, quality of cluster results increase 54.9% in high dimensional data and 52.4% in data had been carried out for dimensional reduction. This shows that K-Means clustering with initial centroids obtained from genetic algorithm calculations has best cluster with significant results

    An approach based on genetic algorithms for clustering classes in components

    Get PDF
    The goal of this work is to create a model that allows identification of the software components (or subsystems according to the unified process terminology) based on the design models, or more exactly, based on the classes diagrams (for the static aspects) and on the interaction diagrams (for the dynamic aspects). The work also presents a genetic algorithm used for the clustering of classes into modules

    Clustering for binary data sets by using genetic algorithm-incremental K-means

    Get PDF
    This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental Kmeans (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers

    Development of genetic algorithm based classification and cluster analysis methods for analytical data

    Get PDF
    Thesis (Doctoral)--İzmir Institute of Technology, Chemistry, İzmir, 2009Includes bibliographical references (leaves: 151-158)Text in English; Abstract: Turkish and Englishxviii, 158 leavesIn this study genetic algorithm based classification and clustering methods were aimed to develop for the spectral data. The developed methods were completely achieved hybridization of nature inspired algorithm (genetic algorithms, GAs) to other classification or clustering methods. The first method was genetic algorithm based principal component analysis (GAPCAD), and the second was genetic algorithm based discriminant analysis (GADA). Both methods were performed to achieve the best discrimination between the olive oil and vegetable oil samples. The classifications of samples were examined directly from their spectral data obtained from using near infrared spectrometry, Fourier transform infrared (FTIR) spectrometry, and spectrofluorometry. The GA was used to optimize the performance of classification or clustering techniques. on training set in order to maximize the correct classification of acceptable and unacceptable samples or samples of dissimilar properties and to reduce the spectral data by wavelength selection. After GA optimization the classification results of training set were controlled by validation set. Lastly, the success of both algorithms was compared to the results of PCA and SIMCA

    A general framework of multi-population methods with clustering in undetectable dynamic environments

    Get PDF
    Copyright @ 2011 IEEETo solve dynamic optimization problems, multiple population methods are used to enhance the population diversity for an algorithm with the aim of maintaining multiple populations in different sub-areas in the fitness landscape. Many experimental studies have shown that locating and tracking multiple relatively good optima rather than a single global optimum is an effective idea in dynamic environments. However, several challenges need to be addressed when multi-population methods are applied, e.g., how to create multiple populations, how to maintain them in different sub-areas, and how to deal with the situation where changes can not be detected or predicted. To address these issues, this paper investigates a hierarchical clustering method to locate and track multiple optima for dynamic optimization problems. To deal with undetectable dynamic environments, this paper applies the random immigrants method without change detection based on a mechanism that can automatically reduce redundant individuals in the search space throughout the run. These methods are implemented into several research areas, including particle swarm optimization, genetic algorithm, and differential evolution. An experimental study is conducted based on the moving peaks benchmark to test the performance with several other algorithms from the literature. The experimental results show the efficiency of the clustering method for locating and tracking multiple optima in comparison with other algorithms based on multi-population methods on the moving peaks benchmark
    corecore