79,128 research outputs found

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

    Get PDF
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

    Optimizing Software Clustering using Hybrid Bee Colony Approach

    Get PDF
    Maintenance of software is the most expensive and complicated phase of the software development lifecycle. It becomes more cumbersome if the architecture of the software system is not available. Search-based optimization is found to be a technique very efficient in recovering the architecture of such a system. In this paper, we propose a technique which is based on a combination of artificial honey bee swarm intelligent algorithm and genetic algorithm to recover this architecture. In this way, it will be very helpful to software maintainers for efficient and effective software maintenance. In order to evaluate the success of this approach, it has been applied to a few real-world module clustering problems. The results we obtained support our claim that this approach produces architecture significantly better than the existing approaches. Keywords: Artificial bee colony algorithm, Genetic algorithm, Software clustering, Software Modularization

    Clustering with shallow trees

    Full text link
    We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.Comment: 11 pages, 7 figure

    Enhancement of K-Parameter Using Hybrid Stratified Sampling and Genetic Algorithm

    Get PDF
    Clustering is a technique used to classify data into clusters based on their similarities. K-means is a clustering algorithm method that classifies the objects based on their closest distance to the cluster center to the groups that have most similarities among the members. In addition, K-means is also the most widely used clustering algorithm due to its ease of implementation. However, the process of selecting the centroid on K-means still randomly. This results K-means is often trapped in local minimum conditions. Genetic algorithm is used in this research as a metaheuristic method where the algorithm can support K-means in reaching global optimum function. Besides, the stratified sampling is also used in this research, where the sampling functions by dividing the population into homogeneous areas using stratification variables. The validation value of the proposed method with iris dataset is 0.417, while the K-means is only 0.662.Clustering is a technique used to classify data into clusters based on their similarities. K-means is a clustering algorithm method that classifies the objects based on their closest distance to the cluster center to the groups that have most similarities among the members. In addition, K-means is also the most widely used clustering algorithm due to its ease of implementation. However, the process of selecting the centroid on K-means still randomly. This results K-means is often trapped in local minimum conditions. Genetic algorithm is used in this research as a metaheuristic method where the algorithm can support K-means in reaching global optimum function. Besides, the stratified sampling is also used in this research, where the sampling functions by dividing the population into homogeneous areas using stratification variables. The validation value of the proposed method with iris dataset is 0.417, while the K-means is only 0.662

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

    Get PDF
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

    Feature Selection for Image Retrieval based on Genetic Algorithm

    Get PDF
    This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co- occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm

    An Improved Differential Evolution Algorithm for Data Stream Clustering

    Get PDF
    A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%

    Multiobjective optimization of cluster measures in Microarray Cancer data using Genetic Algorithm Based Fuzzy Clustering

    Get PDF
    The field of biological and biomedical research has been changed rapidly with the invention of microarray technology, which facilitates simultaneously monitoring of large number of genes across different experimental conditions. In this report a multi objective genetic algorithm technique called Non-Dominated Sorting Genetic Algorithm (NSGA) - II based approach has been proposed for fuzzy clustering of microarray cancer expression dataset that encodes the cluster modes and simultaneously optimizes the two factors called fuzzy compactness and fuzzy separation of the clusters. The multiobjective technique produces a set of non-dominated solutions. This approach identifies the solution i.e. the individual chromosome which gives the optimal value of the parameters
    corecore