197 research outputs found

    Multiobjective Particle Swarm Optimization Based on PAM and Uniform Design

    Get PDF
    In MOPSO (multiobjective particle swarm optimization), to maintain or increase the diversity of the swarm and help an algorithm to jump out of the local optimal solution, PAM (Partitioning Around Medoid) clustering algorithm and uniform design are respectively introduced to maintain the diversity of Pareto optimal solutions and the uniformity of the selected Pareto optimal solutions. In this paper, a novel algorithm, the multiobjective particle swarm optimization based on PAM and uniform design, is proposed. The differences between the proposed algorithm and the others lie in that PAM and uniform design are firstly introduced to MOPSO. The experimental results performing on several test problems illustrate that the proposed algorithm is efficient

    Multi-objective evolutionary algorithms for data clustering

    Get PDF
    In this work we investigate the use of Multi-Objective metaheuristics for the data-mining task of clustering. We �first investigate methods of evaluating the quality of clustering solutions, we then propose a new Multi-Objective clustering algorithm driven by multiple measures of cluster quality and then perform investigations into the performance of different Multi-Objective clustering algorithms. In the context of clustering, a robust measure for evaluating clustering solutions is an important component of an algorithm. These Cluster Quality Measures (CQMs) should rely solely on the structure of the clustering solution. A robust CQM should have three properties: it should be able to reward a \good" clustering solution; it should decrease in value monotonically as the solution quality deteriorates and, it should be able to evaluate clustering solutions with varying numbers of clusters. We review existing CQMs and present an experimental evaluation of their robustness. We find that measures based on connectivity are more robust than other measures for cluster evaluation. We then introduce a new Multi-Objective Clustering algorithm (MOCA). The use of Multi-Objective optimisation in clustering is desirable because it permits the incorporation of multiple measures of cluster quality. Since the definition of what constitutes a good clustering is far from clear, it is beneficial to develop algorithms that allow for multiple CQMs to be accommodated. The selection of the clustering quality measures to use as objectives for MOCA is informed by our previous work with internal evaluation measures. We explain the implementation details and perform experimental work to establish its worth. We compare MOCA with k-means and find some promising results. We�find that MOCA can generate a pool of clustering solutions that is more likely to contain the optimal clustering solution than the pool of solutions generated by k-means. We also perform an investigation into the performance of different implementations of MOEA algorithms for clustering. We�find that representations of clustering based around centroids and medoids produce more desirable clustering solutions and Pareto fronts. We also �find that mutation operators that greatly disrupt the clustering solutions lead to better exploration of the Pareto front whereas mutation operators that modify the clustering solutions in a more moderate way lead to higher quality clustering solutions. We then perform more specific investigations into the performance of mutation operators focussing on operators that promote clustering solution quality, exploration of the Pareto front and a hybrid combination. We use a number of techniques to assess the performance of the mutation operators as the algorithms execute. We confirm that a disruptive mutation operator leads to better exploration of the Pareto front and mutation operators that modify the clustering solutions lead to the discovery of higher quality clustering solutions. We find that our implementation of a hybrid mutation operator does not lead to a good improvement with respect to the other mutation operators but does show promise for future work

    An analytics-based heuristic decomposition of a bilevel multiple-follower cutting stock problem

    Get PDF
    This paper presents a new class of multiple-follower bilevel problems and a heuristic approach to solving them. In this new class of problems, the followers may be nonlinear, do not share constraints or variables, and are at most weakly constrained. This allows the leader variables to be partitioned among the followers. We show that current approaches for solving multiple-follower problems are unsuitable for our new class of problems and instead we propose a novel analytics-based heuristic decomposition approach. This approach uses Monte Carlo simulation and k-medoids clustering to reduce the bilevel problem to a single level, which can then be solved using integer programming techniques. The examples presented show that our approach produces better solutions and scales up better than the other approaches in the literature. Furthermore, for large problems, we combine our approach with the use of self-organising maps in place of k-medoids clustering, which significantly reduces the clustering times. Finally, we apply our approach to a real-life cutting stock problem. Here a forest harvesting problem is reformulated as a multiple-follower bilevel problem and solved using our approachThis publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/228

    A Hybrid Heuristic for the k-medoids Clustering Problem

    Get PDF
    Clustering is an important tool for data analysis, since it allows the exploration of datasets with no or very little prior information. Its main goal is to group a set of data based on their similarity (dissimilarity). A well known mathematical formulation for clustering is the k-medoids problem. Current versions of k-medoids rely on heuristics, with good results reported in the literature. However, few methods that analyze the quality of the partitions found by the heuristics have been proposed. in this paper, we propose a hybrid Lagrangian heuristic for the k-medoids. We compare the performance of the proposed Lagrangian heuristic with other heuristics for the k-medoids problem found in literature. Experimental results presented that the proposed Lagrangian heuristic outperformed the other algorithms.UNIFESP, Inst Ciencia & Tecnol, BR-12230280 Sao Jose Dos Campos, SP, BrazilUNIFESP, Inst Ciencia & Tecnol, BR-12230280 Sao Jose Dos Campos, SP, BrazilWeb of Scienc

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Multi-Objective Differential Evolution for Automatic Clustering with Application to Micro-Array Data Analysis

    Get PDF
    This paper applies the Differential Evolution (DE) algorithm to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performances of two multi-objective variants of DE over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each algorithm consists of a number of non-dominated solutions, from which the user can choose the most promising ones according to the problem specifications. A real-coded representation of the search variables, accommodating variable number of cluster centers, is used for DE. The performances of the multi-objective DE-variants have also been contrasted to that of two most well-known schemes of MO clustering, namely the Non Dominated Sorting Genetic Algorithm (NSGA II) and Multi-Objective Clustering with an unknown number of Clusters K (MOCK). Experimental results using six artificial and four real life datasets of varying range of complexities indicate that DE holds immense promise as a candidate algorithm for devising MO clustering schemes

    Graph Based Sequence Clustering Through Multiobjective Evolutionary Algorithms

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2008Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2008Bu çalışmada, ikili benzerlikler olarak ifade edilen sıralı dizilerin çokamaçlı evrimsel algoritmalar kullanılarak demetlenmesi üzerine odaklanılmıştır. Sıralı dizilerden oluşan bir veri kümesi bir yönsüz, ağırlıklı bir çizge olarak ifade edildiğinde, sıralı diziler çizge üzerindeki düğümlere, onlar arası benzerliklerse kenar uzunluklarına denk düşerler. Bu durumda sıralı dizi demetleme problemi evrimsel algoritmalarla çözülebilecek NP-zor çizge bölümleme problemine dönüştürülür. Sıralı dizilerin demetlenmesi için çizge tabanlı bir çokamaçlı evrimsel algoritma önerilmiş, algoritmanın evrimsel operatörleri, amaç fonksiyonları, genetik temsil ve başlangıç durumuna getirme yöntemi ve temel çokamaçlı evrimsel algoritma bileşenleri değiştirilerek çeşitli varyasyonları gerçeklenmiştir. Sıralı dizi demetleme problemi için en uygun varyasyonun belirlenmesi istatiksel testler ve demetleme kalite göstergeleri aracılığıyla sağlanmıştır.This dissertation focuses on the clustering of sequences represented as pairwise similarities through multiobjective evolutionary algorithms. The sequence can be expressed through weighted, undirected graphs where each sequence becomes a vertex of the graph and the pairwise similarities or dissimilarities form the edges connecting the corresponding vertices in the graph. Through this representation approach, the sequence clustering problem becomes equivalent to graph partitioning which is an NP-hard problem and can be solved through evolutionary algorithms. To cluster sequences a graph based multiobjective evolutionary algorithm is proposed. By changing the evolutionary operators, objective functions, genetic representation and initialization method different variations of this algorithm is implemented. In order to determine the best variation for the sequence clustering problem quality indicators with statistical tests and cluster validation indices are used.Yüksek LisansM.Sc

    Data clustering procedures: a general review

    Get PDF
    In the age of data science, the clustering of various types of objects (e.g., documents, genes, customers) has become a key activity and many high-quality computer implementations are provided for this purpose by many general software packages. Clustering consists of grouping a set of objects in such a way that objects which are similar to one another according to some metric belong to the same group, named a cluster. It is one of the most valuable and used tasks of exploratory data mining and can be applied to a wide variety of fields. Research on the problem of clustering tends to be fragmented across pattern recognition, database, data mining, and machine learning communities. This work discusses the common techniques that are used in cluster analysis. These methodologies will be applied to data analysis in the framework of polymer processing.A. Manuela Gonçalves was partially financed by Portuguese Funds through FCT (Fundação para a Ciência e a Tecnologia) within the Projects UIDB/00013/2020 and UIDP/00013/2020 of CMAT-UMThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement No. 734205 – H2020-MSCA-RISE-2016
    corecore