23 research outputs found

    Evolutionary multiobjective clustering algorithms with ensemble for patient stratification

    Get PDF
    The file attached to this record is the author's final peer reviewed version.Patient stratification has been studied widely to tackle subtype diagnosis problems for effective treatment. Due to the dimensionality curse and poor interpretability of data, there is always a long-lasting challenge in constructing a stratification model with high diagnostic ability and good generalization. To address these problems, this paper proposes two novel evolutionary multiobjective clustering algorithms with ensemble (NSGA-II-ECFE and MOEA/D-ECFE) with four cluster validity indices used as the objective functions. First, an effective ensemble construction method is developed to enrich the ensemble diversity. After that, an ensemble clustering fitness evaluation (ECFE) method is proposed to evaluate the ensembles by measuring the consensus clustering under those four objective functions. To generate the consensus clustering, ECFE exploits the hybrid co-association matrix from the ensembles and then dynamically selects the suitable clustering algorithm on that matrix. Multiple experiments have been conducted to demonstrate the effectiveness of the proposed algorithm in comparison with seven clustering algorithms, twelve ensemble clustering approaches, and two multiobjective clustering algorithms on 55 synthetic datasets and 35 real patient stratification datasets. The experimental results demonstrate the competitive edges of the proposed algorithms over those compared methods. Furthermore, the proposed algorithm is applied to extend its advantages by identifying cancer subtypes from five cancer-related single-cell RNA-seq datasets

    MOCF: A Multi-Objective Clustering Framework using an Improved Particle Swarm Optimization Algorithm

    Get PDF
    Traditional clustering algorithms, such as K-Means, perform clustering with a single goal in mind. However, in many real-world applications, multiple objective functions must be considered at the same time. Furthermore, traditional clustering algorithms have drawbacks such as centroid selection, local optimal, and convergence. Particle Swarm Optimization (PSO)-based clustering approaches were developed to address these shortcomings. Animals and their social Behaviour, particularly bird flocking and fish schooling, inspire PSO. This paper proposes the Multi-Objective Clustering Framework (MOCF), an improved PSO-based framework. As an algorithm, a Particle Swarm Optimization (PSO) based Multi-Objective Clustering (PSO-MOC) is proposed. It significantly improves clustering efficiency. The proposed framework's performance is evaluated using a variety of real-world datasets. To test the performance of the proposed algorithm, a prototype application was built using the Python data science platform. The empirical results showed that multi-objective clustering outperformed its single-objective counterparts

    Density propagation based adaptive multi-density clustering algorithm

    Get PDF
    This research was supported by the Science & Technology Development Foundation of Jilin Province (Grants Nos. 20160101259JC, 20180201045GX), the National Natural Science Foundation of China (Grants No. 61772227) and the Natural Science Foundation of Xinjiang Province (Grants No. 2015211C127). This resarch is also supported by the Engineering and Physical Sciences Research Council (EPSRC) funded project on New Industrial Systems: Manufacturing Immortality (EP/R020957/1).Peer reviewedPublisher PD

    Multi-Objective Differential Evolution for Automatic Clustering with Application to Micro-Array Data Analysis

    Get PDF
    This paper applies the Differential Evolution (DE) algorithm to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performances of two multi-objective variants of DE over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each algorithm consists of a number of non-dominated solutions, from which the user can choose the most promising ones according to the problem specifications. A real-coded representation of the search variables, accommodating variable number of cluster centers, is used for DE. The performances of the multi-objective DE-variants have also been contrasted to that of two most well-known schemes of MO clustering, namely the Non Dominated Sorting Genetic Algorithm (NSGA II) and Multi-Objective Clustering with an unknown number of Clusters K (MOCK). Experimental results using six artificial and four real life datasets of varying range of complexities indicate that DE holds immense promise as a candidate algorithm for devising MO clustering schemes

    Clustering: finding patterns in the darkness

    Get PDF
    Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets

    CLUSTERING PROBLEMS IN A MULTIOBJECTIVE FRAMEWORK

    Full text link

    A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

    Get PDF
    Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two contradictory objective functions based on maximum data compactness in clusters (the degree of proximity of data) and maximum cluster separation (the degree of remoteness of clusters’ centers) is proposed. In order to solve this model, a recently proposed optimization method, the Multi-objective Improved Teaching Learning Based Optimization (MOITLBO) algorithm, is used. This algorithm is tested on several datasets and its clusters are compared with the results of some single-objective algorithms. Furthermore, with respect to noise, the comparison of the performance of the proposed model with another multi-objective model shows that it is robust to noisy data sets and thus can be efficiently used for multi-objective fuzzy clustering

    Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping

    Get PDF
    Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multi-omics data make great effects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the effectiveness of the proposed framework is confirmed by evaluating five different multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AE-assisted multi-omics clustering method can identify clinically significant cancer subtypes
    corecore