69,796 research outputs found

    Clustering ensemble method

    Get PDF
    A clustering ensemble aims to combine multiple clustering models to produce a better result than that of the individual clustering algorithms in terms of consistency and quality. In this paper, we propose a clustering ensemble algorithm with a novel consensus function named Adaptive Clustering Ensemble. It employs two similarity measures, cluster similarity and a newly defined membership similarity, and works adaptively through three stages. The first stage is to transform the initial clusters into a binary representation, and the second is to aggregate the initial clusters that are most similar based on the cluster similarity measure between clusters. This iterates itself adaptively until the intended candidate clusters are produced. The third stage is to further refine the clusters by dealing with uncertain objects to produce an improved final clustering result with the desired number of clusters. Our proposed method is tested on various real-world benchmark datasets and its performance is compared with other state-of-the-art clustering ensemble methods, including the Co-association method and the Meta-Clustering Algorithm. The experimental results indicate that on average our method is more accurate and more efficient

    Coupled clustering ensemble by exploring data interdependence

    Get PDF
    © 2018 ACM. Clustering ensembles combine multiple partitions of data into a single clustering solution. It is an effective technique for improving the quality of clustering results. Current clustering ensemble algorithms are usually built on the pairwise agreements between clusterings that focus on the similarity via consensus functions, between data objects that induce similarity measures from partitions and re-cluster objects, and between clusters that collapse groups of clusters into meta-clusters. In most of those models, there is a strong assumption on IIDness (i.e., independent and identical distribution), which states that base clusterings perform independently of one another and all objects are also independent. In the real world, however, objects are generally likely related to each other through features that are either explicit or even implicit. There is also latent but definite relationship among intermediate base clusterings because they are derived from the same set of data. All these demand a further investigation of clustering ensembles that explores the interdependence characteristics of data. To solve this problem, a new coupled clustering ensemble (CCE) framework that works on the interdependence nature of objects and intermediate base clusterings is proposed in this article. The main idea is to model the coupling relationship between objects by aggregating the similarity of base clusterings, and the interactive relationship among objects by addressing their neighborhood domains. Once these interdependence relationships are discovered, they will act as critical supplements to clustering ensembles. We verified our proposed framework by using three types of consensus function: clustering-based, object-based, and cluster-based. Substantial experiments on multiple synthetic and real-life benchmark datasets indicate that CCE can effectively capture the implicit interdependence relationships among base clusterings and among objects with higher clustering accuracy, stability, and robustness compared to 14 state-of-the-art techniques, supported by statistical analysis. In addition, we show that the final clustering quality is dependent on the data characteristics (e.g., quality and consistency) of base clusterings in terms of sensitivity analysis. Finally, the applications in document clustering, as well as on the datasets with much larger size and dimensionality, further demonstrate the effectiveness, efficiency, and scalability of our proposed models

    Microbial community pattern detection in human body habitats via ensemble clustering framework

    Full text link
    The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural patterns. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201

    Pancancer analysis of DNA methylation-driven genes using MethylMix.

    Get PDF
    Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues
    corecore