63,289 research outputs found
Comparison of methods to identify modules in noisy or incomplete brain networks
open6siCommunity structure, or "modularity," is a fundamentally important aspect in the organization of structural and functional brain networks, but their identification with community detection methods is confounded by noisy or missing connections. Although several methods have been used to account for missing data, the performance of these methods has not been compared quantitatively so far. In this study, we compared four different approaches to account for missing connections when identifying modules in binary and weighted networks using both Louvain and Infomap community detection algorithms. The four methods are "zeros," "row-column mean," "common neighbors," and "consensus clustering." Using Lancichinetti-Fortunato-Radicchi benchmark-simulated binary and weighted networks, we find that "zeros," "row-column mean," and "common neighbors" approaches perform well with both Louvain and Infomap, whereas "consensus clustering" performs well with Louvain but not Infomap. A similar pattern of results was observed with empirical networks from stereotactical electroencephalography data, except that "consensus clustering" outperforms other approaches on weighted networks with Louvain. Based on these results, we recommend any of the four methods when using Louvain on binary networks, whereas "consensus clustering" is superior with Louvain clustering of weighted networks. When using Infomap, "zeros" or "common neighbors" should be used for both binary and weighted networks. These findings provide a basis to accounting for noisy or missing connections when identifying modules in brain networks.openWilliams N.; Arnulfo G.; Wang S.H.; Nobili L.; Palva S.; Palva J.M.Williams, N.; Arnulfo, G.; Wang, S. H.; Nobili, L.; Palva, S.; Palva, J. M
Weighted consensus clustering for multiblock data
International audienc
Consensus clustering approach to group brain connectivity matrices
A novel approach rooted on the notion of consensus clustering, a strategy
developed for community detection in complex networks, is proposed to cope with
the heterogeneity that characterizes connectivity matrices in health and
disease. The method can be summarized as follows:
(i) define, for each node, a distance matrix for the set of subjects by
comparing the connectivity pattern of that node in all pairs of subjects; (ii)
cluster the distance matrix for each node; (iii) build the consensus network
from the corresponding partitions; (iv) extract groups of subjects by finding
the communities of the consensus network thus obtained.
Differently from the previous implementations of consensus clustering, we
thus propose to use the consensus strategy to combine the information arising
from the connectivity patterns of each node. The proposed approach may be seen
either as an exploratory technique or as an unsupervised pre-training step to
help the subsequent construction of a supervised classifier. Applications on a
toy model and two real data sets, show the effectiveness of the proposed
methodology, which represents heterogeneity of a set of subjects in terms of a
weighted network, the consensus matrix
Automated calibration of consensus weighted distance-based clustering approaches using sharp
In consensus clustering, a clustering algorithm is used in combination with a
subsampling procedure to detect stable clusters. Previous studies on both
simulated and real data suggest that consensus clustering outperforms native
algorithms. We extend here consensus clustering to allow for attribute
weighting in the calculation of pairwise distances using existing regularised
approaches. We propose a procedure for the calibration of the number of
clusters (and regularisation parameter) by maximising a novel consensus score
calculated directly from consensus clustering outputs, making it extremely
computationally competitive. Our simulation study shows better clustering
performances of (i) models calibrated by maximising our consensus score
compared to existing calibration scores, and (ii) weighted compared to
unweighted approaches in the presence of features that do not contribute to
cluster definition. Application on real gene expression data measured in lung
tissue reveals clear clusters corresponding to different lung cancer subtypes.
The R package sharp (version 1.4.0) is available on CRAN
Automated calibration of consensus weighted distance-based clustering approaches using sharp
Motivation: In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms. Results: We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularised approaches. We propose a procedure for the calibration of the number of clusters (and regularisation parameter) by maximising the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximising the sharp score compared to existing calibration scores, and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes. Availability and implementation: The R package sharp (version ≥ 1.4.3) is available on CRAN at https://CRAN.R-project.org/package=sharp
Weighting Policies for Robust Unsupervised Ensemble Learning
The unsupervised ensemble learning, or consensus clustering, consists of finding the optimal com- bination strategy of individual partitions that is robust in comparison to the selection of an algorithmic clustering pool. Despite its strong properties, this approach assigns the same weight to the contribution of each clustering to the final solution. We propose a weighting policy for this problem that is based on internal clustering quality measures and compare against other modern approaches. Results on publicly available datasets show that weights can significantly improve the accuracy performance while retaining the robust properties. Since the issue of determining an appropriate number of clusters, which is a primary input for many clustering methods is one of the significant challenges, we have used the same methodology to predict correct or the most suitable number of clusters as well. Among various methods, using internal validity indexes in conjunction with a suitable algorithm is one of the most popular way to determine the appropriate number of cluster. Thus, we use weighted consensus clustering along with four different indexes which are Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indexes. Our experiment indicates that weighted consensus clustering together with chosen indexes is a useful method to determine right or the most appropriate number of clusters in comparison to individual clustering methods (e.g., k-means) and consensus clustering. Lastly, to decrease the variance of proposed weighted consensus clustering, we borrow the idea of Markowitz portfolio theory and implement its core idea to clustering domain. We aim to optimize the combination of individual clustering methods to minimize the variance of clustering accuracy. This is a new weighting policy to produce partition with a lower variance which might be crucial for a decision maker. Our study shows that using the idea of Markowitz portfolio theory will create a partition with a less variation in comparison to traditional consensus clustering and proposed weighted consensus clustering
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Consensus clustering and functional interpretation of gene-expression data
Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas
- …