28,875 research outputs found
Accuracy improvement in protein complex prediction from protein interaction networks by refining cluster overlaps
<p>Abstract</p> <p>Background</p> <p>Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters.</p> <p><b>Results</b></p> <p>We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the <it>f</it>-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps.</p> <p><b>Conclusions</b></p> <p>The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.</p
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
Overlapping stochastic block models with application to the French political blogosphere
Complex systems in nature and in society are often represented as networks,
describing the rich set of interactions between objects of interest. Many
deterministic and probabilistic clustering methods have been developed to
analyze such structures. Given a network, almost all of them partition the
vertices into disjoint clusters, according to their connection profile.
However, recent studies have shown that these techniques were too restrictive
and that most of the existing networks contained overlapping clusters. To
tackle this issue, we present in this paper the Overlapping Stochastic Block
Model. Our approach allows the vertices to belong to multiple clusters, and, to
some extent, generalizes the well-known Stochastic Block Model [Nowicki and
Snijders (2001)]. We show that the model is generically identifiable within
classes of equivalence and we propose an approximate inference procedure, based
on global and local variational techniques. Using toy data sets as well as the
French Political Blogosphere network and the transcriptional network of
Saccharomyces cerevisiae, we compare our work with other approaches.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS382 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- âŚ