581 research outputs found
Seeding for pervasively overlapping communities
In some social and biological networks, the majority of nodes belong to
multiple communities. It has recently been shown that a number of the
algorithms that are designed to detect overlapping communities do not perform
well in such highly overlapping settings. Here, we consider one class of these
algorithms, those which optimize a local fitness measure, typically by using a
greedy heuristic to expand a seed into a community. We perform synthetic
benchmarks which indicate that an appropriate seeding strategy becomes
increasingly important as the extent of community overlap increases. We find
that distinct cliques provide the best seeds. We find further support for this
seeding strategy with benchmarks on a Facebook network and the yeast
interactome.Comment: 8 Page
Detecting highly overlapping community structure by greedy clique expansion
In complex networks it is common for each node to belong to several
communities, implying a highly overlapping community structure. Recent advances
in benchmarking indicate that existing community assignment algorithms that are
capable of detecting overlapping communities perform well only when the extent
of community overlap is kept to modest levels. To overcome this limitation, we
introduce a new community assignment algorithm called Greedy Clique Expansion
(GCE). The algorithm identifies distinct cliques as seeds and expands these
seeds by greedily optimizing a local fitness function. We perform extensive
benchmarks on synthetic data to demonstrate that GCE's good performance is
robust across diverse graph topologies. Significantly, GCE is the only
algorithm to perform well on these synthetic graphs, in which every node
belongs to multiple communities. Furthermore, when put to the task of
identifying functional modules in protein interaction data, and college dorm
assignments in Facebook friendship data, we find that GCE performs
competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at
http://sites.google.com/site/greedycliqueexpansion
Uncovering the overlapping community structure of complex networks in nature and society
Many complex systems in nature and society can be described in terms of
networks capturing the intricate web of connections among the units they are
made of. A key question is how to interpret the global organization of such
networks as the coexistence of their structural subunits (communities)
associated with more highly interconnected parts. Identifying these a priori
unknown building blocks (such as functionally related proteins, industrial
sectors and groups of people) is crucial to the understanding of the structural
and functional properties of networks. The existing deterministic methods used
for large networks find separated communities, whereas most of the actual
networks are made of highly overlapping cohesive groups of nodes. Here we
introduce an approach to analysing the main statistical features of the
interwoven sets of overlapping communities that makes a step towards uncovering
the modular structure of complex systems. After defining a set of new
characteristic quantities for the statistics of communities, we apply an
efficient technique for exploring overlapping communities on a large scale. We
find that overlaps are significant, and the distributions we introduce reveal
universal features of networks. Our studies of collaboration, word-association
and protein interaction graphs show that the web of communities has non-trivial
correlations and specific scaling properties.Comment: The free academic research software, CFinder, used for the
publication is available at the website of the publication:
http://angel.elte.hu/clusterin
Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology
Abstract Background In many protein-protein interaction (PPI) networks, densely connected hub proteins are more likely to be essential proteins. This is referred to as the "centrality-lethality rule", which indicates that the topological placement of a protein in PPI network is connected with its biological essentiality. Though such connections are observed in many PPI networks, the underlying topological properties for these connections are not yet clearly understood. Some suggested putative connections are the involvement of essential proteins in the maintenance of overall network connections, or that they play a role in essential protein clusters. In this work, we have attempted to examine the placement of essential proteins and the network topology from a different perspective by determining the correlation of protein essentiality and reverse nearest neighbor topology (RNN). Results The RNN topology is a weighted directed graph derived from PPI network, and it is a natural representation of the topological dependences between proteins within the PPI network. Similar to the original PPI network, we have observed that essential proteins tend to be hub proteins in RNN topology. Additionally, essential genes are enriched in clusters containing many hub proteins in RNN topology (RNN protein clusters). Based on these two properties of essential genes in RNN topology, we have proposed a new measure; the RNN cluster centrality. Results from a variety of PPI networks demonstrate that RNN cluster centrality outperforms other centrality measures with regard to the proportion of selected proteins that are essential proteins. We also investigated the biological importance of RNN clusters. Conclusions This study reveals that RNN cluster centrality provides the best correlation of protein essentiality and placement of proteins in PPI network. Additionally, merged RNN clusters were found to be topologically important in that essential proteins are significantly enriched in RNN clusters, and biologically important because they play an important role in many Gene Ontology (GO) processes.http://deepblue.lib.umich.edu/bitstream/2027.42/78257/1/1471-2105-11-505.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/2/1471-2105-11-505-S1.DOChttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/3/1471-2105-11-505.pdfPeer Reviewe
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
Efficient and accurate greedy search methods for mining functional modules in protein interaction networks
<p>Abstract</p> <p>Background</p> <p>Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures.</p> <p>Methods</p> <p>In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules.</p> <p>Results</p> <p>The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms.</p> <p>Conclusions</p> <p>Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.</p
Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks
Identification of protein complexes fromprotein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived.The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures
Overlapping modularity at the critical point of k-clique percolation
One of the most remarkable social phenomena is the formation of communities
in social networks corresponding to families, friendship circles, work teams,
etc. Since people usually belong to several different communities at the same
time, the induced overlaps result in an extremely complicated web of the
communities themselves. Thus, uncovering the intricate community structure of
social networks is a non-trivial task with great potential for practical
applications, gaining a notable interest in the recent years. The Clique
Percolation Method (CPM) is one of the earliest overlapping community finding
methods, which was already used in the analysis of several different social
networks. In this approach the communities correspond to k-clique percolation
clusters, and the general heuristic for setting the parameters of the method is
to tune the system just below the critical point of k-clique percolation.
However, this rule is based on simple physical principles and its validity was
never subject to quantitative analysis. Here we examine the quality of the
partitioning in the vicinity of the critical point using recently introduced
overlapping modularity measures. According to our results on real social- and
other networks, the overlapping modularities show a maximum close to the
critical point, justifying the original criteria for the optimal parameter
settings.Comment: 20 pages, 6 figure
- …