Search CORE

581 research outputs found

Seeding for pervasively overlapping communities

Author: A. Mislove
Aaron McDaid
C. Lee
Conrad Lee
F. Luo
Fergal Reid
J. Baumes
Neil Hurley
R. Andersen
Publication venue: 'American Physical Society (APS)'
Publication date: 29/04/2011
Field of study

In some social and biological networks, the majority of nodes belong to multiple communities. It has recently been shown that a number of the algorithms that are designed to detect overlapping communities do not perform well in such highly overlapping settings. Here, we consider one class of these algorithms, those which optimize a local fitness measure, typically by using a greedy heuristic to expand a seed into a community. We perform synthetic benchmarks which indicate that an appropriate seeding strategy becomes increasingly important as the extent of community overlap increases. We find that distinct cliques provide the best seeds. We find further support for this seeding strategy with benchmarks on a Facebook network and the yeast interactome.Comment: 8 Page

arXiv.org e-Print Archive

Crossref

Detecting highly overlapping community structure by greedy clique expansion

Author: Hurley Neil
Lee Conrad
McDaid Aaron
Reid Fergal
Publication venue
Publication date: 01/01/2010
Field of study

In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at http://sites.google.com/site/greedycliqueexpansion

arXiv.org e-Print Archive

CiteSeerX

Research Repository UCD

Irish Universities

Uncovering the overlapping community structure of complex networks in nature and society

Author: A-L Barabási
AC Gavin
BS Everitt
C Song
DJ Watts
DJ Watts
E Ravasz
EI Boyle
F Radicchi
Gergely Palla
I Derényi
I Xenarios
Illés Farkas
Imre Derényi
J Scott
J-P Onnela
JFF Mendes
JM Cherry
K Faust
M Blatt
M Girvan
MEJ Newman
MEJ Newman
MG Everett
R Albert
RM Shiffrin
S Knudsen
S Kosub
S Warner
T Vicsek
Tamás Vicsek
V Spirin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2005
Field of study

Many complex systems in nature and society can be described in terms of networks capturing the intricate web of connections among the units they are made of. A key question is how to interpret the global organization of such networks as the coexistence of their structural subunits (communities) associated with more highly interconnected parts. Identifying these a priori unknown building blocks (such as functionally related proteins, industrial sectors and groups of people) is crucial to the understanding of the structural and functional properties of networks. The existing deterministic methods used for large networks find separated communities, whereas most of the actual networks are made of highly overlapping cohesive groups of nodes. Here we introduce an approach to analysing the main statistical features of the interwoven sets of overlapping communities that makes a step towards uncovering the modular structure of complex systems. After defining a set of new characteristic quantities for the statistics of communities, we apply an efficient technique for exploring overlapping communities on a large scale. We find that overlaps are significant, and the distributions we introduce reveal universal features of networks. Our studies of collaboration, word-association and protein interaction graphs show that the web of communities has non-trivial correlations and specific scaling properties.Comment: The free academic research software, CFinder, used for the publication is available at the website of the publication: http://angel.elte.hu/clusterin

arXiv.org e-Print Archive

Crossref

CERN Document Server

Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology

Author: Leong Hon Wai
Nesvizhskii Alexey I.
Ng Hoong Kee
Ning Kang
Srihari Sriganesh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract Background In many protein-protein interaction (PPI) networks, densely connected hub proteins are more likely to be essential proteins. This is referred to as the "centrality-lethality rule", which indicates that the topological placement of a protein in PPI network is connected with its biological essentiality. Though such connections are observed in many PPI networks, the underlying topological properties for these connections are not yet clearly understood. Some suggested putative connections are the involvement of essential proteins in the maintenance of overall network connections, or that they play a role in essential protein clusters. In this work, we have attempted to examine the placement of essential proteins and the network topology from a different perspective by determining the correlation of protein essentiality and reverse nearest neighbor topology (RNN). Results The RNN topology is a weighted directed graph derived from PPI network, and it is a natural representation of the topological dependences between proteins within the PPI network. Similar to the original PPI network, we have observed that essential proteins tend to be hub proteins in RNN topology. Additionally, essential genes are enriched in clusters containing many hub proteins in RNN topology (RNN protein clusters). Based on these two properties of essential genes in RNN topology, we have proposed a new measure; the RNN cluster centrality. Results from a variety of PPI networks demonstrate that RNN cluster centrality outperforms other centrality measures with regard to the proportion of selected proteins that are essential proteins. We also investigated the biological importance of RNN clusters. Conclusions This study reveals that RNN cluster centrality provides the best correlation of protein essentiality and placement of proteins in PPI network. Additionally, merged RNN clusters were found to be topologically important in that essential proteins are significantly enriched in RNN clusters, and biologically important because they play an important role in many Gene Ontology (GO) processes.http://deepblue.lib.umich.edu/bitstream/2027.42/78257/1/1471-2105-11-505.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/2/1471-2105-11-505-S1.DOChttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/3/1471-2105-11-505.pdfPeer Reviewe

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Blue Documents at the University of Michigan

ScholarBank@NUS

Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

Author: Patil Ashwini
Srihari Sriganesh
Wong Limsoon
Yong Chern Han
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

arXiv.org e-Print Archive

Elsevier - Publisher Connector

University of Queensland eSpace

Efficient and accurate greedy search methods for mining functional modules in protein interaction networks

Author: A Gavin
B Adamcsek
Baoliu Ye
BS Everitt
C Brun
Chaojun Li
DJ Watts
F Luo
F Radicchi
G Palla
GD Bader
H Jeong
H Leung
HW Mewes
I Xenarios
J Wang
J Wang
J Wang
Jieyue He
L Gao
LF Wu
M Altaf-Ul-Amin
M Girvan
M Li
M Li
M Wu
MEJ Newman
SH Jung
SS Dwight
V Spirin
Wei Zhong
X Li
YR Cho
Z Dezso
Publication venue: BioMed Central
Publication date: 01/06/2012
Field of study

Abstract Background Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. Methods In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. Results The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Conclusions Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks

Author: Chen Weijie
Li Min
Pan Yi
Wang Jianxin
Wu Fang-Xiang
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2014
Field of study

Identification of protein complexes fromprotein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived.The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures

ScholarWorks @ Georgia State University

Directory of Open Access Journals

PubMed Central

Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks

Author: Lin Hongfei
Wang Jian
Xie Dong
Yang Zhihao
Zhang Yijia
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Overlapping modularity at the critical point of k-clique percolation

Author: Palla Gergely
Toth Balint
Vicsek Tamas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2012
Field of study

One of the most remarkable social phenomena is the formation of communities in social networks corresponding to families, friendship circles, work teams, etc. Since people usually belong to several different communities at the same time, the induced overlaps result in an extremely complicated web of the communities themselves. Thus, uncovering the intricate community structure of social networks is a non-trivial task with great potential for practical applications, gaining a notable interest in the recent years. The Clique Percolation Method (CPM) is one of the earliest overlapping community finding methods, which was already used in the analysis of several different social networks. In this approach the communities correspond to k-clique percolation clusters, and the general heuristic for setting the parameters of the method is to tune the system just below the critical point of k-clique percolation. However, this rule is based on simple physical principles and its validity was never subject to quantitative analysis. Here we examine the quality of the partitioning in the vicinity of the critical point using recently introduced overlapping modularity measures. According to our results on real social- and other networks, the overlapping modularities show a maximum close to the critical point, justifying the original criteria for the optimal parameter settings.Comment: 20 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)