Search CORE

27 research outputs found

Modularity detection in protein-protein interaction networks

Author: Ananth Grama
Merril Gersten
Shankar Subramaniam
Tejaswini Narayanan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

BACKGROUND: Many recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable promise in the domain of modularity detection is the Newman and Girvan (NG) algorithm, which relies on the number of shortest-paths across pairs of vertices in the network traversing a given edge, referred to as the betweenness of that edge. The edge with the highest betweenness is iteratively eliminated from the network, with the betweenness of the remaining edges recalculated in every iteration. This generates a complete dendrogram, from which modules are extracted by applying a quality metric called modularity denoted by Q. This exhaustive computation can be prohibitively expensive for large networks such as Protein-Protein Interaction Networks. In this paper, we present a novel optimization to the modularity detection algorithm, in terms of an efficient termination criterion based on a target edge betweenness value, using which the process of iterative edge removal may be terminated. RESULTS: We validate the robustness of our approach by applying our algorithm on real-world protein-protein interaction networks of Yeast, C.Elegans and Drosophila, and demonstrate that our algorithm consistently has significant computational gains in terms of reduced runtime, when compared to the NG algorithm. Furthermore, our algorithm produces modules comparable to those from the NG algorithm, qualitatively and quantitatively. We illustrate this using comparison metrics such as module distribution, module membership cardinality, modularity Q, and Jaccard Similarity Coefficient. CONCLUSIONS: We have presented an optimized approach for efficient modularity detection in networks. The intuition driving our approach is the extraction of holistic measures of centrality from graphs, which are representative of inherent modular structure of the underlying network, and the application of those measures to efficiently guide the modularity detection process. We have empirically evaluated our approach in the specific context of real-world large scale biological networks, and have demonstrated significant savings in computational time while maintaining comparable quality of detected modules

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Efficient Algorithms for Node Disjoint Subgraph Homeomorphism Determination

Author: He Zhengying
Wang Wei
Wu Wentao
Xiao Yanghua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2007
Field of study

Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation between graphs is more suitable than (sub)graph isomorphism in many cases, especially in those cases that node skipping and node mismatching are allowed. However, no efficient node disjoint subgraph homeomorphism determination (ndSHD) algorithms have been available. In this paper, we propose two computationally efficient ndSHD algorithms based on state spaces searching with backtracking, which employ many heuristics to prune the search spaces. Experimental results on synthetic data sets show that the proposed algorithms are efficient, require relative little time in most of the testing cases, can scale to large or dense graphs, and can accommodate to more complex fuzzy matching cases.Comment: 15 pages, 11 figures, submitted to DASFAA 200

arXiv.org e-Print Archive

CiteSeerX

Multiple Hypothesis Testing in Pattern Discovery

Author: Garriga Gemma C.
Hanhijärvi Sami
Puolamäki Kai
Publication venue
Publication date: 01/01/2009
Field of study

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.Comment: 28 page

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Threshold-limited spreading in social networks with multiple initiators

Author: Korniss G.
Singh P.
Sreenivasan S.
Szymanski B. K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/06/2013
Field of study

A classical model for social-influence-driven opinion change is the threshold model. Here we study cascades of opinion change driven by threshold model dynamics in the case where multiple {\it initiators} trigger the cascade, and where all nodes possess the same adoption threshold

\phi

. Specifically, using empirical and stylized models of social networks, we study cascade size as a function of the initiator fraction

p

. We find that even for arbitrarily high value of

\phi

, there exists a critical initiator fraction

p_c(\phi)

beyond which the cascade becomes global. Network structure, in particular clustering, plays a significant role in this scenario. Similarly to the case of single-node or single-clique initiators studied previously, we observe that community structure within the network facilitates opinion spread to a larger extent than a homogeneous random network. Finally, we study the efficacy of different initiator selection strategies on the size of the cascade and the cascade window

arXiv.org e-Print Archive

CiteSeerX

Local Network Topology in Human Protein Interaction Data Predicts Functional Association

Author: A Hildebrand
A Vazquez
AD King
AJ Enright
AJ Walhout
AK Ramani
B Lehner
B Schwikowski
BA Hocevar
BJ Breitkreutz
C Stark
CT Chien
DS Goldberg
E Nabieva
F Li
GD Bader
H Jeong
Hua Li
J Geisler-Lee
J Laurikkala
J Rual
JA McMahon
JC Rain
JD Han
JG Abreu
L Giot
M Altaf-Ul-Amin
M Ashburner
M Deng
M Girvan
M Kanehisa
MP Samanta
N Przulj
P D'haeseleer
PK Datta
PK Datta
R Llewellyn
R Mazzarella
R Sharan
R Sharan
S Li
Shoudan Liang
Sridhar Hannenhalli
T Ito
TKB Gandhi
U Karaoz
U Stelzl
V Spirin
W Chen
Y Benjamini
Y Ikeda
Y Sun
Publication venue: Public Library of Science
Publication date: 29/07/2009
Field of study

The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-β signaling pathway (P<10−50). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era

Public Library of Science (PLOS)

Crossref

PubMed Central

Associating Genes and Protein Complexes with Disease via Network Propagation

Author: A Hamosh
AD D'Andrea
AJ Enright
C Perez-Iratxeta
D Levine
D Zhou
DL Wheeler
Eytan Ruppin
F Thorel
HG Brunner
JF Rual
K Lage
K Tan
KG Becker
L Franke
M Oti
M Oti
M Rebhan
MA van Driel
O Vanunu
Oded Magger
Oron Vanunu
R Ewing
R Sharan
R Sharan
R Sharan
RA George
Roded Sharan
S Karni
S Kohler
S Peri
Tomer Shlomi
U Stelzl
Wyeth W. Wasserman
X Wu
X Wu
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

Author: Patil Ashwini
Srihari Sriganesh
Wong Limsoon
Yong Chern Han
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

arXiv.org e-Print Archive

Elsevier - Publisher Connector

University of Queensland eSpace

Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping

Author: A Ceol
A Valencia
A Wiles
AJ Vilella
AJ Walhout
Andrew P Jarman
B Aranda
B Lehner
BJ Breitkreutz
C Prieto
CS Pedamallu
CT Hittinger
D Bray
D Figeys
D Kemmer
DJ LaCount
E Chautard
F He
G Gallone
Giuseppe Gallone
H Hegyi
H Yu
H Yu
HB Fraser
J Douglas Armstrong
J Goll
J Wojcik
JE Stajich
KR Brown
L Giot
L Matthews
LJ Jensen
LR Matthews
M Ashburner
M Michaut
M Persico
MD Adams
NJ Krogan
P Bork
P Flicek
P Kersey
P Shannon
PJ Kersey
R Sharan
RM Ewing
RT Fielding
S Kerrien
S Li
S Razick
S Wuchty
S Wuchty
T Berggård
T Ian Simpson
TKB Gandhi
TW Huang
TW Huang
U Stelzl
X He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein-protein interaction (PPI) data are widely used to generate network models that aim to describe the relationships between proteins in biological systems. The fidelity and completeness of such networks is primarily limited by the paucity of protein interaction information and by the restriction of most of these data to just a few widely studied experimental organisms. In order to extend the utility of existing PPIs, computational methods can be used that exploit functional conservation between orthologous proteins across taxa to predict putative PPIs or 'interologs'. To date most interolog prediction efforts have been restricted to specific biological domains with fixed underlying data sources and there are no software tools available that provide a generalised framework for 'on-the-fly' interolog prediction. Results We introduce <monospace>Bio::Homology::InterologWalk</monospace>, a Perl module to retrieve, prioritise and visualise putative protein-protein interactions through an orthology-walk method. The module uses orthology and experimental interaction data to generate putative PPIs and optionally collates meta-data into an Interaction Prioritisation Index that can be used to help prioritise interologs for further analysis. We show the application of our interolog prediction method to the genomic interactome of the fruit fly, <it>Drosophila melanogaster</it>. We analyse the resulting interaction networks and show that the method proposes new interactome members and interactions that are candidates for future experimental investigation. Conclusions Our interolog prediction tool employs the Ensembl Perl API and PSICQUIC enabled protein interaction data sources to generate up to date interologs 'on-the-fly'. This represents a significant advance on previous methods for interolog prediction as it allows the use of the latest orthology and protein interaction data for all of the genomes in Ensembl. The module outputs simple text files, making it easy to customise the results by post-processing, allowing the putative PPI datasets to be easily integrated into existing analysis workflows. The <monospace>Bio::Homology::InterologWalk</monospace> module, sample scripts and full documentation are freely available from the Comprehensive Perl Archive Network (CPAN) under the GNU Public license.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Finding -regulatory genes and protein complexes modulating meiotic recombination hotspots of human, mouse and yeast

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector