Search CORE

Royal Holloway Research Online

SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale

Author: A Biegert
A Murzin
A Paccanaro
A Ruepp
AJ Enright
AJ Enright
Alberto Paccanaro
AY Ng
B Everitt
D Arthur
D Ballard
EL Hong
G Wang
JJ Forman
JM Chandonia
K Verkhedkar
LJ Jensen
M Ashburner
M Meilă
M Newman
N Kannan
O Krishnadev
P Pipenbacher
P Shannon
Rajkumar Sasidharan
RB Lehoucq
S van Dongen
SF Altschul
SF Altschul
T Fruchterman
Tamás Nepusz
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. Results SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast <it>Saccharomyces cerevisiae </it>(6,690 sequences). Conclusions Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at <url>http://www.paccanarolab.org/software/scps</url>.</p

Springer - Publisher Connector

Cerebral malaria: insights from host-parasite protein-protein interactions

Author: A Friedman
Aditya Rao
AS Weyrich
C Aurrecoechea
C Epp
CA Moxon
CH Brandts
D Camus
DN Männel
DR Drew
EA Nollen
ET Han
EW Sayers
F Lebrin
FM Omer
FP Davis
Gopalakrishnan Bulusu
HC van der Heyde
IM Medana
IM Medana
JA Lyon
JP Sokol
K Haldar
M Vignali
MA Mahdavi
Mayil K Kumar
MD Dyer
MD Hjelmeland
MS Oakley
NO Wilson
O Krishnadev
P Grellier
R Udomsangpetch
RW Mahley
S Frankland
S Ivens
SC Wassmer
SJ Chakravorty
SR Pavithra
T Okadome
T Spielmann
The UniProt Consortium
Thomas Joseph
TJ Hubbard
V Combes
WHO
Y David
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Cerebral malaria is a form of human malaria wherein <it>Plasmodium falciparum</it>-infected red blood cells adhere to the blood capillaries in the brain, potentially leading to coma and death. Interactions between parasite and host proteins are important in understanding the pathogenesis of this deadly form of malaria. It is, therefore, necessary to study available protein-protein interactions to identify lesser known interactions that could throw light on key events of cerebral malaria. Methods Sequestration, haemostasis dysfunction, systemic inflammation and neuronal damage are key processes of cerebral malaria. Key events were identified from literature as being crucial to these processes. An integrated interactome was created using available experimental and predicted datasets as well as from literature. Interactions from this interactome were filtered based on Gene Ontology and tissue-specific annotations, and further analysed for relevance to the key events. Results PfEMP1 presentation, platelet activation and astrocyte dysfunction were identified as the key events influencing the disease. 48896 host-parasite along with other host-parasite, host-host and parasite-parasite protein-protein interactions obtained from a disease-specific corpus were combined to form an integrated interactome. Filtering of the interactome resulted in five host-parasite PPI, six parasite-parasite and two host-host PPI. The analysis of these interactions revealed the potential significance of apolipoproteins and temperature/Hsp expression on efficient PfEMP1 presentation; role of MSP-1 in platelet activation; effect of parasite proteins in TGF-β regulation and the role of albumin in astrocyte dysfunction. Conclusions This work links key host-parasite, parasite-parasite and host-host protein-protein interactions to key processes of cerebral malaria and generates hypotheses for disease pathogenesis based on a filtered interaction dataset. These hypotheses provide novel and significant insights to cerebral malaria.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Metabolome Based Reaction Graphs of M. tuberculosis and M. leprae: A Comparative Network Analysis

Author: A Wagner
AL Barabási
C Kettner
CE Barry III
CM Sassetti
D Bu
D Park
DA Fell
E Eisenberg
E Ravasz
Emmanouil Dermitzakis
ER Gansner
H Jeong
HW Ma
HW Ma
J Gross
J Jeong
J Park
J van Helden
J Zhao
J-DJ Han
JA Papin
JS Edwards
K Raman
K Takayama
Karthik Raman
Ketki D. Verkhedkar
KM Hall
KV Brinda
L Felix
N Kannan
Nagasuma R. Chandra
NM Luscombe
O Krishnadev
P Holme
P Uetz
R Albert
R Guimera
RK Sistla
RW Floyd
S Bilke
S Goto
S Vishveshwara
S Wuchty
SA Becker
SA Rahman
Saraswathi Vishveshwara
SM Patra
SS Shen-Orr
ST Cole
TY Kim
TZ Sen
V Kunin
YI Wolf
Z Liang
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

BACKGROUND: Several types of networks, such as transcriptional, metabolic or protein-protein interaction networks of various organisms have been constructed, that have provided a variety of insights into metabolism and regulation. Here, we seek to exploit the reaction-based networks of three organisms for comparative genomics. We use concepts from spectral graph theory to systematically determine how differences in basic metabolism of organisms are reflected at the systems level and in the overall topological structures of their metabolic networks. METHODOLOGY/PRINCIPAL FINDINGS: Metabolome-based reaction networks of Mycobacterium tuberculosis, Mycobacterium leprae and Escherichia coli have been constructed based on the KEGG LIGAND database, followed by graph spectral analysis of the network to identify hubs as well as the sub-clustering of reactions. The shortest and alternate paths in the reaction networks have also been examined. Sub-cluster profiling demonstrates that reactions of the mycolic acid pathway in mycobacteria form a tightly connected sub-cluster. Identification of hubs reveals reactions involving glutamate to be central to mycobacterial metabolism, and pyruvate to be at the centre of the E. coli metabolome. The analysis of shortest paths between reactions has revealed several paths that are shorter than well established pathways. CONCLUSIONS: We conclude that severe downsizing of the leprae genome has not significantly altered the global structure of its reaction network but has reduced the total number of alternate paths between its reactions while keeping the shortest paths between them intact. The hubs in the mycobacterial networks that are absent in the human metabolome can be explored as potential drug targets. This work demonstrates the usefulness of constructing metabolome based networks of organisms and the feasibility of their analyses through graph spectral methods. The insights obtained from such studies provide a broad overview of the similarities and differences between organisms, taking comparative genomics studies to a higher dimension

Open Access Repository of IISc Research Publications

Functional clustering of yeast proteins from the protein-protein interaction network

Author: A Fernandez
A Kloczkowski
AC Gavin
AH Tong
AJ Morgan
AL Barabasi
AM Edwards
Andrzej Kloczkowski
AR Atilgan
AW Rives
BJ Breitkreutz
CL Barrett
D Gibson
DB Bu
E Anderson
E Malolepsza
E Yeger-Lotem
EH Davidson
G Shinar
GD Bader
H de Jong
H Jeong
H Salis
HW Mewes
HW Wu
I Bahar
I Farkas
I Salazar-Ciudad
IB Kuznetsov
J Hasty
J Ihmels
J-DJ Han
JB Pereira-Leal
JD Han
KJ Worsley
KP Rabitsch
KV Brinda
L Giot
LH Hartwell
M Duno
M Girvan
M Kaern
M Robinson
M Vidal
ME Wall
ME Wall
MY Lu
N Barkai
N Kannan
N Kashtan
N Kashtan
N Kashtan
ND Price
ND Price
NJ Krogan
O Keskin
O Keskin
O Krishnadev
P Aloy
P Aloy
P Doruker
P Doruker
P Uetz
PJ Flory
R Milo
R Milo
RK Sistla
Robert L Jernigan
S Asthana
S Itzkovitz
S Itzkovitz
S Itzkovitz
S Mangan
S Maslov
S Vishveshwara
SM Gomez
SM Patra
SS Shen-Orr
T Ito
T Ito
Taner Z Sen
TZ Sen
U Alon
U Alon
V Arnau
V Spirin
Y Ho
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The abundant data available for protein interaction networks have not yet been fully understood. New types of analyses are needed to reveal organizational principles of these networks to investigate the details of functional and regulatory clusters of proteins. RESULTS: In the present work, individual clusters identified by an eigenmode analysis of the connectivity matrix of the protein-protein interaction network in yeast are investigated for possible functional relationships among the members of the cluster. With our functional clustering we have successfully predicted several new protein-protein interactions that indeed have been reported recently. CONCLUSION: Eigenmode analysis of the entire connectivity matrix yields both a global and a detailed view of the network. We have shown that the eigenmode clustering not only is guided by the number of proteins with which each protein interacts, but also leads to functional clustering that can be applied to predict new protein interactions

Digital Repository @ Iowa State University (ISU)

Springer - Publisher Connector

Public Library of Science (PLOS)

Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures

Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.ph

Secretaría de Estado de Cultura

Digital.CSIC

Computational recognition and analysis of hitherto uncharacterized nucleotide cyclase-like proteins in bacteria

Author: A Krogh
A Mitchell
A Rauch
A Sali
Abha Jain
AD Ketkar
AR Shenoy
AR Shenoy
AR Shenoy
C UniProt
G Ramakrishnan
Gayatri Ramakrishnan
J Pei
JD Watson
JJ Tesmer
JJ Tesmer
JM Lew
K Tamura
LA Kelley
M Kanehisa
M Punta
Nagasuma Chandra
Narayanaswamy Srinivasan
O Krishnadev
PJ Kersey
R Overbeek
RC Edgar
RD Finn
RM Bennett-Lovsey
S Pronk
SB Pandit
SB Pandit
SC Sinha
SR Eddy
SV Date
T Tatusova
TN Petersen
X Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study