Search CORE

97 research outputs found

Literature curation of protein interactions: measuring agreement across major public databases

Author: A. L. Turinsky
B. Turner
Bader
Bader
Bader
Bairoch
Ceol
Charbonnier
Chien
Collins
Cusick
Feki
Gavin
Gavin
Guldener
Hermjakob
Ho
Howe
I. M. Donaldson
Jensen
Kerrien
Kleiman
Krogan
Kuhner
Lehner
Leitner
Lievens
Mons
Orchard
Peri
Prieto
Razick
Rual
S. J. Wodak
S. Razick
Salwinski
Salwinski
Stark
Tong
Uetz
von Mering
Publication venue: Oxford University Press
Publication date
Field of study

Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community

Crossref

PubMed Central

The BioGRID Interaction Database: 2011 update

Author: A. Chatr-aryamontri
A. Winter
B.-J. Breitkreutz
Behrends
Bork
Breitkreutz
Breitkreutz
C. Stark
Cline
Costanzo
Drabkin
Hertz-Fowler
Howe
J. M. Rust
J. Nixon
K. Dolinski
K. Van Auken
Kerrien
L. Boucher
Leitner
M. S. Livstone
M. Tyers
Mering
M ller
R. Oughtred
Razick
T. Reguly
Wiederkehr
X. Shi
X. Wang
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions

CiteSeerX

Crossref

PubMed Central

Edinburgh Research Explorer

Caltech Authors

Recommended from our members

International Research Institute for Climate and Society

Author: Chandimala J.
Lyon B.
Ralapanawe V.
Razick S.
Tennakoon U.
Yahiya Z.
Zubair L.
Publication venue
Publication date: 01/01/2012
Field of study

Although many natural disasters have hydro-meteorological antecedents, little advantage has been taken of the availability of weather and climate data, advanced diagnostics and seasonal predictions for disaster risk management. In this study, methodologies for use of hydro-meteorological data in hazard risk assessment are presented laying the ground work for future dynamic hazard predictions. A high-resolution assessment of natural hazards, vulnerability to hazards and of multi- hazard disaster risk has been carried out for Sri Lanka. Drought, flood, cyclone and landslide hazards, and vulnerability were identified using data from Sri Lankan government agencies. Drought and flood prone areas were mapped using rainfall data that was gridded at a resolution of 10-km. Cyclone and landslide hazardousness were mapped based on long-term historical incidence data. Indices for regional industrial development, infrastructure development and agricultural production were estimated based on proxies. An assessment of regional food insecurity from the World Food Programme was used in the analysis. Records of emergency relief were used in estimating a spatial proxy for disaster risk. A multi-hazardousness map was developed for Sri Lanka. The hazardousness estimates for drought, floods, cyclones, landslides were weighted for their associated disaster risk with proxies for economic losses to provide a risk map or a hotspots map. Our principal findings are summarized below. Useful hazard and vulnerability analysis can be carried out with the type of data that is available in-country. The hazardousness estimates for droughts, floods, cyclones and landslides show marked spatial variability. Vulnerability shows marked spatial variability as well. Thus, the resolution of analysis needs to match the resolution of spatial variations in relief, climate and other features. The higher resolution information is needed in planning and action for disaster management. Multi-hazard analysis brought out regions of high risk in Sri Lanka such as the Kegalle and Ratnapura Districts in the South West and Ampara, Batticaloa, Trincomalee, Mullaitivu and Killinochchi districts in the North-East and the districts of Nuwara Eliya, Badulla, Ampara and Matale that contain some of the sharpest hill slopes of the central mountain massifs. There is a distinct seasonality to risks posed by drought, floods, landslides and cyclones. Whereas the Eastern slopes regions have hotspots during the boreal fall and early winter, the Western slopes regions is risk prone in the summer and the early fall. Thus attention is warranted not only on Hot-Spots but also on “Hot-Seasons”. Climate data was useful in estimating hazardousness in the case of droughts, floods and cyclones and for estimating flood and landslide risk. The methodologies presented here for hazard analysis of floods and droughts present an explicit link between climate and hazard. The results from this study coupled with the high-resolution seasonal climate prediction techniques developed in a related study point the way to using historical, current and predictive climate information to inform disaster management policy, and early warning systems. Climate, environmental and social change such as deforestation, urbanization and war affect the hazardousness and vulnerability. It is more difficult to quantify such changes rather than the baseline conditions. Our analysis was carried out for a period since 1960 that included a period of civil war after 1983. This war affected the North-East of the island in particular. To put things in context, while natural disasters accounted for 1,483 fatalities in this period, the civil wars accounted for over 65,000. Wars and conflict poses complications for hazard and vulnerability analysis. Yet, the vulnerabilities created by the war make such efforts to reduce disaster risks all the more important. Technical details of our work have been included in a case study published by the World Bank and in journals listed in the outputs

Columbia University Academic Commons

OrthoNets: simultaneous visual analysis of orthologs and their interaction neighborhoods across different organisms

Author: A. L. Turinsky
A. Merkoulovitch
B. Turner
Bateman
Costanzo
D. Roudeva
Gavin
J. Greenblatt
J. Vlasblom
Kouzarides
Krogan
O'Brien
Razick
S. J. Wodak
S. Pu
Shannon
Stelzl
Y. Hao
Yu
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Protein interaction networks contain a wealth of biological information, but their large size often hinders cross-organism comparisons. We present OrthoNets, a Cytoscape plugin that displays protein–protein interaction (PPI) networks from two organisms simultaneously, highlighting orthology relationships and aggregating several types of biomedical annotations. OrthoNets also allows PPI networks derived from experiments to be overlaid on networks extracted from public databases, supporting the identification and verification of new interactors. Any newly identified PPIs can be validated by checking whether their orthologs interact in another organism

Crossref

PubMed Central

iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database

Author: A Ceol
A Clauset
A Ruepp
A Stojmirovic
AL Turinsky
Antonio Mora
B Aranda
B Turner
C Alfarano
C Stark
G Csardi
GD Bader
I Xenarios
Ian M Donaldson
J Yu
KR Brown
P Braun
P Pagel
RM Ewing
S Kerrien
S Razick
TS Keshava Prasad
U Guldener
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. Results The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL. The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. Conclusions The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Reactome: a database of reactions, pathways and biological processes

Author: B. Jassal
B. May
C. Yung
Chen
D. Croft
Demir
Dutta
E. Birney
E. Schmidt
Frazer
Funahashi
G. Gopinath
G. O'Kelly
G. Wu
H. Hermjakob
I. Kalatskaya
Irwin
Jain
Kerrien
Killcoyne
L. Matthews
L. Stein
M. Caudy
M. Gillespie
Montecchi-Palazzi
N. Ndegwa
Novere
P. D'Eustachio
P. Garapati
Pico
R. Haw
Razick
S. Jupe
S. Mahajan
Sherry
V. Shamovsky
Warr
Wiegers
Wu
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice

CiteSeerX

City University of New York

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

Author: A Rodionov
A Valencia
AK Ramani
Alex Rodionov
Alexandr Bezginov
AM Altenhoff
D MacLeod
D Robinson
Elisabeth RM Tillier
ERM Tillier
ERM Tillier
F Pazos
F Pazos
GW Clark
J Felsenstein
J Felsenstein
Jonathan Rose
K Katoh
MK Kuhner
PRJ Östergård
R Jothi
RG Beiko
RM Karp
S Razick
T Sato
V Soria-Carrasco
W Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time. Results In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM. Conclusions MMMvII will thus allow for more more extensive and intricate analyses of coevolution. Availability An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

Author: A Subramanian
A Zanzoni
AJ Smola
AP Francisco
B Aranda
B Harr
Bart de Moor
C Saunders
C Stark
C von Mering
D Nitsch
D Zieker
Daniela Nitsch
F Chung
F Fouss
Fabian Ojeda
GC Cawley
GD Bader
H Yang
HY Chuang
J Chen
JA Hanley
Joana P Gonçalves
JW Park
K Lage
KR Brown
L Franke
L Gautier
L Salwinski
LC Tranchevent
M Liu
P Baldi
P Pagel
R Gupta
RA Irizarry
RI Kondor
RK Nibbe
S Aerts
S Köhler
S Mirkin
S Razick
S Vardhanabhuti
SE Choe
T Fawcett
WK Lim
Y Saad
Yves Moreau
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping

Author: A Ceol
A Valencia
A Wiles
AJ Vilella
AJ Walhout
Andrew P Jarman
B Aranda
B Lehner
BJ Breitkreutz
C Prieto
CS Pedamallu
CT Hittinger
D Bray
D Figeys
D Kemmer
DJ LaCount
E Chautard
F He
G Gallone
Giuseppe Gallone
H Hegyi
H Yu
H Yu
HB Fraser
J Douglas Armstrong
J Goll
J Wojcik
JE Stajich
KR Brown
L Giot
L Matthews
LJ Jensen
LR Matthews
M Ashburner
M Michaut
M Persico
MD Adams
NJ Krogan
P Bork
P Flicek
P Kersey
P Shannon
PJ Kersey
R Sharan
RM Ewing
RT Fielding
S Kerrien
S Li
S Razick
S Wuchty
S Wuchty
T Berggård
T Ian Simpson
TKB Gandhi
TW Huang
TW Huang
U Stelzl
X He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein-protein interaction (PPI) data are widely used to generate network models that aim to describe the relationships between proteins in biological systems. The fidelity and completeness of such networks is primarily limited by the paucity of protein interaction information and by the restriction of most of these data to just a few widely studied experimental organisms. In order to extend the utility of existing PPIs, computational methods can be used that exploit functional conservation between orthologous proteins across taxa to predict putative PPIs or 'interologs'. To date most interolog prediction efforts have been restricted to specific biological domains with fixed underlying data sources and there are no software tools available that provide a generalised framework for 'on-the-fly' interolog prediction. Results We introduce <monospace>Bio::Homology::InterologWalk</monospace>, a Perl module to retrieve, prioritise and visualise putative protein-protein interactions through an orthology-walk method. The module uses orthology and experimental interaction data to generate putative PPIs and optionally collates meta-data into an Interaction Prioritisation Index that can be used to help prioritise interologs for further analysis. We show the application of our interolog prediction method to the genomic interactome of the fruit fly, <it>Drosophila melanogaster</it>. We analyse the resulting interaction networks and show that the method proposes new interactome members and interactions that are candidates for future experimental investigation. Conclusions Our interolog prediction tool employs the Ensembl Perl API and PSICQUIC enabled protein interaction data sources to generate up to date interologs 'on-the-fly'. This represents a significant advance on previous methods for interolog prediction as it allows the use of the latest orthology and protein interaction data for all of the genomes in Ensembl. The module outputs simple text files, making it easy to customise the results by post-processing, allowing the putative PPI datasets to be easily integrated into existing analysis workflows. The <monospace>Bio::Homology::InterologWalk</monospace> module, sample scripts and full documentation are freely available from the Comprehensive Perl Archive Network (CPAN) under the GNU Public license.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer