97 research outputs found
Literature curation of protein interactions: measuring agreement across major public databases
Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community
The BioGRID Interaction Database: 2011 update
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein
interaction data from model organisms and humans
(http://www.thebiogrid.org). BioGRID currently holds 347 966
interactions (170 162 genetic, 177 804 protein) curated from both
high-throughput data sets and individual focused studies, as derived
from over 23 000 publications in the primary literature. Complete
coverage of the entire literature is maintained for budding yeast
(Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe)
and thale cress (Arabidopsis thaliana), and efforts to expand curation
across multiple metazoan species are underway. The BioGRID houses 48
831 human protein interactions that have been curated from 10 247
publications. Current curation drives are focused on particular areas
of biology to enable insights into conserved networks and pathways that
are relevant to human health. The BioGRID 3.0 web interface contains
new search and display features that enable rapid queries across
multiple data types and sources. An automated Interaction Management
System (IMS) is used to prioritize, coordinate and track curation
across international sites and projects. BioGRID provides interaction
data to several model organism databases, resources such as Entrez-Gene
and other interaction meta-databases. The entire BioGRID 3.0 data
collection may be downloaded in multiple file formats, including PSI MI
XML. Source code for BioGRID 3.0 is freely available without any
restrictions
Recommended from our members
International Research Institute for Climate and Society
Although many natural disasters have hydro-meteorological antecedents, little advantage has been taken of the availability of weather and climate data, advanced diagnostics and seasonal predictions for disaster risk management. In this study, methodologies for use of hydro-meteorological data in hazard risk assessment are presented laying the ground work for future dynamic hazard predictions.
A high-resolution assessment of natural hazards, vulnerability to hazards and of multi- hazard disaster risk has been carried out for Sri Lanka. Drought, flood, cyclone and landslide hazards, and vulnerability were identified using data from Sri Lankan government agencies. Drought and flood prone areas were mapped using rainfall data that was gridded at a resolution of 10-km. Cyclone and landslide hazardousness were mapped based on long-term historical incidence data. Indices for regional industrial development, infrastructure development and agricultural production were estimated based on proxies. An assessment of regional food insecurity from the World Food Programme was used in the analysis. Records of emergency relief were used in estimating a spatial proxy for disaster risk. A multi-hazardousness map was developed for Sri Lanka. The hazardousness estimates for drought, floods, cyclones, landslides were weighted for their associated disaster risk with proxies for economic losses to provide a risk map or a hotspots map. Our principal findings are summarized below.
Useful hazard and vulnerability analysis can be carried out with the type of data that is available in-country. The hazardousness estimates for droughts, floods, cyclones and landslides show marked spatial variability. Vulnerability shows marked spatial variability as well. Thus, the resolution of analysis needs to match the resolution of spatial variations in relief, climate and other features. The higher resolution information is needed in planning and action for disaster management.
Multi-hazard analysis brought out regions of high risk in Sri Lanka such as the Kegalle and Ratnapura Districts in the South West and Ampara, Batticaloa, Trincomalee, Mullaitivu and Killinochchi districts in the North-East and the districts of Nuwara Eliya, Badulla, Ampara and Matale that contain some of the sharpest hill slopes of the central mountain massifs.
There is a distinct seasonality to risks posed by drought, floods, landslides and cyclones. Whereas the Eastern slopes regions have hotspots during the boreal fall and early winter, the Western slopes regions is risk prone in the summer and the early fall. Thus attention is warranted not only on Hot-Spots but also on “Hot-Seasons”.
Climate data was useful in estimating hazardousness in the case of droughts, floods and cyclones and for estimating flood and landslide risk. The methodologies presented here for hazard analysis of floods and droughts present an explicit link between climate and hazard. The results from this study coupled with the high-resolution seasonal climate prediction techniques developed in a related study point the way to using historical,
current and predictive climate information to inform disaster management policy, and early warning systems.
Climate, environmental and social change such as deforestation, urbanization and war affect the hazardousness and vulnerability. It is more difficult to quantify such changes rather than the baseline conditions.
Our analysis was carried out for a period since 1960 that included a period of civil war after 1983. This war affected the North-East of the island in particular. To put things in context, while natural disasters accounted for 1,483 fatalities in this period, the civil wars accounted for over 65,000. Wars and conflict poses complications for hazard and vulnerability analysis. Yet, the vulnerabilities created by the war make such efforts to reduce disaster risks all the more important.
Technical details of our work have been included in a case study published by the World Bank and in journals listed in the outputs
OrthoNets: simultaneous visual analysis of orthologs and their interaction neighborhoods across different organisms
Motivation: Protein interaction networks contain a wealth of biological information, but their large size often hinders cross-organism comparisons. We present OrthoNets, a Cytoscape plugin that displays protein–protein interaction (PPI) networks from two organisms simultaneously, highlighting orthology relationships and aggregating several types of biomedical annotations. OrthoNets also allows PPI networks derived from experiments to be overlaid on networks extracted from public databases, supporting the identification and verification of new interactors. Any newly identified PPIs can be validated by checking whether their orthologs interact in another organism
iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database
<p>Abstract</p> <p>Background</p> <p>The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment.</p> <p>Results</p> <p>The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL.</p> <p>The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis.</p> <p>Conclusions</p> <p>The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.</p
Reactome: a database of reactions, pathways and biological processes
Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice
A new, fast algorithm for detecting protein coevolution using maximum compatible cliques
<p>Abstract</p> <p>Background</p> <p>The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.</p> <p>Results</p> <p>In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.</p> <p>Conclusions</p> <p>MMMvII will thus allow for more more extensive and intricate analyses of coevolution.</p> <p>Availability</p> <p>An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p
Candidate gene prioritization by network analysis of differential expression using machine learning approaches
<p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p
Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping
<p>Abstract</p> <p>Background</p> <p>Protein-protein interaction (PPI) data are widely used to generate network models that aim to describe the relationships between proteins in biological systems. The fidelity and completeness of such networks is primarily limited by the paucity of protein interaction information and by the restriction of most of these data to just a few widely studied experimental organisms. In order to extend the utility of existing PPIs, computational methods can be used that exploit functional conservation between orthologous proteins across taxa to predict putative PPIs or 'interologs'. To date most interolog prediction efforts have been restricted to specific biological domains with fixed underlying data sources and there are no software tools available that provide a generalised framework for 'on-the-fly' interolog prediction.</p> <p>Results</p> <p>We introduce <monospace>Bio::Homology::InterologWalk</monospace>, a Perl module to retrieve, prioritise and visualise putative protein-protein interactions through an orthology-walk method. The module uses orthology and experimental interaction data to generate putative PPIs and optionally collates meta-data into an Interaction Prioritisation Index that can be used to help prioritise interologs for further analysis. We show the application of our interolog prediction method to the genomic interactome of the fruit fly, <it>Drosophila melanogaster</it>. We analyse the resulting interaction networks and show that the method proposes new interactome members and interactions that are candidates for future experimental investigation.</p> <p>Conclusions</p> <p>Our interolog prediction tool employs the Ensembl Perl API and PSICQUIC enabled protein interaction data sources to generate up to date interologs 'on-the-fly'. This represents a significant advance on previous methods for interolog prediction as it allows the use of the latest orthology and protein interaction data for all of the genomes in Ensembl. The module outputs simple text files, making it easy to customise the results by post-processing, allowing the putative PPI datasets to be easily integrated into existing analysis workflows. The <monospace>Bio::Homology::InterologWalk</monospace> module, sample scripts and full documentation are freely available from the Comprehensive Perl Archive Network (CPAN) under the GNU Public license.</p
SoftPanel: a website for grouping diseases and related disorders for generation of customized panels
- …