67 research outputs found

    The BioGRID Interaction Database: 2011 update

    Get PDF
    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions

    Finite Element Algorithms and Data Structures on Graphical Processing Units

    Get PDF
    The finite element method (FEM) is one of the most commonly used techniques for the solution of partial differential equations on unstructured meshes. This paper discusses both the assembly and the solution phases of the FEM with special attention to the balance of computation and data movement. We present a GPU assembly algorithm that scales to arbitrary degree polynomials used as basis functions, at the expense of redundant computations. We show how the storage of the stiffness matrix affects the performance of both the assembly and the solution. We investigate two approaches: global assembly into the CSR and ELLPACK matrix formats and matrix-free algorithms, and show the trade-off between the amount of indexing data and stiffness data. We discuss the performance of different approaches in light of the implicit caches on Fermi GPUs and show a speedup over a two-socket 12-core CPU of up to 10 times in the assembly and up to 6 times in the solution phase. We present our sparse matrix-vector multiplication algorithms that are part of a conjugate gradient iteration and show that a matrix-free approach may be up to two times faster than global assembly approaches and up to 4 times faster than NVIDIA’s cuSPARSE library, depending on the preconditioner used

    A human MAP kinase interactome.

    Get PDF
    Mitogen-activated protein kinase (MAPK) pathways form the backbone of signal transduction in the mammalian cell. Here we applied a systematic experimental and computational approach to map 2,269 interactions between human MAPK-related proteins and other cellular machinery and to assemble these data into functional modules. Multiple lines of evidence including conservation with yeast supported a core network of 641 interactions. Using small interfering RNA knockdowns, we observed that approximately one-third of MAPK-interacting proteins modulated MAPK-mediated signaling. We uncovered the Na-H exchanger NHE1 as a potential MAPK scaffold, found links between HSP90 chaperones and MAPK pathways and identified MUC12 as the human analog to the yeast signaling mucin Msb2. This study makes available a large resource of MAPK interactions and clone libraries, and it illustrates a methodology for probing signaling networks based on functional refinement of experimentally derived protein-interaction maps

    Reuse of structural domain–domain interactions in protein networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as <it>i</it>Pfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the <it>i</it>Pfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases.</p> <p>Results</p> <p>We find that known structural domain interactions can only explain a subset of 4–19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between <it>E. coli</it>, <it>S. cerevisiae </it>and <it>H. sapiens</it>. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor.</p> <p>Conclusion</p> <p>Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.</p

    Generating confidence intervals on biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available.</p> <p>Methods</p> <p>We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the <it>Saccharomyces cerevisiae </it>protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data.</p> <p>Results</p> <p>We use the protein interaction network of <it>S. cerevisiae</it>; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks.</p> <p>Conclusion</p> <p>An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.</p

    Network-based functional enrichment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many methods have been developed to infer and reason about molecular interaction networks. These approaches often yield networks with hundreds or thousands of nodes and up to an order of magnitude more edges. It is often desirable to summarize the biological information in such networks. A very common approach is to use gene function enrichment analysis for this task. A major drawback of this method is that it ignores information about the edges in the network being analyzed, i.e., it treats the network simply as a set of genes. In this paper, we introduce a novel method for functional enrichment that explicitly takes network interactions into account.</p> <p>Results</p> <p>Our approach naturally generalizes Fisher’s exact test, a gene set-based technique. Given a function of interest, we compute the subgraph of the network induced by genes annotated to this function. We use the sequence of sizes of the connected components of this sub-network to estimate its connectivity. We estimate the statistical significance of the connectivity empirically by a permutation test. We present three applications of our method: i) determine which functions are enriched in a given network, ii) given a network and an interesting sub-network of genes within that network, determine which functions are enriched in the sub-network, and iii) given two networks, determine the functions for which the connectivity improves when we merge the second network into the first. Through these applications, we show that our approach is a natural alternative to network clustering algorithms.</p> <p>Conclusions</p> <p>We presented a novel approach to functional enrichment that takes into account the pairwise relationships among genes annotated by a particular function. Each of the three applications discovers highly relevant functions. We used our methods to study biological data from three different organisms. Our results demonstrate the wide applicability of our methods. Our algorithms are implemented in C++ and are freely available under the GNU General Public License at our supplementary website. Additionally, all our input data and results are available at <url>http://bioinformatics.cs.vt.edu/~murali/supplements/2011-incob-nbe/</url>.</p

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p

    Facile whole mitochondrial genome resequencing from nipple aspirate fluid using MitoChip v2.0

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mutations in the mitochondrial genome (mtgenome) have been associated with many disorders, including breast cancer. Nipple aspirate fluid (NAF) from symptomatic women could potentially serve as a minimally invasive sample for breast cancer screening by detecting somatic mutations in this biofluid. This study is aimed at 1) demonstrating the feasibility of NAF recovery from symptomatic women, 2) examining the feasibility of sequencing the entire mitochondrial genome from NAF samples, 3) cross validation of the Human mitochondrial resequencing array 2.0 (MCv2), and 4) assessing the somatic mtDNA mutation rate in benign breast diseases as a potential tool for monitoring early somatic mutations associated with breast cancer.</p> <p>Methods</p> <p>NAF and blood were obtained from women with symptomatic benign breast conditions, and we successfully assessed the mutation load in the entire mitochondrial genome of 19 of these women. DNA extracts from NAF were sequenced using the mitochondrial resequencing array MCv2 and by capillary electrophoresis (CE) methods as a quality comparison. Sequencing was performed independently at two institutions and the results compared. The germline mtDNA sequence determined using DNA isolated from the patient's blood (control) was compared to the mutations present in cellular mtDNA recovered from patient's NAF.</p> <p>Results</p> <p>From the cohort of 28 women recruited for this study, NAF was successfully recovered from 23 participants (82%). Twenty two (96%) of the women produced fluids from both breasts. Twenty NAF samples and corresponding blood were chosen for this study. Except for one NAF sample, the whole mtgenome was successfully amplified using a single primer pair, or three pairs of overlapping primers. Comparison of MCv2 data from the two institutions demonstrates 99.200% concordance. Moreover, MCv2 data was 99.999% identical to CE sequencing, indicating that MCv2 is a reliable method to rapidly sequence the entire mtgenome. Four NAF samples contained somatic mutations.</p> <p>Conclusion</p> <p>We have demonstrated that NAF is a suitable material for mtDNA sequence analysis using the rapid and reliable MCv2. Somatic mtDNA mutations present in NAF of women with benign breast diseases could potentially be used as risk factors for progression to breast cancer, but this will require a much larger study with clinical follow up.</p

    Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products.</p> <p>Results</p> <p>We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance.</p> <p>Conclusions</p> <p>While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems.</p
    corecore