71 research outputs found

    Batch solution of small PDEs with the OPS DSL

    Get PDF
    In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs

    The pseudo-mitochondrial genome influences mistakes in heteroplasmy interpretation

    Get PDF
    BACKGROUND: Nuclear mitochondrial pseudogenes (numts) are a potential source of contamination during mitochondrial DNA PCR amplification. This possibility warrants careful experimental design and cautious interpretation of heteroplasmic results. RESULTS: Here we report the cloning and sequencing of numts loci, amplified from human tissue and rho-zero (ρ(0)) cells (control) with primers known to amplify the mitochondrial genome. This paper is the first to fully sequence 46 paralogous nuclear DNA fragments that represent the entire mitochondrial genome. This is a surprisingly small number due primarily to the primer sets used in this study, because prior to this, BLAST searches have suggested that nuclear DNA harbors between 400 to 1,500 paralogous mitochondrial DNA fragments. Our results indicate that multiple numts were amplified simultaneously with the mitochondrial genome and increased the load of pseudogene signal in PCR reactions. Further, the entire mitochondrial genome was represented by multiple copies of paralogous nuclear sequences. CONCLUSION: These findings suggest that mitochondrial genome disease-associated biomarkers must be rigorously authenticated to preclude any affiliation with paralogous nuclear pseudogenes. Importantly, the common perception that mitochondrial template "swamps" numts loci precluding detectable amplification, depends on the region of the mitochondrial genome targeted by the PCR reaction and the number of pseudogene loci that may co-amplify. Cloning and relevant sequencing data will facilitate the correct interpretation. This is the first complete, wet-lab characterization of numts that represent the entire mitochondrial genome

    Improved Network Performance via Antagonism: From Synthetic Rescues to Multi-drug Combinations

    Get PDF
    Recent research shows that a faulty or sub-optimally operating metabolic network can often be rescued by the targeted removal of enzyme-coding genes--the exact opposite of what traditional gene therapy would suggest. Predictions go as far as to assert that certain gene knockouts can restore the growth of otherwise nonviable gene-deficient cells. Many questions follow from this discovery: What are the underlying mechanisms? How generalizable is this effect? What are the potential applications? Here, I will approach these questions from the perspective of compensatory perturbations on networks. Relations will be drawn between such synthetic rescues and naturally occurring cascades of reaction inactivation, as well as their analogues in physical and other biological networks. I will specially discuss how rescue interactions can lead to the rational design of antagonistic drug combinations that select against resistance and how they can illuminate medical research on cancer, antibiotics, and metabolic diseases.Comment: Online Open "Problems and Paradigms" articl

    Generating confidence intervals on biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available.</p> <p>Methods</p> <p>We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the <it>Saccharomyces cerevisiae </it>protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data.</p> <p>Results</p> <p>We use the protein interaction network of <it>S. cerevisiae</it>; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks.</p> <p>Conclusion</p> <p>An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.</p

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p

    New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As protein interactions mediate most cellular mechanisms, protein-protein interaction networks are essential in the study of cellular processes. Consequently, several large-scale interactome mapping projects have been undertaken, and protein-protein interactions are being distilled into databases through literature curation; yet protein-protein interaction data are still far from comprehensive, even in the model organism <it>Saccharomyces cerevisiae</it>. Estimating the interactome size is important for evaluating the completeness of current datasets, in order to measure the remaining efforts that are required.</p> <p>Results</p> <p>We examined the yeast interactome from a new perspective, by taking into account how thoroughly proteins have been studied. We discovered that the set of literature-curated protein-protein interactions is qualitatively different when restricted to proteins that have received extensive attention from the scientific community. In particular, these interactions are less often supported by yeast two-hybrid, and more often by more complex experiments such as biochemical activity assays. Our analysis showed that high-throughput and literature-curated interactome datasets are more correlated than commonly assumed, but that this bias can be corrected for by focusing on well-studied proteins. We thus propose a simple and reliable method to estimate the size of an interactome, combining literature-curated data involving well-studied proteins with high-throughput data. It yields an estimate of at least 37, 600 direct physical protein-protein interactions in <it>S. cerevisiae</it>.</p> <p>Conclusions</p> <p>Our method leads to higher and more accurate estimates of the interactome size, as it accounts for interactions that are genuine yet difficult to detect with commonly-used experimental assays. This shows that we are even further from completing the yeast interactome map than previously expected.</p

    A genome-wide screen for essential yeast genes that affect telomere length maintenance

    Get PDF
    Telomeres are structures composed of repetitive DNA and proteins that protect the chromosomal ends in eukaryotic cells from fusion or degradation, thus contributing to genomic stability. Although telomere length varies between species, in all organisms studied telomere length appears to be controlled by a dynamic equilibrium between elongating mechanisms (mainly addition of repeats by the enzyme telomerase) and nucleases that shorten the telomeric sequences. Two previous studies have analyzed a collection of yeast deletion strains (deleted for nonessential genes) and found over 270 genes that affect telomere length (Telomere Length Maintenance or TLM genes). Here we complete the list of TLM by analyzing a collection of strains carrying hypomorphic alleles of most essential genes (DAmP collection). We identify 87 essential genes that affect telomere length in yeast. These genes interact with the nonessential TLM genes in a significant manner, and provide new insights on the mechanisms involved in telomere length maintenance. The newly identified genes span a variety of cellular processes, including protein degradation, pre-mRNA splicing and DNA replication

    Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products.</p> <p>Results</p> <p>We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance.</p> <p>Conclusions</p> <p>While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems.</p

    DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Charting the interactions among genes and among their protein products is essential for understanding biological systems. A flood of interaction data is emerging from high throughput technologies, computational approaches, and literature mining methods. Quick and efficient access to this data has become a critical issue for biologists. Several excellent multi-organism databases for gene and protein interactions are available, yet most of these have understandable difficulty maintaining comprehensive information for any one organism. No single database, for example, includes all available interactions, integrated gene expression data, and comprehensive and searchable gene information for the important model organism, <it>Drosophila melanogaster</it>.</p> <p>Description</p> <p>DroID, the <it>Drosophila </it>Interactions Database, is a comprehensive interactions database designed specifically for <it>Drosophila</it>. DroID houses published physical protein interactions, genetic interactions, and computationally predicted interactions, including interologs based on data for other model organisms and humans. All interactions are annotated with original experimental data and source information. DroID can be searched and filtered based on interaction information or a comprehensive set of gene attributes from Flybase. DroID also contains gene expression and expression correlation data that can be searched and used to filter datasets, for example, to focus a study on sub-networks of co-expressed genes. To address the inherent noise in interaction data, DroID employs an updatable confidence scoring system that assigns a score to each physical interaction based on the likelihood that it represents a biologically significant link.</p> <p>Conclusion</p> <p>DroID is the most comprehensive interactions database available for <it>Drosophila</it>. To facilitate downstream analyses, interactions are annotated with original experimental information, gene expression data, and confidence scores. All data in DroID are freely available and can be searched, explored, and downloaded through three different interfaces, including a text based web site, a Java applet with dynamic graphing capabilities (IM Browser), and a Cytoscape plug-in. DroID is available at <url>http://www.droidb.org</url>.</p

    Inferring the role of transcription factors in regulatory networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays.</p> <p>Results</p> <p>We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of <it>E. coli </it>extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to <it>S. cerevisiae </it>transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions.</p> <p>Conclusion</p> <p>Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.</p
    • 

    corecore