31 research outputs found

    Evolving proteins at Darwin's bicentenary

    Get PDF
    A report of the Biochemical Society/Wellcome Trust meeting 'Protein Evolution - Sequences, Structures and Systems', Hinxton, UK, 26-27 January 2009

    SYSBIONS: nested sampling for systems biology.

    Get PDF
    MOTIVATION: Model selection is a fundamental part of the scientific process in systems biology. Given a set of competing hypotheses, we routinely wish to choose the one that best explains the observed data. In the Bayesian framework, models are compared via Bayes factors (the ratio of evidences), where a model's evidence is the support given to the model by the data. A parallel interest is inferring the distribution of the parameters that define a model. Nested sampling is a method for the computation of a model's evidence and the generation of samples from the posterior parameter distribution. RESULTS: We present a C-based, GPU-accelerated implementation of nested sampling that is designed for biological applications. The algorithm follows a standard routine with optional extensions and additional features. We provide a number of methods for sampling from the prior subject to a likelihood constraint. AVAILABILITY AND IMPLEMENTATION: The software SYSBIONS is available from http://www.theosysbio.bio.ic.ac.uk/resources/sysbions/ CONTACT: [email protected], [email protected]

    A graph theoretical approach to data fusion.

    Get PDF
    The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets

    Topological sensitivity analysis for systems biology.

    Get PDF
    Mathematical models of natural systems are abstractions of much more complicated processes. Developing informative and realistic models of such systems typically involves suitable statistical inference methods, domain expertise, and a modicum of luck. Except for cases where physical principles provide sufficient guidance, it will also be generally possible to come up with a large number of potential models that are compatible with a given natural system and any finite amount of data generated from experiments on that system. Here we develop a computational framework to systematically evaluate potentially vast sets of candidate differential equation models in light of experimental and prior knowledge about biological systems. This topological sensitivity analysis enables us to evaluate quantitatively the dependence of model inferences and predictions on the assumed model structures. Failure to consider the impact of structural uncertainty introduces biases into the analysis and potentially gives rise to misleading conclusions

    Network motifs: structure does not determine function

    Get PDF
    BACKGROUND: A number of publications have recently examined the occurrence and properties of the feed-forward motif in a variety of networks, including those that are of interest in genome biology, such as gene networks. The present work looks in some detail at the dynamics of the bi-fan motif, using systems of ordinary differential equations to model the populations of transcription factors, mRNA and protein, with the aim of extending our understanding of what appear to be important building blocks of gene network structure. RESULTS: We develop an ordinary differential equation model of the bi-fan motif and analyse variants of the motif corresponding to its behaviour under various conditions. In particular, we examine the effects of different steady and pulsed inputs to five variants of the bifan motif, based on evidence in the literature of bifan motifs found in Saccharomyces cerevisiae (commonly known as baker's yeast). Using this model, we characterize the dynamical behaviour of the bi-fan motif for a wide range of biologically plausible parameters and configurations. We find that there is no characteristic behaviour for the motif, and with the correct choice of parameters and of internal structure, very different, indeed even opposite behaviours may be obtained. CONCLUSION: Even with this relatively simple model, the bi-fan motif can exhibit a wide range of dynamical responses. This suggests that it is difficult to gain significant insights into biological function simply by considering the connection architecture of a gene network, or its decomposition into simple structural motifs. It is necessary to supplement such structural information by kinetic parameters, or dynamic time series experimental data, both of which are currently difficult to obtain

    Cellular population dynamics control the robustness of the stem cell niche.

    Get PDF
    Within populations of cells, fate decisions are controlled by an indeterminate combination of cell-intrinsic and cell-extrinsic factors. In the case of stem cells, the stem cell niche is believed to maintain 'stemness' through communication and interactions between the stem cells and one or more other cell-types that contribute to the niche conditions. To investigate the robustness of cell fate decisions in the stem cell hierarchy and the role that the niche plays, we introduce simple mathematical models of stem and progenitor cells, their progeny and their interplay in the niche. These models capture the fundamental processes of proliferation and differentiation and allow us to consider alternative possibilities regarding how niche-mediated signalling feedback regulates the niche dynamics. Generalised stability analysis of these stem cell niche systems enables us to describe the stability properties of each model. We find that although the number of feasible states depends on the model, their probabilities of stability in general do not: stem cell-niche models are stable across a wide range of parameters. We demonstrate that niche-mediated feedback increases the number of stable steady states, and show how distinct cell states have distinct branching characteristics. The ecological feedback and interactions mediated by the stem cell niche thus lend (surprisingly) high levels of robustness to the stem and progenitor cell population dynamics. Furthermore, cell-cell interactions are sufficient for populations of stem cells and their progeny to achieve stability and maintain homeostasis. We show that the robustness of the niche - and hence of the stem cell pool in the niche - depends only weakly, if at all, on the complexity of the niche make-up: simple as well as complicated niche systems are capable of supporting robust and stable stem cell dynamics

    Statistical inference of the time-varying structure of gene-regulation networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions.</p> <p>Methods</p> <p>To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.).</p> <p>Results</p> <p>We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of <it>Drosophila melanogaster </it>and the response of <it>Saccharomyces cerevisiae </it>to benomyl poisoning.</p> <p>Conclusions</p> <p>ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.</p

    What the papers say: Text mining for genomics and systems biology

    Get PDF
    Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -- the automated extraction of information from (electronically) published sources -- could potentially fulfil an important role -- but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward

    Model selection in systems biology depends on experimental design.

    Get PDF
    Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model's predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis

    Balancing the robustness and predictive performance of biomarkers.

    Get PDF
    Recent studies have highlighted the importance of assessing the robustness of putative biomarkers identified from experimental data. This has given rise to the concept of stable biomarkers, which are ones that are consistently identified regardless of small perturbations to the data. Since stability is not by itself a useful objective, we present a number of strategies that combine assessments of stability and predictive performance in order to identify biomarkers that are both robust and diagnostically useful. Moreover, by wrapping these strategies around logistic regression classifiers regularized by the elastic net penalty, we are able to assess the effects of correlations between biomarkers upon their perceived stability. We use a synthetic example to illustrate the properties of our proposed strategies. In this example, we find that: (i) assessments of stability can help to reduce the number of false-positive biomarkers, although potentially at the cost of missing some true positives; (ii) combining assessments of stability with assessments of predictive performance can improve the true positive rate; and (iii) correlations between biomarkers can have adverse effects on their stability and hence must be carefully taken into account when undertaking biomarker discovery. We then apply our strategies in a proteomics context to identify a number of robust candidate biomarkers for the human disease HTLV1-associated myelopathy/tropical spastic paraparesis (HAM/TSP)
    corecore