66 research outputs found

    Dynamic Bayesian networks in molecular plant science: inferring gene regulatory networks from multiple gene expression time series

    Get PDF
    To understand the processes of growth and biomass production in plants, we ultimately need to elucidate the structure of the underlying regulatory networks at the molecular level. The advent of high-throughput postgenomic technologies has spurred substantial interest in reverse engineering these networks from data, and several techniques from machine learning and multivariate statistics have recently been proposed. The present article discusses the problem of inferring gene regulatory networks from gene expression time series, and we focus our exposition on the methodology of Bayesian networks. We describe dynamic Bayesian networks and explain their advantages over other statistical methods. We introduce a novel information sharing scheme, which allows us to infer gene regulatory networks from multiple sources of gene expression data more accurately. We illustrate and test this method on a set of synthetic data, using three different measures to quantify the network reconstruction accuracy. The main application of our method is related to the problem of circadian regulation in plants, where we aim to reconstruct the regulatory networks of nine circadian genes in Arabidopsis thaliana from four gene expression time series obtained under different experimental conditions

    Heterogeneous continuous dynamic Bayesian networks with flexible structure and inter-time segment information sharing

    Get PDF
    Classical dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption and cannot deal with heterogeneity and non-stationarity in temporal processes. Various approaches to relax the homogeneity assumption have recently been proposed. The present paper aims to improve the shortcomings of three recent versions of heterogeneous DBNs along the following lines: (i) avoiding the need for data discretization, (ii) increasing the flexibility over a time-invariant network structure, (iii) avoiding over-flexibility and overfitting by introducing a regularization scheme based in inter-time segment information sharing. The improved method is evaluated on synthetic data and compared with alternative published methods on gene expression time series from Drosophila melanogaster. 1

    Inference in complex biological systems with Gaussian processes and parallel tempering

    Get PDF
    Parameter inference in mathematical models of complex biological systems, expressed as coupled ordinary differential equations (ODEs), is a challenging problem. These depend on kinetic parameters, which cannot all be measured and have to be ascertained a different way. However, the computational costs associated with repeatedly solving the ODEs are often staggering, making many techniques impractical. Therefore, aimed at reducing this cost, new concepts using gradient matching have been proposed. This paper combines current adaptive gradient matching approaches, using Gaussian processes, with a parallel tempering scheme, in order to compare 2 different paradigms using the same nonlinear regression method. We use 2 ODE systems to assess our technique, showing an improvement over the recent method in Calderhead et al. (2008)

    ODE parameter inference using adaptive gradient matching with Gaussian processes

    Get PDF
    Parameter inference in mechanistic models based on systems of coupled differential equa- tions is a topical yet computationally chal- lenging problem, due to the need to fol- low each parameter adaptation with a nu- merical integration of the differential equa- tions. Techniques based on gradient match- ing, which aim to minimize the discrepancy between the slope of a data interpolant and the derivatives predicted from the differen- tial equations, offer a computationally ap- pealing shortcut to the inference problem. The present paper discusses a method based on nonparametric Bayesian statistics with Gaussian processes due to Calderhead et al. (2008), and shows how inference in this model can be substantially improved by consistently inferring all parameters from the joint dis- tribution. We demonstrate the efficiency of our adaptive gradient matching technique on three benchmark systems, and perform a de- tailed comparison with the method in Calder- head et al. (2008) and the explicit ODE inte- gration approach, both in terms of parameter inference accuracy and in terms of computa- tional efficiency

    Parameter inference in mechanistic models of cellular regulation and signalling pathways using gradient matching

    Get PDF
    A challenging problem in systems biology is parameter inference in mechanistic models of signalling pathways. In the present article, we investigate an approach based on gradient matching and nonparametric Bayesian modelling with Gaussian processes. We evaluate the method on two biological systems, related to the regulation of PIF4/5 in Arabidopsis thaliana, and the JAK/STAT signal transduction pathway

    TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops

    Get PDF
    Summary: TOPALi v2 simplifies and automates the use of several methods for the evolutionary analysis of multiple sequence alignments. Jobs are submitted from a Java graphical user interface as TOPALi web services to either run remotely on high-performance computing clusters or locally (with multiple cores supported). Methods available include model selection and phylogenetic tree estimation using the Bayesian inference and maximum likelihood (ML) approaches, in addition to recombination detection methods. The optimal substitution model can be selected for protein or nucleic acid (standard, or protein-coding using a codon position model) data using accurate statistical criteria derived from ML co-estimation of the tree and the substitution model. Phylogenetic software available includes PhyML, RAxML and MrBayes

    Reachability in Parametric Interval Markov Chains using Constraints

    Full text link
    Parametric Interval Markov Chains (pIMCs) are a specification formalism that extend Markov Chains (MCs) and Interval Markov Chains (IMCs) by taking into account imprecision in the transition probability values: transitions in pIMCs are labeled with parametric intervals of probabilities. In this work, we study the difference between pIMCs and other Markov Chain abstractions models and investigate the two usual semantics for IMCs: once-and-for-all and at-every-step. In particular, we prove that both semantics agree on the maximal/minimal reachability probabilities of a given IMC. We then investigate solutions to several parameter synthesis problems in the context of pIMCs -- consistency, qualitative reachability and quantitative reachability -- that rely on constraint encodings. Finally, we propose a prototype implementation of our constraint encodings with promising results

    Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees

    Get PDF
    Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination
    corecore