215 research outputs found

    Modelling transcriptional regulation with Gaussian processes

    Get PDF
    A challenging problem in systems biology is the quantitative modelling of transcriptional regulation. Transcription factors (TFs), which are the key proteins at the centre of the regulatory processes, may be subject to post-translational modification, rendering them unobservable at the mRNA level, or they may be controlled outside of the subsystem being modelled. In both cases, a mechanistic model description of the regula- tory system needs to be able to deal with latent activity profiles of the key regulators. A promising approach to deal with these difficulties is based on using Gaussian processes to define a prior distribution over the latent TF activity profiles. Inference is based on the principles of non-parametric Bayesian statistics, consistently inferring the posterior distribution of the unknown TF activities from the observed expression levels of potential target genes. The present work provides explicit solutions to the differ- ential equations needed to model the data in this manner, as well as the derivatives needed for effective optimisation. The work further explores identifiability issues not fully shown in previous work and looks at how this can cause difficulties with inference. We subsequently look at how the method works on two different TFs, including looking at how the model works with a more biologically realistic mechanistic model. Finally we analyse the effect of more biologically realistic non-Gaussian noise on the biologically realistic model showing how this can cause a reduction in the accuracy of the inference

    Statistical Modelling of Cell Movement

    Get PDF
    In this paper we demonstrate an application of the unscented Kalman filter in the context of cell movement, using a model defined in terms of stochastic differential equations (SDEs)

    Gradient matching methods for computational inference in mechanistic models for systems biology: a review and comparative analysis

    Get PDF
    Parameter inference in mathematical models of biological pathways, expressed as coupled ordinary differential equations (ODEs), is a challenging problem in contemporary systems biology. Conventional methods involve repeatedly solving the ODEs by numerical integration, which is computationally onerous and does not scale up to complex systems. Aimed at reducing the computational costs, new concepts based on gradient matching have recently been proposed in the computational statistics and machine learning literature. In a preliminary smoothing step, the time series data are interpolated; then, in a second step, the parameters of the ODEs are optimised so as to minimise some metric measuring the difference between the slopes of the tangents to the interpolants, and the time derivatives from the ODEs. In this way, the ODEs never have to be solved explicitly. This review provides a concise methodological overview of the current state-of-the-art methods for gradient matching in ODEs, followed by an empirical comparative evaluation based on a set of widely used and representative benchmark data

    Inference in Nonlinear Systems with Unscented Kalman Filters

    Get PDF
    An increasing number of scientific disciplines, most notably the life sciences and health care, have become more quantitative, describing complex systems with coupled nonlinear di↵erential equations. While powerful algorithms for numerical simulations from these systems have been developed, statistical inference of the system parameters is still a challenging problem. A promising approach is based on the unscented Kalman filter (UKF), which has seen a variety of recent applications, from soft tissue mechanics to chemical kinetics. The present study investigates the dependence of the accuracy of parameter estimation on the initialisation. Based on three toy systems that capture typical features of real-world complex systems: limit cycles, chaotic attractors and intrinsic stochasticity, we carry out repeated simulations on a large range of independent data instantiations. Our study allows a quantification of the accuracy of inference, measured in terms of two alternative distance measures in function and parameter space, in dependence on the initial deviation from the ground truth

    Network Reconstruction with Realistic Models

    Get PDF
    We extend a recently proposed gradient-matching method for inferring interactions in complex systems described by differential equations in various respects: improved gradient inference, evaluation of the influence of the prior on kinetic parameters, comparative evaluation of two model selection paradigms: marginal likelihood versus DIC (divergence information criterion), comparative evaluation of different numerical procedures for computing the marginal likelihood, extension of the methodology from protein phosphorylation to transcriptional regulation, based on a realistic simulation of the underlying molecular processes with Markov jump processes

    Controversy in mechanistic modelling with Gaussian processes

    Get PDF
    Parameter inference in mechanistic models based on non-affine differential equations is computationally onerous, and various faster alternatives based on gradient matching have been proposed. A particularly promising approach is based on nonparametric Bayesian modelling with Gaussian processes, which exploits the fact that a Gaussian process is closed under differentiation. However, two alternative paradigms have been proposed. The first paradigm, proposed at NIPS 2008 and AISTATS 2013, is based on a product of experts approach and a marginalization over the derivatives of the state variables. The second paradigm, proposed at ICML 2014, is based on a probabilistic generative model and a marginalization over the state variables. The claim has been made that this leads to better inference results. In the present article, we offer a new interpretation of the second paradigm, which highlights the underlying assumptions, approximations and limitations. In particular, we show that the second paradigm suffers from an intrinsic identifiability problem, which the first paradigm is not affected by

    Inference in Complex Systems Using Multi-Phase MCMC Sampling With Gradient Matching Burn-in

    Get PDF
    We propose a novel method for parameter inference that builds on the current research in gradient matching surrogate likelihood spaces. Adopting a three phase technique, we demonstrate that it is possible to obtain parameter estimates of limited bias whilst still adopting the paradigm of the computationally cheap surrogate approximation

    Detection of recombination in DNA multiple alignments with hidden markov models

    Get PDF
    CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected

    Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration

    Get PDF
    Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a thermodynamic integration scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods

    Multiphase MCMC sampling for parameter inference in nonlinear ordinary differential equations

    Get PDF
    Traditionally, ODE parameter inference relies on solving the system of ODEs and assessing fit of the estimated signal with the observations. However, nonlinear ODEs often do not permit closed form solutions. Using numerical methods to solve the equations results in prohibitive computational costs, particularly when one adopts a Bayesian approach in sampling parameters from a posterior distribution. With the introduction of gradient matching, we can abandon the need to numerically solve the system of equations. Inherent in these efficient procedures is an introduction of bias to the learning problem as we no longer sample based on the exact likelihood function. This paper presents a multiphase MCMC approach that attempts to close the gap between efficiency and accuracy. By sampling using a surrogate likelihood, we accelerate convergence to the stationary distribution before sampling using the exact likelihood. We demonstrate that this method combines the efficiency of gradient matching and the accuracy of the exact likelihood scheme
    • …
    corecore