204 research outputs found
Modelling transcriptional regulation with Gaussian processes
A challenging problem in systems biology is the quantitative modelling
of transcriptional regulation. Transcription factors (TFs), which are the
key proteins at the centre of the regulatory processes, may be subject
to post-translational modification, rendering them unobservable at the
mRNA level, or they may be controlled outside of the subsystem being
modelled. In both cases, a mechanistic model description of the regula-
tory system needs to be able to deal with latent activity profiles of the key
regulators. A promising approach to deal with these difficulties is based
on using Gaussian processes to define a prior distribution over the latent
TF activity profiles. Inference is based on the principles of non-parametric
Bayesian statistics, consistently inferring the posterior distribution of the
unknown TF activities from the observed expression levels of potential
target genes. The present work provides explicit solutions to the differ-
ential equations needed to model the data in this manner, as well as the
derivatives needed for effective optimisation. The work further explores
identifiability issues not fully shown in previous work and looks at how
this can cause difficulties with inference. We subsequently look at how the
method works on two different TFs, including looking at how the model
works with a more biologically realistic mechanistic model. Finally we
analyse the effect of more biologically realistic non-Gaussian noise on the
biologically realistic model showing how this can cause a reduction in the
accuracy of the inference
Statistical Modelling of Cell Movement
In this paper we demonstrate an application of the unscented Kalman filter in the context of cell movement, using a model defined in terms of stochastic differential equations (SDEs)
Gradient matching methods for computational inference in mechanistic models for systems biology: a review and comparative analysis
Parameter inference in mathematical models of biological pathways, expressed as coupled ordinary differential equations (ODEs), is a challenging problem in contemporary systems biology. Conventional methods involve repeatedly solving the ODEs by numerical integration, which is computationally onerous and does not scale up to complex systems. Aimed at reducing the computational costs, new concepts based on gradient matching have recently been proposed in the computational statistics and machine learning literature. In a preliminary smoothing step, the time series data are interpolated; then, in a second step, the parameters of the ODEs are optimised so as to minimise some metric measuring the difference between the slopes of the tangents to the interpolants, and the time derivatives from the ODEs. In this way, the ODEs never have to be solved explicitly. This review provides a concise methodological overview of the current state-of-the-art methods for gradient matching in ODEs, followed by an empirical comparative evaluation based on a set of widely used and representative benchmark data
Inference in Nonlinear Systems with Unscented Kalman Filters
An increasing number of scientific disciplines, most notably the life sciences and
health care, have become more quantitative, describing complex systems with coupled nonlinear
di↵erential equations. While powerful algorithms for numerical simulations from these systems
have been developed, statistical inference of the system parameters is still a challenging problem.
A promising approach is based on the unscented Kalman filter (UKF), which has seen
a variety of recent applications, from soft tissue mechanics to chemical kinetics. The present
study investigates the dependence of the accuracy of parameter estimation on the initialisation.
Based on three toy systems that capture typical features of real-world complex systems: limit
cycles, chaotic attractors and intrinsic stochasticity, we carry out repeated simulations on a large
range of independent data instantiations. Our study allows a quantification of the accuracy of
inference, measured in terms of two alternative distance measures in function and parameter
space, in dependence on the initial deviation from the ground truth
Addressing the shortcomings of three recent bayesian methods for detecting interspecific recombination in DNA sequence alignments
We address a potential shortcoming of three probabilistic models for detecting interspecific recombination in DNA sequence alignments: the multiple change-point model (MCP) of Suchard et al. (2003), the dual multiple change-point model (DMCP) of Minin et al. (2005), and the phylogenetic factorial hidden Markov model (PFHMM) of Husmeier (2005). These models are based on the Bayesian paradigm, which requires the solution of an integral over the space of branch lengths. To render this integration analytically tractable, all three models make the same assumption that the vectors of branch lengths of the phylogenetic tree are independent among sites. While this approximation reduces the computational complexity considerably, we show that it leads to the systematic prediction of spurious topology changes in the Felsenstein zone, that is, the area in the branch lengths configuration space where maximum parsimony consistently infers the wrong topology due to long-branch attraction. We apply two Bayesian hypothesis tests, based on an inter- and an intra-model approach to estimating the marginal likelihood. We then propose a revised model that addresses these shortcomings, and compare it with the aforementioned models on a set of synthetic DNA sequence alignments systematically generated around the Felsenstein zone
Network Reconstruction with Realistic Models
We extend a recently proposed gradient-matching method for inferring interactions in complex systems described by differential equations in various respects: improved gradient inference, evaluation of the influence of the prior on kinetic parameters, comparative evaluation of two model selection paradigms: marginal likelihood versus DIC (divergence information criterion), comparative evaluation of different numerical procedures for computing the marginal likelihood, extension of the methodology from protein phosphorylation to transcriptional regulation, based on a realistic simulation of the underlying molecular processes with Markov jump processes
Inference in Complex Systems Using Multi-Phase MCMC Sampling With Gradient Matching Burn-in
We propose a novel method for parameter inference that builds on the current research in gradient matching surrogate likelihood spaces. Adopting a three phase technique, we demonstrate that it is possible to obtain parameter estimates of limited bias whilst still adopting the paradigm of the computationally cheap surrogate approximation
Detection of recombination in DNA multiple alignments with hidden markov models
CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected
Controversy in mechanistic modelling with Gaussian processes
Parameter inference in mechanistic models based on non-affine differential equations is computationally onerous, and various faster alternatives based on gradient matching have been proposed. A particularly promising approach is based on nonparametric Bayesian modelling with Gaussian processes, which exploits the fact that a Gaussian process is closed under differentiation. However, two alternative paradigms have been proposed. The first paradigm, proposed at NIPS 2008 and AISTATS 2013, is based on a product of experts approach and a marginalization over the derivatives of the state variables. The second paradigm, proposed at ICML 2014, is based on a probabilistic generative model and a marginalization over the state variables. The claim has been made that this leads to better inference results. In the present article, we offer a new interpretation of the second paradigm, which highlights the underlying assumptions, approximations and limitations. In particular, we show that the second paradigm suffers from an intrinsic identifiability problem, which the first paradigm is not affected by
Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration
Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a thermodynamic integration scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods
- …