4,937 research outputs found
Techniques for automated parameter estimation in computational models of probabilistic systems
The main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complex stochastic systems such that the resultant model meets user-specified behavioral specifications. These algorithms are designed to operate on probabilistic systems – systems that, in general, behave differently under identical conditions. The algorithms work using an approach that combines formal verification and mathematical optimization to explore a model\u27s parameter space. The problem of determining whether a model instantiated with a given set of parameter values satisfies the desired specification is first defined using formal verification terminology, and then reformulated in terms of statistical hypothesis testing. Parameter space exploration involves determining the outcome of the hypothesis testing query for each parameter point and is guided using simulated annealing. The first algorithm uses the sequential probability ratio test (SPRT) to solve the hypothesis testing problems, whereas the second algorithm uses an approach based on Bayesian statistical model checking (BSMC). The SPRT-based parameter synthesis algorithm was used to validate that a given model of glucose-insulin metabolism has the capability of representing diabetic behavior by synthesizing values of three parameters that ensure that the glucose-insulin subsystem spends at least 20 minutes in a diabetic scenario. The BSMC-based algorithm was used to discover the values of parameters in a physiological model of the acute inflammatory response that guarantee a set of desired clinical outcomes. These two applications demonstrate how our algorithms use formal verification, statistical hypothesis testing and mathematical optimization to automatically synthesize parameters of complex probabilistic models in order to meet user-specified behavioral propertie
Inferential stability in systems biology
The modern biological sciences are fraught with statistical difficulties. Biomolecular
stochasticity, experimental noise, and the “large p, small n” problem all contribute to
the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful
conclusions from observations. In this thesis, we explore methods for assessing
the effects of data variability upon downstream inference, in an attempt to quantify and
promote the stability of the inferences we make.
We start with a review of existing methods for addressing this problem, focusing upon the
bootstrap and similar methods. The key requirement for all such approaches is a statistical
model that approximates the data generating process.
We move on to consider biomarker discovery problems. We present a novel algorithm for
proposing putative biomarkers on the strength of both their predictive ability and the stability
with which they are selected. In a simulation study, we find our approach to perform
favourably in comparison to strategies that select on the basis of predictive performance
alone.
We then consider the real problem of identifying protein peak biomarkers for HAM/TSP,
an inflammatory condition of the central nervous system caused by HTLV-1 infection.
We apply our algorithm to a set of SELDI mass spectral data, and identify a number of
putative biomarkers. Additional experimental work, together with known results from the
literature, provides corroborating evidence for the validity of these putative biomarkers.
Having focused on static observations, we then make the natural progression to time
course data sets. We propose a (Bayesian) bootstrap approach for such data, and then
apply our method in the context of gene network inference and the estimation of parameters
in ordinary differential equation models. We find that the inferred gene networks
are relatively unstable, and demonstrate the importance of finding distributions of ODE
parameter estimates, rather than single point estimates
The use of mixture density networks in the emulation of complex epidemiological individual-based models
Complex, highly-computational, individual-based models are abundant in epidemiology. For epidemics such as macro-parasitic diseases, detailed modelling of human behaviour and pathogen life-cycle are required in order to produce accurate results. This can often lead to models that are computationally-expensive to analyse and perform model fitting, and often require many simulation runs in order to build up sufficient statistics. Emulation can provide a more computationally-efficient output of the individual-based model, by approximating it using a statistical model. Previous work has used Gaussian processes (GPs) in order to achieve this, but these can not deal with multi-modal, heavy-tailed, or discrete distributions. Here, we introduce the concept of a mixture density network (MDN) in its application in the emulation of epidemiological models. MDNs incorporate both a mixture model and a neural network to provide a flexible tool for emulating a variety of models and outputs. We develop an MDN emulation methodology and demonstrate its use on a number of simple models incorporating both normal, gamma and beta distribution outputs. We then explore its use on the stochastic SIR model to predict the final size distribution and infection dynamics. MDNs have the potential to faithfully reproduce multiple outputs of an individual-based model and allow for rapid analysis from a range of users. As such, an open-access library of the method has been released alongside this manuscript
Systems Biology of Cancer: A Challenging Expedition for Clinical and Quantitative Biologists
A systems-biology approach to complex disease (such as cancer) is now complementing traditional experience-based approaches, which have typically been invasive and expensive. The rapid progress in biomedical knowledge is enabling the targeting of disease with therapies that are precise, proactive, preventive, and personalized. In this paper, we summarize and classify models of systems biology and model checking tools, which have been used to great success in computational biology and related fields. We demonstrate how these models and tools have been used to study some of the twelve biochemical pathways implicated in but not unique to pancreatic cancer, and conclude that the resulting mechanistic models will need to be further enhanced by various abstraction techniques to interpret phenomenological models of cancer progression
Programmable models of growth and mutation of cancer-cell populations
In this paper we propose a systematic approach to construct mathematical
models describing populations of cancer-cells at different stages of disease
development. The methodology we propose is based on stochastic Concurrent
Constraint Programming, a flexible stochastic modelling language. The
methodology is tested on (and partially motivated by) the study of prostate
cancer. In particular, we prove how our method is suitable to systematically
reconstruct different mathematical models of prostate cancer growth - together
with interactions with different kinds of hormone therapy - at different levels
of refinement.Comment: In Proceedings CompMod 2011, arXiv:1109.104
Bayesian Centroid Estimation for Motif Discovery
Biological sequences may contain patterns that are signal important
biomolecular functions; a classical example is regulation of gene expression by
transcription factors that bind to specific patterns in genomic promoter
regions. In motif discovery we are given a set of sequences that share a common
motif and aim to identify not only the motif composition, but also the binding
sites in each sequence of the set. We present a Bayesian model that is an
extended version of the model adopted by the Gibbs motif sampler, and propose a
new centroid estimator that arises from a refined and meaningful loss function
for binding site inference. We discuss the main advantages of centroid
estimation for motif discovery, including computational convenience, and how
its principled derivation offers further insights about the posterior
distribution of binding site configurations. We also illustrate, using
simulated and real datasets, that the centroid estimator can differ from the
maximum a posteriori estimator.Comment: 24 pages, 9 figure
Inferring Latent States and Refining Force Estimates via Hierarchical Dirichlet Process Modeling in Single Particle Tracking Experiments
Optical microscopy provides rich spatio-temporal information characterizing
in vivo molecular motion. However, effective forces and other parameters used
to summarize molecular motion change over time in live cells due to latent
state changes, e.g., changes induced by dynamic micro-environments,
photobleaching, and other heterogeneity inherent in biological processes. This
study focuses on techniques for analyzing Single Particle Tracking (SPT) data
experiencing abrupt state changes. We demonstrate the approach on GFP tagged
chromatids experiencing metaphase in yeast cells and probe the effective forces
resulting from dynamic interactions that reflect the sum of a number of
physical phenomena. State changes are induced by factors such as microtubule
dynamics exerting force through the centromere, thermal polymer fluctuations,
etc. Simulations are used to demonstrate the relevance of the approach in more
general SPT data analyses. Refined force estimates are obtained by adopting and
modifying a nonparametric Bayesian modeling technique, the Hierarchical
Dirichlet Process Switching Linear Dynamical System (HDP-SLDS), for SPT
applications. The HDP-SLDS method shows promise in systematically identifying
dynamical regime changes induced by unobserved state changes when the number of
underlying states is unknown in advance (a common problem in SPT applications).
We expand on the relevance of the HDP-SLDS approach, review the relevant
background of Hierarchical Dirichlet Processes, show how to map discrete time
HDP-SLDS models to classic SPT models, and discuss limitations of the approach.
In addition, we demonstrate new computational techniques for tuning
hyperparameters and for checking the statistical consistency of model
assumptions directly against individual experimental trajectories; the
techniques circumvent the need for "ground-truth" and subjective information.Comment: 25 pages, 6 figures. Differs only typographically from PLoS One
publication available freely as an open-access article at
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.013763
- …