178 research outputs found
Inferring Latent States and Refining Force Estimates via Hierarchical Dirichlet Process Modeling in Single Particle Tracking Experiments
Optical microscopy provides rich spatio-temporal information characterizing
in vivo molecular motion. However, effective forces and other parameters used
to summarize molecular motion change over time in live cells due to latent
state changes, e.g., changes induced by dynamic micro-environments,
photobleaching, and other heterogeneity inherent in biological processes. This
study focuses on techniques for analyzing Single Particle Tracking (SPT) data
experiencing abrupt state changes. We demonstrate the approach on GFP tagged
chromatids experiencing metaphase in yeast cells and probe the effective forces
resulting from dynamic interactions that reflect the sum of a number of
physical phenomena. State changes are induced by factors such as microtubule
dynamics exerting force through the centromere, thermal polymer fluctuations,
etc. Simulations are used to demonstrate the relevance of the approach in more
general SPT data analyses. Refined force estimates are obtained by adopting and
modifying a nonparametric Bayesian modeling technique, the Hierarchical
Dirichlet Process Switching Linear Dynamical System (HDP-SLDS), for SPT
applications. The HDP-SLDS method shows promise in systematically identifying
dynamical regime changes induced by unobserved state changes when the number of
underlying states is unknown in advance (a common problem in SPT applications).
We expand on the relevance of the HDP-SLDS approach, review the relevant
background of Hierarchical Dirichlet Processes, show how to map discrete time
HDP-SLDS models to classic SPT models, and discuss limitations of the approach.
In addition, we demonstrate new computational techniques for tuning
hyperparameters and for checking the statistical consistency of model
assumptions directly against individual experimental trajectories; the
techniques circumvent the need for "ground-truth" and subjective information.Comment: 25 pages, 6 figures. Differs only typographically from PLoS One
publication available freely as an open-access article at
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.013763
Spatio-temporal inference for circadian gene transcription in the mammalian SCN
Almost all life on earth exhibit circadian rhythms of behaviours that are tied to the natural day and night cycle. In mammals, the suprachiasmatic nucleus (SCN) is responsible for generating and communicating these rhythms to peripheral tissues. The neurons of the SCN function as noisy molecular clocks, expressing circadian genes in an oscillatory fashion over the course of 24 hours through a transcriptional/translational feedback loop (TTFL). The cells synchronise to form a robust clock, capable of exact timekeeping and entrainment to external stimuli, e.g. light, via intercellular signalling. This thesis investigates spatio-temporal inference for stochastic models of the TTFL, motivated by the availability of high-resolution bioimaging data of core circadian genes Period and Cryptochrome from mouse SCN.
We begin by introducing the mammalian clock and SCN bioimaging data. We then cover various methodologies for mechanistic and stochastic modelling of gene transcription, including chemical reaction networks, the chemical Langevin equation, and Markov chain Monte Carlo methods for Bayesian inference. We derive stability criteria for a model of the single-cell TTFL that describes transcriptional inhibition through a distributed delay. The model is fitted to imaging data of the gene Cry1, which allows us to infer the dynamics of circadian gene transcription and molecular population sizes.
A Bayesian hierarchical framework is developed to model spatial dependencies observed in the parameter estimates of the single-cell model. The methodology is applied to bioimaging data of the Cry1-gene and the analysis tools are developed further by deriving a Bayesian period estimator and an inhibition profile which allow us to study the spatial distribution of key properties of the TTFL across SCN tissue.
Finally, the methodology is extended to include an additional molecular species that captures transcriptional activation. This extension confers a mechanistic spatial interpretation to the model by describing the effect of intercellular signalling. By eliciting informative prior distributions for parameters of the circadian Per2 feedback loop, we are able to fit the model to simultaneous recordings of Per2 and calcium. The model fit represents a first step in obtaining a complete model of both single-cell and organ-wide dynamics with empirically estimated parameters
Novel descriptive and model based statistical approaches in immunology and signal transduction
Biological systems are usually complex nonlinear systems of which
we only have a limited understanding. Here we show three different
aspects of investigating such systems. We present a method to extract
detailed knowledge from typical biological trajectory data, which have
randomness as a main characteristic. The migration of immune cells,
such as leukocytes, are a key example of our study. The application of
our methodology leads to the discovery of novel random walk behaviour
of leukocyte migration.
Furthermore we use the gathered knowledge to construct the under-
lying mathematical model that captures the behaviour of leukocytes, or
more precisely macrophages and neutrophils, under acute injury. Any
model of a biological system has little predictive power if it is not compared to collected data. We present a pipeline of how complex spatio-
temporal trajectory data can be used to calibrate our model of leukocyte
migration. The pipeline employs approximate methods in a Bayesian
framework. Using the same approach we are able to learn additional information about the underlying signalling network, which is not directly
apparent in the cell migration data.
While these two methods can be seen as data processing and analysis,
we show in the last part of this work how to assess the information
content of experiments. The choice of an experiment with the highest
information content out of a set of possible experiments leads us to the
problem of optimal experimental design. We develop and implement an
algorithm for simulation based Bayesian experimental design in order
to learn parameters of a given model. We validate our algorithm with
the help of toy examples and apply it to examples in immunology (Hes1
transcription regulation) and signal transduction (growth factor induced
MAPK pathway)
Measuring confidence of missing data estimation for HIV classification
Computational intelligence methods have been applied to classify pregnant women’s HIV status
using demographic data from the South African Antenatal Seroprevalence database obtained
from the South African Department of Health. Classification accuracies using a multitude of
computational intelligence techniques ranged between 60% and 70%. The purpose of this
research is to determine the certainty of predicting the HIV status of a patient. Ensemble
neural networks were used for the investigation to obtain a set of possible solutions. The
predictive certainty of each patients predicted HIV status was computed by giving the
percentage of most dominant outputs from the set of possible solutions. Ensembles of neural
networks were obtained using boosting, bagging and the Bayesian approach. It was found that
the ensemble trained using the Bayesian approach is most suitable for the proposed predictive
certainty measure. Furthermore, a sensitivity analysis was done to investigate how each of the
demographic variables influenced the certainty of predicting the HIV status of a patien
Data-assisted modeling of complex chemical and biological systems
Complex systems are abundant in chemistry and biology; they can be multiscale, possibly high-dimensional or stochastic, with nonlinear dynamics and interacting components. It is often nontrivial (and sometimes impossible), to determine and study the macroscopic quantities of interest and the equations they obey. One can only (judiciously or randomly) probe the system, gather observations and study trends. In this thesis, Machine Learning is used as a complement to traditional modeling and numerical methods to enable data-assisted (or data-driven) dynamical systems. As case studies, three complex systems are sourced from diverse fields: The first one is a high-dimensional computational neuroscience model of the Suprachiasmatic Nucleus of the human brain, where bifurcation analysis is performed by simply probing the system. Then, manifold learning is employed to discover a latent space of neuronal heterogeneity. Second, Machine Learning surrogate models are used to optimize dynamically operated catalytic reactors. An algorithmic pipeline is presented through which it is possible to program catalysts with active learning. Third, Machine Learning is employed to extract laws of Partial Differential Equations describing bacterial Chemotaxis. It is demonstrated how Machine Learning manages to capture the rules of bacterial motility in the macroscopic level, starting from diverse data sources (including real-world experimental data). More importantly, a framework is constructed though which already existing, partial knowledge of the system can be exploited. These applications showcase how Machine Learning can be used synergistically with traditional simulations in different scenarios: (i) Equations are available but the overall system is so high-dimensional that efficiency and explainability suffer, (ii) Equations are available but lead to highly nonlinear black-box responses, (iii) Only data are available (of varying source and quality) and equations need to be discovered. For such data-assisted dynamical systems, we can perform fundamental tasks, such as integration, steady-state location, continuation and optimization. This work aims to unify traditional scientific computing and Machine Learning, in an efficient, data-economical, generalizable way, where both the physical system and the algorithm matter
- …