178 research outputs found

    Inferring Latent States and Refining Force Estimates via Hierarchical Dirichlet Process Modeling in Single Particle Tracking Experiments

    Get PDF
    Optical microscopy provides rich spatio-temporal information characterizing in vivo molecular motion. However, effective forces and other parameters used to summarize molecular motion change over time in live cells due to latent state changes, e.g., changes induced by dynamic micro-environments, photobleaching, and other heterogeneity inherent in biological processes. This study focuses on techniques for analyzing Single Particle Tracking (SPT) data experiencing abrupt state changes. We demonstrate the approach on GFP tagged chromatids experiencing metaphase in yeast cells and probe the effective forces resulting from dynamic interactions that reflect the sum of a number of physical phenomena. State changes are induced by factors such as microtubule dynamics exerting force through the centromere, thermal polymer fluctuations, etc. Simulations are used to demonstrate the relevance of the approach in more general SPT data analyses. Refined force estimates are obtained by adopting and modifying a nonparametric Bayesian modeling technique, the Hierarchical Dirichlet Process Switching Linear Dynamical System (HDP-SLDS), for SPT applications. The HDP-SLDS method shows promise in systematically identifying dynamical regime changes induced by unobserved state changes when the number of underlying states is unknown in advance (a common problem in SPT applications). We expand on the relevance of the HDP-SLDS approach, review the relevant background of Hierarchical Dirichlet Processes, show how to map discrete time HDP-SLDS models to classic SPT models, and discuss limitations of the approach. In addition, we demonstrate new computational techniques for tuning hyperparameters and for checking the statistical consistency of model assumptions directly against individual experimental trajectories; the techniques circumvent the need for "ground-truth" and subjective information.Comment: 25 pages, 6 figures. Differs only typographically from PLoS One publication available freely as an open-access article at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.013763

    Spatio-temporal inference for circadian gene transcription in the mammalian SCN

    Get PDF
    Almost all life on earth exhibit circadian rhythms of behaviours that are tied to the natural day and night cycle. In mammals, the suprachiasmatic nucleus (SCN) is responsible for generating and communicating these rhythms to peripheral tissues. The neurons of the SCN function as noisy molecular clocks, expressing circadian genes in an oscillatory fashion over the course of 24 hours through a transcriptional/translational feedback loop (TTFL). The cells synchronise to form a robust clock, capable of exact timekeeping and entrainment to external stimuli, e.g. light, via intercellular signalling. This thesis investigates spatio-temporal inference for stochastic models of the TTFL, motivated by the availability of high-resolution bioimaging data of core circadian genes Period and Cryptochrome from mouse SCN. We begin by introducing the mammalian clock and SCN bioimaging data. We then cover various methodologies for mechanistic and stochastic modelling of gene transcription, including chemical reaction networks, the chemical Langevin equation, and Markov chain Monte Carlo methods for Bayesian inference. We derive stability criteria for a model of the single-cell TTFL that describes transcriptional inhibition through a distributed delay. The model is fitted to imaging data of the gene Cry1, which allows us to infer the dynamics of circadian gene transcription and molecular population sizes. A Bayesian hierarchical framework is developed to model spatial dependencies observed in the parameter estimates of the single-cell model. The methodology is applied to bioimaging data of the Cry1-gene and the analysis tools are developed further by deriving a Bayesian period estimator and an inhibition profile which allow us to study the spatial distribution of key properties of the TTFL across SCN tissue. Finally, the methodology is extended to include an additional molecular species that captures transcriptional activation. This extension confers a mechanistic spatial interpretation to the model by describing the effect of intercellular signalling. By eliciting informative prior distributions for parameters of the circadian Per2 feedback loop, we are able to fit the model to simultaneous recordings of Per2 and calcium. The model fit represents a first step in obtaining a complete model of both single-cell and organ-wide dynamics with empirically estimated parameters

    Novel descriptive and model based statistical approaches in immunology and signal transduction

    No full text
    Biological systems are usually complex nonlinear systems of which we only have a limited understanding. Here we show three different aspects of investigating such systems. We present a method to extract detailed knowledge from typical biological trajectory data, which have randomness as a main characteristic. The migration of immune cells, such as leukocytes, are a key example of our study. The application of our methodology leads to the discovery of novel random walk behaviour of leukocyte migration. Furthermore we use the gathered knowledge to construct the under- lying mathematical model that captures the behaviour of leukocytes, or more precisely macrophages and neutrophils, under acute injury. Any model of a biological system has little predictive power if it is not compared to collected data. We present a pipeline of how complex spatio- temporal trajectory data can be used to calibrate our model of leukocyte migration. The pipeline employs approximate methods in a Bayesian framework. Using the same approach we are able to learn additional information about the underlying signalling network, which is not directly apparent in the cell migration data. While these two methods can be seen as data processing and analysis, we show in the last part of this work how to assess the information content of experiments. The choice of an experiment with the highest information content out of a set of possible experiments leads us to the problem of optimal experimental design. We develop and implement an algorithm for simulation based Bayesian experimental design in order to learn parameters of a given model. We validate our algorithm with the help of toy examples and apply it to examples in immunology (Hes1 transcription regulation) and signal transduction (growth factor induced MAPK pathway)

    Measuring confidence of missing data estimation for HIV classification

    Get PDF
    Computational intelligence methods have been applied to classify pregnant women’s HIV status using demographic data from the South African Antenatal Seroprevalence database obtained from the South African Department of Health. Classification accuracies using a multitude of computational intelligence techniques ranged between 60% and 70%. The purpose of this research is to determine the certainty of predicting the HIV status of a patient. Ensemble neural networks were used for the investigation to obtain a set of possible solutions. The predictive certainty of each patients predicted HIV status was computed by giving the percentage of most dominant outputs from the set of possible solutions. Ensembles of neural networks were obtained using boosting, bagging and the Bayesian approach. It was found that the ensemble trained using the Bayesian approach is most suitable for the proposed predictive certainty measure. Furthermore, a sensitivity analysis was done to investigate how each of the demographic variables influenced the certainty of predicting the HIV status of a patien

    Data-assisted modeling of complex chemical and biological systems

    Get PDF
    Complex systems are abundant in chemistry and biology; they can be multiscale, possibly high-dimensional or stochastic, with nonlinear dynamics and interacting components. It is often nontrivial (and sometimes impossible), to determine and study the macroscopic quantities of interest and the equations they obey. One can only (judiciously or randomly) probe the system, gather observations and study trends. In this thesis, Machine Learning is used as a complement to traditional modeling and numerical methods to enable data-assisted (or data-driven) dynamical systems. As case studies, three complex systems are sourced from diverse fields: The first one is a high-dimensional computational neuroscience model of the Suprachiasmatic Nucleus of the human brain, where bifurcation analysis is performed by simply probing the system. Then, manifold learning is employed to discover a latent space of neuronal heterogeneity. Second, Machine Learning surrogate models are used to optimize dynamically operated catalytic reactors. An algorithmic pipeline is presented through which it is possible to program catalysts with active learning. Third, Machine Learning is employed to extract laws of Partial Differential Equations describing bacterial Chemotaxis. It is demonstrated how Machine Learning manages to capture the rules of bacterial motility in the macroscopic level, starting from diverse data sources (including real-world experimental data). More importantly, a framework is constructed though which already existing, partial knowledge of the system can be exploited. These applications showcase how Machine Learning can be used synergistically with traditional simulations in different scenarios: (i) Equations are available but the overall system is so high-dimensional that efficiency and explainability suffer, (ii) Equations are available but lead to highly nonlinear black-box responses, (iii) Only data are available (of varying source and quality) and equations need to be discovered. For such data-assisted dynamical systems, we can perform fundamental tasks, such as integration, steady-state location, continuation and optimization. This work aims to unify traditional scientific computing and Machine Learning, in an efficient, data-economical, generalizable way, where both the physical system and the algorithm matter
    • …
    corecore