12 research outputs found

    Bifurcation analysis informs Bayesian inference in the Hes1 feedback loop

    Get PDF
    <p>Background Ordinary differential equations (ODEs) are an important tool for describing the dynamics of biological systems. However, for ODE models to be useful, their parameters must first be calibrated. Parameter estimation, that is, finding parameter values given experimental data, is an inference problem that can be treated systematically through a Bayesian framework.</p> <p>A Markov chain Monte Carlo approach can then be used to sample from the appropriate posterior probability distributions, provided that suitable prior distributions can be found for the unknown parameter values. Choosing these priors is therefore a vital first step in the inference process. We study here a negative feedback loop in gene regulation where an ODE incorporating a time delay has been proposed as a realistic model and where experimental data is available. Our aim is to show that a priori mathematical analysis can be exploited in the choice of priors.</p> <p>Results By focussing on the onset of oscillatory behaviour through a Hopf Bifurcation, we derive a range of analytical expressions and constraints that link the model parameters to the observed dynamics of the system. Computational tests on both simulated and experimental data emphasise the usefulness of this analysis.</p> <p>Conclusion Mathematical analysis not only gives insights into the possible dynamical behaviour of gene expression models, but can also be used to inform the choice of priors when parameters are inferred from experimental data in a Bayesian setting.</p&gt

    An optimal approach to the design of experiments for the automatic characterisation of biosystems

    Get PDF
    The Design-Build-Test-Learn cycle is the main approach of synthetic biology to re-design and create new biological parts and systems, targeting the solution for complex and challenging paramount problems. The applications of the novel designs range from biosensing and bioremediation of water pollutants (e.g. heavy metals) to drug discovery and delivery (e.g. cancer treatment) or biofuel production (e.g. butanol and ethanol), amongst others. Standardisation, predictability and automation are crucial elements for synthetic biology to efficiently attain these objectives. Mathematical modelling is a powerful tool that allows us to understand, predict, and control these systems, as shown in many other disciplines such as particle physics, chemical engineering, epidemiology and economics. Yet, the inherent difficulties of using mathematical models substantially slowed their adoption by the synthetic biology community. Researchers might develop different competing model alternatives in absence of in-depth knowledge of a system, consequently being left with the burden of with having to find the best one. Models also come with unknown and difficult to measure parameters that need to be inferred from experimental data. Moreover, the varying informative content of different experiments hampers the solution of these model selection and parameter identification problems, adding to the scarcity and noisiness of laborious-to-obtain data. The difficulty to solve these non-linear optimisation problems limited the widespread use of advantageous mathematical models in synthetic biology, broadening the gap between computational and experimental scientists. In this work, I present the solutions to the problems of parameter identification, model selection and experimental design, validating them with in vivo data. First, I use Bayesian inference to estimate model parameters, relaxing the traditional noise assumptions associated with this problem. I also apply information-theoretic approaches to evaluate the amount of information extracted from experiments (entropy gain). Next, I define methodologies to quantify the informative content of tentative experiments planned for model selection (distance between predictions of competing models) and parameter inference (model prediction uncertainty). Then, I use the two methods to define efficient platforms for optimal experimental design and use a synthetic gene circuit (the genetic toggle switch) to substantiate the results, computationally and experimentally. I also expand strategies to optimally design experiments for parameter identification to update parameter information and input designs during the execution of these (on-line optimal experimental design) using microfluidics. Finally, I developed an open-source and easy-to-use Julia package, BOMBs.jl, automating all the above functionalities to facilitate their dissemination and use amongst the synthetic biology community

    Differential geometric MCMC methods and applications

    Get PDF
    This thesis presents novel Markov chain Monte Carlo methodology that exploits the natural representation of a statistical model as a Riemannian manifold. The methods developed provide generalisations of the Metropolis-adjusted Langevin algorithm and the Hybrid Monte Carlo algorithm for Bayesian statistical inference, and resolve many shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlation structure. The performance of these Riemannian manifold Markov chain Monte Carlo algorithms is rigorously assessed by performing Bayesian inference on logistic regression models, log-Gaussian Cox point process models, stochastic volatility models, and both parameter and model level inference of dynamical systems described by nonlinear differential equations

    Synthesising executable gene regulatory networks in haematopoiesis from single-cell gene expression data

    Get PDF
    A fundamental challenge in biology is to understand the complex gene regulatory networks which control tissue development in the mammalian embryo, and maintain homoeostasis in the adult. The cell fate decisions underlying these processes are ultimately made at the level of individual cells. Recent experimental advances in biology allow researchers to obtain gene expression profiles at single-cell resolution over thousands of cells at once. These single-cell measurements provide snapshots of the states of the cells that make up a tissue, instead of the population-level averages provided by conventional high-throughput experiments. The aim of this PhD was to investigate the possibility of using this new high resolution data to reconstruct mechanistic computational models of gene regulatory networks. In this thesis I introduce the idea of viewing single-cell gene expression profiles as states of an asynchronous Boolean network, and frame model inference as the problem of reconstructing a Boolean network from its state space. I then give a scalable algorithm to solve this synthesis problem. In order to achieve scalability, this algorithm works in a modular way, treating different aspects of a graph data structure separately before encoding the search for logical rules as Boolean satisfiability problems to be dispatched to a SAT solver. Together with experimental collaborators, I applied this method to understanding the process of early blood development in the embryo, which is poorly understood due to the small number of cells present at this stage. The emergence of blood from Flk1+ mesoderm was studied by single cell expression analysis of 3934 cells at four sequential developmental time points. A mechanistic model recapitulating blood development was reconstructed from this data set, which was consistent with known biology and the bifurcation of blood and endothelium. Several model predictions were validated experimentally, demonstrating that HoxB4 and Sox17 directly regulate the haematopoietic factor Erg, and that Sox7 blocks primitive erythroid development. A general-purpose graphical tool was then developed based on this algorithm, which can be used by biological researchers as new single-cell data sets become available. This tool can deploy computations to the cloud in order to scale up larger high-throughput data sets. The results in this thesis demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the gene regulatory networks that underpin organogenesis. Rapid technological advances in our ability to perform single-cell profiling suggest that my tool will be applicable to other organ systems and may inform the development of improved cellular programming strategies.Microsoft Research PhD Scholarshi

    Book of abstracts

    Get PDF

    Applications, challenges and new perspectives on the analysis of transcriptional regulation using epigenomic and transcriptomic data

    Get PDF
    The integrative analysis of epigenomics and transcriptomics data is an active research field in Bioinformatics. New methods are required to interpret and process large omics data sets, as generated within consortia such as the International Human Epigenomics Consortium. In this thesis, we present several approaches illustrating how combined epigenomics and transcriptomics datasets, e.g. for differential or time series analysis, can be used to derive new biological insights on transcriptional regulation. In this work we focus on regulatory proteins called transcription factors (TFs), which are essential for orchestrating cellular processes. In our novel approaches, we combine epigenomics data, such as DNaseI-seq, predicted TF binding scores and gene-expression measurements in interpretable machine learning models. In joint work with our collaborators within and outside IHEC, we have shown that our methods lead to biological meaningful results, which could be validated with wet-lab experiments. Aside from providing the community with new tools to perform integrative analysis of epigenomics and transcriptomics data, we have studied the characteristics of chromatin accessibility data and its relation to gene-expression in detail to better understand the implications of both computational processing and of different experimental methods on data interpretation. Overall, we provide easy to use tools to enable researchers to benefit from the era of Biological Data Science.In dieser Dissertation stellen wir mehrere Ansätze vor, um die häufigsten "omics" Daten, wie beispielsweise differentielle Datenstze oder auch Zeitreihen zu verwenden, um neue Erkenntnisse über Genregulation auf transkriptioneller Ebene gewinnen zu können. Dabei haben wir uns insbesondere auf sogenannte Transkriptionsfaktoren konzentriert. Dies sind Proteine, die essentiell für die Steuerung regulatorischer Prozesse in der Zelle sind. In unseren neuen Methoden kombinieren wir epigenetische Daten, zum Beispiel DNaseI-seq oder ATAC-seq Daten, vorhergesagte Transkriptionsfaktorbindestellen und Genexpressionsdaten in interpretierbaren Modellen des maschinellen Lernens. Zusammen mit unseren Kooperationspartnern haben wir gezeigt, dass unsere Methoden zu biologisch bedeutsamen Ergebnissen führen, die exemplarisch im Labor validiert werden konnten. Ferner haben wir im Detail Zusammenhänge zwischen der Struktur des Chromatins und der Genexpression untersucht. Dies ist von immenser Bedeutung, um den Einfluss von experimentellen Charakteristika aber auch von der Modellierung der Daten auf die biologische Interpretation zu vermeiden

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore