91 research outputs found
Synthesising executable gene regulatory networks in haematopoiesis from single-cell gene expression data
A fundamental challenge in biology is to understand the complex gene regulatory networks which control tissue development in the mammalian embryo, and maintain homoeostasis in the adult. The cell fate decisions underlying these processes are ultimately made at the level of individual cells. Recent experimental advances in biology allow researchers to obtain gene expression profiles at single-cell resolution over thousands of cells at once. These single-cell measurements provide snapshots of the states of the cells that make up a tissue, instead of the population-level averages provided by conventional high-throughput experiments. The aim of this PhD was to investigate the possibility of using this new high resolution data to reconstruct mechanistic computational models of gene regulatory networks.
In this thesis I introduce the idea of viewing single-cell gene expression profiles as states of an asynchronous Boolean network, and frame model inference as the problem of reconstructing a Boolean network from its state space. I then give a scalable algorithm to solve this synthesis problem. In order to achieve scalability, this algorithm works in a modular way, treating different aspects of a graph data structure separately before encoding the search for logical rules as Boolean satisfiability problems to be dispatched to a SAT solver.
Together with experimental collaborators, I applied this method to understanding the process of early blood development in the embryo, which is poorly understood due to the small number of cells present at this stage. The emergence of blood from Flk1+ mesoderm was studied by single cell expression analysis of 3934 cells at four sequential developmental time points. A mechanistic model recapitulating blood development was reconstructed from this data set, which was consistent with known biology and the bifurcation of blood and endothelium. Several model predictions were validated experimentally, demonstrating that HoxB4 and Sox17 directly regulate the haematopoietic factor Erg, and that Sox7 blocks primitive erythroid development.
A general-purpose graphical tool was then developed based on this algorithm, which can be used by biological researchers as new single-cell data sets become available. This tool can deploy computations to the cloud in order to scale up larger high-throughput data sets. The results in this thesis demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the gene regulatory networks that underpin organogenesis. Rapid technological advances in our ability to perform single-cell profiling suggest that my tool will be applicable to other organ systems and may inform the development of improved cellular programming strategies.Microsoft Research PhD Scholarshi
Recommended from our members
Program Synthesis Meets Deep Learning for Decoding Regulatory Networks
With ever growing data sets spanning DNA sequencing all the way to single-cell transcriptomics, we are now facing the question of how can we turn this vast amount of information into knowledge. How do we integrate these large data sets into a coherent whole to help understand biological programs? The last few years have seen a growing interest in machine learning methods to analyse patterns in high-throughput data sets and an increasing interest in using program synthesis techniques to reconstruct and analyse executable models of gene regulatory networks. In this review, we discuss the synergies between the two methods and share our views on how they can be combined to reconstruct executable mechanistic programs directly from large-scale genomic data
SCNS: a graphical tool for reconstructing executable regulatory networks from single-cell genomic data.
Background
Reconstruction of executable mechanistic models from single-cell gene expression data represents a powerful approach to understanding developmental and disease processes. New ambitious efforts like the Human Cell Atlas will soon lead to an explosion of data with potential for uncovering and understanding the regulatory networks which underlie the behaviour of all human cells. In order to take advantage of this data, however, there is a need for general-purpose, user-friendly and efficient computational tools that can be readily used by biologists who do not have specialist computer science knowledge.
Results
The Single Cell Network Synthesis toolkit (SCNS) is a general-purpose computational tool for the reconstruction and analysis of executable models from single-cell gene expression data. Through a graphical user interface, SCNS takes single-cell qPCR or RNA-sequencing data taken across a time course, and searches for logical rules that drive transitions from early cell states towards late cell states. Because the resulting reconstructed models are executable, they can be used to make predictions about the effect of specific gene perturbations on the generation of specific lineages.
Conclusions
SCNS should be of broad interest to the growing number of researchers working in single-cell genomics and will help further facilitate the generation of valuable mechanistic insights into developmental, homeostatic and disease processes.Research in the Gottgens lab is supported by infrastructure support funding from the Wellcome Trust to the Wellcome Trust and MRC Cambridge Stem Cell Institute. Steven Woodhouse is a postdoctoral researcher supported by Microsoft Researc
Recommended from our members
Processing, visualising and reconstructing network models from single-cell data.
New single-cell technologies readily permit gene expression profiling of thousands of cells at single-cell resolution. In this review, we will discuss methods for visualisation and interpretation of single-cell gene expression data, and the computational analysis needed to go from raw data to predictive executable models of gene regulatory network function. We will focus primarily on single-cell real-time quantitative PCR and RNA-sequencing data, but much of what we cover will also be relevant to other platforms, such as the mass cytometry technology for high-dimensional single-cell proteomics.S.W is supported by a Microsoft Research PhD Scholarship.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/icb.2015.10
Revealing the vectors of cellular identity with single-cell genomics
Single-cell genomics has now made it possible to create a comprehensive atlas of human cells. At the same time, it has reopened definitions of a cell's identity and of the ways in which identity is regulated by the cell's molecular circuitry. Emerging computational analysis methods, especially in single-cell RNA sequencing (scRNA-seq), have already begun to reveal, in a data-driven way, the diverse simultaneous facets of a cell's identity, from discrete cell types to continuous dynamic transitions and spatial locations. These developments will eventually allow a cell to be represented as a superposition of 'basis vectors', each determining a different (but possibly dependent) aspect of cellular organization and function. However, computational methods must also overcome considerable challenges-from handling technical noise and data scale to forming new abstractions of biology. As the scale of single-cell experiments continues to increase, new computational approaches will be essential for constructing and characterizing a reference map of cell identities.National Institutes of Health (U.S.) (grant P50 HG006193)BRAIN Initiative (grant U01 MH105979)National Institutes of Health (U.S.) (BRAIN grant 1U01MH105960-01)National Cancer Institute (U.S.) (grant 1U24CA180922)National Institute of Allergy and Infectious Diseases (U.S.) (grant 1U24AI118672-01
Model checking the evolution of gene regulatory networks
The behaviour of gene regulatory networks (GRNs) is typically analysed using simulation-based statistical testing-like methods. In this paper, we demonstrate that we can replace this approach by a formal verification-like method that gives higher assurance and scalability. We focus on Wagner’s weighted GRN model with varying weights, which is used in evolutionary biology. In the model, weight parameters represent the gene interaction strength that may change due to genetic mutations. For a property of interest, we synthesise the constraints over the parameter space that represent the set of GRNs satisfying the property. We experimentally show that our parameter synthesis procedure computes the mutational robustness of GRNs—an important problem of interest in evolutionary biology—more efficiently than the classical simulation method. We specify the property in linear temporal logic. We employ symbolic bounded model checking and SMT solving to compute the space of GRNs that satisfy the property, which amounts to synthesizing a set of linear constraints on the weights
Tools and techniques for multi-valued networks using rewriting logic
PhD ThesisMulti-valued networks (MVNs) are an important, widely used qualitative modelling technique
where time and states are discrete. MVNs extend the well-known Boolean networks by
providing a more powerful qualitative modelling approach for biological systems by allowing
an entity’s state to be within a range of discrete set of values instead of just 0 and 1. They
provide a logical framework for qualitatively modelling and analysing control systems and
have been successfully applied to biological systems and circuit design. While a range of
support tools for developing and analysing MVNs exist, more work is needed to develop
tools to support the practical applications of those techniques.
One of the frameworks that have been successfully applied to biological systems is
Rewriting Logic (RL), an algebraic specification framework that is capable of modelling and
analysing the behaviour of dynamic, concurrent systems. The flexibility of RL techniques
such as implementation of strategies has allowed it to be successfully used to model a wide
range of different formalisms and systems, such as process algebras, Petri nets, and biological
systems. RL specification, programming and computation is supported by a range of powerful
analysis tools which was one of the motivations for choosing to use RL. We choose Maude
as a tool in our work here which is a high-performance reflective language supporting both
equational and RL specification. Maude is going to be used through this thesis to model and
analyse a range of MVNs using RL.
In this thesis we aim to investigate the application of RL to modelling and analysing
both synchronous and asynchronous MVNs, thus enabling the application of support tools
available for RL. We start by constructing an RL model for MVNs using a translation
approach that translates an MVNs set of equations into rewrite rules. We formally show that
our translation approach is correct by proving its soundness and completeness. We illustrate
the techniques and the developed RL framework for MVNs by presenting a range of case
studies which provides a good illustration of the practical application of the developed RL
framework. We then introduce an artificial, scalable MVN model in order to allow a range of
model sizes to be considered and we investigate the performance of our RL framework. We
analyse a larger regulatory network from the literature using our RL framework to give some
insights into how it coped with a larger case studyMinistry of Higher Education in Saudi Arabi
Infobiotics : computer-aided synthetic systems biology
Until very recently Systems Biology has, despite its stated goals, been too reductive in terms of the models being constructed and the methods used have been, on the one hand, unsuited for large scale adoption or integration of knowledge across scales, and on the other hand, too fragmented. The thesis of this dissertation is that better computational languages and seamlessly integrated tools are required by systems and synthetic biologists to enable them to meet the significant challenges involved in understanding life as it is, and by designing, modelling and manufacturing novel organisms, to understand life as it could be. We call this goal, where everything necessary to conduct model-driven investigations of cellular circuitry and emergent effects in populations of cells is available without significant context-switching, “one-pot” in silico synthetic systems biology in analogy to “one-pot” chemistry and “one-pot” biology. Our strategy is to increase the understandability and reusability of models and experiments, thereby avoiding unnecessary duplication of effort, with practical gains in the efficiency of delivering usable prototype models and systems. Key to this endeavour are graphical interfaces that assists novice users by hiding complexity of the underlying tools and limiting choices to only what is appropriate and useful, thus ensuring that the results of in silico experiments are consistent, comparable and reproducible.
This dissertation describes the conception, software engineering and use of two novel software platforms for systems and synthetic biology: the Infobiotics Workbench for modelling, in silico experimentation and analysis of multi-cellular biological systems; and DNA Library Designer with the DNALD language for the compact programmatic specification of combinatorial DNA libraries, as the first stage of a DNA synthesis pipeline, enabling methodical exploration biological problem spaces. Infobiotics models are formalised as Lattice Population P systems, a novel framework for the specification of spatially-discrete and multi-compartmental rule-based models, imbued with a stochastic execution semantics. This framework was developed to meet the needs of real systems biology problems: hormone transport and signalling in the root of Arabidopsis thaliana, and quorum sensing in the pathogenic bacterium Pseudomonas aeruginosa. Our tools have also been used to prototype a novel synthetic biological system for pattern formation, that has been successfully implemented in vitro. Taken together these novel software platforms provide a complete toolchain, from design to wet-lab implementation, of synthetic biological circuits, enabling a step change in the scale of biological investigations that is orders of magnitude greater than could previously be performed in one in silico “pot”
Infobiotics : computer-aided synthetic systems biology
Until very recently Systems Biology has, despite its stated goals, been too reductive in terms of the models being constructed and the methods used have been, on the one hand, unsuited for large scale adoption or integration of knowledge across scales, and on the other hand, too fragmented. The thesis of this dissertation is that better computational languages and seamlessly integrated tools are required by systems and synthetic biologists to enable them to meet the significant challenges involved in understanding life as it is, and by designing, modelling and manufacturing novel organisms, to understand life as it could be. We call this goal, where everything necessary to conduct model-driven investigations of cellular circuitry and emergent effects in populations of cells is available without significant context-switching, “one-pot” in silico synthetic systems biology in analogy to “one-pot” chemistry and “one-pot” biology. Our strategy is to increase the understandability and reusability of models and experiments, thereby avoiding unnecessary duplication of effort, with practical gains in the efficiency of delivering usable prototype models and systems. Key to this endeavour are graphical interfaces that assists novice users by hiding complexity of the underlying tools and limiting choices to only what is appropriate and useful, thus ensuring that the results of in silico experiments are consistent, comparable and reproducible.
This dissertation describes the conception, software engineering and use of two novel software platforms for systems and synthetic biology: the Infobiotics Workbench for modelling, in silico experimentation and analysis of multi-cellular biological systems; and DNA Library Designer with the DNALD language for the compact programmatic specification of combinatorial DNA libraries, as the first stage of a DNA synthesis pipeline, enabling methodical exploration biological problem spaces. Infobiotics models are formalised as Lattice Population P systems, a novel framework for the specification of spatially-discrete and multi-compartmental rule-based models, imbued with a stochastic execution semantics. This framework was developed to meet the needs of real systems biology problems: hormone transport and signalling in the root of Arabidopsis thaliana, and quorum sensing in the pathogenic bacterium Pseudomonas aeruginosa. Our tools have also been used to prototype a novel synthetic biological system for pattern formation, that has been successfully implemented in vitro. Taken together these novel software platforms provide a complete toolchain, from design to wet-lab implementation, of synthetic biological circuits, enabling a step change in the scale of biological investigations that is orders of magnitude greater than could previously be performed in one in silico “pot”
- …