25 research outputs found

    BTR: training asynchronous Boolean models using single-cell expression data

    Get PDF
    Abstract Background Rapid technological innovation for the generation of single-cell genomics data presents new challenges and opportunities for bioinformatics analysis. One such area lies in the development of new ways to train gene regulatory networks. The use of single-cell expression profiling technique allows the profiling of the expression states of hundreds of cells, but these expression states are typically noisier due to the presence of technical artefacts such as drop-outs. While many algorithms exist to infer a gene regulatory network, very few of them are able to harness the extra expression states present in single-cell expression data without getting adversely affected by the substantial technical noise present. Results Here we introduce BTR, an algorithm for training asynchronous Boolean models with single-cell expression data using a novel Boolean state space scoring function. BTR is capable of refining existing Boolean models and reconstructing new Boolean models by improving the match between model prediction and expression data. We demonstrate that the Boolean scoring function performed favourably against the BIC scoring function for Bayesian networks. In addition, we show that BTR outperforms many other network inference algorithms in both bulk and single-cell synthetic expression data. Lastly, we introduce two case studies, in which we use BTR to improve published Boolean models in order to generate potentially new biological insights. Conclusions BTR provides a novel way to refine or reconstruct Boolean models using single-cell expression data. Boolean model is particularly useful for network reconstruction using single-cell data because it is more robust to the effect of drop-outs. In addition, BTR does not assume any relationship in the expression states among cells, it is useful for reconstructing a gene regulatory network with as few assumptions as possible. Given the simplicity of Boolean models and the rapid adoption of single-cell genomics by biologists, BTR has the potential to make an impact across many fields of biomedical research

    Genetic Dissection of Leaf Development in Brassica rapa

    Full text link

    An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state stability.

    Get PDF
    Transcription factor (TF) networks determine cell-type identity by establishing and maintaining lineage-specific expression profiles, yet reconstruction of mammalian regulatory network models has been hampered by a lack of comprehensive functional validation of regulatory interactions. Here, we report comprehensive ChIP-Seq, transgenic and reporter gene experimental data that have allowed us to construct an experimentally validated regulatory network model for haematopoietic stem/progenitor cells (HSPCs). Model simulation coupled with subsequent experimental validation using single cell expression profiling revealed potential mechanisms for cell state stabilisation, and also how a leukaemogenic TF fusion protein perturbs key HSPC regulators. The approach presented here should help to improve our understanding of both normal physiological and disease processes.Research in the authorsā€™ laboratories was supported by Bloodwise, The Wellcome Trust, Cancer Research UK, the Biotechnology and Biological Sciences Research Council, the National Institute of Health Research, the Medical Research Council, the MRC Molecular Haematology Unit (Oxford) core award, a Weizmann-UK ā€œMaking Connectionsā€ grant (Oxford) and core support grants by the Wellcome Trust to the Cambridge Institute for Medical Research (100140) and Wellcome Trustā€“MRC Cambridge Stem Cell Institute (097922).This is the final version of the article. It first appeared from eLife via http://dx.doi.org/10.7554/eLife.1146

    Using probabilistic graphical models to reconstruct biological networks and linkage maps

    Get PDF
    Probabilistic graphical models (PGMs) offer a conceptual architecture where biological and mathematical objects can be expressed with a common, intuitive formalism. This facilitates the joint development of statistical and computational tools for quantitative analysis of biological data. Over the last few decades, procedures based on well-understood principles for constructing PGMs from observational and experimental data have been studied extensively, and they thus form a model-based methodology for analysis and discovery. In this thesis, we further explore the potential of this methodology in systems biology and quantitative genetics, and illustrate the capabilities of our proposed approaches by several applications to both real and simulated omics data. In quantitative genetics, we partition phenotypic variation into heritable, genetic, and non-heritable, environmental, parts. In molecular genetics, we identify chromosomal regions that drive genetic variation: quantitative trait loci (QTLs). In systems genetics, we would like to answer the question of whether relations between multiple phenotypic traits can be organized within wholly or partially directed network structures. Directed edges in those networks can be interpreted as causal relationships, causality meaning that the consequences of interventions are predictable: phenotypic interventions in upstream traits, i.e. traits occurring early in causal chains, will produce changes in downstream traits. The effect of a QTL allele can be considered to represent a genetic intervention on the phenotypic network. Various methods have been proposed for statistical reconstruction of causal phenotypic networks exploiting previously identified QTLs. In chapter 2, we present a novel heuristic search algorithm, namely the QTL+phenotype supervised orientation (QPSO) algorithm, to infer causal relationships between phenotypic traits. Our algorithm shows good performance in the common, but so far uncovered case, where some traits come without QTLs. Therefore, our algorithm is especially attractive for applications involving expensive phenotypes, like metabolites, where relatively few genotypes can be measured and population size is limited. Standard QTL mapping typically models phenotypic variations observable in nature in relation to genetic variation in gene expression, regardless of multiple intermediate-level biological variations. In chapter 3, we present an approach integrating Gaussian graphical modeling (GGM) and causal inference for simultaneous modeling of multilevel biological responses to DNA variations. More specifically, for ripe tomato fruits, the dependencies of 24 sensory traits on 29 metabolites and the dependencies of all the sensory and metabolic traits further on 21 QTLs were investigated by three GGM approaches including: (i) lasso-based neighborhood selection in combination with a stability approach to regularization selection, (ii) the PC-skeleton algorithm and (iii) the Lasso in combination with stability selection, and then followed by the QPSO algorithm. The inferred dependency network which, though not essentially representing biological pathways, suggests how the effects of allele substitutions propagate through multilevel phenotypes. Such simultaneous study of the underlying genetic architecture and multifactorial interactions is expected to enhance the prediction and manipulation of complex traits. And it is applicable to a range of population structures, including offspring populations from crosses between inbred parents and outbred parents, association panels and natural populations. In chapter 4, we report a novel method for linkage map construction using probabilistic graphical models. It has been shown that linkage map construction can be hampered by the presence of genotyping errors and chromosomal rearrangements such as inversions and translocations. Our proposed method is proven, both theoretically and practically, to be effective in filtering out markers that contain genotyping errors. In particular, it carries out marker filtering and ordering simultaneously, and is therefore superior to the standard post-hoc filtering using nearest-neighbour stress. Furthermore, we demonstrate empirically that the proposed method offers a promising solution to genetic map construction in the case of a reciprocal translocation. In the domain of PGMs, Bayesian networks (BNs) have proven, both theoretically and practically, to be a promising tool for the reconstruction of causal networks. In particular, the PC algorithm and the Metropolis-Hastings algorithm, which are representatives of mainstream methods to BN structure learning, are reported to have been successfully applied to the field of biology. In view of the fact that most biological systems exist in the form of random network or scale-free network, in chapter 5 we compare the performance of the two algorithms in constructing both random and scale-free BNs. Our simulation study shows that for either type of BN, the PC algorithm is superior to the M-H algorithm in terms of timeliness; the M-H algorithm is preferable to the PC algorithm when the completeness of reconstruction is emphasized; but when the fidelity of reconstruction is taken into account, the better one of the two algorithms varies from case to case. Moreover, whichever algorithm is adopted, larger sample sizes generally permit more accurate reconstructions, especially in regard to the completeness of the resulting networks. Finally, chapter 6 presents a further elaboration and discussion of the key concepts and results involved in this thesis.</p

    An example of LODG.

    No full text
    <p><i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub> are two correlated traits; <i>C</i><sub>2</sub> and <i>C</i><sub>3</sub> are two traits that have been newly determined as parent nodes of <i>Y</i><sub>1</sub>; <i>Y</i><sub>1</sub>, <i>C</i><sub>3</sub> and <i>C</i><sub>5</sub> are three traits newly determined as parent nodes of <i>Y</i><sub>2</sub>; <i>Y</i><sub>1</sub> is a newly determined parent node of traits <i>C</i><sub>1</sub> and <i>C</i><sub>4</sub>; <i>Y</i><sub>2</sub> is a newly determined parent node of traits <i>C</i><sub>4</sub> and <i>C</i><sub>6</sub>.</p

    The general representations of resolvable LGPNs.

    No full text
    <p><i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub> are two correlated traits; <b><i>P</i></b><sub>1</sub>ā€Š=ā€Š{<i>P</i><sub>11</sub>,ā€¦,<i>P</i><sub>1<i>k</i></sub>} and <b><i>P</i></b><sub>2</sub>ā€Š=ā€Š{<i>P</i><sub>21</sub>,ā€¦,<i>P</i><sub>2<i>l</i></sub>} are, respectively, the unique parent nodes of <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>; <b><i>P</i></b><sub>12</sub>ā€Š=ā€Š{<i>P</i><sub>1</sub>,ā€¦,<i>P<sub>s</sub></i>} are the common parent nodes of <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>; <b><i>C</i></b><sub>1</sub>ā€Š=ā€Š{<i>C</i><sub>11</sub>,ā€¦,<i>C</i><sub>1<i>u</i></sub>} and <b><i>C</i></b><sub>2</sub>ā€Š=ā€Š{<i>C</i><sub>21</sub>,ā€¦,<i>C</i><sub>2<i>v</i></sub>} are the unique neighboring traits of <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>; <b><i>C</i></b><sub>12</sub>ā€Š=ā€Š{<i>C</i><sub>1</sub>,ā€¦,<i>C<sub>t</sub></i>} are the common neighboring traits of <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>. Note that each of the neighboring traits of <i>Y</i><sub>1</sub> is nonadjacent to at least one of the parent nodes of <i>Y</i><sub>1</sub>, and the same is true of <i>Y</i><sub>2</sub>. Also note that <b><i>P</i></b><sub>1</sub>, <b><i>P</i></b><sub>2</sub> and <b><i>P</i></b><sub>12</sub> are allowed to have three different compositions: (1) a pure set of QTLs, if only genetic factors have been identified for <i>Y</i><sub>1</sub> and/or <i>Y</i><sub>2</sub>; (2) a mixed set of QTLs and traits, if some traits in addition to QTLs have been determined to have causal effects on <i>Y</i><sub>1</sub> and/or <i>Y</i><sub>2</sub>; (3) a pure set of traits, if only some traits have been found as causal factors of <i>Y</i><sub>1</sub> and/or <i>Y</i><sub>2</sub>; in contrast, <b><i>C</i></b><sub>1</sub>, <b><i>C</i></b><sub>2</sub> and <b><i>C</i></b><sub>12</sub> only refer to those traits that are directly connected to <i>Y</i><sub>1</sub> and/or <i>Y</i><sub>2</sub> by an undirected edge. (A) The general representation of LGPNs where both <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub> have parent nodes, and at least one of them has unique parent nodes; (B) the general representation of LGPNs where only <i>Y</i><sub>1</sub> has parent nodes.</p

    Candidate solutions to causal inference in two correlated traits.

    No full text
    <p><i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub> are two traits correlated with each other; <b><i>Q</i></b><sub>1</sub>ā€Š=ā€Š{<i>Q</i><sub>11</sub>,ā€¦,<i>Q</i><sub>1<i>k</i></sub>} and <b><i>Q</i></b><sub>2</sub>ā€Š=ā€Š{<i>Q</i><sub>21</sub>,ā€¦,<i>Q</i><sub>2<i>l</i></sub>} denote QTLs for <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>, respectively.</p

    Test models in triad analysis.

    No full text
    <p>(A) a QTL <i>Q</i> has pleiotropic effects on two traits <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>, <i>Y</i><sub>1</sub> is also a causal factor of <i>Y</i><sub>2</sub>; (B) <i>Q</i> is identified for <i>Y</i><sub>1</sub>, <i>Y</i><sub>1</sub> has a causal effect on <i>Y</i><sub>2</sub>; (C) <i>Q</i> is identified for <i>Y</i><sub>2</sub>, <i>Y</i><sub>2</sub> has a causal effect on <i>Y</i><sub>1</sub>; (D) <i>Q</i> is identified for both <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub>, but the causal relationship between <i>Y</i><sub>1</sub> and <i>Y</i><sub>2</sub> is unclear.</p

    Comparison between the best two models obtained by a single run of the QPSO algorithm.

    No full text
    <p>Computing time was measured on a 32 bit Intel(R) Core(TM) i5-2410M CPU 2.30GHz machine with 4GB RAM. ā€œdifferent edgesā€ were assigned with opposite directions in the best two models. (āˆš) means the direction of that edge was inferred correctly, whereas (Ɨ) applies to the opposite case.</p

    Comparative evaluation of three algorithms in overall orientation of the synthetic phenotype network.

    No full text
    <p>Sample size, means and standard deviations of the proportion of true positive edges that were correctly oriented across a series of 20 simulations.</p
    corecore