Search CORE

25 research outputs found

BTR: training asynchronous Boolean models using single-cell expression data

Author: Fisher Jasmin
Gottgens Berthold
Lim Chee
Piterman Nir
Wang Huange
Wernisch Lorenz
Woodhouse Steven
Publication venue
Publication date: 06/09/2016
Field of study

Abstract Background Rapid technological innovation for the generation of single-cell genomics data presents new challenges and opportunities for bioinformatics analysis. One such area lies in the development of new ways to train gene regulatory networks. The use of single-cell expression profiling technique allows the profiling of the expression states of hundreds of cells, but these expression states are typically noisier due to the presence of technical artefacts such as drop-outs. While many algorithms exist to infer a gene regulatory network, very few of them are able to harness the extra expression states present in single-cell expression data without getting adversely affected by the substantial technical noise present. Results Here we introduce BTR, an algorithm for training asynchronous Boolean models with single-cell expression data using a novel Boolean state space scoring function. BTR is capable of refining existing Boolean models and reconstructing new Boolean models by improving the match between model prediction and expression data. We demonstrate that the Boolean scoring function performed favourably against the BIC scoring function for Bayesian networks. In addition, we show that BTR outperforms many other network inference algorithms in both bulk and single-cell synthetic expression data. Lastly, we introduce two case studies, in which we use BTR to improve published Boolean models in order to generate potentially new biological insights. Conclusions BTR provides a novel way to refine or reconstruct Boolean models using single-cell expression data. Boolean model is particularly useful for network reconstruction using single-cell data because it is more robust to the effect of drop-outs. In addition, BTR does not assume any relationship in the expression states among cells, it is useful for reconstructing a gene regulatory network with as few assumptions as possible. Given the simplicity of Boolean models and the rapid adoption of single-cell genomics by biologists, BTR has the potential to make an impact across many fields of biomedical research

Crossref

PubMed Central

Apollo (Cambridge)

Queen Mary Research Online

Leicester Research Archive

Genetic Dissection of Leaf Development in Brassica rapa

Author: Dong Xiao
Guusje Bonnema
Huange Wang
Jianjun Zhao
Ke Lin
Ram Kumar Basnet
Xilin Hou
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date
Field of study

Crossref

An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state stability.

Author: Antoniou Stella
Basilico Silvia
Bonzanni Nicola
Calero-Nieto Fernando J
Chan Mun Chiang
de Bruijn Marella Ftr
Göttgens Berthold
Hannah Rebecca L
Jarratt Andrew
Kinston Sarah J
Moignard Victoria
Nürnberg Sylvia T
Ouwehand Willem H
Riepsaame Joey
Schütte Judith
Wang Huange
Wilson Nicola K
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 25/01/2016
Field of study

Transcription factor (TF) networks determine cell-type identity by establishing and maintaining lineage-specific expression profiles, yet reconstruction of mammalian regulatory network models has been hampered by a lack of comprehensive functional validation of regulatory interactions. Here, we report comprehensive ChIP-Seq, transgenic and reporter gene experimental data that have allowed us to construct an experimentally validated regulatory network model for haematopoietic stem/progenitor cells (HSPCs). Model simulation coupled with subsequent experimental validation using single cell expression profiling revealed potential mechanisms for cell state stabilisation, and also how a leukaemogenic TF fusion protein perturbs key HSPC regulators. The approach presented here should help to improve our understanding of both normal physiological and disease processes.Research in the authors’ laboratories was supported by Bloodwise, The Wellcome Trust, Cancer Research UK, the Biotechnology and Biological Sciences Research Council, the National Institute of Health Research, the Medical Research Council, the MRC Molecular Haematology Unit (Oxford) core award, a Weizmann-UK “Making Connections” grant (Oxford) and core support grants by the Wellcome Trust to the Cambridge Institute for Medical Research (100140) and Wellcome Trust–MRC Cambridge Stem Cell Institute (097922).This is the final version of the article. It first appeared from eLife via http://dx.doi.org/10.7554/eLife.1146

Crossref

PubMed Central

Apollo (Cambridge)

Using probabilistic graphical models to reconstruct biological networks and linkage maps

Author: Wang Huange
Publication venue: 'Wageningen University and Research'
Publication date: 01/01/2017
Field of study

Probabilistic graphical models (PGMs) offer a conceptual architecture where biological and mathematical objects can be expressed with a common, intuitive formalism. This facilitates the joint development of statistical and computational tools for quantitative analysis of biological data. Over the last few decades, procedures based on well-understood principles for constructing PGMs from observational and experimental data have been studied extensively, and they thus form a model-based methodology for analysis and discovery. In this thesis, we further explore the potential of this methodology in systems biology and quantitative genetics, and illustrate the capabilities of our proposed approaches by several applications to both real and simulated omics data. In quantitative genetics, we partition phenotypic variation into heritable, genetic, and non-heritable, environmental, parts. In molecular genetics, we identify chromosomal regions that drive genetic variation: quantitative trait loci (QTLs). In systems genetics, we would like to answer the question of whether relations between multiple phenotypic traits can be organized within wholly or partially directed network structures. Directed edges in those networks can be interpreted as causal relationships, causality meaning that the consequences of interventions are predictable: phenotypic interventions in upstream traits, i.e. traits occurring early in causal chains, will produce changes in downstream traits. The effect of a QTL allele can be considered to represent a genetic intervention on the phenotypic network. Various methods have been proposed for statistical reconstruction of causal phenotypic networks exploiting previously identified QTLs. In chapter 2, we present a novel heuristic search algorithm, namely the QTL+phenotype supervised orientation (QPSO) algorithm, to infer causal relationships between phenotypic traits. Our algorithm shows good performance in the common, but so far uncovered case, where some traits come without QTLs. Therefore, our algorithm is especially attractive for applications involving expensive phenotypes, like metabolites, where relatively few genotypes can be measured and population size is limited. Standard QTL mapping typically models phenotypic variations observable in nature in relation to genetic variation in gene expression, regardless of multiple intermediate-level biological variations. In chapter 3, we present an approach integrating Gaussian graphical modeling (GGM) and causal inference for simultaneous modeling of multilevel biological responses to DNA variations. More specifically, for ripe tomato fruits, the dependencies of 24 sensory traits on 29 metabolites and the dependencies of all the sensory and metabolic traits further on 21 QTLs were investigated by three GGM approaches including: (i) lasso-based neighborhood selection in combination with a stability approach to regularization selection, (ii) the PC-skeleton algorithm and (iii) the Lasso in combination with stability selection, and then followed by the QPSO algorithm. The inferred dependency network which, though not essentially representing biological pathways, suggests how the effects of allele substitutions propagate through multilevel phenotypes. Such simultaneous study of the underlying genetic architecture and multifactorial interactions is expected to enhance the prediction and manipulation of complex traits. And it is applicable to a range of population structures, including offspring populations from crosses between inbred parents and outbred parents, association panels and natural populations. In chapter 4, we report a novel method for linkage map construction using probabilistic graphical models. It has been shown that linkage map construction can be hampered by the presence of genotyping errors and chromosomal rearrangements such as inversions and translocations. Our proposed method is proven, both theoretically and practically, to be effective in filtering out markers that contain genotyping errors. In particular, it carries out marker filtering and ordering simultaneously, and is therefore superior to the standard post-hoc filtering using nearest-neighbour stress. Furthermore, we demonstrate empirically that the proposed method offers a promising solution to genetic map construction in the case of a reciprocal translocation. In the domain of PGMs, Bayesian networks (BNs) have proven, both theoretically and practically, to be a promising tool for the reconstruction of causal networks. In particular, the PC algorithm and the Metropolis-Hastings algorithm, which are representatives of mainstream methods to BN structure learning, are reported to have been successfully applied to the field of biology. In view of the fact that most biological systems exist in the form of random network or scale-free network, in chapter 5 we compare the performance of the two algorithms in constructing both random and scale-free BNs. Our simulation study shows that for either type of BN, the PC algorithm is superior to the M-H algorithm in terms of timeliness; the M-H algorithm is preferable to the PC algorithm when the completeness of reconstruction is emphasized; but when the fidelity of reconstruction is taken into account, the better one of the two algorithms varies from case to case. Moreover, whichever algorithm is adopted, larger sample sizes generally permit more accurate reconstructions, especially in regard to the completeness of the resulting networks. Finally, chapter 6 presents a further elaboration and discussion of the key concepts and results involved in this thesis.</p

Wageningen University & Research Publications

An example of LODG.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

Y1 and Y2 are two correlated traits; C2 and C3 are two traits that have been newly determined as parent nodes of Y1; Y1, C3 and C5 are three traits newly determined as parent nodes of Y2; Y1 is a newly determined parent node of traits C1 and C4; Y2 is a newly determined parent node of traits C4 and C6.</p

FigShare

The general representations of resolvable LGPNs.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

Y1 and Y2 are two correlated traits; P1 = {P11,…,P1k} and P2 = {P21,…,P2l} are, respectively, the unique parent nodes of Y1 and Y2; P12 = {P1,…,Ps} are the common parent nodes of Y1 and Y2; C1 = {C11,…,C1u} and C2 = {C21,…,C2v} are the unique neighboring traits of Y1 and Y2; C12 = {C1,…,Ct} are the common neighboring traits of Y1 and Y2. Note that each of the neighboring traits of Y1 is nonadjacent to at least one of the parent nodes of Y1, and the same is true of Y2. Also note that P1, P2 and P12 are allowed to have three different compositions: (1) a pure set of QTLs, if only genetic factors have been identified for Y1 and/or Y2; (2) a mixed set of QTLs and traits, if some traits in addition to QTLs have been determined to have causal effects on Y1 and/or Y2; (3) a pure set of traits, if only some traits have been found as causal factors of Y1 and/or Y2; in contrast, C1, C2 and C12 only refer to those traits that are directly connected to Y1 and/or Y2 by an undirected edge. (A) The general representation of LGPNs where both Y1 and Y2 have parent nodes, and at least one of them has unique parent nodes; (B) the general representation of LGPNs where only Y1 has parent nodes.</p

FigShare

Candidate solutions to causal inference in two correlated traits.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

Y1 and Y2 are two traits correlated with each other; Q1 = {Q11,…,Q1k} and Q2 = {Q21,…,Q2l} denote QTLs for Y1 and Y2, respectively.</p

FigShare

Test models in triad analysis.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

(A) a QTL Q has pleiotropic effects on two traits Y1 and Y2, Y1 is also a causal factor of Y2; (B) Q is identified for Y1, Y1 has a causal effect on Y2; (C) Q is identified for Y2, Y2 has a causal effect on Y1; (D) Q is identified for both Y1 and Y2, but the causal relationship between Y1 and Y2 is unclear.</p

FigShare

Comparison between the best two models obtained by a single run of the QPSO algorithm.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

Computing time was measured on a 32 bit Intel(R) Core(TM) i5-2410M CPU 2.30GHz machine with 4GB RAM. “different edges” were assigned with opposite directions in the best two models. (√) means the direction of that edge was inferred correctly, whereas (×) applies to the opposite case.</p

FigShare

Comparative evaluation of three algorithms in overall orientation of the synthetic phenotype network.

Author: Fred A. van Eeuwijk (358696)
Huange Wang (619576)
Publication venue
Publication date
Field of study

Sample size, means and standard deviations of the proportion of true positive edges that were correctly oriented across a series of 20 simulations.</p

FigShare