832 research outputs found

    Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila

    Get PDF
    Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network which minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies on the numerous multi-species gene expression datasets for other biological processes available in the literature.Comment: 10 pages text + 3 figures + 1 table + 2 supplementary figures + 3 supplementary table

    The Molecular Mechanisms Of Sex Determination In Vertebrates

    Get PDF
    Many reptiles display temperature-dependent sex determination (TSD), in which the primary sex is determined by incubation temperatures rather than sex chromosomes. However, temperature is not the only factor that play critical roles in sex determination in the species with TSD. Previous studies in the snapping turtle, a species with TSD, showed that dihydrotestosterone (DHT) induces ovary development at temperatures that normally produce males or mixed sex ratios. In addition, the feminizing effect of DHT was found to be associated with increased expression of the ovary-determining gene Foxl2, suggesting a potential androgen-Foxl2 regulatory mechanism. This dissertation aims to clarify the molecular mechanisms underlying TSD in several aspects. First, determine the role of androgen in TSD; second, identify novel thermosensitive genes involved in TSD and lastly, reconstruct gene regulatory networks underlying sex determination. To test the hypothetical androgen-Foxl2 interaction, I cloned the proximal promoter (1.6 kb) and coding sequence for snapping turtle Foxl2 (tFoxl2) in frame with mCherry, a red fluorescent protein. The tFoxl2-mCherry fusion plasmid or mCherry plasmid were stably transfected into mouse KK1 granulosa cells. Although expression of tFoxl2-mCherry was not affected by androgen treatment in KK1 cells, androgen inhibited expression of the endogenous mouse Foxl2 gene, suggesting the androgen-Foxl2 interaction does exist but it differs between species. We also found tFoxl2-mCherry potentiated low dose DHT effects on aromatase expression, which has not been reported in any other studies. To identify novel sex-determining genes in TSD, I first de novo assembled and annotated the transcriptome of the snapping turtle using next-generation sequencing (NGS) and then performed RNA-seq analyses on the newly assembled reference transcriptome. With the differential gene expression analyses, I identified 293 thermosensitive genes. Among these genes, I find AEBP2, JARID2, and KDM6B of particular interest because these genes could influence expression of many other genes via epigenetic modifications. To further investigate the molecular mechanisms underlying sex determination, I reconstructed gene regulatory networks using an entropy based network reconstructing algorithm – ARACNE with public microarray experiments in mouse gonads. The subsequent hub gene analyses revealed the basic molecular pathways underlying gonadal development and the master regulator analyses identified 110 candidate sex-determining genes including both known sex-determining genes and novel candidate genes. My findings demonstrate that androgens can influence expression of key ovarian genes but further studies are needed to understand the androgen signaling in TSD. Furthermore, my study provides a first description of the snapping turtle transcriptome and the effects of temperature on transcriptome-wide patterns of gene expression during the TSP. In addition, hub genes and master regulators identified for mammalian gonad determination will guide the direction of future studies in the field of sex determination. However, additional studies are needed to validate the computational findings

    A Computational Algebra Approach to the Reverse Engineering of Gene Regulatory Networks

    Full text link
    This paper proposes a new method to reverse engineer gene regulatory networks from experimental data. The modeling framework used is time-discrete deterministic dynamical systems, with a finite set of states for each of the variables. The simplest examples of such models are Boolean networks, in which variables have only two possible states. The use of a larger number of possible states allows a finer discretization of experimental data and more than one possible mode of action for the variables, depending on threshold values. Furthermore, with a suitable choice of state set, one can employ powerful tools from computational algebra, that underlie the reverse-engineering algorithm, avoiding costly enumeration strategies. To perform well, the algorithm requires wildtype together with perturbation time courses. This makes it suitable for small to meso-scale networks rather than networks on a genome-wide scale. The complexity of the algorithm is quadratic in the number of variables and cubic in the number of time points. The algorithm is validated on a recently published Boolean network model of segment polarity development in Drosophila melanogaster.Comment: 28 pages, 5 EPS figures, uses elsart.cl

    Inferring orthologous gene regulatory networks using interspecies data fusion

    Get PDF
    MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase

    Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments

    Get PDF
    BACKGROUND: Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. RESULTS: Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. CONCLUSION: Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors

    A framework to identify epigenome and transcription factor crosstalk

    Get PDF
    While changes in chromatin are integral to transcriptional reprogramming during cellular differentiation, it is currently unclear how chromatin modifications are targeted to specific loci. To systematically identify transcription factors (TFs) that can direct chromatin changes during cell fate decisions, we model the genome-wide dynamics of chromatin marks in terms of computationally predicted TF binding sites. By applying this computational approach to a time course of Polycomb-mediated H3K27me3 marks during neuronal differentiation of murine stem cells, we identify several motifs that likely regulate dynamics of this chromatin mark. Among these, the motifs bound by REST and by the SNAIL family of TFs are predicted to transiently recruit H3K27me3 in neuronal progenitors. We validate these predictions experimentally and show that absence of REST indeed causes loss of H3K27me3 at target promoters in trans, specifically at the neuronal progenitor state. Moreover, using targeted transgenic insertion, we show that promoter fragments containing REST or SNAIL binding sites are sufficient to recruit H3K27me3 in cis, while deletion of these sites results in loss of H3K27me3. These findings illustrate that the occurrence of TF binding sites can determine chromatin dynamics. Local determination of Polycomb activity by Rest and Snail motifs exemplifies such TF based regulation of chromatin. Furthermore, our results show that key TFs can be identified ab initio through computational modeling of epigenome datasets using a modeling approach that we make readily accessible

    Big data analytics in computational biology and bioinformatics

    Get PDF
    Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference. The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures. The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone. Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment
    corecore