4,030 research outputs found

    Discovering transcriptional modules by Bayesian data integration

    Get PDF
    Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

    Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding

    Get PDF
    Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators of transcription as well as proteins and enzymes shaping the DNA template. Such intrinsic multi-dimensionality makes uncovering connectivity patterns difficult and unreliable and it calls for adoption of methodologies commensurate with the underlying organization of the data source. Here we present a new computational method that predicts interactions between transcription factors and target genes using a compendium of microarray gene expression data and the knowledge of known interactions between genes and transcription factors. The proposed method called Kernel Embedding of REgulatory Networks (KEREN) is based on the concept of gene-regulon association and it captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct gene regulatory interactions in the model bacteria E.coli on a genome-wide scale. Our method not only yields accurate prediction of verifiable interactions, which outperforms on certain metrics comparable methodologies, but also demonstrates the utility of a geometric approach to the analysis of high-dimensional biological data. We also describe the general application of kernel embedding techniques to some other function and network discovery algorithms

    Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data

    Get PDF
    Determining where transcription factors (TFs) bind in genomes provides insight into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these data sets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF-binding preferences, or motifs, to identify TF-binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here, we compare four motif-mapping tools widely used in the Arabidopsis (Arabidopsis thaliana) research community and evaluate their performance using chromatin immunoprecipitation data sets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of Find Individual Motif Occurrences, one of the most frequently used motif-mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a protocol is presented to effectively derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression data sets. This approach will pave the way to increase our understanding of GRNs in different cellular conditions

    An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data

    Get PDF
    Background One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. Results Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. Conclusion The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well

    Application of new probabilistic graphical models in the genetic regulatory networks studies

    Get PDF
    This paper introduces two new probabilistic graphical models for reconstruction of genetic regulatory networks using DNA microarray data. One is an Independence Graph (IG) model with either a forward or a backward search algorithm and the other one is a Gaussian Network (GN) model with a novel greedy search method. The performances of both models were evaluated on four MAPK pathways in yeast and three simulated data sets. Generally, an IG model provides a sparse graph but a GN model produces a dense graph where more information about gene-gene interactions is preserved. Additionally, we found two key limitations in the prediction of genetic regulatory networks using DNA microarray data, the first is the sufficiency of sample size and the second is the complexity of network structures may not be captured without additional data at the protein level. Those limitations are present in all prediction methods which used only DNA microarray data.Comment: 38 pages, 3 figure

    Discovering Transcriptional Regulatory Rules from Gene Expression and TF-DNA Binding Data by Decision Tree Learning

    Get PDF
    Background: One of the most promising but challenging task in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its transcription, and how several transcription factors coordinate to accomplish specific regulations. Results: Here we propose a supervised machine learning approach to address these questions. We build decision trees to associate the expression level of a gene with the transcription factor binding data of its promoter. From the decision trees, we extract regulatory rules that specify how the binding of a combination of several transcription factors affects the expression of a gene. Such rules are easy to interpret, and represent experimentally testable hypotheses. We use a decision tree ensemble approach to increase modeling accuracy and robustness. We also propose a novel method to integrate rules learned from several time series that measure the same biological processes. We apply our method to publicly available cell cycle expression data and transcription factor binding data for the budding yeast. Cross-validation experiments show that our method is highly accurate and reliable. The method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. It also explicitly reveals synergetic relationships of transcription factors, most of which agree well with existing literatures, while the rest provide testable biological hypotheses. Conclusions: The high accuracy of our method indicates that our method is valid and that the learned regulatory rules can be used as the basic building elements of a transcriptional regulatory network. As more and more gene expression and TF binding data are available, we believe that our method will be useful for reconstructing large scale transcriptional regulatory networks

    An Ensemble Learning Approach to Reverse-Engineering Transcriptional Regulatory Networks from Time-Series Gene Expression Data

    Get PDF
    Background One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. Results Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. Conclusion The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well

    Modelling signaling networks underlying plant defence

    Get PDF
    Transcriptional reprogramming plays a significant role in governing plant responses to pathogens. The underlying regulatory networks are complex and dynamic, responding to numerous input signals. Most network modelling studies to date have used large-scale expression data sets from public repositories but defence network models with predictive ability have also been inferred from single time series data sets, and sophisticated biological insights generated from focused experiments containing multiple network perturbations. Using multiple network inference methods, or combining network inference with additional data, such as promoter motifs, can enhance the ability of the model to predict gene function or regulatory relationships. Network topology can highlight key signaling components and provides a systems level understanding of plant defence

    Comparative analysis of acute and chronic corticosteroid pharmacogenomic effects in rat liver: Transcriptional dynamics and regulatory structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comprehensively understanding corticosteroid pharmacogenomic effects is an essential step towards an insight into the underlying molecular mechanisms for both beneficial and detrimental clinical effects. Nevertheless, even in a single tissue different methods of corticosteroid administration can induce different patterns of expression and regulatory control structures. Therefore, rich <it>in vivo </it>datasets of pharmacological time-series with two dosing regimens sampled from rat liver are examined for temporal patterns of changes in gene expression and their regulatory commonalities.</p> <p>Results</p> <p>The study addresses two issues, including (1) identifying significant transcriptional modules coupled with dynamic expression patterns and (2) predicting relevant common transcriptional controls to better understand the underlying mechanisms of corticosteroid adverse effects. Following the orientation of meta-analysis, an extended computational approach that explores the concept of agreement matrix from consensus clustering has been proposed with the aims of identifying gene clusters that share common expression patterns across multiple dosing regimens as well as handling challenges in the analysis of microarray data from heterogeneous sources, e.g. different platforms and time-grids in this study. Six significant transcriptional modules coupled with typical patterns of expression have been identified. Functional analysis reveals that virtually all enriched functions (gene ontologies, pathways) in these modules are shown to be related to metabolic processes, implying the importance of these modules in adverse effects under the administration of corticosteroids. Relevant putative transcriptional regulators (e.g. RXRF, FKHD, SP1F) are also predicted to provide another source of information towards better understanding the complexities of expression patterns and the underlying regulatory mechanisms of those modules.</p> <p>Conclusions</p> <p>We have proposed a framework to identify significant coexpressed clusters of genes across multiple conditions experimented from different microarray platforms, time-grids, and also tissues if applicable. Analysis on rich <it>in vivo </it>datasets of corticosteroid time-series yielded significant insights into the pharmacogenomic effects of corticosteroids, especially the relevance to metabolic side-effects. This has been illustrated through enriched metabolic functions in those transcriptional modules and the presence of GRE binding motifs in those enriched pathways, providing significant modules for further analysis on pharmacogenomic corticosteroid effects.</p
    corecore