3,029 research outputs found

    A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli

    Get PDF
    While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer), a semi-supervised learning method that uses a curated database of verified transcriptional factor–gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor–gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic–anaerobic shift interface

    Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding

    Get PDF
    Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators of transcription as well as proteins and enzymes shaping the DNA template. Such intrinsic multi-dimensionality makes uncovering connectivity patterns difficult and unreliable and it calls for adoption of methodologies commensurate with the underlying organization of the data source. Here we present a new computational method that predicts interactions between transcription factors and target genes using a compendium of microarray gene expression data and the knowledge of known interactions between genes and transcription factors. The proposed method called Kernel Embedding of REgulatory Networks (KEREN) is based on the concept of gene-regulon association and it captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct gene regulatory interactions in the model bacteria E.coli on a genome-wide scale. Our method not only yields accurate prediction of verifiable interactions, which outperforms on certain metrics comparable methodologies, but also demonstrates the utility of a geometric approach to the analysis of high-dimensional biological data. We also describe the general application of kernel embedding techniques to some other function and network discovery algorithms

    Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.

    Get PDF
    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery

    Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models

    Get PDF
    Advances in sequencing technology are resulting in the rapid emergence of large numbers of complete genome sequences. High throughput annotation and metabolic modeling of these genomes is now a reality. The high throughput reconstruction and analysis of genome-scale transcriptional regulatory networks represents the next frontier in microbial bioinformatics. The fruition of this next frontier will depend upon the integration of numerous data sources relating to mechanisms, components, and behavior of the transcriptional regulatory machinery, as well as the integration of the regulatory machinery into genome-scale cellular models. Here we review existing repositories for different types of transcriptional regulatory data, including expression data, transcription factor data, and binding site locations, and we explore how these data are being used for the reconstruction of new regulatory networks. From template network based methods to de novo reverse engineering from expression data, we discuss how regulatory networks can be reconstructed and integrated with metabolic models to improve model predictions and performance. Finally, we explore the impact these integrated models can have in simulating phenotypes, optimizing the production of compounds of interest or paving the way to a whole-cell model.J.P.F. acknowledges funding from [SFRH/BD/70824/2010] of the FCT (Portuguese Foundation for Science and Technology) PhD program. The work was supported in part by the ERDF—European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness), National Funds through the FCT within projects [FCOMP-01-0124-FEDER015079] (ToMEGIM—Computational Tools for Metabolic Engineering using Genome-scale Integrated Models) and FCOMP-01-0124-FEDER009707 (HeliSysBio—molecular Systems Biology in Helicobacter pylori), the U.S. Department of Energy under contract [DE-ACO2-06CH11357] and the National Science Foundation under [0850546]

    DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli

    Get PDF
    DISTILLER, a data integration framework for the inference of transcriptional module networks, is presented and used to investigate the condition dependency and modularity in Escherichia coli networks

    Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions.</p> <p>Results</p> <p>In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification.</p> <p>Conclusion</p> <p>High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.</p

    BioQuali Cytoscape plugin: analysing the global consistency of regulatory networks

    Get PDF
    International audienceBackground: The method most commonly used to analyse regulatory networks is the in silico simulation of fluctuations in network components when a network is perturbed. Nevertheless, confronting experimental data with a regulatory network entails many difficulties, such as the incomplete state-of-art of regulatory knowledge, the large-scale of regulatory models, heterogeneity in the available data and the sometimes violated assumption that mRNA expression is correlated to protein activity. Results: We have developed a plugin for the Cytoscape environment, designed to facilitate automatic reasoning on regulatory networks. The BioQuali plugin enhances user-friendly conversions of regulatory networks (including reference databases) into signed directed graphs. BioQuali performs automatic global reasoning in order to decide which products in the network need to be up or down regulated (active or inactive) to globally explain experimental data. It highlights incomplete regions in the network, meaning that gene expression levels do not globally correlate with existing knowledge on regulation carried by the topology of the network. Conclusion: The BioQuali plugin facilitates in silico exploration of large-scale regulatory networks by combining the user-friendly tools of the Cytoscape environment with high-performance automatic reasoning algorithms. As a main feature, the plugin guides further investigation regarding a system by highlighting regions in the network that are not accurately described and merit specific study

    A microarray data-based semi-kinetic method for predicting quantitative dynamics of genetic networks

    Get PDF
    BACKGROUND: Elucidating the dynamic behaviour of genetic regulatory networks is one of the most significant challenges in systems biology. However, conventional quantitative predictions have been limited to small networks because publicly available transcriptome data has not been extensively applied to dynamic simulation. RESULTS: We present a microarray data-based semi-kinetic (MASK) method which facilitates the prediction of regulatory dynamics of genetic networks composed of recurrently appearing network motifs with reasonable accuracy. The MASK method allows the determination of model parameters representing the contribution of regulators to transcription rate from time-series microarray data. Using a virtual regulatory network and a Saccharomyces cerevisiae ribosomal protein gene module, we confirmed that a MASK model can predict expression profiles for various conditions as accurately as a conventional kinetic model. CONCLUSION: We have demonstrated the MASK method for the construction of dynamic simulation models of genetic networks from time-series microarray data, initial mRNA copy number and first-order degradation constants of mRNA. The quantitative accuracy of the MASK models has been confirmed, and the results indicated that this method enables the prediction of quantitative dynamics in genetic networks composed of commonly used network motifs, which cover considerable fraction of the whole network
    • …
    corecore