10 research outputs found

    Reconstruction of Escherichia coli transcriptional regulatory networks via regulon-based associations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network reconstruction methods that rely on covariance of expression of transcription regulators and their targets ignore the fact that transcription of regulators and their targets can be controlled differently and/or independently. Such oversight would result in many erroneous predictions. However, accurate prediction of gene regulatory interactions can be made possible through modeling and estimation of transcriptional activity of groups of co-regulated genes.</p> <p>Results</p> <p>Incomplete regulatory connectivity and expression data are used here to construct a consensus network of transcriptional regulation in <it>Escherichia coli </it>(<it>E. coli</it>). The network is updated via a covariance model describing the activity of gene sets controlled by common regulators. The proposed model-selection algorithm was used to annotate the likeliest regulatory interactions in <it>E. coli </it>on the basis of two independent sets of expression data, each containing many microarray experiments under a variety of conditions. The key regulatory predictions have been verified by an experiment and literature survey. In addition, the estimated activity profiles of transcription factors were used to describe their responses to environmental and genetic perturbations as well as drug treatments.</p> <p>Conclusion</p> <p>Information about transcriptional activity of documented co-regulated genes (a core regulon) should be sufficient for discovering new target genes, whose transcriptional activities significantly co-vary with the activity of the core regulon members. Our ability to derive a highly significant consensus network by applying the regulon-based approach to two very different data sets demonstrated the efficiency of this strategy. We believe that this approach can be used to reconstruct gene regulatory networks of other organisms for which partial sets of known interactions are available.</p

    Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding

    Get PDF
    Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators of transcription as well as proteins and enzymes shaping the DNA template. Such intrinsic multi-dimensionality makes uncovering connectivity patterns difficult and unreliable and it calls for adoption of methodologies commensurate with the underlying organization of the data source. Here we present a new computational method that predicts interactions between transcription factors and target genes using a compendium of microarray gene expression data and the knowledge of known interactions between genes and transcription factors. The proposed method called Kernel Embedding of REgulatory Networks (KEREN) is based on the concept of gene-regulon association and it captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct gene regulatory interactions in the model bacteria E.coli on a genome-wide scale. Our method not only yields accurate prediction of verifiable interactions, which outperforms on certain metrics comparable methodologies, but also demonstrates the utility of a geometric approach to the analysis of high-dimensional biological data. We also describe the general application of kernel embedding techniques to some other function and network discovery algorithms

    An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli.

    Get PDF
    Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems

    Inferring the conservative causal core of gene regulatory networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically.</p> <p>Results</p> <p>In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from <it>E. coli </it>that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently.</p> <p>Conclusions</p> <p>For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.</p

    Inferring the conservative causal core of gene regulatory networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically.</p> <p>Results</p> <p>In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from <it>E. coli </it>that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently.</p> <p>Conclusions</p> <p>For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.</p

    Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

    Get PDF
    Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect coregulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1 was not uploaded but is available by contacting the author. 27 pages, 5 figures, 15 supplementary file

    On the Choice and Number of Microarrays for Transcriptional Regulatory Network Inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests. However, while such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e.g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. Herein, we characterize the nature of dependency in an <it>Escherichia coli </it>microarray compendium and explore its consequences on the problem of determining which and how many arrays to use in correlation-based TRNI.</p> <p>Results</p> <p>We present evidence of substantial effective dependency among microarrays in this compendium, and characterize that dependency with respect to experimental condition factors. We then introduce a measure <it>n</it><sub><it>eff </it></sub>of the effective number of experiments in a compendium, and find that corresponding to the dependency observed in this particular compendium there is a huge reduction in effective sample size i.e., <it>n</it><sub><it>eff </it></sub>= 14.7 versus <it>n </it>= 376. Furthermore, we found that the <it>n</it><sub><it>eff </it></sub>of select subsets of experiments actually exceeded <it>n</it><sub><it>eff </it></sub>of the full compendium, suggesting that the adage 'less is more' applies here. Consistent with this latter result, we observed improved performance in TRNI using subsets of the data compared to results using the full compendium. We identified experimental condition factors that trend with changes in TRNI performance and <it>n</it><sub><it>eff </it></sub>, including growth phase and media type. Finally, using the set of known E. coli genetic regulatory interactions from RegulonDB, we demonstrated that false discovery rates (FDR) derived from <it>n</it><sub><it>eff </it></sub>-adjusted p-values were well-matched to FDR based on the RegulonDB truth set.</p> <p>Conclusions</p> <p>These results support utilization of <it>n</it><sub><it>eff </it></sub>as a potent descriptor of microarray compendia. In addition, they highlight a straightforward correlation-based method for TRNI with demonstrated meaningful statistical testing for significant edges, readily applicable to compendia from any species, even when a truth set is not available. This work facilitates a more refined approach to construction and utilization of mRNA expression compendia in TRNI.</p

    Identification of Relevant Protein-Gene Associations by Integrating Gene Expression Data and Transcriptional Regulatory Networks.

    Full text link
    One challenge in systems biology is integrating different biological data types to more accurately describe how a biological system functions. If networks describing a pathway or a particular regulatory activity is merged with gene expression data, the specific regulator-gene portions of the pathway responsible for changes in gene expression could be identified. In this thesis, I hypothesize that merging gene expression data with transcriptional network information will allow me to identify possibly regulatory mechanisms that govern the observed gene expression patterns. I developed a computational approach to merge these data types and demonstrated that the method can identify which regulator-gene associations better explain the gene expression patterns even when the activities of the regulators are not observed. Due to the complex interplay of different regulatory proteins during mRNA regulation, the individual activity of these proteins often can’t be measured directly. Previously described methods of identifying protein-gene associations have two main limitations: (1) failing in identifying combinatoric relationships and (2) prediction of inactive regulatory associations. The methods I developed model a regulatory network as a bipartite network with a top layer of unobserved regulators (protein activities) connected to a lower level of observed variables (mRNA expression values). This bipartite approach has been used in the past to study regulatory networks but assuming a linear mixing model. In contrast, I use a multinomial model that better captures the nonlinear patterns seen in gene regulation networks: Bayesian networks. I tested the developed tools using synthetic, E. coli, and human expression data. The synthetic data results show that the method is capable of identifying relevant connections. When using E.coli and human gene expression data, the method identified a simplified regulatory network that is both mechanistically sound and maximally consistent with the expression data. By identifying regulatory relationships that are apparently active given a set of gene expression data, this thesis provides a new lens to view gene expression data in general. The methods developed here are directly applicable to large transcriptional networks of any species and provide the foundation for a new branch of bioinformatics analysis.Ph.D.Chemical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78866/1/angelpr_1.pd

    Molecular analysis of virulence mechanisms associated with adherent-invasive Escherichia coli (AIEC)

    Get PDF
    Crohn's Disease (CD) is a chronic inflammatory bowel disease of unknown etiology. Recent work has shown that a new pathotype of Escherichia coli, Adherent Invasive E. coli (AIEC) may be associated with CD. AIEC has been shown to adhere to and invade epithelial cells and to replicate within macrophages (together this is called the AIEC phenotype). In this thesis, the AIEC phenotype of 84 E. coli strains were determined in order to identify the prevalence of this phenotype within the E. coli genus. This study showed that a significant proportion of E. coli strains (approx. 5%) are capable of adhering to and invading epithelial cells and undergoing intramacrophage replication. Moreover, the results presented in this study indicate a correlation between survival in macrophage and resistance to grazing by amoeba supporting the coincidental evolution hypothesis that resistance to amoebae could be a driving force in the evolution of pathogenicity in some bacteria, such as AIEC. In addition, this study has identified an important regulatory role for the CpxA/R two component system (TCS) in the invasive abilities of AIEC HM605, a colonic mucosa-associated CD isolate. A mutation in cpxR was shown to be defective in the invasion of epithelial cells and this defect was shown to be independent of motility or the expression of Type 1 fimbriae, factors that have been shown to be involved in the invasion of another strain of AIEC, isolated from a patient with ileal CD, called LF82. The CpxA/R TCS responds to disturbances in the cell envelope and has been implicated in the virulence of a number of Gram negative pathogens. In this study it is shown that the CpxA/R TCS regulates the expression of a potentially novel invasin called SinH. SinH is found in a number of invasive strains of E. coli and Salmonella. Moreover work presented here shows that a critical mechanism underpinning AIEC persistence in macrophages is the repair of DNA bases damaged by macrophage oxidants. Together these findings provide evidence to suggest that AIEC are a diverse group of E. coli and possess diverse molecular mechanisms and virulence factors that contribute to the AIEC phenotype. In addition, AIEC may have gone through different evolutionary histories acquiring various molecular mechanisms ultimately culminating in the AIEC phenotype. The gastrointestinal (GI) tract harbors a diverse microbiota; most are symbiotic or commensal however some bacteria have the potential to cause disease (pathobiont). The work presented here provides evidence to support the model that AIEC are pathobionts. AIEC strains can be carried as commensals in healthy guts however, when the intestinal homeostasis is disrupted, such as in the compromised gut of CD patients, AIEC may behave as opportunistic pathogens and cause and/or contribute to disease by driving intestinal inflammation
    corecore