1,228 research outputs found

    Statistically Significant Strings are Related to Regulatory Elements in the Promoter Regions of Saccharomyces cerevisiae

    Get PDF
    Finding out statistically significant words in DNA and protein sequences forms the basis for many genetic studies. By applying the maximal entropy principle, we give one systematic way to study the nonrandom occurrence of words in DNA or protein sequences. Through comparison with experimental results, it was shown that patterns of regulatory binding sites in Saccharomyces cerevisiae(yeast) genomes tend to occur significantly in the promoter regions. We studied two correlated gene family of yeast. The method successfully extracts the binding sites varified by experiments in each family. Many putative regulatory sites in the upstream regions are proposed. The study also suggested that some regulatory sites are a ctive in both directions, while others show directional preference.Comment: 13 pages, 2 figures, 3 tables. To appear in Physica

    Finding regulatory modules through large-scale gene-expression data analysis

    Full text link
    The use of gene microchips has enabled a rapid accumulation of gene-expression data. One of the major challenges of analyzing this data is the diversity, in both size and signal strength, of the various modules in the gene regulatory networks of organisms. Based on the Iterative Signature Algorithm [Bergmann, S., Ihmels, J. and Barkai, N. (2002) Phys. Rev. E 67, 031902], we present an algorithm - the Progressive Iterative Signature Algorithm (PISA) - that, by sequentially eliminating modules, allows unsupervised identification of both large and small regulatory modules. We applied PISA to a large set of yeast gene-expression data, and, using the Gene Ontology annotation database as a reference, found that our algorithm is much better able to identify regulatory modules than methods based on high-throughput transcription-factor binding experiments or on comparative genomics.Comment: 7 pages, 6 figures in main text; 2 text pages, 7 figures, 1 table in supplement; rewritten versio

    Multi-scale genetic network inference based on time series gene expression profiles

    Get PDF
    This work integrates multi-scale clustering and short-time correlation to estimate genetic networks with different time resolutions and detail levels. Gene expression data are noisy and large scale. Clustering is widely used to group genes with similar pattern. The cluster centers can be used to infer the genetic networks among these clusters. This work introduces the Multi-scale Fuzzy K-means clustering algorithm to uncover groups of coregulated genes and capture the networks in different levels of detail.;Time series expression profiles provide dynamic information for inferring gene regulatory relationships. Large scale network inference, identifying the transient interactions and feedback loops as well as differentiating direct and indirect interactions are among the major challenges of genetic network inference. Time correlation can estimate the time delay and edge direction. Partial correlation and directed-separation theory help differentiate direct and indirect interactions and identify feedback loops. This work introduces the constraint-based time-correlation (CBTC) network inference algorithm that combines these methods with time correlation estimation to more fully characterize genetic networks. Gene expression regulation can happen in specific time periods and conditions instead of across the whole expression profile. Short-time correlation can capture transient interactions.;The network discovery algorithm was mainly validated using yeast cell cycle data. The algorithm successfully identified the yeast cell cycle development stages, cell cycle and negative feedback loops, and indicated how the networks dynamically changes over time. The inferred networks reflect most interactions previously identified by genome-wide location analysis and match the extant literature. At detailed network level, the inferred networks provide more detailed information about genes (or clusters) and the interactions among them. Interesting genes, clusters and interactions were identified, which match the literature and the gene ontology information and provide hypotheses for further studies

    Intracompartmental and Intercompartmental Transcriptional Networks Coordinate the Expression of Genes for Organellar Functions

    Get PDF
    Genes for mitochondrial and chloroplast proteins are distributed between the nuclear and organellar genomes. Organelle biogenesis and metabolism, therefore, require appropriate coordination of gene expression in the different compartments to ensure efficient synthesis of essential multiprotein complexes of mixed genetic origin. Whereas organelle-to-nucleus signaling influences nuclear gene expression at the transcriptional level, organellar gene expression (OGE) is thought to be primarily regulated posttranscriptionally. Here, we show that intracompartmental and intercompartmental transcriptional networks coordinate the expression of genes for organellar functions. Nearly 1,300 ATH1 microarray-based transcriptional profiles of nuclear and organellar genes for mitochondrial and chloroplast proteins in the model plant Arabidopsis (Arabidopsis thaliana) were analyzed. The activity of genes involved in organellar energy production (OEP) or OGE in each of the organelles and in the nucleus is highly coordinated. Intracompartmental networks that link the OEP and OGE gene sets serve to synchronize the expression of nucleus- and organelle-encoded proteins. At a higher regulatory level, coexpression of organellar and nuclear OEP/OGE genes typically modulates chloroplast functions but affects mitochondria only when chloroplast functions are perturbed. Under conditions that induce energy shortage, the intercompartmental coregulation of photosynthesis genes can even override intracompartmental networks. We conclude that dynamic intracompartmental and intercompartmental transcriptional networks for OEP and OGE genes adjust the activity of organelles in response to the cellular energy state and environmental stresses, and we identify candidate cis-elements involved in the transcriptional coregulation of nuclear genes. Regarding the transcriptional regulation of chloroplast genes, novel tentative target genes of σ factors are identified

    Probabilities of spurious connections in gene networks: Application to expression time series

    Full text link
    Motivation: The reconstruction of gene networks from gene expression microarrays is gaining popularity as methods improve and as more data become available. The reliability of such networks could be judged by the probability that a connection between genes is spurious, resulting from chance fluctuations rather than from a true biological relationship. Results: Unlike the false discovery rate and positive false discovery rate, the decisive false discovery rate (dFDR) is exactly equal to a conditional probability without assuming independence or the randomness of hypothesis truth values. This property is useful not only in the common application to the detection of differential gene expression, but also in determining the probability of a spurious connection in a reconstructed gene network. Estimators of the dFDR can estimate each of three probabilities: 1. The probability that two genes that appear to be associated with each other lack such association. 2. The probability that a time ordering observed for two associated genes is misleading. 3. The probability that a time ordering observed for two genes is misleading, either because they are not associated or because they are associated without a lag in time. The first probability applies to both static and dynamic gene networks, and the other two only apply to dynamic gene networks. Availability: Cross-platform software for network reconstruction, probability estimation, and plotting is free from http://www.davidbickel.com as R functions and a Java application.Comment: Like q-bio.GN/0404032, this was rejected in March 2004 because it was submitted to the math archive. The only modification is a corrected reference to q-bio.GN/0404032, which was not modified at al

    Creating, Modeling, and Visualizing Metabolic Networks

    Get PDF
    Metabolic networks combine metabolism and regulation. These complex networks are difficult to understand and create due to the diverse types of information that need to be represented. This chapter describes a suite of interlinked tools for developing, displaying, and modeling metabolic networks. The metabolic network interactions database, MetNetDB, contains information on regulatory and metabolic interactions derived from a combination of web databases and input from biologists in their area of expertise. PathBinderA mines the biological “literaturome” by searching for new interactions or supporting evidence for existing interactions in metabolic networks. Sentences from abstracts are ranked in terms of the likelihood that an interaction is described and combined with evidence provided by other sentences. FCModeler, a publicly available software package, enables the biologist to visualize and model metabolic and regulatory network maps. FCModeler aids in the development and evaluation of hypotheses, and provides a modeling framework for assessing the large amounts of data captured by high-throughput gene expression experiments

    Large-scale gene expression data analysis: a new challenge to computational biologists

    Get PDF
    The use of high-density DNA arrays to monitor gene expression at a genome-wide scale constitutes a fundamental advance in biology. In particular, the expression pattern of all genes in Saccharomyces cerevisiae can be interrogated using microarray analysis where cDNAs are hybridized to an array of each of the approximately 6000 genes in the yeast genome. In this survey I review three recent experiments related to transcriptional regulation and discuss the great challenge for computational biologists trying to extract functional information from such large-scale gene expression data

    Genomics of Signaling Crosstalk of Estrogen Receptor α in Breast Cancer Cells

    Get PDF
    BACKGROUND: The estrogen receptor alpha (ERalpha) is a ligand-regulated transcription factor. However, a wide variety of other extracellular signals can activate ERalpha in the absence of estrogen. The impact of these alternate modes of activation on gene expression profiles has not been characterized. METHODOLOGY/PRINCIPAL FINDINGS: We show that estrogen, growth factors and cAMP elicit surprisingly distinct ERalpha-dependent transcriptional responses in human MCF7 breast cancer cells. In response to growth factors and cAMP, ERalpha primarily activates and represses genes, respectively. The combined treatments with the anti-estrogen tamoxifen and cAMP or growth factors regulate yet other sets of genes. In many cases, tamoxifen is perverted to an agonist, potentially mimicking what is happening in certain tamoxifen-resistant breast tumors and emphasizing the importance of the cellular signaling environment. Using a computational analysis, we predicted that a Hox protein might be involved in mediating such combinatorial effects, and then confirmed it experimentally. Although both tamoxifen and cAMP block the proliferation of MCF7 cells, their combined application stimulates it, and this can be blocked with a dominant-negative Hox mutant. CONCLUSIONS/SIGNIFICANCE: The activating signal dictates both target gene selection and regulation by ERalpha, and this has consequences on global gene expression patterns that may be relevant to understanding the progression of ERalpha-dependent carcinomas

    The cis-regulatory map of Shewanella genomes

    Get PDF
    While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems
    corecore