45 research outputs found

    DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The accurate determination of transcription factor binding affinities is an important problem in biology and key to understanding the gene regulation process. Position weight matrices are commonly used to represent the binding properties of transcription factor binding sites but suffer from low information content and a large number of false matches in the genome. We describe a novel algorithm for the refinement of position weight matrices representing transcription factor binding sites based on experimental data, including ChIP-chip analyses. We present an iterative weight matrix optimization method that is more accurate in distinguishing true transcription factor binding sites from a negative control set. The initial position weight matrix comes from JASPAR, TRANSFAC or other sources. The main new features are the discriminative nature of the method and matrix width and length optimization.</p> <p>Results</p> <p>The algorithm was applied to the increasing collection of known transcription factor binding sites obtained from ChIP-chip experiments. The results show that our algorithm significantly improves the sensitivity and specificity of matrix models for identifying transcription factor binding sites.</p> <p>Conclusion</p> <p>When the transcription factor is known, it is more appropriate to use a discriminative approach such as the one presented here to derive its transcription factor-DNA binding properties starting with a matrix, as opposed to performing <it>de novo </it>motif discovery. Generating more accurate position weight matrices will ultimately contribute to a better understanding of eukaryotic transcriptional regulation, and could potentially offer a better alternative to <it>ab initio </it>motif discovery.</p

    Computational analyses of eukaryotic promoters

    Get PDF
    Computational analysis of eukaryotic promoters is one of the most difficult problems in computational genomics and is essential for understanding gene expression profiles and reverse-engineering gene regulation network circuits. Here I give a basic introduction of the problem and recent update on both experimental and computational approaches. More details may be found in the extended references. This review is based on a summer lecture given at Max Planck Institute at Berlin in 2005

    Context Differences Reveal Insulator and Activator Functions of a Su(Hw) Binding Region

    Get PDF
    Insulators are DNA elements that divide chromosomes into independent transcriptional domains. The Drosophila genome contains hundreds of binding sites for the Suppressor of Hairy-wing [Su(Hw)] insulator protein, corresponding to locations of the retroviral gypsy insulator and non-gypsy binding regions (BRs). The first non-gypsy BR identified, 1A-2, resides in cytological region 1A. Using a quantitative transgene system, we show that 1A-2 is a composite insulator containing enhancer blocking and facilitator elements. We discovered that 1A-2 separates the yellow (y) gene from a previously unannotated, non-coding RNA gene, named yar for y-achaete (ac) intergenic RNA. The role of 1A-2 was elucidated using homologous recombination to excise these sequences from the natural location, representing the first deletion of any Su(Hw) BR in the genome. Loss of 1A-2 reduced yar RNA accumulation, without affecting mRNA levels from the neighboring y and ac genes. These data indicate that within the 1A region, 1A-2 acts an activator of yar transcription. Taken together, these studies reveal that the properties of 1A-2 are context-dependent, as this element has both insulator and enhancer activities. These findings imply that the function of non-gypsy Su(Hw) BRs depends on the genomic environment, predicting that Su(Hw) BRs represent a diverse collection of genomic regulatory elements

    Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.</p> <p>Results</p> <p>To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies.</p> <p>To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA.</p> <p>Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.</p> <p>Conclusion</p> <p>Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.</p

    Comparative analyses imply that the enigmatic sigma factor 54 is a central controller of the bacterial exterior

    Get PDF
    Contains fulltext : 95738.pdf (publisher's version ) (Open Access)BACKGROUND: Sigma-54 is a central regulator in many pathogenic bacteria and has been linked to a multitude of cellular processes like nitrogen assimilation and important functional traits such as motility, virulence, and biofilm formation. Until now it has remained obscure whether these phenomena and the control by Sigma-54 share an underlying theme. RESULTS: We have uncovered the commonality by performing a range of comparative genome analyses. A) The presence of Sigma-54 and its associated activators was determined for all sequenced prokaryotes. We observed a phylum-dependent distribution that is suggestive of an evolutionary relationship between Sigma-54 and lipopolysaccharide and flagellar biosynthesis. B) All Sigma-54 activators were identified and annotated. The relation with phosphotransfer-mediated signaling (TCS and PTS) and the transport and assimilation of carboxylates and nitrogen containing metabolites was substantiated. C) The function annotations, that were represented within the genomic context of all genes encoding Sigma-54, its activators and its promoters, were analyzed for intra-phylum representation and inter-phylum conservation. Promoters were localized using a straightforward scoring strategy that was formulated to identify similar motifs. We found clear highly-represented and conserved genetic associations with genes that concern the transport and biosynthesis of the metabolic intermediates of exopolysaccharides, flagella, lipids, lipopolysaccharides, lipoproteins and peptidoglycan. CONCLUSION: Our analyses directly implicate Sigma-54 as a central player in the control over the processes that involve the physical interaction of an organism with its environment like in the colonization of a host (virulence) or the formation of biofilm
    corecore