262 research outputs found

    Correlation between nucleotide composition and folding energy of coding sequences with special attention to wobble bases

    Get PDF
    Background: The secondary structure and complexity of mRNA influences its accessibility to regulatory molecules (proteins, micro-RNAs), its stability and its level of expression. The mobile elements of the RNA sequence, the wobble bases, are expected to regulate the formation of structures encompassing coding sequences. Results: The sequence/folding energy (FE) relationship was studied by statistical, bioinformatic methods in 90 CDS containing 26,370 codons. I found that the FE (dG) associated with coding sequences is significant and negative (407 kcal/1000 bases, mean +/- S.E.M.) indicating that these sequences are able to form structures. However, the FE has only a small free component, less than 10% of the total. The contribution of the 1st and 3rd codon bases to the FE is larger than the contribution of the 2nd (central) bases. It is possible to achieve a ~ 4-fold change in FE by altering the wobble bases in synonymous codons. The sequence/FE relationship can be described with a simple algorithm, and the total FE can be predicted solely from the sequence composition of the nucleic acid. The contributions of different synonymous codons to the FE are additive and one codon cannot replace another. The accumulated contributions of synonymous codons of an amino acid to the total folding energy of an mRNA is strongly correlated to the relative amount of that amino acid in the translated protein. Conclusion: Synonymous codons are not interchangable with regard to their role in determining the mRNA FE and the relative amounts of amino acids in the translated protein, even if they are indistinguishable in respect of amino acid coding.Comment: 14 pages including 6 figures and 1 tabl

    Inferring Binding Energies from Selected Binding Sites

    Get PDF
    We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay

    Get PDF
    Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.National Human Genome Research Institute (U.S.) (grant R01HG004037)National Science Foundation (U.S.) ((NSF) grant PHY-0957573)National Science Foundation (U.S.) (NSF grant PHY-1022140)Broad Institut

    A systematic, large-scale comparison of transcription factor binding site models

    Get PDF
    Background The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results While the area- under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites

    Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms.</p> <p>Results</p> <p>We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins.</p> <p>Conclusion</p> <p>Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

    Identification of Synaptic Targets of Drosophila Pumilio

    Get PDF
    Drosophila Pumilio (Pum) protein is a translational regulator involved in embryonic patterning and germline development. Recent findings demonstrate that Pum also plays an important role in the nervous system, both at the neuromuscular junction (NMJ) and in long-term memory formation. In neurons, Pum appears to play a role in homeostatic control of excitability via down regulation of para, a voltage gated sodium channel, and may more generally modulate local protein synthesis in neurons via translational repression of eIF-4E. Aside from these, the biologically relevant targets of Pum in the nervous system remain largely unknown. We hypothesized that Pum might play a role in regulating the local translation underlying synapse-specific modifications during memory formation. To identify relevant translational targets, we used an informatics approach to predict Pum targets among mRNAs whose products have synaptic localization. We then used both in vitro binding and two in vivo assays to functionally confirm the fidelity of this informatics screening method. We find that Pum strongly and specifically binds to RNA sequences in the 3′UTR of four of the predicted target genes, demonstrating the validity of our method. We then demonstrate that one of these predicted target sequences, in the 3′UTR of discs large (dlg1), the Drosophila PSD95 ortholog, can functionally substitute for a canonical NRE (Nanos response element) in vivo in a heterologous functional assay. Finally, we show that the endogenous dlg1 mRNA can be regulated by Pumilio in a neuronal context, the adult mushroom bodies (MB), which is an anatomical site of memory storage
    corecore