262 research outputs found
Correlation between nucleotide composition and folding energy of coding sequences with special attention to wobble bases
Background: The secondary structure and complexity of mRNA influences its
accessibility to regulatory molecules (proteins, micro-RNAs), its stability and
its level of expression. The mobile elements of the RNA sequence, the wobble
bases, are expected to regulate the formation of structures encompassing coding
sequences.
Results: The sequence/folding energy (FE) relationship was studied by
statistical, bioinformatic methods in 90 CDS containing 26,370 codons. I found
that the FE (dG) associated with coding sequences is significant and negative
(407 kcal/1000 bases, mean +/- S.E.M.) indicating that these sequences are able
to form structures. However, the FE has only a small free component, less than
10% of the total. The contribution of the 1st and 3rd codon bases to the FE is
larger than the contribution of the 2nd (central) bases. It is possible to
achieve a ~ 4-fold change in FE by altering the wobble bases in synonymous
codons. The sequence/FE relationship can be described with a simple algorithm,
and the total FE can be predicted solely from the sequence composition of the
nucleic acid. The contributions of different synonymous codons to the FE are
additive and one codon cannot replace another. The accumulated contributions of
synonymous codons of an amino acid to the total folding energy of an mRNA is
strongly correlated to the relative amount of that amino acid in the translated
protein.
Conclusion: Synonymous codons are not interchangable with regard to their
role in determining the mRNA FE and the relative amounts of amino acids in the
translated protein, even if they are indistinguishable in respect of amino acid
coding.Comment: 14 pages including 6 figures and 1 tabl
Inferring Binding Energies from Selected Binding Sites
We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms
Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data
Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al
Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences
(c) 2014 De Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
Recommended from our members
Review of the algal biology program within the National Alliance for Advanced Biofuels and Bioproducts
In 2010, when the National Alliance for Advanced Biofuels and Bioproducts (NAABB) consortium began, little was known about the molecular basis of algal biomass or oil production. Very few algal genome sequences were available and efforts to identify the best-producing wild species through bioprospecting approaches had largely stalled after the U.S. Department of Energy's Aquatic Species Program. This lack of knowledge included how reduced carbon was partitioned into storage products like triglycerides or starch and the role played by metabolite remodeling in the accumulation of energy-dense storage products. Furthermore, genetic transformation and metabolic engineering approaches to improve algal biomass and oil yields were in their infancy. Genome sequencing and transcriptional profiling were becoming less expensive, however; and the tools to annotate gene expression profiles under various growth and engineered conditions were just starting to be developed for algae. It was in this context that an integrated algal biology program was introduced in the NAABB to address the greatest constraints limiting algal biomass yield. This review describes the NAABB algal biology program, including hypotheses, research objectives, and strategies to move algal biology research into the twenty-first century and to realize the greatest potential of algae biomass systems to produce biofuels
Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay
Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.National Human Genome Research Institute (U.S.) (grant R01HG004037)National Science Foundation (U.S.) ((NSF) grant PHY-0957573)National Science Foundation (U.S.) (NSF grant PHY-1022140)Broad Institut
A systematic, large-scale comparison of transcription factor binding site models
Background The modelling of gene regulation is a major challenge in biomedical
research. This process is dominated by transcription factors (TFs) and
mutations in their binding sites (TFBSs) may cause the misregulation of genes,
eventually leading to disease. The consequences of DNA variants on TF binding
are modelled in silico using binding matrices, but it remains unclear whether
these are capable of accurately representing in vivo binding. In this study,
we present a systematic comparison of binding models for 82 human TFs from
three freely available sources: JASPAR matrices, HT-SELEX-generated models and
matrices derived from protein binding microarrays (PBMs). We determined their
ability to detect experimentally verified “real” in vivo TFBSs derived from
ENCODE ChIP-seq data. As negative controls we chose random downstream exonic
sequences, which are unlikely to harbour TFBS. All models were assessed by
receiver operating characteristics (ROC) analysis. Results While the area-
under-curve was low for most of the tested models with only 47 % reaching a
score of 0.7 or higher, we noticed strong differences between the various
position-specific scoring matrices with JASPAR and HT-SELEX models showing
higher success rates than PBM-derived models. In addition, we found that while
TFBS sequences showed a higher degree of conservation than randomly chosen
sequences, there was a high variability between individual TFBSs. Conclusions
Our results show that only few of the matrix-based models used to predict
potential TFBS are able to reliably detect experimentally confirmed TFBS. We
compiled our findings in a freely accessible web application called ePOSSUM
(http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to
assess the impact of genetic alterations on TF binding in user-defined
sequences. Additionally, ePOSSUM provides information on the reliability of
the prediction using our test set of experimentally confirmed binding sites
Discriminative motif discovery in DNA and protein sequences using the DEME algorithm
<p>Abstract</p> <p>Background</p> <p>Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms.</p> <p>Results</p> <p>We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins.</p> <p>Conclusion</p> <p>Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p
Identification of Synaptic Targets of Drosophila Pumilio
Drosophila Pumilio (Pum) protein is a translational regulator involved in embryonic patterning and germline development. Recent findings demonstrate that Pum also plays an important role in the nervous system, both at the neuromuscular junction (NMJ) and in long-term memory formation. In neurons, Pum appears to play a role in homeostatic control of excitability via down regulation of para, a voltage gated sodium channel, and may more generally modulate local protein synthesis in neurons via translational repression of eIF-4E. Aside from these, the biologically relevant targets of Pum in the nervous system remain largely unknown. We hypothesized that Pum might play a role in regulating the local translation underlying synapse-specific modifications during memory formation. To identify relevant translational targets, we used an informatics approach to predict Pum targets among mRNAs whose products have synaptic localization. We then used both in vitro binding and two in vivo assays to functionally confirm the fidelity of this informatics screening method. We find that Pum strongly and specifically binds to RNA sequences in the 3′UTR of four of the predicted target genes, demonstrating the validity of our method. We then demonstrate that one of these predicted target sequences, in the 3′UTR of discs large (dlg1), the Drosophila PSD95 ortholog, can functionally substitute for a canonical NRE (Nanos response element) in vivo in a heterologous functional assay. Finally, we show that the endogenous dlg1 mRNA can be regulated by Pumilio in a neuronal context, the adult mushroom bodies (MB), which is an anatomical site of memory storage
- …
