7,650 research outputs found

    Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis-regulatory function.

    Get PDF
    A vast amount of research on the regulation of gene expression has relied on plasmid reporter assays. In this study, we show that plasmids widely used for this purpose constitutively produce substantial amounts of RNA from a TATA-containing cryptic promoter within the origin of replication. Readthrough of these RNAs into the intended transcriptional unit potently stimulated reporter activity when the inserted test sequence contained a 3' splice site (ss). We show that two human sequences, originally reported to be internal ribosome entry sites and later to instead be promoters, mimic both types of element in dicistronic reporter assays by causing these cryptic readthrough transcripts to splice in patterns that allow efficient translation of the downstream cistron. Introduction of test sequences containing 3' ss into monocistronic luciferase reporter vectors widely used in the study of transcriptional regulation also created the false appearance of promoter function via the same mechanism. Across a large number of variants of these plasmids, we found a very highly significant correlation between reporter activity and levels of such spliced readthrough transcripts. Computational estimation of the frequency of cryptic 3' ss in genomic sequences suggests that misattribution of cis-regulatory function may be a common occurrence

    TISs-ST: a web server to evaluate polymorphic translation initiation sites and their reflections on the secretory targets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The nucleotide sequence flanking the translation initiation codon (start codon context) affects the translational efficiency of eukaryotic mRNAs, and may indicate the presence of an alternative translation initiation site (TIS) to produce proteins with different properties. Multi-targeting may reflect the translational variability of these other protein forms. In this paper we present a web server that performs computations to investigate the usage of alternative translation initiation sites for the synthesis of new protein variants that might have different functions.</p> <p>Results</p> <p>An efficient web-based tool entitled TISs-ST (Translation Initiation Sites and Secretory Targets) evaluates putative translation initiation sites and indicates the prediction of a signal peptide of the protein encoded from this site. The TISs-ST web server is freely available to both academic and commercial users and can be accessed at <url>http://ipe.cbmeg.unicamp.br/pub/TISs-ST</url>.</p> <p>Conclusion</p> <p>The program can be used to evaluate alternative translation initiation site consensus with user-specified sequences, based on their composition or on many position weight matrix models. TISs-ST provides analytical and visualization tools for evaluating the periodic frequency, the consensus pattern and the total information content of a sequence data set. A search option allows for the identification of signal peptides from predicted proteins using the PrediSi software.</p

    Deep sequencing of pre-translational mRNPs reveals hidden flux through evolutionarily conserved AS-NMD pathways

    Get PDF
    Deep sequencing of mRNAs (RNA-Seq) is now the preferred method for transcriptome-wide quantification of gene expression. Yet many mRNA isoforms, such as those eliminated by nonsense-mediated decay (NMD), are inherently unstable. Thus a significant drawback of steady-state RNA-Seq is that it provides marginal information on the flux through alternative splicing pathways. Measurement of such flux necessitates capture of newly made species prior to mRNA decay. One means to capture nascent mRNAs is affinity purifying either the exon junction complex (EJC) or activated spliceosomes. Late-stage spliceosomes deposit the EJC upstream of exon-exon junctions, where it remains associated until the first round of translation. As most mRNA decay pathways are translation-dependent, these EJC- or spliceosome-associated, pre-translational mRNAs should provide an accurate record of the initial population of alternate mRNA isoforms. Previous work has analyzed the protein composition and structure of pre- translational mRNPs in detail. While in the Moore lab, my project has focused on exploring the diversity of mRNA isoforms contained within these complexes. As expected, known NMD isoforms are more highly represented in pre-translational mRNPs than in RNA-Seq libraries. To investigate whether pre-translational mRNPs contain novel mRNA isoforms, we created a bioinformatics pipeline that identified thousands of previously unannotated splicing events. Though many can be attributed to “splicing noise”, others are evolutionarily-conserved events that produce new AS-NMD isoforms likely involved in maintenance of protein homeostasis. Several of these occur in genes whose overexpression has been linked to poor cancer prognosis

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Characterization of the four genes encoding cytoplasmic ribosomal protein S15a in Arabidopsis thaliana

    Get PDF
    Eukaryotic cytosolic ribosomes are composed of two distinct subunits consisting of four individual ribosomal RNAs and, in Arabidopsis thaliana, 81 ribosomal proteins. Functional subunit assembly is dependent on the production of each ribosomal component. Arabidopsis thaliana r-protein genes exist in multi-gene families ranging in size from two to seven transcriptionally active members. The cytosolic RPS15a gene family consists of four members (RPS15aA, -C, -D and -F) that, at the amino acid level, share 87-100% identity. Using semi-quantitative RT-PCR I have shown that RPS15aC is not expressed and that transcript abundance differs both spatially and temporally among the remaining RPS15a genes in non-treated Arabidopsis tissues and in seedlings following a variety of abiotic stresses. A comprehensive analysis of the RPS15a 5' regulatory regions (RRs) using a series of deletion constructs was used to determine the minimal region required for gene expression and identify putative cis-regulatory elements. Transcription start site mapping using 5' RACE indicated multiple sites of initiation for RPS15aA and -F and only a single site for RPS15aD while all three genes contain a leader intron upstream of the start codon. Analysis of reporter gene activity in transgenic Arabidopsis containing a series of 5' RR deletion::GUS fusions showed that, similar to previous RT-PCR results, there was a trend for mitotically active tissues to stain for GUS activity. Putative cis-elements including the TELO box, PCNA Site II motif and pollen specific elements were identified. However, there was not always a clear correlation between the presence of a putative element and RPS15a transcript abundance or GUS activity. Although variation in transcriptional activity of each RPS15a gene has been observed, subcellular localization of both RPS15aA and -D in the nucleolus has been confirmed in planta by confocal microscopy. The results of this thesis research suggest while all three active RPS15a genes are transcriptionally regulated, additional post-transcriptional and/or translational regulation may be responsible for final RPS15a levels while differential isoform incorporation into ribosomal subunits may be the final point of r-protein regulation

    Computational annotation of eukaryotic gene structures: algorithms development and software systems

    Get PDF
    An important foundation for the advancement of both basic and applied biological science is correct annotation of protein-coding gene repertoires in model organisms. Accurate automated annotation of eukaryotic gene structures remains a challenging, open-ended and critical problem for modern computational biology.;The use of extrinsic (homology) information has been shown as a quite successful strategy for this task, though it is not a perfect solution, for a variety of reasons. More recently, gene prediction methods leveraging information present in syntenic genomic sequences have become favorable, though these too, have limitations.;Identifying genes by inspection of genomic sequence alone thoroughly tests our theoretical understanding of the gene recognition process as it occurs in vivo, and where we encounter failure, excellent opportunities for meaningful research are revealed.;Therefore, the continued development of methods not reliant on homology information---the so-called ab initio gene prediction methods---should help to more rapidly achieve a comprehensive understanding of gene content in our model organisms, at least.;This thesis explores the development of novel algorithms in an attempt to advance the current state-of-the-art in gene prediction, with particular emphasis on ab initio approaches.;The work has been conducted with an eye towards contributing open source, well-documented, and extensible software systems implementing the methods, and to generate novel biological knowledge with respect to plant taxa, in particular

    Alternative translation initiation unraveled by N-terminomics and ribosome profiling

    Get PDF

    Conserved Secondary Structures in Aspergillus

    Get PDF
    Background: Recent evidence suggests that the number and variety of functional RNAs (ncRNAs as well as cis-acting RNA elements within mRNAs) is much higher than previously thought; thus, the ability to computationally predict and analyze RNAs has taken on new importance. We have computationally studied the secondary structures in an alignment of six Aspergillus genomes. Little is known about the RNAs present in this set of fungi, and this diverse set of genomes has an optimal level of sequence conservation for observing the correlated evolution of base-pairs seen in RNAs. Methodology/Principal Findings: We report the results of a whole-genome search for evolutionarily conserved secondary structures, as well as the results of clustering these predicted secondary structures by structural similarity. We find a total of 7450 predicted secondary structures, including a new predicted,60 bp long hairpin motif found primarily inside introns. We find no evidence for microRNAs. Different types of genomic regions are over-represented in different classes of predicted secondary structures. Exons contain the longest motifs (primarily long, branched hairpins), 59 UTRs primarily contain groupings of short hairpins located near the start codon, and 39 UTRs contain very little secondary structure compared to other regions. There is a large concentration of short hairpins just inside the boundaries of exons. The density of predicted intronic RNAs increases with the length of introns, and the density of predicted secondary structures within mRNA coding regions increases with the number of introns in a gene

    Knowledge discovery and modeling in genomic databases

    Get PDF
    This dissertation research is targeted toward developing effective and accurate methods for identifying gene structures in the genomes of high eukaryotes, such as vertebrate organisms. Several effective hidden Markov models (HMMs) are developed to represent the consensus and degeneracy features of the functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites in vertebrate genes. The HMM system based on the developed models is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system achieves high sensitivity and specificity in detecting the functional sites. This HMM system is then incorporated into a new gene detection system, called GeneScout. The main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to the analysis of paths in the graph G. A dynamic programming algorithm is employed by GeneScout to find the optimal path in G. Experimental results on the standard test dataset collected by Burset and Guigo indicate that GeneScout is comparable to existing gene discovery tools and complements the widely used GenScan system