61 research outputs found

    Binding to SMN2 pre-mRNA-protein complex elicits specificity for small molecule splicing modifiers

    Get PDF
    Small molecule splicing modifiers have been previously described that target the general splicing machinery and thus have low specificity for individual genes. Several potent molecules correcting the splicing deficit of the SMN2 (survival of motor neuron 2) gene have been identified and these molecules are moving towards a potential therapy for spinal muscular atrophy (SMA). Here by using a combination of RNA splicing, transcription, and protein chemistry techniques, we show that these molecules directly bind to two distinct sites of the SMN2 pre-mRNA, thereby stabilizing a yet unidentified ribonucleoprotein (RNP) complex that is critical to the specificity of these small molecules for SMN2 over other genes. In addition to the therapeutic potential of these molecules for treatment of SMA, our work has wide-ranging implications in understanding how small molecules can interact with specific quaternary RNA structures

    Classification and function of small open reading frames

    Get PDF
    Small open reading frames (smORFs) of 100 codons or fewer are usually - if arbitrarily - excluded from proteome annotations. Despite this, the genomes of many metazoans, including humans, contain millions of smORFs, some of which fulfil key physiological functions. Recently, the transcriptome of Drosophila melanogaster was shown to contain thousands of smORFs of different classes that actively undergo translation, which produces peptides of mostly unknown function. Here, we present a comprehensive analysis of smORFs in flies, mice and humans. We propose the existence of several functional classes of smORFs, ranging from inert DNA sequences to transcribed and translated cis-regulators of translation and peptides with a propensity to function as regulators of membrane-associated proteins, or as components of ancient protein complexes in the cytoplasm. We suggest that the different smORF classes could represent steps in gene, peptide and protein evolution. Our analysis introduces a distinction between different peptide-coding classes of smORFs in animal genomes, and highlights the role of model organisms for the study of small peptide biology in the context of development, physiology and human disease

    Long noncoding RNAs are rarely translated in two human cell lines

    Get PDF
    Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ∼100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA− fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA− fraction in both cell lines. LncRNAs are ∼13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ∼92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome

    An integrated encyclopedia of DNA elements in the human genome

    Get PDF
    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research
    corecore