80 research outputs found

    Alu Exonization Events Reveal Features Required for Precise Recognition of Exons by the Splicing Machinery

    Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing

    Characteristics of transposable element exonization within human and mouse

    Insertion of transposed elements within mammalian genes is thought to be an important contributor to mammalian evolution and speciation. Insertion of transposed elements into introns can lead to their activation as alternatively spliced cassette exons, an event called exonization. Elucidation of the evolutionary constraints that have shaped fixation of transposed elements within human and mouse protein coding genes and subsequent exonization is important for understanding of how the exonization process has affected transcriptome and proteome complexities. Here we show that exonization of transposed elements is biased towards the beginning of the coding sequence in both human and mouse genes. Analysis of single nucleotide polymorphisms (SNPs) revealed that exonization of transposed elements can be population-specific, implying that exonizations may enhance divergence and lead to speciation. SNP density analysis revealed differences between Alu and other transposed elements. Finally, we identified cases of primate-specific Alu elements that depend on RNA editing for their exonization. These results shed light on TE fixation and the exonization process within human and mouse genes.Comment: 11 pages, 4 figure

    The contribution of Alu exons to the human proteome.

    BackgroundAlu elements are major contributors to lineage-specific new exons in primate and human genomes. Recent studies indicate that some Alu exons have high transcript inclusion levels or tissue-specific splicing profiles, and may play important regulatory roles in modulating mRNA degradation or translational efficiency. However, the contribution of Alu exons to the human proteome remains unclear and controversial. The prevailing view is that exons derived from young repetitive elements, such as Alu elements, are restricted to regulatory functions and have not had adequate evolutionary time to be incorporated into stable, functional proteins.ResultsWe adopt a proteotranscriptomics approach to systematically assess the contribution of Alu exons to the human proteome. Using RNA sequencing, ribosome profiling, and proteomics data from human tissues and cell lines, we provide evidence for the translational activities of Alu exons and the presence of Alu exon derived peptides in human proteins. These Alu exon peptides represent species-specific protein differences between primates and other mammals, and in certain instances between humans and closely related primates. In the case of the RNA editing enzyme ADARB1, which contains an Alu exon peptide in its catalytic domain, RNA sequencing analyses of A-to-I editing demonstrate that both the Alu exon skipping and inclusion isoforms encode active enzymes. The Alu exon derived peptide may fine tune the overall editing activity and, in limited cases, the site selectivity of ADARB1 protein products.ConclusionsOur data indicate that Alu elements have contributed to the acquisition of novel protein sequences during primate and human evolution

    An Alu-derived intronic splicing enhancer facilitates intronic processing and modulates aberrant splicing in ATM

    We have previously reported a natural GTAA deletion within an intronic splicing processing element (ISPE) of the ataxia telangiectasia mutated (ATM) gene that disrupts a non-canonical U1 snRNP interaction and activates the excision of the upstream portion of the intron. The resulting pre-mRNA splicing intermediate is then processed to a cryptic exon, whose aberrant inclusion in the final mRNA is responsible for ataxia telangiectasia. We show here that the last 40 bases of a downstream intronic antisense Alu repeat are required for the activation of the cryptic exon by the ISPE deletion. Evaluation of the pre-mRNA splicing intermediate by a hybrid minigene assay indicates that the identified intronic splicing enhancer represents a novel class of enhancers that facilitates processing of splicing intermediates possibly by recruiting U1 snRNP to defective donor sites. In the absence of this element, the splicing intermediate accumulates and is not further processed to generate the cryptic exon. Our results indicate that Alu-derived sequences can provide intronic splicing regulatory elements that facilitate pre-mRNA processing and potentially affect the severity of disease-causing splicing mutations

    Defective splicing, disease and therapy: searching for master checkpoints in exon definition

    The number of aberrant splicing processes causing human disease is growing exponentially and many recent studies have uncovered some aspects of the unexpectedly complex network of interactions involved in these dysfunctions. As a consequence, our knowledge of the various cis- and trans-acting factors playing a role on both normal and aberrant splicing pathways has been enhanced greatly. However, the resulting information explosion has also uncovered the fact that many splicing systems are not easy to model. In fact we are still unable, with certainty, to predict the outcome of a given genomic variation. Nonetheless, in the midst of all this complexity some hard won lessons have been learned and in this survey we will focus on the importance of the wide sequence context when trying to understand why apparently similar mutations can give rise to different effects. The examples discussed in this summary will highlight the fine ‘balance of power’ that is often present between all the various regulatory elements that define exon boundaries. In the final part, we shall then discuss possible therapeutic targets and strategies to rescue genetic defects of complex splicing systems

    The Origins, Evolution, and Functional Potential of Alternative Splicing in Vertebrates

    Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution

    Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome.

    Introns are a prevalent feature of eukaryotic genomes, yet their origins and contributions to genome function and evolution remain mysterious. In budding yeast, repression of the highly transcribed intron-containing ribosomal protein genes (RPGs) globally increases splicing of non-RPG transcripts through reduced competition for the spliceosome. We show that under these "hungry spliceosome" conditions, splicing occurs at more than 150 previously unannotated locations we call protointrons that do not overlap known introns. Protointrons use a less constrained set of splice sites and branchpoints than standard introns, including in one case AT-AC in place of GT-AG. Protointrons are not conserved in all closely related species, suggesting that most are not under positive selection and are fated to disappear. Some are found in non-coding RNAs (e. g. CUTs and SUTs), where they may contribute to the creation of new genes. Others are found across boundaries between noncoding and coding sequences, or within coding sequences, where they offer pathways to the creation of new protein variants, or new regulatory controls for existing genes. We define protointrons as (1) nonconserved intron-like sequences that are (2) infrequently spliced, and importantly (3) are not currently understood to contribute to gene expression or regulation in the way that standard introns function. A very few protointrons in S. cerevisiae challenge this classification by their increased splicing frequency and potential function, consistent with the proposed evolutionary process of "intronization", whereby new standard introns are created. This snapshot of intron evolution highlights the important role of the spliceosome in the expansion of transcribed genomic sequence space, providing a pathway for the rare events that may lead to the birth of new eukaryotic genes and the refinement of existing gene function

    Transcriptome innovations in primates revealed by single-molecule long-read sequencing

    Transcriptomic diversity greatly contributes to the fundamentals of disease, lineage-specific biology, and environmental adaptation. However, much of the actual isoform repertoire contributing to shaping primate evolution remains unknown. Here, we combined deep long- and short-read sequencing complemented with mass spectrometry proteomics in a panel of lymphoblastoid cell lines (LCLs) from human, three other great apes, and rhesus macaque, producing the largest full-length isoform catalog in primates to date. Around half of the captured isoforms are not annotated in their reference genomes, significantly expanding the gene models in primates. Furthermore, our comparative analyses unveil hundreds of transcriptomic innovations and isoform usage changes related to immune function and immunological disorders. The confluence of these evolutionary innovations with signals of positive selection and their limited impact in the proteome points to changes in alternative splicing in genes involved in immune response as an important target of recent regulatory divergence in primates. changes in alternative splicing in genes involved in immune response as an important target of recent regulatory divergence in primates.This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31020000); National Key R&D Program of China (China's Ministry of Science and Technology [MoST]) grant 2018YFC1406901; the International Partnership Program of the Chinese Academy of Sciences (no. 152453KYSB20170002); the Carlsberg Foundation (CF16-0663); the Villum Foundation (no. 25900) to G.Z.; and the La Caixa Foundation (ID 100010434) Fellowship Code LCF/BQ/DE16/11570011 (L.F.-P.). The Center for Genomic Regulation (CRG) / Universitat Pompeu Fabra (UPF) Proteomics Unit is part of the Spanish Infrastructure for Omics Technologies (National Map of Unique Scientific and Technical Infrastructures [ICTS] OmicsTech) and a member of the ProteoRed PRB3 Consortium, which is supported by grant PT17/0019 of the PE I + D + i 2013–2016 from the Instituto de Salud Carlos III (ISCIII), European Regional Development Fund (ERDF), and “Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya” (2017SGR595). T.M.-B. is supported by funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 864203), BFU2017-86471-P (MINECO/FEDER, UE); “Unidad de Excelencia María de Maeztu,” funded by the Agencia Estatal de Investigación (AEI) (CEX2018-000792-M); Howard Hughes International Early Career; National Institutes of Health 1R01HG010898-01A1; and Secretaria d'Universitats i Recerca and Centres de Recerca de Catalunya (CERCA) Programme del Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880)

