2,914 research outputs found

    The contribution of Alu exons to the human proteome.

    Get PDF
    BackgroundAlu elements are major contributors to lineage-specific new exons in primate and human genomes. Recent studies indicate that some Alu exons have high transcript inclusion levels or tissue-specific splicing profiles, and may play important regulatory roles in modulating mRNA degradation or translational efficiency. However, the contribution of Alu exons to the human proteome remains unclear and controversial. The prevailing view is that exons derived from young repetitive elements, such as Alu elements, are restricted to regulatory functions and have not had adequate evolutionary time to be incorporated into stable, functional proteins.ResultsWe adopt a proteotranscriptomics approach to systematically assess the contribution of Alu exons to the human proteome. Using RNA sequencing, ribosome profiling, and proteomics data from human tissues and cell lines, we provide evidence for the translational activities of Alu exons and the presence of Alu exon derived peptides in human proteins. These Alu exon peptides represent species-specific protein differences between primates and other mammals, and in certain instances between humans and closely related primates. In the case of the RNA editing enzyme ADARB1, which contains an Alu exon peptide in its catalytic domain, RNA sequencing analyses of A-to-I editing demonstrate that both the Alu exon skipping and inclusion isoforms encode active enzymes. The Alu exon derived peptide may fine tune the overall editing activity and, in limited cases, the site selectivity of ADARB1 protein products.ConclusionsOur data indicate that Alu elements have contributed to the acquisition of novel protein sequences during primate and human evolution

    Statistical methods for the analysis and interpretation of RNA-Seq data

    Get PDF
    In the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. Despite the decreasing cost of sequencing data, the majority of RNA-Seq experiments are still suffering from low replication numbers. The statistical methodology for dealing with low replicate RNA-Seq experiments is still in its infancy and has room for further development. Incorporating additional information from publicly accessible databases may provide a plausible avenue to overcome the shortcomings of low replication. Not only could this additional information improve on the ability to find statistically significant signal but this signal should also be more biologically interpretable. This thesis is separated into three distinct statistical problems that arise when processing and analysing RNA-Seq data. Firstly, the use of experimental data to customise gene annotations is proposed. When customised annotations are used to summarise read counts, the corresponding measures of transcript abundance include more information than alternate summarisation approaches and offer improved concordance with qRT-PCR data. A moderation methodology that exploits external estimates of variation is then developed to address the issue of small sample differential expression analysis. This approach performs favourably against existing approaches when comparing gene rankings and sensitivity. With the aim of identifying groups of miRNA-mRNA regulatory relationships, a framework for integrating various databases of prior knowledge with small sample miRNA-Seq and mRNA-Seq data is then outlined. This framework appears to identify more signal than simpler approaches and also provides highly interpretable models of miRNA-mRNA regulation. To conclude, a small sample miRNA-Seq and mRNA-Seq experiment is presented that seeks to discover miRNA-mRNA regulatory relationships associated with loss of Notch2 function and its links to neurodegeneration. This experiment is used to illustrate the methodologies developed in this thesis

    IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons.</p> <p>Results</p> <p>We propose a novel algorithm (IsoformEx) that employs weighted non-negative least squares estimation method to estimate the expression levels of transcript isoforms. Validations based on <it>in silico </it>simulation of mRNA-Seq and qRT-PCR experiments with real mRNA-Seq data showed that IsoformEx could accurately estimate transcript expression levels. In comparisons with published methods, the transcript expression levels estimated by IsoformEx showed higher correlation with known transcript expression levels from simulated mRNA-Seq data, and higher agreement with qRT-PCR measurements of specific transcripts for real mRNA-Seq data.</p> <p>Conclusions</p> <p>IsoformEx is a fast and accurate algorithm to estimate transcript expression levels and gene expression levels, which takes into account short exons and alternative exons with a weighting scheme. The software is available at <url>http://bioinformatics.wistar.upenn.edu/isoformex</url>.</p

    Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments

    Get PDF
    Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/

    Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

    Get PDF

    Widespread intron retention in mammals functionally tunes transcriptomes

    Get PDF
    © 2014 Braunschweig et al.; Published by Cold Spring Harbor Laboratory Press. This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.Alternative splicing (AS) of precursor RNAs is responsible for greatly expanding the regulatory and functional capacity of eukaryotic genomes. Of the different classes of AS, intron retention (IR) is the least well understood. In plants and unicellular eukaryotes, IR is the most common form of AS, whereas in animals, it is thought to represent the least prevalent form. Using high-coverage poly(A)(+) RNA-seq data, we observe that IR is surprisingly frequent in mammals, affecting transcripts from as many as three-quarters of multiexonic genes. A highly correlated set of cis features comprising an "IR code" reliably discriminates retained from constitutively spliced introns. We show that IR acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected. This "transcriptome tuning" function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts. We further show that IR is linked to a cross-talk mechanism involving localized stalling of RNA polymerase II (Pol II) and reduced availability of spliceosomal components. Collectively, the results implicate a global checkpoint-type mechanism whereby reduced recruitment of splicing components coupled to Pol II pausing underlies widespread IR-mediated suppression of inappropriately expressed transcripts.This work was supported by grants from the Canadian Institutes of Health Research and Canadian Cancer Society (B.J.B.); EMBO long-term fellowships (U.B. and T.G.-P.); Human Frontier Science Program Organization long-term fellowships (U.B. and M.I.); an OSCI fellowship (T.G.-P.); CIHR postdoctoral and Marie Curie IOF fellowships (N.L.B.-M.); and an NSERC studentship (E.N.).info:eu-repo/semantics/publishedVersio

    Titin truncating variants affect heart function in disease cohorts and the general population

    Get PDF
    Titin-truncating variants (TTNtv) commonly cause dilated cardiomyopathy (DCM). TTNtv are also encountered in ~1% of the general population, where they may be silent, perhaps reflecting allelic factors. To better understand TTNtv, we integrated TTN allelic series, cardiac imaging and genomic data in humans and studied rat models with disparate TTNtv. In patients with DCM, TTNtv throughout titin were significantly associated with DCM. Ribosomal profiling in rat showed the translational footprint of premature stop codons in Ttn, TTNtv-position-independent nonsense-mediated degradation of the mutant allele and a signature of perturbed cardiac metabolism. Heart physiology in rats with TTNtv was unremarkable at baseline but became impaired during cardiac stress. In healthy humans, machine-learning-based analysis of high-resolution cardiac imaging showed TTNtv to be associated with eccentric cardiac remodeling. These data show that TTNtv have molecular and physiological effects on the heart across species, with a continuum of expressivity in health and disease

    Neuronal Insult Either By Exposure To Lead Or By Direct Neuronal Damage Cause Genome-Wide Changes In Dna Methylation And Histone 3 Lysine 36 Trimethylation

    Get PDF
    Prenatal and postnatal exposure to pervasive neuro-toxicants such as Lead (Pb) has been reported to causes extensive and diverse changes in the epigenetic profile. Among epigenetic modification, DNA methylation (5mC) is perhaps the most widely studied and has been proposed to be potential early biomarkers for Pb toxicity. Several studies have demonstrated the association between Pb-exposure and 5mC. However most of these studies are restricted to looking at a specific set of target genes or repetitive elements. Therefore, one of the main objectives of our study was to use an unbiased genome-wide approach to look at Pb-exposure associated changes in 5mC. To this end, we used the Human methylation 450K (HM450K) high density array to quantitatively measure the Pb-associated 5mC changes. The sample for this study consisted of DNA extracted from neonatal and current blood spots from a mother-infant cohort in Detroit, USA and Umbilical cord blood DNA from a mother-infant cohort from Mexico City, Mexico. We observed that Pb-exposure associated 5mC changes in whole blood and UCB are sex-specific. Furthermore, some of these 5mC changes are heritable and can be transmitted from the grandmother to the grandchildren. To further our understanding of the relationship between Pb-exposure and 5mC, we wanted to look at the impact of Pb-exposure on DNA demethylation, specifically the dynamic changes in 5-hydroxymethylcytosine (5hmC) profile. To study these changes in the 5hmC profile, we used a novel modification of the HM450K, which we named HMeDIP-450K array. Using the HMeDIP-450K array we demonstrated that 5hmC showed a much larger number of sex-independent changes. Interestingly, a vast majority Pb-dependent 5mC and 5hmC clusters mapped to either gene implicated in neurodegeneration and regulation of mitochondrial processes such as NINJ2, VAMP5, GSTM1, GSTM5 etc. 5mC and 5hmC are potent regulators of gene expression and their dysregulation can cause widespread changes in the transcriptome and may contribute to neurodegenerative phenotype. Besides 5mC and 5hmC, transcriptomic changes can also be regulated by dynamic changes in histone methylation profile and alternative splicing. To study these changes, especially in context of neurodegeneration we used a Drosophila model of traumatic brain injury (TBI). Using a modified version of this model, we subjected w1118 fruit flies to mild closed head trauma. To determine the transcriptomic changes which contribute to survival post TBI, we collected fly heads from the survivors at 2 time points; 4 hours and 24 hours’ post-trauma. Mild TBI using our modified TBI protocol had limited impact on the expression profile of genes but showed large perturbations in alternative splicing (AS) regulation 24 hours’ post-trauma. Classification of these AS changes showed selective retention of long introns (\u3e81bps). Some of these genes also showed a significant reduction in transcript abundance and were specifically involved in mitochondrial metabolism. The retained introns were enriched for CA-rich motifs known to bind to Smooth (SM), an hnRNPL class of splicing factor. Mutating SM (sm1/DF) resulted in reversal of intron retention observed 24 hours’ post-trauma. This observation suggested that SM is critical regulator of Intron retention in fly heads. Interestingly, ChIP-sequencing for H3K36me3 revealed increased levels in retained introns post-trauma. Additionally, higher H3k36me3 was also observed around intronic SM-binding motifs post-trauma which suggested that increased level of H3k36me3 might be recruiting SM to their Intronic Splicing Suppressor sites and cause RI in the Drosophila model of TBI. Together our studies in human cohort and Drosophila sheds some light on the complex multi-layered mechanism regulating gene-expression especially under neurotoxic and neurodegenerative conditions
    corecore