2,557 research outputs found

    Statistical modeling of isoform splicing dynamics from RNA-seq time series data

    Get PDF
    Isoform quantification is an important goal of RNA-seq experiments, yet it remains prob- lematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming in- creasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the corre- lations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated data sets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real data sets, our results show that DICEseq provides substan- tially more reproducible and robust quantifications, increasing the correlation of estimates from replicate data sets by up to 10% on genes with low or moderate expression levels (bot- tom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq ex- periments, and offer a novel tool for improved analysis of such data sets. Python code is freely available at http://diceseq.sf.net

    Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

    Get PDF
    Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNA-seq data

    Modeling and analysis of RNA-seq data: a review from a statistical perspective

    Full text link
    Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusion: The development of statistical and computational methods for analyzing RNA- seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development

    Statistical Methods For Whole Transcriptome Sequencing: From Bulk Tissue To Single Cells

    Get PDF
    RNA-Sequencing (RNA-Seq) has enabled detailed unbiased profiling of whole transcriptomes with incredible throughput. Recent technological breakthroughs have pushed back the frontiers of RNA expression measurement to single-cell level (scRNA-Seq). With both bulk and single-cell RNA-Seq analyses, modeling of the noise structure embedded in the data is crucial for draw- ing correct inference. In this dissertation, I developed a series of statistical methods to account for the technical variations specific in RNA-Seq experiments in the context of isoform- or gene- level differential expression analyses. In the first part of my dissertation, I developed MetaDiff (https://github.com/jiach/MetaDiff), a random-effects meta-regression model, that allows the incorporation of uncertainty in isoform expression estimation in isoform differential expression anal- ysis. This framework was further extended to detect splicing quantitative trait loci with RNA-Seq data. In the second part of my dissertation, I developed TASC (Toolkit for Analysis of Single-Cell data; https://github.com/scrna-seq/TASC), a hierarchical mixture model, to explicitly adjust for cell-to-cell technical differences in scRNA-Seq analysis using an empirical Bayes approach. This framework can be adapted to perform differential gene expression analysis. In the third part of my dissertation, I developed, TASC-B, a method extended from TASC to model transcriptional bursting- induced zero-inflation. This model can identify and test for the difference in the level of transcrip- tional bursting. Compared to existing methods, these new tools that I developed have been shown to better control the false discovery rate in situations where technical noise cannot be ignored. They also display superior power in both our simulation studies and real world applications

    Poly(A) Tail Regulation in the Nucleus

    Get PDF
    Der Ribonukleinsäure (RNS) Stoffwechsel umfasst verschiedene Schritte, beginnend mit der Transkription der RNS über die Translation bis zum RNA Abbau. Poly(A) Schwänze befinden sich am Ende der meisten der Boten-RNS, schützen die RNA vor Abbau und stimulieren Translation. Die Deadenylierung von Poly(A) Schwänzen limitiert den Abbau von RNS. Bisher wurde RNS Abbau meist im Kontext von cytoplasmatischen Prozessen untersucht, ob und wie RNS Deadenylierung und Abbau in Nukleus erfolgen ist bisher unklar. Es wurde daher eine neue Methode zur genomweiten Bestimmung von Poly(A) Schwanzlänge entwickelt, welche FLAM-Seq genannt wurde. FLAM-Seq wurde verwendet um Zelllinien, Organoide und C. elegans RNS zu analysieren und es wurde eine signifikante Korrelation zwischen 3’-UTR und Poly(A) Länge gefunden, sowie für viele Gene ein Zusammenhang von alternativen 3‘-UTR Isoformen und Poly(A) Länge. Die Untersuchung von Poly(A) Schwänzen von nicht-gespleißten RNS Molekülen zeige, dass deren Poly(A) Schwänze eine Länge von mehr als 200 nt hatten. Die Analyse wurde durch eine Inhibition des Spleiß-Prozesses validiert. Die Verwendung von Methoden zur Markierung von RNS, welche die zeitliche Auflösung der RNS Prozessierung ermöglicht, deutete auf eine Deadenylierung der Poly(A) Schwänze schon wenige Minuten nach deren Synthesis hin. Die Analyse von subzellulären Fraktionen zeigte, dass diese initiale Deadenylierung ein Prozess im Nukleus ist. Dieser Prozess ist gen-spezifisch und Poly(A) Schwänze von bestimmten Typen von Transkripten, wie nuklearen langen nicht-kodierende RNS Molekülen waren nicht deadenyliert. Um Enzyme zu identifizieren, welche die Deadenylierung im Zellkern katalysieren, wurden verschiedene Methoden wie RNS-abbauende Cas Systeme, siRNAs oder shRNA Zelllinien verwendet. Trotz einer effizienten Reduktion der RNS Expression entsprechender Enzymkomplexe konnten keine molekularen Phänotypen identifiziert werden welche die Poly(A) Länge im Zellkern beeinflussen.The RNA metabolism involves different steps from transcription to translation and decay of messenger RNAs (mRNAs). Most mRNAs have a poly(A) tail attached to their 3’-end, which protects them from degradation and stimulates translation. Removal of the poly(A) tail is the rate-limiting step in RNA decay controlling stability and translation. It is yet unclear if and to what extent RNA deadenylation occurs in the mammalian nucleus. A novel method for genome-wide determination of poly(A) tail length, termed FLAM-Seq, was developed to overcome current challenges in sequencing mRNAs, enabling genome-wide analysis of complete RNAs, including their poly(A) tail sequence. FLAM-Seq analysis of different model systems uncovered a strong correlation between poly(A) tail and 3’-UTR length or alternative polyadenylation. Cytosine nucleotides were further significantly enriched in poly(A) tails. Analyzing poly(A) tails of unspliced RNAs from FLAM-Seq data revealed the genome-wide synthesis of poly(A) tails with a length of more than 200 nt. This could be validated by splicing inhibition experiments which uncovered potential links between the completion of splicing and poly(A) tail shortening. Measuring RNA deadenylation kinetics using metabolic labeling experiments hinted at a rapid shortening of tails within minutes. The analysis of subcellular fractions obtained from HeLa cells and a mouse brain showed that initial deadenylation is a nuclear process. Nuclear deadenylation is gene specific and poly(A) tails of lncRNAs retained in the nucleus were not shortened. To identify enzymes responsible for nuclear deadenylation, RNA targeting Cas-systems, siRNAs and shRNA cell lines were used to different deadenylase complexes. Despite efficient mRNA knockdown, subcellular analysis of poly(A) tail length by did not yield molecular phenotypes of changing nuclear poly(A) tail length

    A survey of best practices for RNA-seq data analysis.

    Get PDF
    RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.This is the final published version. It first appeared at http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8
    corecore