152,703 research outputs found

    MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification

    Full text link
    Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line

    On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments

    Get PDF
    Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Result: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power

    YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    Get PDF
    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes

    Advancing transcriptome platforms

    Get PDF
    During the last decade of years, remarkable technological innovations have emerged that allow the direct or indirect determination of the transcriptome at unprecedented scale and speed. Studies using these methods have already altered our view of the extent and complexity of transcript profiling, which has advanced from one-gene-at-a-time to a holistic view of the genome. Here, we outline the major technical advances in transcriptome characterization, including the most popular used hybridization-based platform, the well accepted tag-based sequencing platform, and the recently developed RNA-Seq (RNA sequencing) based platform. Importantly, these next-generation technologies revolutionize assessing the entire transcriptome via the recent RNA-Seq technology

    Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.

    Get PDF
    BackgroundTo determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed.MethodsTen CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system.ResultsAn average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1).ConclusionThe RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis

    Comparison of reproducibility, accuracy, sensitivity, and specificity of miRNA quantification platforms

    Get PDF
    Given the increasing interest in their use as disease biomarkers, the establishment of reproducible, accurate, sensitive, and specific platforms for microRNA (miRNA) quantification in biofluids is of high priority. We compare four platforms for these characteristics: small RNA sequencing (RNA-seq), FirePlex, EdgeSeq, and nCounter. For a pool of synthetic miRNAs, coefficients of variation for technical replicates are lower for EdgeSeq (6.9%) and RNA-seq (8.2%) than for FirePlex (22.4%); nCounter replicates are not performed. Receiver operating characteristic analysis for distinguishing present versus absent miRNAs shows small RNA-seq (area under curve 0.99) is superior to EdgeSeq (0.97), nCounter (0.94), and FirePlex (0.81). Expected differences in expression of placenta-associated miRNAs in plasma from pregnant and non-pregnant women are observed with RNA-seq and EdgeSeq, but not FirePlex or nCounter. These results indicate that differences in performance among miRNA profiling platforms impact ability to detect biological differences among samples and thus their relative utility for research and clinical use

    A High-Throughput Method for Illumina RNA-Seq Library Preparation.

    Get PDF
    With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation

    Optimizing Splicing Junction Detection in Next Generation Sequencing Data on a Virtual-GRID Infrastructure

    Get PDF
    The new protocol for sequencing the messenger RNA in a cell, named RNA-seq produce millions of short sequence fragments. Next Generation Sequencing technology allows more accurate analysis but increase needs in term of computational resources. This paper describes the optimization of a RNA-seq analysis pipeline devoted to splicing variants detection, aimed at reducing computation time and providing a multi-user/multisample environment. This work brings two main contributions. First, we optimized a well-known algorithm called TopHat by parallelizing some sequential mapping steps. Second, we designed and implemented a hybrid virtual GRID infrastructure allowing to efficiently execute multiple instances of TopHat running on different samples or on behalf of different users, thus optimizing the overall execution time and enabling a flexible multi-user environmen

    rMAPS: RNA map analysis and plotting server for alternative exon regulation.

    Get PDF
    RNA-binding proteins (RBPs) play a critical role in the regulation of alternative splicing (AS), a prevalent mechanism for generating transcriptomic and proteomic diversity in eukaryotic cells. Studies have shown that AS can be regulated by RBPs in a binding-site-position dependent manner. Depending on where RBPs bind, splicing of an alternative exon can be enhanced or suppressed. Therefore, spatial analyses of RBP motifs and binding sites around alternative exons will help elucidate splicing regulation by RBPs. The development of high-throughput sequencing technologies has allowed transcriptome-wide analyses of AS and RBP-RNA interactions. Given a set of differentially regulated alternative exons obtained from RNA sequencing (RNA-seq) experiments, the rMAPS web server (http://rmaps.cecsresearch.org) performs motif analyses of RBPs in the vicinity of alternatively spliced exons and creates RNA maps that depict the spatial patterns of RBP motifs. Similarly, rMAPS can also perform spatial analyses of RBP-RNA binding sites identified by cross-linking immunoprecipitation sequencing (CLIP-seq) experiments. We anticipate rMAPS will be a useful tool for elucidating RBP regulation of alternative exon splicing using high-throughput sequencing data
    corecore