3 research outputs found

    RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

    Get PDF
    Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method.Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses 100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy.Availability and implementation: The software is available at http://www.csbio.unc.edu/rs.Contact: [email protected] information: Supplementary data are available at Bioinformatics online

    RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

    Get PDF
    Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method.Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses 100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy.Availability and implementation: The software is available at http://www.csbio.unc.edu/rs.Contact: [email protected] information: Supplementary data are available at Bioinformatics online

    Efficient Computational Genetics Methods for Multiparent Crosses

    Get PDF
    Multiparent crosses are genetic populations bred in a controlled manner from a finite number of known founders. They represent experimental resources that are of potentially great value for understanding the genetic basis of complex diseases. An important new experimental technology that can be applied to multiparent crosses, namely high-throughput sequencing, generates an immense amount of data and provides unprecedented opportunities to study genetics at a ultra high resolution. However, to take advantage of such massive data, several computational genetics problems have to be resolved. These include RNA-Seq assembly and quantification, QTL mapping, and haplotype effect estimation. In order to tackle these problems, which are highly connected to each other, I propose a series of methods: GeneScissors is a novel method to detect errors caused by multiple alignments in the RNA-Seq; RNA-Skim can rapidly quantify RNA-Seq data while still provide reliable results; HTreeQA is designed as a phylogeny based QTL mapping method for genotypes with heterozygou sites; and Diploffect estimates founder effects with statistically valid interval estimates in multiparent crosses. These methods are extensively studied on both simulated and real data. These studies demonstrate that the proposed methods can make data analysis of multiparent crosses more effective and efficient and produce results are more accurate and trustworthy than a number of existing alternative methods.Doctor of Philosoph
    corecore