17 research outputs found

    Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

    Get PDF
    The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

    RNA-Bloom : de novo RNA-seq assembly with Bloom filters

    No full text
    High-throughput RNA sequencing (RNA-seq) is primarily used in measuring gene expression, quantifying transcript abundance, and building reference transcriptomes. Without bias from a reference sequence, de novo RNA-seq assembly is particularly useful for building new reference transcriptomes, detecting fusion genes, and discovering novel spliced transcripts. This is a challenging problem, and to address it at least eight approaches, including Trans-ABySS and Trinity, were developed within the past decade. For instance, using Trinity and 12 CPUs, it takes approximately one and a half day to assemble a human RNA-seq sample of over 100 million read pairs and requires up to 80 GB of memory. While the high memory usage typical of de novo RNA-seq assemblers may be alleviated by distributed computing, access to a high-performance computing environment is a requirement that may be limiting for smaller labs. In my thesis, I present a novel de novo RNA-seq assembler, “RNA-Bloom,” which utilizes compact data structures based on Bloom filters for the storage of k-mer counts and the de Bruijn graph in memory. Compared to Trans-ABySS and Trinity, RNA-Bloom can assemble a human transcriptome with comparable accuracy using nearly half as much memory and half the wall-clock time with 12 threads.Science, Faculty ofGraduat

    Transcriptome assembly and visualization for RNA-sequencing data

    No full text
    Since its introduction, RNA-sequencing has allowed us to interrogate the transcriptome of an organism, thereby advancing our understanding of cell biology and diseases. Typically, raw RNA-sequencing data is processed via computational methods, such as transcriptome assembly and visualization, to extract meaningful information. Transcriptome assembly aims to reconstruct full-length transcript sequences from RNA-sequencing reads, which are usually short fragments of the corresponding transcripts. Transcriptome visualization provides a platform for exploring and recognizing patterns in transcriptomic data. Transcriptome assembly and visualization tools have been instrumental in identification of gene structures, annotation of draft genomes, and discovery of molecular markers in diseases. Single-cell RNA-sequencing has enabled us to investigate transcriptome heterogeneity within a tissue sample containing up to a million cells. However, single-cell transcriptome analyses have been predominantly performed at the gene level instead of at the isoform level. In my thesis, I present computational solutions for transcriptome assembly and visualization of single-cell RNA-sequencing data thus enabling isoform-level analysis in single cell transcriptomes. Long-read RNA-sequencing technologies have gained traction in transcriptomic research in recent years as their throughput and data quality improved tremendously. Long-read sequencing is particularly useful in transcriptome assembly because its reads can potentially span multiple exons, which simplifies the transcriptome assembly problem. Reference-free assembly for long-read data is a computationally expensive task due to the long read lengths and high base error rates. In my thesis, I present a fast and memory-efficient reference-free assembly method for long-read RNA-sequencing data.Science, Faculty ofGraduat

    RResolver: efficient short-read repeat resolution within ABySS

    No full text
    Background De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by k1k - 1 k - 1 bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes. Results Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%. Conclusions RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .Medicine, Faculty ofOther UBCMedical Genetics, Department ofReviewedFacultyResearche

    SPAT: Searching for Poly(A) Tails in RNA-Seq de novo Assemblies

    No full text
    <p>A method for detecting alternative polyadenylation in RNA-Seq libraries using de novo assembly with ABySS. It will be presented at HiTSeq and ISMB 2013 by Anthony Raymond.</p
    corecore