6 research outputs found

    Manananggal - a novel viewer for alternative splicing events

    Get PDF
    Background: Alternative splicing is an important cellular mechanism that can be analyzed by RNA sequencing. However, identification of splicing events in an automated fashion is error-prone. Thus, further validation is required to select reliable instances of alternative splicing events (ASEs). There are only few tools specifically designed for interactive inspection of ASEs and available visualization approaches can be significantly improved. Results: Here, we present Manananggal, an application specifically designed for the identification of splicing events in next generation sequencing data. Manananggal includes a web application for visual inspection and a command line tool that allows for ASE detection. We compare the sashimi plots available in the IGV Viewer, the DEXSeq splicing plots and SpliceSeq to the Manananggal interface and discuss the advantages and drawbacks of these tools. We show that sashimi plots (such as those used by the IGV Viewer and SpliceSeq) offer a practical solution for simple ASEs, but also indicate short-comings for highly complex genes. Conclusion: Manananggal is an interactive web application that offers functions specifically tailored to the identification of alternative splicing events that other tools are lacking. The ability to select a subset of isoforms allows an easier interpretation of complex alternative splicing events. In contrast to SpliceSeq and the DEXSeq splicing plot, Manananggal does not obscure the gene structure by showing full transcript models that makes it easier to determine which isoforms are expressed and which are not

    Visualization and analysis of RNA-Seq assembly graphs.

    Get PDF
    RNA-Seq is a powerful transcriptome profiling technology enabling transcript discovery and quantification. Whilst most commonly used for gene-level quantification, the data can be used for the analysis of transcript isoforms. However, when the underlying transcript assemblies are complex, current visualization approaches can be limiting, with splicing events a challenge to interpret. Here, we report on the development of a graph-based visualization method as a complementary approach to understanding transcript diversity from short-read RNA-Seq data. Following the mapping of reads to a reference genome, a read-to-read comparison is performed on all reads mapping to a given gene, producing a weighted similarity matrix between reads. This is used to produce an RNA assembly graph, where nodes represent reads and edges similarity scores between them. The resulting graphs are visualized in 3D space to better appreciate their sometimes large and complex topology, with other information being overlaid on to nodes, e.g. transcript models. Here we demonstrate the utility of this approach, including the unusual structure of these graphs and how they can be used to identify issues in assembly, repetitive sequences within transcripts and splice variants. We believe this approach has the potential to significantly improve our understanding of transcript complexity

    COMPUTATIONAL STUDY OF TRANSCRIPTIONAL LANDSCAPES FROM RNA-SEQ DATA

    Get PDF
    Since the inception of genetic science in the days of Gregor Johann Mendel, the major focus of genetics has been to identify functional units, genes, passed through generations and to determine how variation affects the development of an organism. While genome projects allowed us to collect complete and accurate DNA sequences of individual organisms, novel functional elements are being routinely discovered. When RNA sequencing technologies appeared in the late 2000s, they opened a new view over all transcriptional activity of cells at varying levels of resolution. However, the imperfections of both technical and biological processes, the growing amounts of data as well as the overall complexity of eukaryotic genomes underscore the need for novel analytical approaches to discover new and improve understanding of known genes and isoforms. This work begins with an overview of the status of human gene annotation and presents a comprehensive discussion of existing and new methods for creating complete and accurate gene catalogs, via comparative genomics and RNA sequencing. Through careful analysis of large RNA-seq datasets, we annotate effects of transcriptional artifacts and inaccuracies in gene expression on downstream analysis. Moreover, we identify defining properties of validated transcription that distinguish it from the effervescent noise. To address the challenges presented by noisy transcription and improve gene expression analysis, we introduce TieBrush, a comprehensive suite of tools for efficient processing of multi-sample sequencing datasets. This suite allows for the creation of condensed representations of data, facilitating the identification of shared transcriptional motifs and enhancing downstream analysis. Furthermore, to enhance our understanding of alternative splicing and distinguish functional isoforms from noise at protein-coding loci, we develop ORFanage. This highly efficient and accurate system assigns open reading frames (ORFs) to gene transcripts, thereby improving gene annotations. Finally, we employ all presented techniques to design a complete sample-to-annotation protocol for annotating genes and transcripts. We apply this protocol to create CHESS 3 - an improved human genome annotation, identifying multiple novel tissue specific isoforms while increasing consistency and reliability of known transcript models. In addition to advancements in gene annotation, this thesis briefly explores the critical role of large representative datasets of viral genomes in acquiring novel insights into diseases. More specifically, we discuss how pangenomic analysis of HIV-1 facilitated novel insights into viral persistence and how the challenges of mass sequencing of SARS-CoV-2 genomes required a novel approach for identification of first emerging recombinant lineages

    Additional file 5: Figure S4. of Manananggal - a novel viewer for alternative splicing events

    No full text
    Additional figure showing the example of CD44 alternative splicing visualized in DEXSeq. (PNG 303 kb

    Additional file 3: Figure S2. of Manananggal - a novel viewer for alternative splicing events

    No full text
    Additional figure showing the example of CD44 alternative splicing visualized in IGV (zoom in). (PNG 125 kb
    corecore