5,318 research outputs found
Models for transcript quantification from RNA-Seq
RNA-Seq is rapidly becoming the standard technology for transcriptome
analysis. Fundamental to many of the applications of RNA-Seq is the
quantification problem, which is the accurate measurement of relative
transcript abundances from the sequenced reads. We focus on this problem, and
review many recently published models that are used to estimate the relative
abundances. In addition to describing the models and the different approaches
to inference, we also explain how methods are related to each other. A key
result is that we show how inference with many of the models results in
identical estimates of relative abundances, even though model formulations can
be very different. In fact, we are able to show how a single general model
captures many of the elements of previously published methods. We also review
the applications of RNA-Seq models to differential analysis, and explain why
accurate relative transcript abundance estimates are crucial for downstream
analyses
Exploration of alternative splicing events in ten different grapevine cultivars
Background: The complex dynamics of gene regulation in plants are still far from being fully understood. Among many factors involved, alternative splicing (AS) in particular is one of the least well documented. For many years, AS has been considered of less relevant in plants, especially when compared to animals, however, since the introduction of next generation sequencing techniques the number of plant genes believed to be alternatively spliced has increased exponentially.
Results: Here, we performed a comprehensive high-throughput transcript sequencing of ten different grapevine cultivars, which resulted in the first high coverage atlas of the grape berry transcriptome. We also developed findAS, a software tool for the analysis of alternatively spliced junctions. We demonstrate that at least 44 % of multi-exonic genes undergo AS and a large number of low abundance splice variants is present within the 131.622 splice junctions we have annotated from Pinot noir.
Conclusions: Our analysis shows that similar to 70 % of AS events have relatively low expression levels, furthermore alternative splice sites seem to be enriched near the constitutive ones in some extent showing the noise of the splicing mechanisms. However, AS seems to be extensively conserved among the 10 cultivars
CYCLeRâa novel tool for the full isoform assembly and quantification of circRNAs
Splicing is one key mechanism determining the state of any eukaryotic cell. Apart from linear splice variants, circular splice variants (circRNAs) can arise via non-canonical splicing involving a back-splice junction (BSJ). Most existing methods only identify circRNAs via the corresponding BSJ, but do not aim to estimate their full sequence identity or to identify different, alternatively spliced circular isoforms arising from the same BSJ. We here present CYCLeR, the first computational method for identifying the full sequence identity of new and alternatively spliced circRNAs and their abundances while simultaneously co-estimating the abundances of known linear splicing isoforms. We show that CYCLeR significantly outperforms existing methods in terms of FÂ score and quantification of transcripts in simulated data. In a in a comparative study with long-read data, we also show the advantages of CYCLeR compared to existing methods. When analysing Drosophila melanogaster data, CYCLeR uncovers biological patterns of circRNA expression that other methods fail to observe
Network-based approaches to explore complex biological systems towards network medicine
Network medicine relies on different types of networks: from the molecular level of proteinâprotein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of proteinâprotein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAsâincluding long non-coding RNAs (lncRNAs) âcompeting with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genesâcalled switch genesâcritically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
Recommended from our members
Alternative splicing and single-cell RNA-sequencing: a feasibility assessment
We know little about how isoform choice is regulated in individual cells for most spliced genes. In theory, single-cell RNA-sequencing (scRNA-seq) could enable us to investigate isoform choice at cellular resolution. Therefore, scRNA-seq could give insight into the fundamental molecular biology process of how alternative splicing is regulated within cells. However, scRNA-seq is a relatively new technology, and at the start of my PhD it was not clear whether existing bioinformatics approaches would enable accurate splicing analyses. In my PhD I consider what the limitations are when attempting to study alternative splicing using scRNA-seq and what can be done to overcome them.
Alternative splicing is commonly analysed using bulk RNA sequencing (bulk RNA-seq) data with isoform quantification software. It was not clear whether isoform quantification software designed for bulk RNA-seq would perform well when run on scRNA-seq data. To address this, I performed a simulation-based benchmark of isoform quantification software developed for bulk RNA-seq when run on scRNA-seq. I made two important findings. Firstly, I found that isoform quantification software performs poorly when run on Drop-seq data, but performs better when run on scRNA-seq data generated using full-length transcript protocols (eg. SMART-seq and SMART-seq2). Secondly, I found that for the most part, isoform quantification software performs almost as well when run on full-length scRNA-seq as it does when run on bulk RNA-seq. Based on these findings, I concluded that software tools to accurately quantify the reads from full-length scRNA-seq experiments exist, theoretically enabling alternative splicing to be analysed using scRNA-seq.
Encouraged by this result, I embarked on a series of experiments designed to answer questions such as âHow many isoforms does a gene typically produce per cell?â. This is a key basic biology question that could in theory be answered using scRNA-seq. Unfortunately, I found that the results of these experiments were largely impossible to interpret because I was unable to distinguish between biological signal and technical noise. I realised that without a solid understanding of the technical noise and confounding factors associated with scRNA-seq, distinguishing biological signal from technical noise would be challenging and might not be possible. To address this, I embarked on a second simulation-based study, this time investigating the impact of technical noise on our ability to study alternative splicing using scRNA-seq. I simulated four situations: a situation where every gene expressed one isoform per cell, a situation where all genes expressed two isoforms per cell, a situation where all genes expressed three isoforms per cell and a situation where all genes expressed four isoforms per cell. Importantly, I explicitly simulated isoform choice, dropouts and quantification errors. The results of the four simulated situations were not trivial to distinguish from each other, raising concerns about the feasibility of resolving the more complex splicing patterns that probably exist in reality using scRNA-seq data. I concluded that attempts to study alternative splicing using scRNA-seq are currently substantially confounded by a high rate of dropouts and a lack of understanding about the mechanism of isoform choice. Importantly, improvements to isoform quantification software accuracy alone were insufficient to correct for confounding effects caused by dropouts. I propose that to enable accurate alternative splicing analyses using scRNA-seq, further research into accurately modelling dropouts is required, or alternatively, scRNA-seq technologies should be improved to increase their capture efficiency. Additionally, research into how isoform choice is regulated at a cellular level is necessary to enable accurate analyses. Overall, I find that it is not currently possible to accurately perform alternative splicing analyses using scRNA-seq. However, I am optimistic that with further research, it may become possible in the future
- âŚ