Search CORE

1,588 research outputs found

2passtools:two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Author: Barton Geoffrey J.
Knop Katarzyna
Parker Matthew T.
Simpson Gordon G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

University of Dundee Online Publications

BaRTv1.0:an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq

Author: A Ashoub
A Busch
A Dobin
A Dobin
A Janiak
Abdellah Barakate
AM Bolger
AM Mastrangelo
AS Reddy
AT Pham
B Panahi
BA Veeneman
BJ Haas
C Soneson
CG Simpson
Claire Halpin
Claus-Dieter Mayer
CPG Calixto
CPG Calixto
CPG Calixto
Craig G. Simpson
D Staiger
D Szakonyi
G Capovilla
G Guo
Gordon Stephen
GP Alamancos
H Liu
IK Dawson
International Barley Sequencing Consortium
J Bazin
J Russell
Jason Kam
Jenny Morris
John Fuller
John W. S. Brown
JWS Brown
K Mrízová
K Shirasu
KE Hayer
Linda Milne
LS Dahleen
M Kalyna
M Kintlová
M Mascher
M Pertea
M. Cristina Casao
Micha Bayer
Miriam Schreiber
Monika Zwirek
NL Bray
P Ren
Paulo Rapazote-Flores
Pete E. Hedley
PG Engström
Q Zhang
Q Zhang
R Patro
R Zhang
R Zhang
RF Carvalho
Robbie Waugh
RR Sokal
Runxuan Zhang
S Chamala
S Filichkin
S Schindler
S. Ouyang
Sarah M. McKim
SF Altschul
SH Kim
SR Thatcher
T Laloum
T Matsumoto
TD Wu
TW Nilsen
Wenbin Guo
X Gan
XN Zhang
Y Lee
Y Marquez
Y Shi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2019
Field of study

Crossref

University of Dundee Online Publications

NOVEL COMPUTATIONAL METHODS FOR SEQUENCING DATA ANALYSIS: MAPPING, QUERY, AND CLASSIFICATION

Author: Liu Xinan
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Over the past decade, the evolution of next-generation sequencing technology has considerably advanced the genomics research. As a consequence, fast and accurate computational methods are needed for analyzing the large data in different applications. The research presented in this dissertation focuses on three areas: RNA-seq read mapping, large-scale data query, and metagenomics sequence classification. A critical step of RNA-seq data analysis is to map the RNA-seq reads onto a reference genome. This dissertation presents a novel splice alignment tool, MapSplice3. It achieves high read alignment and base mapping yields and is able to detect splice junctions, gene fusions, and circular RNAs comprehensively at the same time. Based on MapSplice3, we further extend a novel lightweight approach called iMapSplice that enables personalized mRNA transcriptional profiling. As huge amount of RNA-seq has been shared through public datasets, it provides invaluable resources for researchers to test hypotheses by reusing existing datasets. To meet the needs of efficiently querying large-scale sequencing data, a novel method, called SeqOthello, has been developed. It is able to efficiently query sequence k-mers against large-scale datasets and finally determines the existence of the given sequence. Metagenomics studies often generate tens of millions of reads to capture the presence of microbial organisms. Thus efficient and accurate algorithms are in high demand. In this dissertation, we introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequences. It supports efficient query of a taxon using its k-mer signatures

University of Kentucky

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Author
Publication venue: BioMed Central
Publication date: 22/08/2014
Field of study

Springer - Publisher Connector

Accurate spliced alignment of long RNA sequencing reads

Author: Mäkinen Veli
Sahlin Kristoffer
Publication venue
Publication date: 15/12/2021
Field of study

Motivation: Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, particularly around small exons. Results: We present an alignment method uLTRA for long RNA sequencing reads based on a novel two-pass collinear chaining algorithm. We show that uLTRA produces higher accuracy over state-of-the-art aligners with substantially higher accuracy for small exons on simulated and synthetic data. On simulated data, uLTRA achieves an accuracy of about 60% for exons of length 10 nucleotides or smaller and close to 90% accuracy for exons of length between 11 and 20 nucleotides. On biological data where true read location is unknown, we show several examples where uLTRA aligns to known and novel isoforms containing small exons that are not detected with other aligners. While uLTRA obtains its accuracy using annotations, it can also be used as a wrapper around minimap2 to align reads outside annotated regions.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Context-based RNA-seq mapping

Author: Bonfert Thomas
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 22/04/2016
Field of study

In recent years, the sequencing of RNA (RNA-seq) using next generation sequencing (NGS) technology has become a powerful tool for analyzing the transcriptomic state of a cell. Modern NGS platforms allow for performing RNA-seq experiments in a few days, resulting in millions of short sequencing reads. A crucial step in analyzing RNA-seq data generally is determining the transcriptomic origin of the sequencing reads (= read mapping). In principal, read mapping is a sequence alignment problem, in which the short sequencing reads (30 - 500 nucleotides) are aligned to much larger reference sequences such as the human genome (3 billion nucleotides). In this thesis, we present ContextMap, an RNA-seq mapping approach that evaluates the context of the sequencing reads for determining the most likely origin of every read. The context of a sequencing read is defined by all other reads aligned to the same genomic region. The ContextMap project started with a proof of concept study, in which we showed that our approach is able to improve already existing read mapping results provided by other mapping programs. Subsequently, we developed a standalone version of ContextMap. This implementation no longer relied on mapping results of other programs, but determined initial alignments itself using a modification of the Bowtie short read alignment program. However, the original ContextMap implementation had several drawbacks. In particular, it was not able to predict reads spanning over more than two exons and to detect insertions or deletions (indels). Furthermore, ContextMap depended on a modification of a specific Bowtie version. Thus, it could neither benefit of Bowtie updates nor of novel developments (e.g. improved running times) in the area of short read alignment software. For addressing these problems, we developed ContextMap 2, an extension of the original ContextMap algorithm. The key features of ContextMap 2 are the context-based resolution of ambiguous read alignments and the accurate detection of reads crossing an arbitrary number of exon-exon junctions or containing indels. Furthermore, a plug-in interface is provided that allows for the easy integration of alternative short read alignment programs (e.g. Bowtie 2 or BWA) into the mapping workflow. The performance of ContextMap 2 was evaluated on real-life as well as synthetic data and compared to other state-of-the-art mapping programs. We found that ContextMap 2 had very low rates of misplaced reads and incorrectly predicted junctions or indels. Additionally, recall values were as high as for the top competing methods. Moreover, the runtime of ContextMap 2 was at least two fold lower than for the best competitors. In addition to the mapping of sequencing reads to a single reference, the ContextMap approach allows the investigation of several potential read sources (e.g. the human host and infecting pathogens) in parallel. Thus, ContextMap can be applied to mine for infections or contaminations or to map data from meta-transcriptomic studies. Furthermore, we developed methods based on mapping-derived statistics that allow to assess confidence of mappings to identified species and to detect false positive hits. ContextMap was evaluated on three real-life data sets and results were compared to metagenomics tools. Here, we showed that ContextMap can successfully identify the species contained in a sample. Moreover, in contrast to most other metagenomics approaches, ContextMap also provides read mapping results to individual species. As a consequence, read mapping results determined by ContextMap can be used to study the gene expression of all species contained in a sample at the same time. Thus, ContextMap might be applied in clinical studies, in which the influence of infecting agents on host organisms is investigated. The methods presented in this thesis allow for an accurate and fast mapping of RNA-seq data. As the amount of available sequencing data increases constantly, these methods will likely become an important part of many RNA-seq data analyses and thus contribute valuably to research in the field of transcriptomics

Post-Transcriptional Regulation In The Drosophila Sex Determination Pathway

Author: Sturgill David
Publication venue
Publication date: 01/01/2012
Field of study

Sexually reproducing organisms produce two very different phenotypes (males and females), by differential deployment of essentially the same gene content. This dimorphism provides an excellent model to study how transcriptomes are differentially regulated, which is one of the central problems of biology. The core sex determination pathway of Drosophila is a well described cascade of transcriptional and post-transcriptional regulation, but knowledge of the downstream components is largely incomplete. High throughput technologies have provided great advances in understanding transcriptome regulation, but limits of the technology have lead to a focus on whole gene expression measurements, rather than post-transcriptional regulation. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms, potentially elucidating the post-transcriptional network. However, methods to analyze splicing are underdeveloped, and challenges in transcript assembly and quantification remain unresolved. This work describes the development of the Splicing Analysis Kit (Spanki) as a fast, open source, suite of tools that uses simulations based on real RNA-Seq data to characterize errors in a given dataset, and user tunable filters that minimize those errors. Spanki quantifies splicing differences in transcripts from the same loci within a sample, as well as between samples by using only those reads that directly assay splicing events (junction spanning reads). Despite the reliance on a fraction of the total data, sequencing depth typically generated in an RNA-Seq experiment is sufficient to identify differentially regulated splicing, and error profiles are superior. I demonstrate that this computational approach outperforms several commonly used approaches in an analysis of sex-differential splicing in Drosophila heads. Next I examine the effects of disrupting post-transcriptional regulation in Drosophila heads. I apply the Spanki software to analyze RNA-Seq data for mutant lines of two post-transcriptional regulators: Darkener of apricot (Doa) and found in neurons (fne). Doa, a serine-threonine kinase, regulates splicing by phosphorylating SR proteins, vital components of the splicing machinery. Found in neurons (fne) binds to transcripts and is involved in RNA metabolism. I demonstrate sex-differences in response to disruption of post-transcriptional regulation, and hypothesize that they are informative of sex-differentiation pathways. Finally, I examine the conservation of splicing regulation within the Drosophila lineage. I show that junction based splicing analysis is effective in making interspecific comparisons without the need for complete transcript models. I use these results to demonstrate the conservation of sex-differential splicing across 40 million years of evolution in 15 species in the Drosophila genus

Digital Repository at the University of Maryland

Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

Author: Schulz M.
Publication venue: Freie Universität
Publication date: 26/08/2010
Field of study

MPG.PuRe