Search CORE

1,175 research outputs found

Transcriptome annotation using tandem SAGE tags

Author: Anthony Boureux
Bertone
Bertone
Bertone
Brenner
Carninci
Chen
Cheng
Claverie
Cummins
ENCODE
Eric Rivals
Fabien Pierrat
Florence Ottones
Florence Ruffle
Ge
Horspool
Huttenhofer
Jacques Marti
Johnson
Jorma Tarhio
Jurka
Margulies
Mireille Lejeune
Mockler
Ng
Nielsen
Oscar Pecharromàn Pérez
Piquemal
Quéré
Quéré
Rinn
Saha
Semon
Shendure
Silva
Tarhio
Thérèse Commes
Velculescu
Virlon
Wheeler
Woelk
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation

An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes

Author: Grosse Ann S.
Kucukural Alper
Lawson Nathan D.
Li Rui
Shin Masahiro
Stone Oliver A.
Yukselen Onur
Zhu Lihua Julie
Publication venue: eScholarship@UMassChan
Publication date: 01/01/2020
Field of study

The zebrafish is ideal for studying embryogenesis and is increasingly applied to model human disease. In these contexts, RNA-sequencing (RNA-seq) provides mechanistic insights by identifying transcriptome changes between experimental conditions. Application of RNA-seq relies on accurate transcript annotation for a genome of interest. Here, we find discrepancies in analysis from RNA-seq datasets quantified using Ensembl and RefSeq zebrafish annotations. These issues were due, in part, to variably annotated 3\u27 untranslated regions and thousands of gene models missing from each annotation. Since these discrepancies could compromise downstream analyses and biological reproducibility, we built a more comprehensive zebrafish transcriptome annotation that addresses these deficiencies. Our annotation improves detection of cell type-specific genes in both bulk and single cell RNA-seq datasets, where it also improves resolution of cell clustering. Thus, we demonstrate that our new transcriptome annotation can outperform existing annotations, providing an important resource for zebrafish researchers

eScholarship@UMMS

Oxford University Research Archive

Methods to study splicing from high-throughput RNA Sequencing data

Author: A Ameur
A Bhasi
A Dobin
A Mortazavi
A Oshlack
A Roberts
A Roberts
AM Mezlini
AN Brooks
B Jackson
B Kakaradov
B Langmead
B Li
B Li
BJ Haas
BJ Haas
C Trapnell
C Trapnell
C Trapnell
D Hiller
D Singh
DL Wood
DW Bryant
E Eyras
E Lee
E Turro
ET Wang
F Birzele
F Bona De
F Denoeud
F Tang
G Robertson
G Xu
GA Sacomoto
GR Grant
GS Slater
H Bao
H Jiang
H Jiang
H Kim
H Richard
J Behr
J Du
J Feng
J Hu
J Lovén
J Martin
J Salzman
J Seok
J Seok
J Wu
J Wu
JE Allen
JJ Li
JP Venables
K Schneeberger
K Wang
KD Hansen
KF Au
KL Howe
KM Borgwardt
L Chen
L Chen
L Wang
L Wang
LY Chen
M Aschoff
M Fiume
M Garber
M Griffith
M Guttman
M Stanke
M Stanke
M Sultan
MC Ryan
MF Rogers
MG Grabherr
MH Schulz
MT Dimon
N Cloonan
N Cloonan
N Deng
N Leng
N Nicolae
N Philippe
N Vijay
NA Fonseca
O Stegle
P Drewe
P Glaus
PL Martelli
PP Labaj
Q Liu
Q Liu
Q Pan
QY Zhao
R Bohnert
R Guigó
R Li
S Anders
S Djebali
S Filichkin
S Heber
S Huang
S Lee
S Mangul
S Marco-Sola
S Shen
S Sonnenburg
S Srivastava
S Tang
S Zheng
SB Montgomery
SH Nagaraj
SK Lou
T Bonfert
TA Clark
TD Wu
TD Wu
W Li
W Li
W Wang
WJ Kent
Y Hu
Y Katz
Y Li
Y Liao
Y Surget-Groba
Y Xing
Y Xing
Y Zhang
Z Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2015
Field of study

The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

arXiv.org e-Print Archive

Crossref

Comparative validation of the D. melanogaster modENCODE transcriptome annotation

Author: Chen Zhen-Xia
Sternberg Paul W.
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/07/2014
Field of study

Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community

Caltech Authors

Recommended from our members

Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA.

Author: Byrne Ashley
Cole Charles
Green Richard E
Palmer Theron
Schmitz Robert J
Volden Roger
Vollmers Christopher
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples

eScholarship - University of California

IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity

Author: Chen Ting-Huei
Chu Haitao
Crowley James J.
de Villena Fernando Pardo-Manuel
Huang Shunping
Kuan Pei-Fen
Li Yuan
Liu Yufeng
McMillan Leonard
Miller Darla
Shaw Ginger
Sullivan Patrick F.
Sun Wei
Wu Yichao
Zhabotynsky Vasyl
Zhou Hua
Zou Fei
Publication venue
Publication date: 29/10/2014
Field of study

We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing paternal and maternal allele of one individual or comparing tumor and normal sample of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

eScholarship - University of California

FigShare

Evaluating Characteristics of De Novo Assembly Software on 454 Transcriptome Data: A Simulation Approach

Author: Bornberg-Bauer Erich
Feulner Philine G. D.
Mundry Marvin
Sammeth Michael
Publication venue: Public Library of Science
Publication date: 27/02/2012
Field of study

Background: The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance of different software to establish a best practice for transcriptome assembly. Here, we adapted a simulation approach to evaluate specific features of assembly programs on 454 data. The novelty of our study is that the simulation allows us to calculate a model assembly as reference point for comparison. Findings: The simulation approach allows us to compare basic metrics of assemblies computed by different software applications (CAP3, MIRA, Newbler, and Oases) to a known optimal solution. We found MIRA and CAP3 are conservative in merging reads. This resulted in comparably high number of short contigs. In contrast, Newbler more readily merged reads into longer contigs, while Oases produced the overall shortest assembly. Due to the simulation approach, reads could be traced back to their correct placement within the transcriptome. Together with mapping reads onto the assembled contigs, we were able to evaluate ambiguity in the assemblies. This analysis further supported the conservative nature of MIRA and CAP3, which resulted in low proportions of chimeric contigs, but high redundancy. Newbler produced less redundancy, but the proportion of chimeric contigs was higher. Conclusion: Our evaluation of four assemblers suggested that MIRA and Newbler slightly outperformed the othe

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Münstersches Informations und Archivsystem für Multimediale Inhalte

Keep Me Around: Intron Retention Detection and Analysis

Author: Conboy John G.
Pachter Lior
Pimentel Harold
Publication venue
Publication date: 02/10/2015
Field of study

We present a tool, keep me around (kma), a suite of python scripts and an R package that finds retained introns in RNA-Seq experiments and incorporates biological replicates to reduce the number of false positives when detecting retention events. kma uses the results of existing quantification tools that probabilistically assign multi-mapping reads, thus interfacing easily with transcript quantification pipelines. The data is represented in a convenient, database style format that allows for easy aggregation across introns, genes, samples, and conditions to allow for further exploratory analysis

arXiv.org e-Print Archive

Caltech Authors

Computational methods for transcriptome annotation and quantification using RNA-seq

Author: Garber Manuel
Grabherr Manfred G.
Guttman Mitchell
Trapnell Cole
Publication venue: Nature Publishing Group
Publication date: 01/06/2011
Field of study

High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications

Caltech Authors