Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

A Derti; A Dobin; A Sherstnev; AI Reid; Alexander Sherstnev; B Langmead; BJ Haas; BJ Haas; BS Yoon; C Burge; C Cole; C Luo; C Trapnell; CE Joyce; CH Jan; Christian Cole; Céline Duc; D Brawand; DM Church; F Ozsolak; F Ozsolak; F Ozsolak; Geoffrey J. Barton; Gordon G. Simpson; H Stroud; H Zou; HK Saini; I Ulitsky; J Bracht; J Harrow; JE Collins; JH Yang; Junfang Song; Kate G. Storey; L Jiang; M Fujii; M Garber; M Yandell; MB Gerstein; Nicholas J. Schurch; P Lamesch; PE Boardman; Sara J. Brown; T Pelissier; Thomas Preiss; TS Becker; V Curwen; V Hamburger; W. H. Irwin McLean; X Cai; Y Kurihara; Y Lee; Z Moqtaderi

research

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

Authors: A Derti
A Dobin
A Sherstnev
AI Reid
Alexander Sherstnev
B Langmead
BJ Haas
BJ Haas
BS Yoon
C Burge
C Cole
C Luo
C Trapnell
CE Joyce
CH Jan
Christian Cole
Céline Duc
D Brawand
DM Church
F Ozsolak
F Ozsolak
F Ozsolak
Geoffrey J. Barton
Gordon G. Simpson
H Stroud
H Zou
HK Saini
I Ulitsky
J Bracht
J Harrow
JE Collins
JH Yang
Junfang Song
Kate G. Storey
L Jiang
M Fujii
M Garber
M Yandell
MB Gerstein
Nicholas J. Schurch
P Lamesch
PE Boardman
Sara J. Brown
T Pelissier
Thomas Preiss
TS Becker
V Curwen
V Hamburger
W. H. Irwin McLean
X Cai
Y Kurihara
Y Lee
Z Moqtaderi
Publication date: 11 November 2013
Publisher: 'Public Library of Science (PLoS)'
Doi

Abstract

The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure