Search CORE

8 research outputs found

RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

Author: A Ameur
A McPherson
A Mortazavi
A Sboner
Asim S. Siddiqui
B Li
Benjamin S. Kong
BJ Druker
BP Lewis
BP Rubin
C Adem
C Kumar-Sinha
C Lin
C Tognon
C Trapnell
C Trapnell
CA Maher
CA Maher
CA Westbrook
Catalin Barbacioru
Chieh-Yuan Li
D Zerbino
EL Kwak
ET Wang
F De Bona
F Denoeud
F Ozsolak
F Tang
Fiona C. Hyland
G Robertson
H Edgren
Heinz Breu
I Birol
J Wang
JD Rowley
Jeffrey K. Ichikawa
Jian Gu
Joel P. Brockman
John P. Bodeau
JP Koivunen
K Inaki
K Kannan
Kelli S. Bramlett
KF Au
KJ McKernan
KS Kosik
L Shi
Liviu Popescu
M Guttman
M Kinsella
M Krzywinski
M Nicolae
M Persson
M Yassour
Matthew W. Muller
MC Haffner
MF Berger
Milan Radovich
N Cloonan
N Cloonan
N Palanisamy
Nriti Garg
O Monni
OA Hampton
Onur Sakarya
P Shepherd
Paolo Vatta
Penn P. Whitley
RD Canales
Robert C. Nutter
S Perner
SA Tomlins
SG O'Brien
Sowmi Utiramerur
SR Knezevich
U Manber
U Nagalakshmi
Vidya Kudlingar
Weixiong Zhang
Y Hu
Y Surget-Groba
Yongzhi Chen
Yulei N. Wang
YW Asmann
Z Wang
Zheng Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Validated MCF-7 gene fusions and TaqMan expression ratios.

Notes: Each exon name (gene name-dash-exon-order) was obtained from RefSeq database. Inverted fusions are on same chromosome but different strands. Last four columns show the Cycle Threshold (CT) value in TaqMan assays. Lower CT values indicate higher expression.</p

FigShare

Fusion breakpoints are biased to 5′ end of the genes.

Histogram of order of 5′ (yellow) and 3′ (green) intron breakpoints for A. MCF-7, B. UHR and HBR combined gene fusions. Breakpoint is inferred to happen at the intron (X axis) following the exon that is fused. Y axis shows the count of breakpoints that are inferred to happen at numbered intron. C. Boxplot of the distribution of simulated gene fusion locations for each of the 23 genes in which a fusion was observed. Magenta star marks the location of the observed fusion, relative to the 5′ exon. 23 fusions correspond to the gene fusions from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002464#pcbi-1002464-t002" target="_blank">Table 2</a> (except for ESR1- C6orf97, and ADAMTS19- SLC27A6 alternatively spliced fusions merged into single data points).</p

FigShare

RNA-Seq mapping and splice junction detection methodology.

A. Four reads that span (spliced single reads), and three reads that bridge (paired-end reads) the junction are shown. The top chart shows a bird's eye view of the genomic alignments detected for seven pairs of reads between the two exons. Areas of the read highlighted in red correspond to colors that do not align to a genomic reference, and dots in the reference are unknown colors/bases. B. Mapping pipeline is reviewed in the Methods sections. Candidate junctions correspond to a sparse graph of junction evidences. After the candidates are found, splice junction and fusion predictions are made with optional quality thresholds. C. As a first step in SASR, 10 to 35 bp ends from each end of the exon are stored in two lexicographical dictionaries. Stored suffix starts are shown as a vertical stop and end with empty triangles. D. 10 base pairs from the left and right ends of the read (decamers) are searched in the 3′ and 5′ end dictionaries, respectively, with a binary string search. Decamers are matched without mismatches. Matching decamers are extended as possible (with up to two mismatches) to determine whether they cover the entire suffix. Mismatches are illustrated as vertical lines. Up to ten bases are clipped from the ends of the reads until a matching read is found. E. Decamer block size frequency in the hg18 RefSeq database.</p

FigShare

Improvements by junction confidence value and comparison to TopHat.

A. Logarithms of number of known and putative junctions are shown with yellow and blue bars respectively. The ratio of known over putative is shown with dashed line. Dataset consisted of 64,000 sample UHR junctions called with default thresholds. B. TopHat and Lifescope candidate calls were compared to each other and also to RefSeq database. TopHat junctions were filtered with score>5, and Lifescope junctions were filtered with 1-SR-1-PE threshold (requiring one span and one bridge evidence).</p

FigShare

Localization of gene fusions on specific chromosomal regions.

A. Whole genome and B. Chr 1, 17 and 20 gene fusions circular graph. Red lines represent inter-chromosomal gene fusions, blue lines represent inverted intra-chromosomal and black lines represent same-strand intra-chromosomal fusion events. Graphs were drawn with Circos software <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002464#pcbi.1002464-Krzywinski1" target="_blank">[61]</a>.</p

FigShare

Combined evidence improves specificity of splice and fusion detection.

Scatterplots show the increasing mapped coverage (x-axis) versus Left: Known RefSeq junctions; Middle: Putative junctions; Right: Fusion junctions. Top track shows results for UHR and bottom track for HBR. Three different evidence thresholds were compared: 1) red line: one SPAN (SR) evidence required for junction call, 2) magenta line: two SPAN (2-SR) evidences required for junction call, and 3) blue line: one SPAN and one BRIDGE evidence (1-SR-1-PE) required for junction call.</p

FigShare

Mapping and splicing statistics for paired-end runs.

Notes: Confidently aligned pairs was defined as primary alignments with PQV>10. 120 and 150 refer to insert size of RNA library. MCF-7 and MCF-7 -2 libraries were prepared separately from the same lot. Known splicing events are found in RefSeq database whereas putative splicing events were not.</p

FigShare