8 research outputs found

    RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

    Get PDF
    High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

    Fusion breakpoints are biased to 5β€² end of the genes.

    No full text
    <p>Histogram of order of 5β€² (yellow) and 3β€² (green) intron breakpoints for <b>A.</b> MCF-7, <b>B.</b> UHR and HBR combined gene fusions. Breakpoint is inferred to happen at the intron (X axis) following the exon that is fused. Y axis shows the count of breakpoints that are inferred to happen at numbered intron. <b>C.</b> Boxplot of the distribution of simulated gene fusion locations for each of the 23 genes in which a fusion was observed. Magenta star marks the location of the observed fusion, relative to the 5β€² exon. 23 fusions correspond to the gene fusions from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002464#pcbi-1002464-t002" target="_blank">Table 2</a> (except for <i>ESR1- C6orf97</i>, and <i>ADAMTS19- SLC27A6</i> alternatively spliced fusions merged into single data points).</p

    RNA-Seq mapping and splice junction detection methodology.

    No full text
    <p><b>A.</b> Four reads that span (spliced single reads), and three reads that bridge (paired-end reads) the junction are shown. The top chart shows a bird's eye view of the genomic alignments detected for seven pairs of reads between the two exons. Areas of the read highlighted in red correspond to colors that do not align to a genomic reference, and dots in the reference are unknown colors/bases. <b>B.</b> Mapping pipeline is reviewed in the Methods sections. Candidate junctions correspond to a sparse graph of junction evidences. After the candidates are found, splice junction and fusion predictions are made with optional quality thresholds. <b>C.</b> As a first step in SASR, 10 to 35 bp ends from each end of the exon are stored in two lexicographical dictionaries. Stored suffix starts are shown as a vertical stop and end with empty triangles. <b>D.</b> 10 base pairs from the left and right ends of the read (decamers) are searched in the 3β€² and 5β€² end dictionaries, respectively, with a binary string search. Decamers are matched without mismatches. Matching decamers are extended as possible (with up to two mismatches) to determine whether they cover the entire suffix. Mismatches are illustrated as vertical lines. Up to ten bases are clipped from the ends of the reads until a matching read is found. <b>E.</b> Decamer block size frequency in the hg18 RefSeq database.</p

    Improvements by junction confidence value and comparison to TopHat.

    No full text
    <p><b>A.</b> Logarithms of number of known and putative junctions are shown with yellow and blue bars respectively. The ratio of known over putative is shown with dashed line. Dataset consisted of 64,000 sample UHR junctions called with default thresholds. <b>B.</b> TopHat and Lifescope candidate calls were compared to each other and also to RefSeq database. TopHat junctions were filtered with score>5, and Lifescope junctions were filtered with 1-SR-1-PE threshold (requiring one span and one bridge evidence).</p

    Localization of gene fusions on specific chromosomal regions.

    No full text
    <p><b>A.</b> Whole genome and <b>B.</b> Chr 1, 17 and 20 gene fusions circular graph. Red lines represent inter-chromosomal gene fusions, blue lines represent inverted intra-chromosomal and black lines represent same-strand intra-chromosomal fusion events. Graphs were drawn with Circos software <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002464#pcbi.1002464-Krzywinski1" target="_blank">[61]</a>.</p

    Combined evidence improves specificity of splice and fusion detection.

    No full text
    <p>Scatterplots show the increasing mapped coverage (x-axis) versus Left: Known RefSeq junctions; Middle: Putative junctions; Right: Fusion junctions. Top track shows results for UHR and bottom track for HBR. Three different evidence thresholds were compared: 1) red line: one SPAN (SR) evidence required for junction call, 2) magenta line: two SPAN (2-SR) evidences required for junction call, and 3) blue line: one SPAN and one BRIDGE evidence (1-SR-1-PE) required for junction call.</p
    corecore