9 research outputs found

    FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data

    Get PDF
    The hallmarks of many haematological malignancies and solid tumours are chromosomal translocations, which may lead to gene fusions. Recently, next-generation sequencing techniques at the transcriptome level (RNA-Seq) have been used to verify known and discover novel transcribed gene fusions. We present FusionFinder, a Perl-based software designed to automate the discovery of candidate gene fusion partners from single-end (SE) or paired-end (PE) RNA-Seq read data. FusionFinder was applied to data from a previously published analysis of the K562 chronic myeloid leukaemia (CML) cell line. Using FusionFinder we successfully replicated the findings of this study and detected additional previously unreported fusion genes in their dataset, which were confirmed experimentally. These included two isoforms of a fusion involving the genes BRK1 and VHL, whose co-deletion has previously been associated with the prevalence and severity of renal-cell carcinoma. FusionFinder is made freely available for non-commercial use and can be downloaded from the project website (http://bioinformatics.childhealthresearch.org.au/software/fusionfinder/)

    Summary of the overall comparative performance of FusionFinder, FusionMap and Tophat-Fusion on a simulated dataset.

    No full text
    <p>A total of 55 fusion genes were simulated. Sensitivity and PPV measures were compiled from predicted fusion genes evidenced by 1 read or more (i.e. all data). True positive fusions/reads relate to the accurate prediction of simulated fusions whereas false positive fusions/reads relate to all other predictions including synonymous fusions. For FusionFinder, all false positive genes and consequently all false positive reads were from synonymous fusions (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0039987#pone.0039987.s002" target="_blank">Table S2b</a>). FusionFinder detects more simulated fusions and significantly fewer false positives than FusionMap with consistently greater sensitivity and PPV. FusionFinder showed a higher sensitivity and comparable PPV to Tophat-Fusion.</p

    Comparison of sensitivity and PPV for FusionFinder, FusionMap and Tophat-Fusion.

    No full text
    <p>To compare the sensitivity and PPV of FusionFinder, FusionMap and Tophat-Fusion to detect fusion genes, each software was used to analyse a randomly generated dataset simulating normal genes and 55 fusion genes. Calculations of sensitivity and PPV were made for subgroups of the results based on the number of reads evidencing the fusion genes predicted by each software. FusionFinder shows consistently higher sensitivity than both FusionMap and Tophat-Fusion and shows a generally higher PPV than FusionMap and similar PPV to Tophat-Fusion.</p

    Summary file showing the 7 candidates from the FusionFinder analysis of the MCF-7 Breast Cancer cell line paired-end dataset.

    No full text
    <p>Displayed are the Ensembl gene IDs and HGNC IDs of the G1:G2 pair, their chromosome number, the total number of read pairs providing evidence for the G1:G2 pair in question (totalreads), the number of aligned blocks on each gene (G1_blocks/G2_blocks) the number of potential isoforms identified and the assigned category of each candidate. Candidates numbered 1–6 have previously been reported.</p

    Summary file showing the 9 candidates from the FusionFinder analysis of the Levin dataset.

    No full text
    <p>Displayed are the Ensembl gene IDs and HGNC IDs of the G1:G2 pair, their chromosome number, the total number of read pairs providing evidence for the G1:G2 pair in question (totalreads), the number of aligned blocks on each gene (G1_blocks/G2_blocks) the number of potential isoforms identified and the assigned category of each candidate. The index number is used to refer to particular candidates in the main article. Candidates numbered 1–5 and 7 were previously reported by Levin <i>et al</i>.</p

    FusionFinder rationale.

    No full text
    <p>A) RNA-Seq produces millions of short reads, some of which will span the exon boundaries of hypothetical fusion transcripts between Gene 1 and Gene 2. Two different fusion isoforms involving different exons are shown, left and right, along with a single read that spans each breakpoint. Reads are split into smaller pseudo PE reads which can be aligned independently to a reference transcriptome. B) Alignment of pseudo PE reads against the reference transcriptome. One of each pair aligns to an exon on Gene 1 and the other aligns to an exon on Gene 2. Repeating this process for all other RNA-Seq reads creates “alignment blocks” from overlapping groups of aligned 5′ and 3′ pseudo PE reads and their genomic coordinates. Multiple alignment blocks on either gene (as for Gene 1 in the example) provide evidence for the existence of different isoforms of the fusion.</p

    FusionFinder isoforms file for the six common fusion candidates reported by both FusionFinder and Levin <i>et al</i>.

    No full text
    <p>Displayed are the Ensembl gene IDs and HGNC IDs of the G1:G2 pair, their chromosome number, the number of read pairs providing evidence for the particular isoform (isoreads), the total number of read pairs providing evidence for the G1:G2 pair in question (totalreads), the genomic coordinates for the aligned block on G1 or G2 (G1_block, G2_block respectively) the corresponding closest Ensembl exon to the aligned block (G1_exon/G2_exon) the proximity (distance in bp) of the aligned block to the end or start of the corresponding exon (G1_expos, G2_expos respectively) and the strand where each gene of the pair is located (G1_str, G2_str respectively). Note the novel second isoform of the <i>PRIM1:NACA</i> fusion discovered in the current analysis.</p

    Performance comparison of FusionFinder, FusionMap and Tophat-Fusion in an analysis of the Levin dataset.

    No full text
    <p>Data based on the analysis of the Levin dataset comprising 14 million 76 mer reads, using either a single 2.4 GHz core or 5 cores of a 64-bit linux machine with multiple AMD Opteron 8431 CPUs and 32GB memory. The parameters used for each analysis are in the main text and the raw results for each analysis can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0039987#pone-0039987-t001" target="_blank">Table 1</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0039987#pone.0039987.s001" target="_blank">Tables S1</a> a and b.</p

    Identification of the transcript breakpoint in each <i>PRIM1</i>:<i>NACA</i> isoform.

    No full text
    <p>Alignments of the full length 76mer reads providing evidence for the two isoforms of <i>PRIM1</i>:<i>NACA</i> (i.e. as originally identified by Levin et al, top, and the novel isoform identified by FusionFinder, bottom) against the last 30 bases of the implicated <i>PRIM1</i> (G1) exon and the first 30 bases of the <i>NACA</i> (G2) exon. The transcript breakpoint can be clearly seen where the <i>PRIM1</i> exon ends and the <i>NACA</i> exon begins. Also displayed is an in-frame translation of the G1 exon from wild type <i>PRIM1</i>, running into the fused <i>NACA</i> exon. Both isoforms retain an open reading frame despite different exon usage.</p
    corecore