8 research outputs found

    InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data

    No full text
    <div><p>Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from <a href="http:/bitbucket.org/kokonech/infusion" target="_blank">http:/bitbucket.org/kokonech/infusion</a>.</p></div

    Clustering of breakpoint candidates.

    No full text
    <p>The arrows of the SPLIT alignments and the dot lines of BRIDGE alignments demonstrate the direction to the breakpoint position. (A) Initial clusters are created from intersecting SPLIT and BRIDGE alignments. (B) Cluster 4 is separated from cluster 1 based on the directionality, which is inferred from the alignment strand and order. (C) Cluster 5 is separated from cluster 2 based on the putative breakpoint position. Alignments belonging to the same breakpoint candidate have the same color. BRIDGE reads are marked with b, SPLIT reads are marked with s. A SPLIT read assumes an exact breakpoint, while a BRIDGE read assumes an approximate breakpoint within allowed insert size distance.</p

    TMPRSS2-ERG fusion isoforms.

    No full text
    <p>(A) Genomic structure of the TMPRSS2–ERG fusion transcripts discovered from deep sequencing data by InFusion. Isoform 3 is a known transcript, while isoforms 1 and 2 are novel. Transcript names are taken from the Ensembl v.68 database. (B) RT-PCR validation of isoforms in VCaP, LNCaP, RWPE-1 and PrEC cell lines; NTC = no template control. The PCR primer design was based on the output from the InFusion pipeline. In order to detect only one product, one PCR primer specific for Isoform 3 was designed to cover the fusion junction site. A 50 bp DNA ladder was co-run as size marker; bright bands indicate 250 bp and 500 bp. (C) Relative expression levels of the fusion isoforms as measured by qRT-PCR. All measurements were performed in triplicate, mean expression values were computed relative to GAPDH. Plotted values are normalized to the computed expression of isoform 3. (D) Expression levels of isoforms estimated in RPKM under the assumption of uniform coverage.</p

    Example of fusion detection from RNA-seq data.

    No full text
    <p>The fusion consists of exons 1 and 2 from gene A and exons 3 and 4 from gene B. SPLIT reads cover the junction point, while BRIDGE reads span the junction point within the insert region, which is not sequenced.</p

    Comparison of recall and precision on simulated data.

    No full text
    <p>Recall and precision are plotted based on the number of supporting reads. For each given threshold we selected fusions that by simulation design have a number of supporting reads less or equal to the threshold. The number of true positive events was computed for every tool using only the selected fusions.</p
    corecore