8 research outputs found

    deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

    Get PDF
    Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes

    deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

    Get PDF
    Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes

    A novel statistical method for the accurate identification of RNA-edits with application to human cancers

    No full text
    RNA-editing is the post-transcriptional, enzymatic modification of RNA molecules resulting in an altered nucleotide sequence. These modifications play a critical role in mammalian tissues and are essential for proper function of liver and neuronal development, among other processes. The advent of high-throughput sequencing (HTS) technologies (e.g. Illumina HiSeq) has renewed interest in RNA-editing discovery due to unprecedented opportunities for simultaneous interrogation of whole genome and transcriptome sequences. In the past several months a number of studies have been published describing methods and results of RNA-editing discovery in HTS data. These methods have been ad hoc approaches based on repurposing SNP calling tools designed for genome-based variant detection. However, the statistical properties of RNA-editing warrant specialized analytical strategies that leverage the non-uniform substitution distributions inherent in RNA-editing processes. A novel statistical framework, called Auditor, that simultaneously analyzes the genomic and transcriptomic base-counts and infers the likelihood of an RNA-edit at each position in the transcriptome is reported. This model leverages the inherent correlation present in the RNA and DNA sequence while encoding the non-uniform substitution distributions induced by RNA-editing, conferring increased sensitivity. Further, a Random-Forest based technical artifact removal tool that accurately identifies sequencing and alignment errors has been implemented, greatly increasing the specificity of the method. The combination of these approaches leads to a robust, principled method that accurately detects RNA-edits in the presence of both biological and technical noise. It is systematically shown, in both a simulation study and on real matched whole genome and transcriptome data generated from 11 lymphoma samples, that Auditor significantly outperforms similar, but simpler statistical frameworks, including a Samtools/bcftools based approach that is similar to a recently published study. Finally by profiling 11 diffuse large B-cell lymphomas and 16 triple negative breast cancers with Auditor, it is shown that RNA-editing is an active process in human malignancies. Surprisingly, consistent patterns of nucleotide substitutions and regional enrichment of RNA-edits in 3 UTRs suggests that RNA-editing processes are invariant between cell lineages and between tumours of similar histological subtypes and even cancers from distinct tissues of origin. iiScience, Faculty ofGraduat

    The clonal and mutational evolution spectrum of primary triple-negative breast cancers

    No full text
    Primary triple-negative breast cancers (TNBCs), a tumour type defined by lack of oestrogen receptor, progesterone receptor and ERBB2 gene amplification, represent approximately 16% of all breast cancers. Here we show in 104 TNBC cases that at the time of diagnosis these cancers exhibit a wide and continuous spectrum of genomic evolution, with some having only a handful of coding somatic aberrations in a few pathways, whereas others contain hundreds of coding somatic mutations. High-throughput RNA sequencing (RNA-seq) revealed that only approximately 36% of mutations are expressed. Using deep re-sequencing measurements of allelic abundance for 2,414 somatic mutations, we determine for the first time-to our knowledge-in an epithelial tumour subtype, the relative abundance of clonal frequencies among cases representative of the population. We show that TNBCs vary widely in their clonal frequencies at the time of diagnosis, with the basal subtype of TNBC showing more variation than non-basal TNBC. Although p53 (also known as TP53), PIK3CA and PTEN somatic mutations seem to be clonally dominant compared to other genes, in some tumours their clonal frequencies are incompatible with founder status. Mutations in cytoskeletal, cell shape and motility proteins occurred at lower clonal frequencies, suggesting that they occurred later during tumour progression. Taken together, our results show that understanding the biology and therapeutic responses of patients with TNBC will require the determination of individual tumour clonal genotypes. © 2012 Macmillan Publishers Limited. All rights reserved
    corecore