19 research outputs found

    Data from: Fusion transcript discovery in formalin-fixed paraffin-embedded human breast cancer tissues reveals a link to tumor progression

    No full text
    The identification of gene fusions promises to play an important role in personalized cancer treatment decisions. Many rare gene fusion events have been identified in fresh frozen solid tumors from common cancers employing next-generation sequencing technology. However the ability to detect transcripts from gene fusions in RNA isolated from formalin-fixed paraffin-embedded (FFPE) tumor tissues, which exist in very large sample repositories for which disease outcome is known, is still limited due to the low complexity of FFPE libraries and the lack of appropriate bioinformatics methods. We sought to develop a bioinformatics method, named gFuse, to detect fusion transcripts in FFPE tumor tissues. An integrated, cohort based strategy has been used in gFuse to examine single-end 50 base pair (bp) reads generated from FFPE RNA-Sequencing (RNA-Seq) datasets employing two breast cancer cohorts of 136 and 76 patients. In total, 118 fusion events were detected transcriptome-wide at base-pair resolution across the 212 samples. We selected 77 candidate fusions based on their biological relevance to cancer and supported 61% of these using TaqMan assays. Direct sequencing of 19 of the fusion sequences identified by TaqMan confirmed them. Three unique fused gene pairs were recurrent across the 212 patients with 6, 3, 2 individuals harboring these fusions respectively. We show here that a high frequency of fusion transcripts detected at the whole transcriptome level correlates with poor outcome (P<0.0005) in human breast cancer patients. This study demonstrates the ability to detect fusion transcripts as biomarkers from archival FFPE tissues, and the potential prognostic value of the fusion transcripts detected

    SAM files containing RNA-seq reads supporting fusion transcripts in Providence and Rush cohorts

    No full text
    These SAM files consist 2 directories, one for Providence cohort, and another for Rush cohort. Each sample has one individual SAM file under each cohort directory. The data were generated by gFuse, a RNA-seq fusion detection bioinformatics method, from archived FFPE human breast cancer patients. As a part of gFuse pipeline, the SAM files were generated by GSNAP via mapping supporting RNA-seq reads against the fasta sequences of 5 template sets of 100 unique fusion junctions in the length of 100bp, which are available here in the file “100fusiontemplates.fa”. The sequences and sample IDs are also described in Supplementary Table 1 of the manuscript titled “Fusion transcript discovery in formalin-fixed paraffin-embedded human breast cancer tissues reveals a link to tumor progression”

    Data from: Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue

    No full text
    RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR < 10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts

    Data from: Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue

    Get PDF
    RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR < 10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts

    Fusion Transcript Discovery in Formalin-Fixed Paraffin-Embedded Human Breast Cancer Tissues Reveals a Link to Tumor Progression

    No full text
    <div><p>The identification of gene fusions promises to play an important role in personalized cancer treatment decisions. Many rare gene fusion events have been identified in fresh frozen solid tumors from common cancers employing next-generation sequencing technology. However the ability to detect transcripts from gene fusions in RNA isolated from formalin-fixed paraffin-embedded (FFPE) tumor tissues, which exist in very large sample repositories for which disease outcome is known, is still limited due to the low complexity of FFPE libraries and the lack of appropriate bioinformatics methods. We sought to develop a bioinformatics method, named gFuse, to detect fusion transcripts in FFPE tumor tissues. An integrated, cohort based strategy has been used in gFuse to examine single-end 50 base pair (bp) reads generated from FFPE RNA-Sequencing (RNA-Seq) datasets employing two breast cancer cohorts of 136 and 76 patients. In total, 118 fusion events were detected transcriptome-wide at base-pair resolution across the 212 samples. We selected 77 candidate fusions based on their biological relevance to cancer and supported 61% of these using TaqMan assays. Direct sequencing of 19 of the fusion sequences identified by TaqMan confirmed them. Three unique fused gene pairs were recurrent across the 212 patients with 6, 3, 2 individuals harboring these fusions respectively. We show here that a high frequency of fusion transcripts detected at the whole transcriptome level correlates with poor outcome (P<0.0005) in human breast cancer patients. This study demonstrates the ability to detect fusion transcripts as biomarkers from archival FFPE tissues, and the potential prognostic value of the fusion transcripts detected.</p></div

    The utilization of a five template set and expression profiling for fusion transcript detection.

    No full text
    <p>A. The concept of five template set is illustrated with six RNA transcripts for a fusion transcript in a FFPE RNA sample. Each template is numbered under lines around the corresponding RNA sequence. The preserved and discarded sides of donor or acceptor are indicated by arrowed lines indicating transcription directions above each pre-mRNA. The red blocks are DNA breakpoints. The interrupt ratio (IR) is calculated by using the * marked preserved and discarded sides accordingly for donor or acceptor fusion genes. B. All supporting RNA-Seq split reads are aligned to five templates of fusion RABEP1->DNAH9 in the Providence sample CSG. Each template is numbered according to <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0094202#pone-0094202-g001" target="_blank">Figure 1A</a>. The vertical line indicates the junction site. C. Two samples are shown as outliers (solid red dots) when the gene expression levels of donor RABEP1 are plotted against acceptor DNAH9 in the Providence cohort. The expression levels are log2 base counts normalized by library size factors. TaqMan tested negative samples are labeled as solid black dots. D. Exon and intron expression levels of acceptor DNAH9 in the Providence cohort show the interrupted expression pattern in samples CSG and ECI at the predicted fusion junction site (orange line). The base counts of each exon and intron are normalized by library size, then center-scaled across the Providence cohort. The vertical arrow indicates RNA samples from low to high IR values of DNAH9. The exons (black ticks) and introns of DNAH9 are ordered according to the transcription direction (horizontal arrow), with the intron harboring DNA breakpoint omitted in Figure 2D and 2E. E. The base counts of exons and introns of acceptor DNAH9 in two samples show interrupted expression patterns at the fusion junction site. The base counts are normalized by library size then divided by length of each exon or intron.</p

    Fusion junctions are confirmed by PGM sequencing of PCR amplicons.

    No full text
    #<p>In these 7 PGM libraries containing a single PCR reaction with an unique PGM barcode, the fusion amplicons identify the most prevalent clonal population in the library. The detailed experimental results including amplicon sequences are in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0094202#pone.0094202.s005" target="_blank">Table S3</a>.</p

    Breast cancers with high fusion frequency have poor prognosis.

    No full text
    <p>A. The distributions of block age, clinical recurrence and ER status are shown according to fusion number categories in Providence and Rush cohorts. The archived block age is plotted as mean and standard deviation for each category. ER status was assessed by immunohistochemistry. The patient number for each category is labeled accordingly. B. Kaplan-Meier plots of each fusion number category show Providence patients with multiple fusions had poor prognosis, and a similar trend exists for Rush patients. The log-rank p-values are indicated in Kaplan-Meier plots. C. Kaplan-Meier plot with the 36 TaqMan supported fusion transcripts in the Providence cohort. There are another 11 TaqMan supported fusion transcripts from the Rush cohort but they are too few to generate a meaningful Kaplan-Meier plot.</p
    corecore