44 research outputs found
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data
We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy
The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively. Here this "proteogenomics" approach was applied to 122 treatment-naive primary breast cancers accrued to preserve post-translational modifications, including protein phosphorylation and acetylation. Proteogenomics challenged standard breast cancer diagnoses, provided detailed analysis of the ERBB2 amplicon, defined tumor subsets that could benefit from immune checkpoint therapy, and allowed more accurate assessment of Rb status for prediction of CDK4/6 inhibitor responsiveness. Phosphoproteomics profiles uncovered novel associations between tumor suppressor loss and targetable kinases. Acetylproteome analysis highlighted acetylation on key nuclear proteins involved in the DNA damage response and revealed cross-talk between cytoplasmic and mitochondrial acetylation and metabolism. Our results underscore the potential of proteogenomics for clinical investigation of breast cancer through more accurate annotation of targetable pathways and biological features of this remarkably heterogeneous malignancy
Integrated Molecular Characterization of Testicular Germ Cell Tumors
We studied 137 primary testicular germ cell tumors (TGCTs) using high-dimensional assays of genomic, epigenomic, transcriptomic, and proteomic features. These tumors exhibited high aneuploidy and a paucity of somatic mutations. Somatic mutation of only three genes achieved significance—KIT, KRAS, and NRAS—exclusively in samples with seminoma components. Integrated analyses identified distinct molecular patterns that characterized the major recognized histologic subtypes of TGCT: seminoma, embryonal carcinoma, yolk sac tumor, and teratoma. Striking differences in global DNA methylation and microRNA expression between histology subtypes highlight a likely role of epigenomic processes in determining histologic fates in TGCTs. We also identified a subset of pure seminomas defined by KIT mutations, increased immune infiltration, globally demethylated DNA, and decreased KRAS copy number. We report potential biomarkers for risk stratification, such as miRNA specifically expressed in teratoma, and others with molecular diagnostic potential, such as CpH (CpA/CpC/CpT) methylation identifying embryonal carcinomas. Shen et al. identify molecular characteristics that classify testicular germ cell tumor types, including a separate subset of seminomas defined by KIT mutations. This provides a set of candidate biomarkers for risk stratification and potential therapeutic targeting
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.
Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy
Pathogenic Germline Variants in 10,389 Adult Cancers
We conducted the largest investigation of predisposition variants in cancer to date, discovering 853 pathogenic or likely pathogenic variants in 8% of 10,389 cases from 33 cancer types. Twenty-one genes showed single or cross-cancer associations, including novel associations of SDHA in melanoma and PALB2 in stomach adenocarcinoma. The 659 predisposition variants and 18 additional large deletions in tumor suppressors, including ATM, BRCA1, and NF1, showed low gene expression and frequent (43%) loss of heterozygosity or biallelic two-hit events. We also discovered 33 such variants in oncogenes, including missenses in MET, RET, and PTPN11 associated with high gene expression. We nominated 47 additional predisposition variants from prioritized VUSs supported by multiple evidences involving case-control frequency, loss of heterozygosity, expression effect, and co-localization with mutations and modified residues. Our integrative approach links rare predisposition variants to functional consequences, informing future guidelines of variant classification and germline genetic testing in cancer. A pan-cancer analysis identifies hundreds of predisposing germline variants
Somatic Mutational Landscape of Splicing Factor Genes and Their Functional Consequences across 33 Cancer Types
Hotspot mutations in splicing factor genes have been recently reported at high frequency in hematological malignancies, suggesting the importance of RNA splicing in cancer. We analyzed whole-exome sequencing data across 33 tumor types in The Cancer Genome Atlas (TCGA), and we identified 119 splicing factor genes with significant non-silent mutation patterns, including mutation over-representation, recurrent loss of function (tumor suppressor-like), or hotspot mutation profile (oncogene-like). Furthermore, RNA sequencing analysis revealed altered splicing events associated with selected splicing factor mutations. In addition, we were able to identify common gene pathway profiles associated with the presence of these mutations. Our analysis suggests that somatic alteration of genes involved in the RNA-splicing process is common in cancer and may represent an underappreciated hallmark of tumorigenesis
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas
Although the MYC oncogene has been implicated in cancer, a systematic assessment of alterations of MYC, related transcription factors, and co-regulatory proteins, forming the proximal MYC network (PMN), across human cancers is lacking. Using computational approaches, we define genomic and proteomic features associated with MYC and the PMN across the 33 cancers of The Cancer Genome Atlas. Pan-cancer, 28% of all samples had at least one of the MYC paralogs amplified. In contrast, the MYC antagonists MGA and MNT were the most frequently mutated or deleted members, proposing a role as tumor suppressors. MYC alterations were mutually exclusive with PIK3CA, PTEN, APC, or BRAF alterations, suggesting that MYC is a distinct oncogenic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such as immune response and growth factor signaling; chromatin, translation, and DNA replication/repair were conserved pan-cancer. This analysis reveals insights into MYC biology and is a reference for biomarkers and therapeutics for cancers with alterations of MYC or the PMN. We present a computational study determining the frequency and extent of alterations of the MYC network across the 33 human cancers of TCGA. These data, together with MYC, positively correlated pathways as well as mutually exclusive cancer genes, will be a resource for understanding MYC-driven cancers and designing of therapeutics
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
Long noncoding RNAs (lncRNAs) are commonly dysregulated in tumors, but only a handful are known to play pathophysiological roles in cancer. We inferred lncRNAs that dysregulate cancer pathways, oncogenes, and tumor suppressors (cancer genes) by modeling their effects on the activity of transcription factors, RNA-binding proteins, and microRNAs in 5,185 TCGA tumors and 1,019 ENCODE assays. Our predictions included hundreds of candidate onco- and tumor-suppressor lncRNAs (cancer lncRNAs) whose somatic alterations account for the dysregulation of dozens of cancer genes and pathways in each of 14 tumor contexts. To demonstrate proof of concept, we showed that perturbations targeting OIP5-AS1 (an inferred tumor suppressor) and TUG1 and WT1-AS (inferred onco-lncRNAs) dysregulated cancer genes and altered proliferation of breast and gynecologic cancer cells. Our analysis indicates that, although most lncRNAs are dysregulated in a tumor-specific manner, some, including OIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergistically dysregulate cancer pathways in multiple tumor contexts. Chiu et al. present a pan-cancer analysis of lncRNA regulatory interactions. They suggest that the dysregulation of hundreds of lncRNAs target and alter the expression of cancer genes and pathways in each tumor context. This implies that hundreds of lncRNAs can alter tumor phenotypes in each tumor context
lncRNA Epigenetic Landscape Analysis Identifies EPIC1 as an Oncogenic lncRNA that Interacts with MYC and Promotes Cell-Cycle Progression in Cancer
We characterized the epigenetic landscape of genes encoding long noncoding RNAs (lncRNAs) across 6,475 tumors and 455 cancer cell lines. In stark contrast to the CpG island hypermethylation phenotype in cancer, we observed a recurrent hypomethylation of 1,006 lncRNA genes in cancer, including EPIC1 (epigenetically-induced lncRNA1). Overexpression of EPIC1 is associated with poor prognosis in luminal B breast cancer patients and enhances tumor growth in vitro and in vivo. Mechanistically, EPIC1 promotes cell-cycle progression by interacting with MYC through EPIC1's 129\u2013283 nt region. EPIC1 knockdown reduces the occupancy of MYC to its target genes (e.g., CDKN1A, CCNA2, CDC20, and CDC45). MYC depletion abolishes EPIC1's regulation of MYC target and luminal breast cancer tumorigenesis in vitro and in vivo. Wang et al. characterize the epigenetic landscape of lncRNAs genes across a large number of human tumors and cancer cell lines and observe recurrent hypomethylation of lncRNA genes, including EPIC1. EPIC1 RNA promotes cell-cycle progression by interacting with MYC and enhancing its binding to target genes