122 research outputs found

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.

    Get PDF
    Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy

    Brain-based classification of youth with anxiety disorders: transdiagnostic examinations within the ENIGMA-Anxiety database using machine learning

    Get PDF
    Neuroanatomical findings on youth anxiety disorders are notoriously difficult to replicate, small in effect size and have limited clinical relevance. These concerns have prompted a paradigm shift toward highly powered (that is, big data) individual-level inferences, which are data driven, transdiagnostic and neurobiologically informed. Here we built and validated supervised neuroanatomical machine learning models for individual-level inferences, using a case–control design and the largest known neuroimaging database on youth anxiety disorders: the ENIGMA-Anxiety Consortium (N = 3,343; age = 10–25 years; global sites = 32). Modest, yet robust, brain-based classifications were achieved for specific anxiety disorders (panic disorder), but also transdiagnostically for all anxiety disorders when patients were subgrouped according to their sex, medication status and symptom severity (area under the receiver operating characteristic curve, 0.59–0.63). Classifications were driven by neuroanatomical features (cortical thickness, cortical surface area and subcortical volumes) in fronto-striato-limbic and temporoparietal regions. This benchmark study within a large, heterogeneous and multisite sample of youth with anxiety disorders reveals that only modest classification performances can be realistically achieved with machine learning using neuroanatomical data.NWORubicon 019.201SG.022Advanced Behavioural Research MethodsHealth and Well-bein

    Somatic Mutational Landscape of Splicing Factor Genes and Their Functional Consequences across 33 Cancer Types

    Get PDF
    Hotspot mutations in splicing factor genes have been recently reported at high frequency in hematological malignancies, suggesting the importance of RNA splicing in cancer. We analyzed whole-exome sequencing data across 33 tumor types in The Cancer Genome Atlas (TCGA), and we identified 119 splicing factor genes with significant non-silent mutation patterns, including mutation over-representation, recurrent loss of function (tumor suppressor-like), or hotspot mutation profile (oncogene-like). Furthermore, RNA sequencing analysis revealed altered splicing events associated with selected splicing factor mutations. In addition, we were able to identify common gene pathway profiles associated with the presence of these mutations. Our analysis suggests that somatic alteration of genes involved in the RNA-splicing process is common in cancer and may represent an underappreciated hallmark of tumorigenesis

    Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas

    Get PDF
    Although the MYC oncogene has been implicated in cancer, a systematic assessment of alterations of MYC, related transcription factors, and co-regulatory proteins, forming the proximal MYC network (PMN), across human cancers is lacking. Using computational approaches, we define genomic and proteomic features associated with MYC and the PMN across the 33 cancers of The Cancer Genome Atlas. Pan-cancer, 28% of all samples had at least one of the MYC paralogs amplified. In contrast, the MYC antagonists MGA and MNT were the most frequently mutated or deleted members, proposing a role as tumor suppressors. MYC alterations were mutually exclusive with PIK3CA, PTEN, APC, or BRAF alterations, suggesting that MYC is a distinct oncogenic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such as immune response and growth factor signaling; chromatin, translation, and DNA replication/repair were conserved pan-cancer. This analysis reveals insights into MYC biology and is a reference for biomarkers and therapeutics for cancers with alterations of MYC or the PMN. We present a computational study determining the frequency and extent of alterations of the MYC network across the 33 human cancers of TCGA. These data, together with MYC, positively correlated pathways as well as mutually exclusive cancer genes, will be a resource for understanding MYC-driven cancers and designing of therapeutics

    Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

    Get PDF
    The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Genomic and Functional Approaches to Understanding Cancer Aneuploidy

    Get PDF
    Aneuploidy, whole chromosome or chromosome arm imbalance, is a near-universal characteristic of human cancers. In 10,522 cancer genomes from The Cancer Genome Atlas, aneuploidy was correlated with TP53 mutation, somatic mutation rate, and expression of proliferation genes. Aneuploidy was anti-correlated with expression of immune signaling genes, due to decreased leukocyte infiltrates in high-aneuploidy samples. Chromosome arm-level alterations show cancer-specific patterns, including loss of chromosome arm 3p in squamous cancers. We applied genome engineering to delete 3p in lung cells, causing decreased proliferation rescued in part by chromosome 3 duplication. This study defines genomic and phenotypic correlates of cancer aneuploidy and provides an experimental approach to study chromosome arm aneuploidy. Analyzing >10,000 human cancers, Taylor et al. show that aneuploidy is correlated with somatic mutation rate, expression of proliferation genes, and decreased leukocyte infiltration. Loss of chromosome arm 3p is common in squamous cancers, but deletion of chromosome 3p reduces cell proliferation in vitro

    A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers

    Get PDF
    We analyzed molecular data on 2,579 tumors from The Cancer Genome Atlas (TCGA) of four gynecological types plus breast. Our aims were to identify shared and unique molecular features, clinically significant subtypes, and potential therapeutic targets. We found 61 somatic copy-number alterations (SCNAs) and 46 significantly mutated genes (SMGs). Eleven SCNAs and 11 SMGs had not been identified in previous TCGA studies of the individual tumor types. We found functionally significant estrogen receptor-regulated long non-coding RNAs (lncRNAs) and gene/lncRNA interaction networks. Pathway analysis identified subtypes with high leukocyte infiltration, raising potential implications for immunotherapy. Using 16 key molecular features, we identified five prognostic subtypes and developed a decision tree that classified patients into the subtypes based on just six features that are assessable in clinical laboratories. By performing molecular analyses of 2,579 TCGA gynecological (OV, UCEC, CESC, and UCS) and breast tumors, Berger et al. identify five prognostic subtypes using 16 key molecular features and propose a decision tree based on six clinically assessable features that classifies patients into the subtypes
    • …
    corecore