53 research outputs found

    Combined burden and functional impact tests for cancer driver discovery using DriverPower

    Get PDF
    The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower's background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery

    Combined burden and functional impact tests for cancer driver discovery using DriverPower

    Full text link
    The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery

    Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Get PDF
    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Pan-Cancer Analysis of Non-Coding Driver Mutations

    No full text
    Cancers are caused by genomic alterations known as drivers. As drivers have broad applications in precision oncology, their discovery has become one of the central motivations for cancer genomics. At present, the majority of drivers have been found in the ~2% protein-coding regions. Despite an intensive search for non-coding cancer drivers, however, only a few have been discovered to date. Here I describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify drivers within cancer whole genomes. Using 1,373 genomic features, DriverPower's background model explains up to 93% of the regional variance in mutation rates across multiple tumour types. By incorporating functional impact scores, I further increase the accuracy of driver discovery. Comparing to six published methods, DriverPower has the highest F1-score for both coding and non-coding driver discovery. Applied to 2,583 cancer genomes from public sources, DriverPower identifies 217 coding and 95 non-coding driver candidates in well-defined genomic regions, including novel candidates like the SGK1 splice site, GPR126 enhancer and ALB promoter. To test whether the surprisingly low number of non-coding drivers is related to missing drivers in poorly-defined genomic regions, I investigate non-coding spliceosomal RNAs since protein-coding splicing factors are frequently mutated in cancer. Indeed, I found a highly recurrent A>C somatic mutation at the third base of U1 spliceosomal RNA across several tumour types. This mutation changes the preferential A-U base-pairing between U1 and 5′ splice site to C-G base-pairing, thereby creating novel splice junctions and altering the splice pattern of multiple genes, including known cancer drivers. Clinically, the A>C mutation is associated with alcohol abuse in hepatocellular carcinoma and the aggressive subtype of chronic lymphocytic leukaemia (CLL). The mutation also confers an adverse prognosis to CLL patients independently. This finding demonstrates the first non-coding driver in spliceosomal RNAs, reveals a novel mechanism of aberrant splicing in cancer and may represent a new target for treatment. Together, my research indicates that non-coding mutations play crucial roles in cancer, and future studies should focus on completing the cancer driver catalog and using it for precision oncology.Ph.D

    train_feature.hdf5.part1

    No full text
    Part 1/3 of training genomic feature

    test_feature.hdf5

    No full text
    Genomic features for test elements (promoter, enhancer, CDS, UTRs
    corecore