8 research outputs found

    Interpretable classification of tumours through multiple instance learning and somatic mutations

    No full text
    Next generation sequencing is brought into the clinic. Screening of disease associated genes will aid the diagnosis of disorders with a genetic component. The diagnosis of cancer is of particular interest due to its variety and prevalence. The obtained mutations provide clues about underlying biological properties that could be used for classification. Classification of a tumour as a particular type of cancer is an important step towards treatment. But currently no method exists that can directly classify cancers using next generation sequencing derived mutations. We have developed a classification method in which tumours are modelled as bags of annotated somatic mutations. Our method uses a machine learning approach to identify and select the relevant mutations and subsequently train a classifier for each type of cancer. The selected mutations result in an interpretable model that sheds light onto which biological properties are important to separate one cancer type from the others. We compare the proposed method to two other approaches. First a gene based approach in which the mutations are reduced to a mutation count per gene. Second a distance approach that uses all the available mutations, but returns a model that is hard to interpret. We show that the proposed method performs equally well when compared to the first approach. Our method achieves performance close to the second approach, while it yields a model that allows for biological interpretation.M.Sc. Computer Science: BioinformaticsPattern Recognition & Bioinformatics GroupElectrical Engineering, Mathematics and Computer Scienc

    Principles of Reconstructing the Subclonal Architecture of Cancers

    No full text
    Most cancers evolve from a single founder cell through a series of clonal expansions that are driven by somatic mutations. These clonal expansions can lead to several coexisting subclones sharing subsets of mutations. Analysis of massively parallel sequencing data can infer a tumor's subclonal composition through the identification of populations of cells with shared mutations. We describe the principles that underlie subclonal reconstruction through single nucleotide variants (SNVs) or copy number alterations (CNAs) from bulk or single-cell sequencing. These principles include estimating the fraction of tumor cells for SNVs and CNAs, performing clustering of SNVs from single- and multisample cases, and single-cell sequencing. The application of subclonal reconstruction methods is providing key insights into tumor evolution, identifying subclonal driver mutations, patterns of parallel evolution and differences in mutational signatures between cellular populations, and characterizing the mechanisms of therapy resistance, spread, and metastasis

    Pervasive chromosomal instability and karyotype order in tumour evolution

    Get PDF
    Chromosomal instability in cancer consists of dynamic changes to the number and structure of chromosomes. The resulting diversity in somatic copy number alterations (SCNAs) may provide the variation necessary for tumour evolution. Here we use multi-sample phasing and SCNA analysis of 1,421 samples from 394 tumours across 22 tumour types to show that continuous chromosomal instability results in pervasive SCNA heterogeneity. Parallel evolutionary events, which cause disruption in the same genes (such as BCL9, MCL1, ARNT (also known as HIF1B), TERT and MYC) within separate subclones, were present in 37% of tumours. Most recurrent losses probably occurred before whole-genome doubling, that was found as a clonal event in 49% of tumours. However, loss of heterozygosity at the human leukocyte antigen (HLA) locus and loss of chromosome 8p to a single haploid copy recurred at substantial subclonal frequencies, even in tumours with whole-genome doubling, indicating ongoing karyotype remodelling. Focal amplifications that affected chromosomes 1q21 (which encompasses BCL9, MCL1 and ARNT), 5p15.33 (TERT), 11q13.3 (CCND1), 19q12 (CCNE1) and 8q24.1 (MYC) were frequently subclonal yet appeared to be clonal within single samples. Analysis of an independent series of 1,024 metastatic samples revealed that 13 focal SCNAs were enriched in metastatic samples, including gains in chromosome 8q24.1 (encompassing MYC) in clear cell renal cell carcinoma and chromosome 11q13.3 (encompassing CCND1) in HER2(+) breast cancer. Chromosomal instability may enable the continuous selection of SCNAs, which are established as ordered events that often occur in parallel, throughout tumour evolution

    Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition

    Get PDF
    About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.J.M.C.T. is supported by European Research Council (ERC) Starting Grant 716290 ‘SCUBA CANCERS’, Ramon y Cajal grant RYC-2014-14999 and Spanish Ministry of Economy, Industry and Competitiveness (MINECO) grant SAF2015-66368-P. B.R.-M., E.G.A., M.S.G. and S.Z. are supported by PhD fellowships from Xunta de Galicia (Spain) ED481A-2016/151, ED481A-2017/299, ED481A-2017/306 and ED481A-2018/199, respectively. F.S. was supported by ERC Starting Grant 757700 ‘HYPER-INSIGHT’, MINECO grant BFU2017-89833-P ‘RegioMut’, and further acknowledges institutional funding from the MINECO Severo Ochoa award and from the CERCA Programme of the Catalan Government. Y.S.J. was supported by Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number HI16C2387). A.L.B. is supported by MINECO PhD fellowship BES-2016-078166. M.T. was supported by MINECO grant SAF2015-73916-JIN. R.B. received funding through the National Institutes of Health (U24CA210978 and R01CA188228). M.G.B. received funding through MINECO, AEI, Xunta de Galicia and FEDER (BFU2013-41554-P, BFU2016-78121-P, ED431F 2016/019). N.B. is supported by a My First AIRC grant from the Associazione Italiana Ricerca sul Cancro (number 17658). J.D. is a postdoctoral fellow of the Research Foundation Flanders (FWO) and the European Union’s Horizon 2020 research and innovation program (Marie Skłodowska-Curie grant agreement number 703594-DECODE). K.C. and Z.C. are supported by NIH R01 CA172652 and U41 HG007497. Z.C. is supported by an American Heart Association Institutional Data Fellowship Award (17IF33890015). P.A.W.E. is supported by Cancer Research UK. E.A.L. is supported by K01AG051791. I.M. is supported by Cancer Research UK (C57387/A21777). S.M.W. received funding through a SNSF Early Postdoc Mobility fellowship (P2ELP3_155365) and an EMBO Long-Term Fellowship (ALTF 755-2014). J.W. received funding from the Danish Medical Research Council (DFF-4183-00233). J.O.K. is supported by an ERC Starting Grant. This work is supported by The Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202) and the Wellcome Trust (FC001202). H.H.K. is supported by grants from the National Institute of General Medical Sciences (P50GM107632 and 1R01GM099875). K.H.B. is supported by P50GM107632, R01CA163705 and R01GM124531. This work was supported by the TransTumVar project PN013600.. This work was supported by the Wellcome Trust grant 0980

    Pan-cancer analysis of whole genomes

    No full text
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation; analyses timings and patterns of tumour evolution; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity; and evaluates a range of more-specialized features of cancer genomes
    corecore