10 research outputs found

    A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns.

    Get PDF
    In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA

    Knowledge representation in metabolic pathway databases

    No full text
    The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized

    Improving the description of metabolic networks: the TCA cycle as example

    No full text
    To collect the ever-increasing yet scattered knowledge on metabolism, multiple pathway databases like the Kyoto Encyclopedia of Genes and Genomes have been created. A complete and accurate description of the metabolic network for human and other organisms is essential to foster new biological discoveries. Previous research has shown, however, that the level of agreement among pathway databases is surprisingly low. We investigated whether the lack of consensus among databases can be explained by an inaccurate representation of the knowledge described in scientific literature. As an example, we focus on the well-known tricarboxylic acid (TCA) cycle and evaluated the description of this pathway as found in a comprehensive selection of 10 human metabolic pathway databases. Remarkably, none of the descriptions given by these databases is entirely correct. Moreover, consensus exists on only 3 reactions. Mistakes in pathway databases might lead to the propagation of incorrect knowledge, misinterpretation of high-throughput molecular data, and poorly designed follow-up experiments. We provide an improved description of the TCA cycle via the community-curated database WikiPathways. We review various initiatives that aim to improve the description of the human metabolic network and discuss the importance of the active involvement of biological experts in these

    Metrics for energy efficiency assessment in data centers and server rooms

    No full text
    The energy consumption of centralized IT-equipment and data centers has been a fast growing topic over the past few years. It is widely recognized that a number of effective measures exist that in many cases would allow energy savings of 20-60% or even higher. Following the motto 'You can't manage what you don't measure.' Energy efficiency metrics are typically used for benchmarking the energy consumption of single products or systems, hence covering equipment or facility-level. The major - not yet sufficiently resolved - challenge is the definition of the 'useful work' provided by a data center. Recently developed concepts to further address service related energy consumption will be discussed in the paper. Central criteria and options for further development will be addressed

    Pan-cancer analysis of whole genomes

    No full text
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation; analyses timings and patterns of tumour evolution; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity; and evaluates a range of more-specialized features of cancer genomes
    corecore