23 research outputs found

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Validity of the Stryd Power Meter to Model Metabolic Power of Transport

    No full text

    Are estimates of resting metabolism based on body composition results accurate?

    No full text

    ARC-MOF: A Diverse Database of Metal-Organic Frameworks with DFT-Derived Partial Atomic Charges and Descriptors for Machine Learning

    No full text
    Metal-organic frameworks (MOFs) are a class of crystalline materials composed of metal nodes or clusters connected via semi-rigid organic linkers. Owing to their high surface area, porosity, and tunability, MOFs have received significant attention for numerous applications such as gas separation and storage. Atomistic simulations and data-driven methods (e.g., machine learning) have been successfully employed to screen large databases and successfully develop new experimentally synthesized and validated MOFs for CO2 capture. To enable data-driven materials discovery for any application, the first (and arguably most crucial) step is database curation. This work introduces the ab initio REPEAT charge MOF (ARC-MOF) database. This is a database of ~280,000 MOFs which have been either experimentally characterized or computationally generated, spanning all publicly available MOF databases. A key feature of ARC-MOF is that it contains DFT-derived electrostatic potential fitted partial atomic charges for each MOF. Additionally, ARC-MOF contains pre-computed descriptors for out-of-the-box machine learning applications. An in-depth analysis of the diversity of ARC-MOF with respect to the currently mapped design space of MOFs was performed – a critical, yet commonly overlooked aspect of previously reported MOF databases. Using this analysis, balanced subsets from ARC-MOF for various machine learning purposes have been identified. Other chemical and geometric diversity analyses are presented, with an analysis on the effect of charge assignment method on atomistic simulation of gas uptake in MOFs
    corecore