34 research outputs found

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Predicting Extinction Risk for Data Deficient Bats

    No full text
    Conservation biology aims to identify species most at risk of extinction and to understand factors that forecast species vulnerability. The International Union for Conservation of Nature (IUCN) Red List is a leading source for extinction risk data of species globally, however, many potentially at risk species are not assessed by the IUCN owing to inadequate data. Of the approximately 1150 bat species (Chiroptera) recognized by the IUCN, 17 percent are categorized as Data Deficient. Here, we show that large trait databases in combination with a comprehensive phylogeny can identify which traits are important for assessing extinction risk in bats. Using phylogenetic logistic regressions, we show that geographic range and island endemism are the strongest correlates of binary extinction risk. We also show that simulations using two models that trade-off between data complexity and data coverage provide similar estimates of extinction risk for species that have received a Red List assessment. We then use our model parameters to provide quantitative predictions of extinction risk for 60 species that have not received risk assessments by the IUCN. Our model suggests that at least 20 bat species should be treated as threatened by extinction. In combination with expert knowledge, our results can be used as a quick, first-pass prioritization for conservation action

    Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats.

    No full text
    Research can be more transparent and collaborative by using Findable, Accessible, Interoperable, and Reusable (FAIR) principles to publish Earth and environmental science data. Reporting formats-instructions, templates, and tools for consistently formatting data within a discipline-can help make data more accessible and reusable. However, the immense diversity of data types across Earth science disciplines makes development and adoption challenging. Here, we describe 11 community reporting formats for a diverse set of Earth science (meta)data including cross-domain metadata (dataset metadata, location metadata, sample metadata), file-formatting guidelines (file-level metadata, CSV files, terrestrial model data archiving), and domain-specific reporting formats for some biological, geochemical, and hydrological data (amplicon abundance tables, leaf-level gas exchange, soil respiration, water and sediment chemistry, sensor-based hydrologic measurements). More broadly, we provide guidelines that communities can use to create new (meta)data formats that integrate with their scientific workflows. Such reporting formats have the potential to accelerate scientific discovery and predictions by making it easier for data contributors to provide (meta)data that are more interoperable and reusable
    corecore