486 research outputs found

    Quantifying single nucleotide variant detection sensitivity in exome sequencing

    Get PDF
    BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed. CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits

    CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

    Full text link
    Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .Comment: Published in AAAI 201

    DHODH modulates transcriptional elongation in the neural crest and melanoma

    Get PDF
    Melanoma is a tumour of transformed melanocytes, which are originally derived from the embryonic neural crest. It is unknown to what extent the programs that regulate neural crest development interact with mutations in the BRAF oncogene, which is the most commonly mutated gene in human melanoma1. We have used zebrafish embryos to identify the initiating transcriptional events that occur on activation of human BRAF(V600E) (which encodes an amino acid substitution mutant of BRAF) in the neural crest lineage. Zebrafish embryos that are transgenic for mitfa:BRAF(V600E) and lack p53 (also known as tp53) have a gene signature that is enriched for markers of multipotent neural crest cells, and neural crest progenitors from these embryos fail to terminally differentiate. To determine whether these early transcriptional events are important for melanoma pathogenesis, we performed a chemical genetic screen to identify small-molecule suppressors of the neural crest lineage, which were then tested for their effects on melanoma. One class of compound, inhibitors of dihydroorotate dehydrogenase (DHODH), for example leflunomide, led to an almost complete abrogation of neural crest development in zebrafish and to a reduction in the self-renewal of mammalian neural crest stem cells. Leflunomide exerts these effects by inhibiting the transcriptional elongation of genes that are required for neural crest development and melanoma growth. When used alone or in combination with a specific inhibitor of the BRAF(V600E) oncogene, DHODH inhibition led to a marked decrease in melanoma growth both in vitro and in mouse xenograft studies. Taken together, these studies highlight developmental pathways in neural crest cells that have a direct bearing on melanoma formation

    Disease Burden of Clostridium difficile Infections in Adults, Hong Kong, China, 2006-2014

    Get PDF
    Cross-sectional studies suggest an increasing trend in incidence and relatively low recurrence rates of Clostridium difficile infections in Asia than in Europe and North America. The temporal trend of C. difficile infection in Asia is not completely understood. We conducted a territory-wide population-based observational study to investigate the burden and clinical outcomes in Hong Kong, China, over a 9-year period. A total of 15,753 cases were identified, including 14,402 (91.4%) healthcare-associated cases and 817 (5.1%) community-associated cases. After adjustment for diagnostic test, we found that incidence increased from 15.41 cases/100,000 persons in 2006 to 36.31 cases/100,000 persons in 2014, an annual increase of 26%. This increase was associated with elderly patients, for whom incidence increased 3-fold over the period. Recurrence at 60 days increased from 5.7% in 2006 to 9.1% in 2014 (p<0.001). Our data suggest the need for further surveillance, especially in Asia, which contains ≈60% of the world’s population

    Emergent global patterns of ecosystem structure and function from a mechanistic general ecosystem model

    Get PDF
    Anthropogenic activities are causing widespread degradation of ecosystems worldwide, threatening the ecosystem services upon which all human life depends. Improved understanding of this degradation is urgently needed to improve avoidance and mitigation measures. One tool to assist these efforts is predictive models of ecosystem structure and function that are mechanistic: based on fundamental ecological principles. Here we present the first mechanistic General Ecosystem Model (GEM) of ecosystem structure and function that is both global and applies in all terrestrial and marine environments. Functional forms and parameter values were derived from the theoretical and empirical literature where possible. Simulations of the fate of all organisms with body masses between 10 ”g and 150,000 kg (a range of 14 orders of magnitude) across the globe led to emergent properties at individual (e.g., growth rate), community (e.g., biomass turnover rates), ecosystem (e.g., trophic pyramids), and macroecological scales (e.g., global patterns of trophic structure) that are in general agreement with current data and theory. These properties emerged from our encoding of the biology of, and interactions among, individual organisms without any direct constraints on the properties themselves. Our results indicate that ecologists have gathered sufficient information to begin to build realistic, global, and mechanistic models of ecosystems, capable of predicting a diverse range of ecosystem properties and their response to human pressures

    A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics

    Get PDF
    Background: Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives. Methods: Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes. Results: We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants. Conclusions: Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge

    New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.

    Get PDF
    Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes
    • 

    corecore