78 research outputs found

    Advancing Benchmarks for Genome Sequencing

    Get PDF
    Several recent benchmarking efforts provide reference datasets and samples to improve genome sequencing and calling of germline and somatic mutations

    Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

    Get PDF
    Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method

    PEPR: pipelines for evaluating prokaryotic references

    Get PDF

    Scientific access into Mercer Subglacial Lake: scientific objectives, drilling operations and initial observations

    Get PDF
    © The Author(s), 2021. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Priscu, J. C., Kalin, J., Winans, J., Campbell, T., Siegfried, M. R., Skidmore, M., Dore, J. E., Leventer, A., Harwood, D. M., Duling, D., Zook, R., Burnett, J., Gibson, D., Krula, E., Mironov, A., McManis, J., Roberts, G., Rosenheim, B. E., Christner, B. C., Kasic, K., Fricker, H. A., Lyons, W. B., Barker, J., Bowling, M., Collins, B., Davis, C., Gagnon, A., Gardner, C., Gustafson, C., Kim, O-S., Li, W., Michaud, A., Patterson, M. O., Tranter, M., Ryan Venturelli, R., Trista Vick-Majors, T., & Elsworth, C. Scientific access into Mercer Subglacial Lake: scientific objectives, drilling operations and initial observations. Annals of Glaciology, 62(85–86), (2021): 340–352, https://doi.org/10.1017/aog.2021.10.The Subglacial Antarctic Lakes Scientific Access (SALSA) Project accessed Mercer Subglacial Lake using environmentally clean hot-water drilling to examine interactions among ice, water, sediment, rock, microbes and carbon reservoirs within the lake water column and underlying sediments. A ~0.4 m diameter borehole was melted through 1087 m of ice and maintained over ~10 days, allowing observation of ice properties and collection of water and sediment with various tools. Over this period, SALSA collected: 60 L of lake water and 10 L of deep borehole water; microbes >0.2 μm in diameter from in situ filtration of ~100 L of lake water; 10 multicores 0.32–0.49 m long; 1.0 and 1.76 m long gravity cores; three conductivity–temperature–depth profiles of borehole and lake water; five discrete depth current meter measurements in the lake and images of ice, the lake water–ice interface and lake sediments. Temperature and conductivity data showed the hydrodynamic character of water mixing between the borehole and lake after entry. Models simulating melting of the ~6 m thick basal accreted ice layer imply that debris fall-out through the ~15 m water column to the lake sediments from borehole melting had little effect on the stratigraphy of surficial sediment cores.This material is based upon work supported by the US National Science Foundation, Section for Antarctic Sciences, Antarctic Integrated System Science program as part of the interdisciplinary (Subglacial Antarctic Lakes Scientific Access (SALSA): Integrated study of carbon cycling in hydrologically-active subglacial environments) project (NSF-OPP 1543537, 1543396, 1543405, 1543453 and 1543441). Ok-Sun Kim was funded by the Korean Polar Research Institute. We are particularly thankful to the SALSA traverse personnel for crucial technical and logistical support. The United States Antarctic Program enabled our fieldwork; the New York Air National Guard and Kenn Borek Air provided air support; UNAVCO provided geodetic instrument support. Hot water drilling activities, including repair and upgrade modifications of the WISSARD hot water drill system, for the SALSA project were supported by a subaward from the Ice Drilling Program of Dartmouth College (NSF-PLR 1327315) to the University of Nebraska-Lincoln. J. Lawrence assisted with manuscript preparation. Finally, we are grateful to C. Dean, the SALSA Project Manager, and R. Ricards, SALSA Project Coordinator at McMurdo Station, for their organizational skills, and B. Huber of Lamont-Doherty Earth Observatory for providing the SBE39 PT sensors and the Nortek Aquadopp current meter and assisting with interpretation of the data. B. Huber also provided helpful input on programing and calibrating the SBE19PlusV2 6112 CTD

    A crowdsourced set of curated structural variants for the human genome.

    Get PDF
    Funder: U.S. Food and Drug Administration; funder-id: http://dx.doi.org/10.13039/100000038A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies

    svclassify: a method to establish benchmark structural variant calls

    Get PDF
    The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-

    Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

    Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

    Wing pathology of white-nose syndrome in bats suggests life-threatening disruption of physiology

    Get PDF
    White-nose syndrome (WNS) is causing unprecedented declines in several species of North American bats. The characteristic lesions of WNS are caused by the fungus Geomyces destructans, which erodes and replaces the living skin of bats while they hibernate. It is unknown how this infection kills the bats. We review here the unique physiological importance of wings to hibernating bats in relation to the damage caused by G. destructans and propose that mortality is caused by catastrophic disruption of wing-dependent physiological functions. Mechanisms of disease associated with G. destructans seem specific to hibernating bats and are most analogous to disease caused by chytrid fungus in amphibians
    • …
    corecore