6 research outputs found
Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished)
PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants
precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions
The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants
Tuberculosis: integrated studies for a complex disease 2050
Tuberculosis (TB) has been a disease for centuries with various challenges [1]. Like
other places where challenges and opportunities come together, TB challenges were
the inspiration for the scientific community to mobilize different groups for the
purpose of interest. For example, with the emergence of drug resistance, there has
been a huge volume of research on the discovery of new medicines and drug
delivery methods and the repurposing of old drugs [2, 3]. Moreover, to enhance the
capacity to detect TB cases, studies have sought diagnostics and biomarkers, with
much hope recently expressed in the direction of point-of-care tests [4].
Despite all such efforts as being highlighted in 50 Chapters of this volume, we
are still writing about TB and thinking about how to fight this old disease–implying
that the problem of TB might be complex, so calling the need for an integrated
science to deal with multiple dimensions in a simultaneous and effective manner.
We are not the first one; there have been proposed integrated platform for TB
research, integrated prevention services, integrated models for drug screening,
integrated imaging protocol, integrated understanding of the disease pathogenesis,
integrated control models, integrated mapping of the genome of the pathogen, etc.
[5–12], to name some.
These integrated jobs date back decades ago. So, a question arises: why is there a
disease named TB yet? It might be due to the fact that this integration has happened
to a scale that is not global, and so TB remains to be a problem, especially in
resource-limited settings.
Hope Tuberculosis: Integrated Studies for a Complex Disease helps to globalize
the integrated science of TB.info:eu-repo/semantics/publishedVersio