Search CORE

21 research outputs found

Recommended from our members

Accurate genome analysis with nanopore sequencing using deep neural networks.

Author: Shafin Kishwar
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

Nanopore sequencing, commercialized by Oxford Nanopore Technology (ONT), is a high-throughput genome sequencing platform. Unlike traditional sequencing-by-synthesis methods, nanopore sequencing uses measured current signals to sense the nucleotide sequence flowing through the pore. The signal-to-base conversion process introduces unique error patterns, making it challenging to design methods that rely on hand-crafted features. Deep learning uses multiple layers to progressively learn complex patterns in the input data, making it suitable for genome analysis. In this dissertation research, I present methods I developed based on deep neural networks to improve genome inference with nanopore sequencing. First, I introduce haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art results for nanopore long-reads. Next, I demonstrate a pipeline to perform de novo assembly of eleven human genomes in nine days. Then I show the application of the methods to validate and correct errors in the first complete human genome assembly. Finally, I demonstrate the utility of PEPPER-Margin-DeepVariant paired with highly multiplexed nanopore sequencing for rapidly identifying disease-causing variants

eScholarship - University of California

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

Author: Shafin Kishwar,
Publication venue
Publication date: 22/12/2020
Field of study

Ezid

Impact of heuristics in clustering large biological networks

Author: Adamcsek
Altaf-Ul-Amin
Altaf-Ul-Amin
Ashburner
Brohee
Brun
Chatr-aryamontri
Colak
Fredman
Frings
Georgii
Iffatur Ridwan
Jensen
Jiang
Kazi Lutful Kabir
Kishwar Shafin
Lord
M. Sohel Rahman
Md. Kishwar Shafin
Mewes
Mitra
Mohammad Mozammel Hoque
Nepusz
Palla
Peng
Rashid Saadman Karim
Song
Tasmiah Tamzid Anannya
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.

Author: Baid Gunjan
Carnevali Paolo
Carroll Andrew
Chang Pi-Chuan
Eizenga Jordan M
Goel Sidharth
Jain Miten
Kolesnikov Alexey
Kolmogorov Mikhail
Miga Karen H
Nattestad Maria
Paten Benedict
Pesout Trevor
Shafin Kishwar
Publication venue: eScholarship, University of California
Publication date: 01/11/2021
Field of study

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished)

PubMed Central

eScholarship - University of California

GIAB Benchmarking of HG002 Assemblies from HPRC Year 1 Bakeoff

Author: Collins Joanna
Ebert Peter
Formenti Giulio
Garg Shilpa
Harvey William
Hastie Alex
Haukness Marina
Kirsche Melanie
Kolmogorov Mikhail
Koren Sergey
Korlach Jonas
Li Daofeng
Lucas Julian
Luo Feng
Marschall Tobias
McDaniel Jennifer
Nie Fan
Olson Nathan D.
Regier Allison
Rhie Arang
Sanders Ashley D.
Schmitt Anthony
Shafin Kishwar
Shumate Alaina
Stober Catherine
Torrance James
Wang Jianxin
Wood Jonathan
Zimin Aleksey V.
Zook Justin M.
Publication venue: Clemson University Libraries
Publication date: 08/06/2022
Field of study

Clemson University: TigerPrints

A complete reference genome improves analysis of human genetic variation

Author: Aganezov Sergey
Avdeyev Pavel
Chin Chen-Shan
Dennis Megan Y
Hansen Nancy F
Kirsche Melanie
Koren Sergey
Layer Ryan
Lee Joyce
Martin Skylar
McCoy Rajiv C
McDaniel Jennifer
Meredith Melissa
Miga Karen H
Miller Danny E
Olson Nathan D
Paten Benedict
Phillippy Adam M
Rhie Arang
Rosenfeld Jeffrey A
Sauria Michael EG
Schatz Michael C
Sedlazeck Fritz J
Shafin Kishwar
Shumate Alaina
Soto Daniela C
Taylor Dylan J
Vollger Mitchell R
Wagner Justin
Xiao Chunlin
Yan Stephanie M
Zarate Samantha
Zook Justin M
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/04/2022
Field of study

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

eScholarship - University of California

Benchmarking challenging small variants with linked and long reads.

Author: Aganezov Sergey
Bansal Vikas
Barrio Alvaro
Byrska-Bishop Marta
Carroll Andrew
Chin Chen-Shan
Clarke Wayne
Ebert Peter
Evani Uday
Farek Jesse
Fiddes Ian
Fungtammasan Arkarachai
Hanlon Vincent
Harris Lindsay
Khan Ziad
Kirsche Melanie
Kovacevic Vladimir
Lansdorp Peter
Mahmoud Medhat
Markello Charles
Marschall Tobias
Mattsson Carl-Adam
Miller Neil
Narzisi Giuseppe
Ni Bohan
Olson Nathan
Rosenfeld Jeffrey
Rowell William
Salit Marc
Schatz Michael
Sedlazeck Fritz
Shafin Kishwar
Sidow Arend
Stankovic Ana
Wagner Justin
Wenger Aaron
Xiao Chunlin
Yoo Byunggil
Zarate Samantha
Zhou Xin
Zook Justin
Publication venue: eScholarship, University of California
Publication date: 01/05/2022
Field of study

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development

PubMed Central

eScholarship - University of California

Recommended from our members

A complete reference genome improves analysis of human genetic variation

Author: Aganezov Sergey
Avdeyev Pavel
Chin Chen-Shan
Dennis Megan Y
Hansen Nancy F
Kirsche Melanie
Koren Sergey
Layer Ryan
Lee Joyce
Martin Skylar
McCoy Rajiv C
McDaniel Jennifer
Meredith Melissa
Miga Karen H
Miller Danny E
Olson Nathan D
Paten Benedict
Phillippy Adam M
Rhie Arang
Rosenfeld Jeffrey A
Sauria Michael EG
Schatz Michael C
Sedlazeck Fritz J
Shafin Kishwar
Shumate Alaina
Soto Daniela C
Taylor Dylan J
Vollger Mitchell R
Wagner Justin
Xiao Chunlin
Yan Stephanie M
Zarate Samantha
Zook Justin M
Publication venue: eScholarship, University of California
Publication date: 01/04/2022
Field of study

eScholarship - University of California

Complete genomic and epigenetic maps of human centromeres

Author: Aganezov Sergey
Alexandrov Ivan A
Altemose Nicolas
Asri Mobin
Borchers Matthew
Brooks Shelise
Bzikadze Andrey V
Caldas Gina V
de Lima Leonardo Gomes
Dennis Megan Y
Dernburg Abby F
Diekhans Mark
Dvorkina Tatiana
Eichler Evan E
Gershman Ariel
Gerton Jennifer L
Gusev Fedor
Hartley Gabrielle A
Haukness Marina
Hoyt Savannah J
Karpen Gary H
Kerpedjiev Peter
Koren Sergey
Kunyavskaya Olga
Langley Charles H
Langley Sasha A
Logsdon Glennis A
Lorig-Roach Ryan
Lucas Julian K
McCartney Ann M
Miga Karen H
Mikheenko Alla
Nurk Sergey
O'Neill Rachel J
Olson Daniel
Paten Benedict
Pevzner Pavel A
Phillippy Adam M
Potapova Tamara
Rhie Arang
Rogaev Evgeny I
Ryabov Fedor D
Salama Sofie R
Sauria Michael EG
Schatz Michael C
Shafin Kishwar
Shepelev Valery A
Shew Colin J
Sidhwani Pragya
Straight Aaron F
Streets Aaron
Sullivan Beth A
Tigyi Kristof
Timp Winston
Uralsky Lev
Vollger Mitchell R
Wheeler Travis J
Young Alice
Zook Justin M
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/04/2022
Field of study

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

eScholarship - University of California