Search CORE

12 research outputs found

Haplotype-aware Diplotyping from Noisy Long Reads

Author: Ebler J.
Haukness M.
Marschall T.
Paten B.
Pesout T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Semi-automated assembly of high-quality diploid human reference genomes

Author: Asri M.
Carnevali P.
Chaisson M.J.P.
Cheng H.
Chin C.S.
Cody S.
Collins J.
Ebert P.
Eichler E.E.
Escalona M.
Fedrigo O.
Formenti G.
Fulton L.L.
Fulton R.S.
Garg S.
Garrison E.
Gerton J.L.
Ghurye J.
Granat A.
Green R.E.
Guarracino A.
Hall I.
Harvey W.
Hasenfeld P.
Hastie A.
Haukness M.
Haussler D.
Howe K.
Jaeger E.B.
Jain M.
Jarvis E.D.
Kirsche M.
Kolmogorov M.
Korbel J.O.
Koren S.
Korlach J.
Lee J.
Li D.
Li H.
Lindsay T.
Logsdon G.A.
Lucas J.
Luo F.
Marschall T.
McDaniel J.
Miga K.H.
Mitchell M.W.
Nie F.
Olsen H.E.
Olson N.D.
Paten B.
Pesout T.
Phillippy A.M.
Porubsky D.
Potapova T.
Puiu D.
Regier A.
Rhie A.
Ruan J.
Salzberg S.L.
Sanders A.D.
Schatz M.C.
Schmitt A.
Schneider V.A.
Selvaraj S.
Shafin K.
Shumate A.
Stitziel N.O.
Stober C.
Thibaud-Nissen F.
Torrance J.
Tracey A.
Vollger M.R.
Wagner J.
Wang J.
Wang T.
Wenger A.
Wood J.
Xiao C.
Yang C.
Zhang G.
Zimin A.V.
Zook J.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/10/2022
Field of study

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements

MDC Repository

A draft human pangenome reference

Author: Abel H.J.
Abou Tayoun A.N.
Antonacci-Fulton L.L.
Asri M.
Baid G.
Baker C.A.
Belyaeva A.
Billis K.
Bourque G.
Buonaiuto S.
Carroll A.
Chaisson M.J P
Chang P.C.
Chang X.H.
Cheng H.
Chu J.
Cody S.
Colonna V.
Cook D.E.
Cook-Deegan R.M.
Cornejo O.E.
Diekhans M.
Doerr D.
Ebert P.
Ebler J.
Eichler E.E.
Eizenga J.M.
Fairley S.
Fedrigo O.
Felsenfeld A.L.
Feng X.
Fischer C.
Flicek Paul
Formenti G.
Frankish A.
Fulton R.S.
Gao Yan
Garg S.
Garrison E.
Garrison N.A.
Giron C.G.
Green R.E.
Groza C.
Guarracino A.
Haggerty L.
Hall I.M.
Harvey W.T.
Haukness M.
Haussler D.
Heumos S.
Hickey G.
Hoekzema K.
Hourlier T.
Howe K.
Jain M.
Jarvis E.D.
Ji H.P.
Kenny E.E.
Koenig B.A.
Kolesnikov A.
Korbel J.O.
Kordosky J.
Koren S.
Lee H.J.
Lewis A.P.
Li H.
Liao W.W.
Lu S.
Lu T.Y.
Lucas J.K.
Magalhães H.
Marco-Sola S.
Marijon P.
Markello C.
Marschall T.
Martin F.J.
McCartney A.
McDaniel J.
Miga K.H.
Mitchell M.W.
Monlong J.
Mountcastle J.
Munson K.M.
Mwaniki M.N.
Nattestad M.
Novak A.M.
Nurk S.
Olsen H.E.
Olson N.D.
Paten B.
Pesout T.
Phillippy A.M.
Popejoy A.B.
Porubsky D.
Prins P.
Puiu D.
Rautiainen M.
Regier A.A.
Rhie A.
Sacco S.
Sanders A.D.
Schneider V.A.
Schultz B.I.
Shafin K.
Sibbesen J.A.
Sirén J.
Smith M.W.
Sofia H.J.
Thibaud-Nissen F.
Tomlinson C.
Tricomi F.F.
Villani F.
Vollger M.R.
Wagner J.
Walenz B.
Wang T.
Wood J.M.D.
Zimin A.V.
Zook J.M.
Publication venue: Nature Publishing Group
Publication date: 11/05/2023
Field of study

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

MDC Repository

Recommended from our members

Nanopore sequencing and the {Shasta} toolkit enable efficient de novo assembly of eleven human genomes

Author: Akeson M.
Armstrong J.
Bosworth C.
Carnevali P.
Costa V.
Eichler E.
Garrison E.
Green R.
Haukness M.
Haussler D.
Jain M.
Kilburn D.
Koren S.
Liu K.
Lorig-Roach R.
Marschall T.
Maurer N.
Mayes S.
Miga K.
Monlong J.
Munson K.
Olsen H.
Paten B.
Pesout T.
Phillippy A.
Salama S.
Sedlazeck F.
Shafin K.
Sorensen M.
Tigyi K.
Vollger M.
Zook J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed

eScholarship - University of California

MPG.PuRe

Towards complete and error-free genome assemblies of all vertebrate species

Author: Al-Ajli F. O.
Balakrishnan C. N.
Biegler M. T.
Bista I.
Burt D.
Cantin L. J.
Chaisson M.
Chow W.
Clark K.
Clawson H.
Clayton D. F.
Collins J.
Crawford A. J.
Dagnew R. E.
Damas J.
Detrich H. W.
Di Palma F.
Diekhans M.
Digby A.
Dunn C.
Durbin R.
Eason D.
Edwards T.
Fedrigo O.
Flicek P.
Formenti G.
Franchini P.
Friedrich S. R.
Fungtammasan A.
Garrison E.
Gedman G. L.
George J. M.
Ghurye J.
Gilbert M. T. P.
Graves J. M.
Green R. E.
Grutzner F.
Guan D.
Gut I.
Haase B.
Haggerty L.
Hall R.
Hannigan B. T.
Harris R. S.
Hastie A.
Haussler D.
Hiller M.
Hoffman J.
Houck M.
Howard J.
Howe K.
Howe K.
Iorns D.
Jarvis E. D.
Johnson W. E.
Kautt A. F.
Kim H.
Kim J.
Kingan S. B.
Ko B. J.
Koepfli K. -P.
Koren S.
Korlach J.
Kraus R. H. S.
Kronenberg Z.
Kwak W.
Lama T. M.
Lee C.
Lee J.
Lewin H. A.
London S. E.
Lovell P. V.
Makova K. D.
Malinsky M.
Marques-Bonet T.
Martin F.
Masterson P.
McCarthy S. A.
Medvedev P.
Mello C. V.
Meyer A.
Misuraca A.
Mooney M.
Mountcastle J.
Murphy R. W.
Myers E. W.
Nassar L.
Naylor G. J. P.
Ning Z.
O'Brien S. J.
Osipova E.
Paez S.
Paten B.
Pelan S.
Pesout T.
Phillippy A. M.
Pippel M.
Putnam N. H.
Rhie A.
Robertson B.
Ryder O. A.
Secomandi S.
Selvaraj S.
Shapiro B.
Simbirsky M.
Sims Y.
Smith M.
Sovic I.
Svardal H.
Teeling E. C.
Theofanopoulou C.
Thibaud-Nissen F.
Torrance J.
Tracey A.
Turner G.
Uliano-Silva M.
Venkatesh B.
Vernes S. C.
Wagner M.
Walenz B. P.
Warnow T.
Warren W. C.
Wilkinson M.
Winkler S.
Wood J.
Zhang G.
Zhou Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences

Archivio della ricerca- Università di Roma La Sapienza

Towards complete and error-free genome assemblies of all vertebrate species

Author: Al-Ajli F.
Balakrishnan C.
Biegler M.
Bista I.
Burt D.
Cantin L.
Chaisson M.
Chow W.
Clark K.
Clawson H.
Clayton D.
Collins J.
Crawford A.
Dagnew R.
Damas J.
Detrich H.
Di Palma F.
Diekhans M.
Digby A.
Dunn C.
Durbin R.
Eason D.
Edwards T.
Fedrigo O.
Flicek P.
Formenti G.
Franchini P.
Friedrich S.
Fungtammasan A.
Garrison E.
Gedman G.
George J.
Ghurye J.
Gilbert M.
Graves J.
Green R.
Grutzner F.
Guan D.
Gut I.
Haase B.
Haggerty L.
Hall R.
Hannigan B.
Harris R.
Hastie A.
Haussler D.
Hiller M.
Hoffman J.
Houck M.
Howard J.
Howe K.
Howe K.
Iorns D.
Jarvis E.
Johnson W.
Kautt A.
Kim H.
Kim J.
Kingan S.
Ko B.
Koepfli K.
Koren S.
Korlach J.
Kraus R.
Kronenberg Z.
Kwak W.
Lama T.
Lee C.
Lee J.
Lewin H.
London S.
Lovell V P.
Makova K.
Malinsky M.
Marques-Bonet T.
Martin F.
Masterson P.
McCarthy S.
Medvedev P.
Mello V C.
Meyer A.
Misuraca A.
Mooney M.
Mountcastle J.
Murphy R.
Myers E.
Nassar L.
Naylor G.
Ning Z.
O'Brien S.
Osipova E.
Paez S.
Paten B.
Pelan S.
Pesout T.
Phillippy A.
Pippel M.
Putnam N.
Rhie A.
Robertson B.
Ryder O.
Secomandi S.
Selvaraj S.
Shapiro B.
Simbirsky M.
Sims Y.
Smith M.
Sovi I.
Svardal H.
Teeling E.
Theofanopoulou C.
Thibaud-Nissen F.
Torrance J.
Tracey A.
Turner G.
Uliano-Silva M.
Venkatesh B.
Vernes S.
Wagner M.
Walenz B.
Warnow T.
Warren W.
Wilkinson M.
Winkler S.
Wood J.
Zhang G.
Zhou Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights. High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species(1-4). To address this issue, the international Genome 10K (G10K) consortium(5,6) has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences

MPG.PuRe

precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions

Author: Ahsan Mian Umair
Arslan Elif
Baid Gunjan
Boja Emily
Bourgey Mathieu
Bourque Guillaume
Brown Richard
Brueffer Christian
Budak Gungor
Carroll Andrew
Catreux Severine
Chang Pi-Chuan
Chen Luoqi
Demirkaya-Budak Sinem
Dolgoborodov Alexey
DU YuanPing
Eveleigh Robert
Fang Li Tai
Feng Hanying
Flores Carlos
Goel Sidharth
Hung Calvin
Jain Amit
Jain Chirag
Jain Miten
Jain Varun
Johanson Elaine
Johnson Ivan J.
Jáspez David
Kabakci-Zorlu Duygu
Kalay Özem
Kolesnikov Alexey
Kyriakidis Konstantinos
Lajoie Bryan
Li Gen
Li Zhipan
Liu Qian
Lorenzo-Salazar José M.
MA ChouXian
Maier Ezekiel J.
Malousi Andigoni
McDaniel Jennifer
Mehio Rami
Mohiyuddin Marghoob
Morata Jordi
Muñoz-Barrera Adrián
Narcı Kübra
Nattestad Maria
Olson Nathan D.
Parra Genís
Paten Benedict
Pesout Trevor
Prasanna Anish G.
Roddey Cooper
Rubio-Rodríguez Luis A.
Ruehle Mike
Sahraeian Sayed Mohammad Ebrahim
Sedlazeck Fritz J.
Semenyuk Vladimir
Serang Omar
Shafin Kishwar
Stephens Sarah H.
Tang LinQi
Tetikol H. Serhat
Tonda Raúl
Trotta Jean-Rémi
Turgut Deniz
Wagner Justin
Wang Kai
Westreich Samuel T.
Yang Howard
Zhang ShaoWei
Zook Justin M
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 15/11/2020
Field of study

The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants

Lund University Publications

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions

Author: Ahsan Mian Umair
Arslan Elif
Baid Gunjan
Boja Emily
Bourgey Mathieu
Bourque Guillaume
Brown Richard
Brueffer Christian
Budak Gungor
Carroll Andrew
Catreux Severine
Chang Pi-Chuan
Chen Luoqi
Demirkaya-Budak Sinem
Dolgoborodov Alexey
DU YuanPing
Eveleigh Robert
Fang Li Tai
Feng Hanying
Flores Carlos
Goel Sidharth
Hung Calvin
Jain Amit
Jain Chirag
Jain Miten
Jain Varun
Johanson Elaine
Johnson Ivan J.
Jáspez David
Kabakci-Zorlu Duygu
Kalay Özem
Kolesnikov Alexey
Kyriakidis Konstantinos
Lajoie Bryan
Li Gen
Li Zhipan
Liu Qian
Lorenzo-Salazar José M.
MA ChouXian
Maier Ezekiel J.
Malousi Andigoni
McDaniel Jennifer
Mehio Rami
Mohiyuddin Marghoob
Morata Jordi
Muñoz-Barrera Adrián
Narcı Kübra
Nattestad Maria
Olson Nathan D.
Parra Genís
Paten Benedict
Pesout Trevor
Prasanna Anish G.
Roddey Cooper
Rubio-Rodríguez Luis A.
Ruehle Mike
Sahraeian Sayed Mohammad Ebrahim
Sedlazeck Fritz J.
Semenyuk Vladimir
Serang Omar
Shafin Kishwar
Stephens Sarah H.
Tang LinQi
Tetikol H. Serhat
Tonda Raúl
Trotta Jean-Rémi
Turgut Deniz
Wagner Justin
Wang Kai
Westreich Samuel T.
Yang Howard
Zhang ShaoWei
Zook Justin M
Publication venue: 'Elsevier BV'
Publication date: 27/04/2022
Field of study

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants

Lund University Publications

PubMed Central

Recommended from our members

Towards complete and error-free genome assemblies of all vertebrate species.

Author: Al-Ajli Farooq O
Balakrishnan Christopher N
Biegler Matthew T
Bista Iliana
Burt Dave
Cantin Lindsey J
Chaisson Mark
Chow William
Clark Karen
Clawson Hiram
Clayton David F
Collins Joanna
Dagnew Robel E
Damas Joana
Detrich H William
Digby Andrew
Dunn Christopher
Eason Daryl
Edwards Taylor
Fedrigo Olivier
Flicek Paul
Formenti Giulio
Franchini Paolo
Friedrich Samantha R
Fungtammasan Arkarachai
Garrison Erik
Gedman Gregory L
George Julia M
Ghurye Jay
Green Richard E
Grutzner Frank
Guan Dengfeng
Gut Ivo
Haase Bettina
Haggerty Leanne
Hall Richard
Hannigan Brett T
Harris Robert S
Hastie Alex
Hiller Michael
Hoffman Jinna
Houck Marlys
Howard Jason
Howe Kevin
Iorns David
Kautt Andreas F
Kim Heebal
Kim Juwan
Kingan Sarah B
Ko Byung June
Koren Sergey
Kronenberg Zev
Kwak Woori
Lama Tanya M
Lee Chul
Lee Joyce
London Sarah E
Lovell Peter V
Makova Kateryna D
Malinsky Milan
Martin Fergal
Masterson Patrick
McCarthy Shane A
Medvedev Paul
Mello Claudio V
Meyer Axel
Misuraca Ann
Mooney Mark
Mountcastle Jacquelyn
Naylor Gavin JP
Ning Zemin
Osipova Ekaterina
Paez Sadye
Pelan Sarah
Pesout Trevor
Pippel Martin
Putnam Nicholas H
Rhie Arang
Robertson Bruce
Secomandi Simona
Selvaraj Siddarth
Simbirsky Maria
Sims Ying
Smith Michelle
Sović Ivan
Svardal Hannes
Theofanopoulou Constantina
Thibaud-Nissen Francoise
Torrance James
Tracey Alan
Turner George
Uliano-Silva Marcela
Vernes Sonja C
Wagner Maximilian
Walenz Brian P
Warren Wesley C
Wilkinson Mark
Winkler Sylke
Wood Jonathan
Zhou Yang
Publication venue: eScholarship, University of California
Publication date: 01/04/2021
Field of study

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences

eScholarship - University of California

Recommended from our members

A draft human pangenome reference.

Author: Abel Haley J
Abou Tayoun Ahmad N
Antonacci-Fulton Lucinda L
Asri Mobin
Baid Gunjan
Baker Carl A
Belyaeva Anastasiya
Billis Konstantinos
Buonaiuto Silvia
Carroll Andrew
Chang Pi-Chuan
Chang Xian H
Cheng Haoyu
Chu Justin
Cody Sarah
Colonna Vincenza
Cook Daniel E
Cook-Deegan Robert M
Cornejo Omar E
Diekhans Mark
Doerr Daniel
Ebert Peter
Ebler Jana
Eizenga Jordan M
Fairley Susan
Fedrigo Olivier
Felsenfeld Adam L
Feng Xiaowen
Fischer Christian
Formenti Giulio
Frankish Adam
Fulton Robert S
Gao Yan
Garg Shilpa
Garrison Nanibaa' A
Giron Carlos Garcia
Green Richard E
Groza Cristian
Guarracino Andrea
Haggerty Leanne
Harvey William T
Haukness Marina
Heumos Simon
Hickey Glenn
Hoekzema Kendra
Hourlier Thibaut
Howe Kerstin
Jain Miten
Ji Hanlee P
Kenny Eimear E
Koenig Barbara A
Kolesnikov Alexey
Korbel Jan O
Kordosky Jennifer
Koren Sergey
Lee HoJoon
Lewis Alexandra P
Liao Wen-Wei
Lu Shuangjia
Lu Tsung-Yu
Lucas Julian K
Magalhães Hugo
Marco-Sola Santiago
Marijon Pierre
Markello Charles
Martin Fergal J
McCartney Ann
McDaniel Jennifer
Mitchell Matthew W
Monlong Jean
Mountcastle Jacquelyn
Munson Katherine M
Mwaniki Moses Njagi
Nattestad Maria
Novak Adam M
Nurk Sergey
Olsen Hugh E
Olson Nathan D
Pesout Trevor
Popejoy Alice B
Porubsky David
Prins Pjotr
Puiu Daniela
Rautiainen Mikko
Regier Allison A
Rhie Arang
Sacco Samuel
Sanders Ashley D
Schneider Valerie A
Schultz Baergen I
Shafin Kishwar
Sibbesen Jonas A
Sirén Jouni
Smith Michael W
Sofia Heidi J
Thibaud-Nissen Françoise
Tomlinson Chad
Tricomi Francesca Floriana
Villani Flavia
Vollger Mitchell R
Publication venue: eScholarship, University of California
Publication date: 01/05/2023
Field of study

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

eScholarship - University of California