5 research outputs found
The Human Pangenome Project: a global resource to map genomic diversity
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine
Recommended from our members
A draft human pangenome reference.
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample
A draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.</p
Recommended from our members
Lessons learned from the eMERGE Network: balancing genomics in discovery and practice
The Electronic Medical Records and Genomics (eMERGE) Network, established in 2007, is a consortium of academic and integrated health systems conducting discovery and implementation research in translational genomics. Here, we outline the history of the network, highlight major impacts and lessons learned, and present the tools and resources developed for large-scale genomic analyses and translation into a clinical setting. The network developed methods to extract phenotypes from the electronic medical record to perform genome-wide and phenome-wide association studies. Recruited cohorts were clinically sequenced off a custom panel for targeted sequencing of variants and monogenic disease risks and returned to participants to investigate the impact of return of genomic results. After generating a 105,000 participant-imputed genome-wide association study (GWAS) dataset for discovery, the network enrolled and sequenced 24,998 participants. Integration of these results into the medical record and the effects of results on participants provided key lessons to the field. These learned lessons inform genetic research in diverse populations and provide insights into the clinical impact of return and implementation of genomic medicine using the electronic medical record. The lessons produced by the eMERGE Network can be utilized by other consortia as translational genomic medicine research evolves