23 research outputs found

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individual

    Chromosome Xq23 Is Associated with Lower Atherogenic Lipid Concentrations and Favorable Cardiometabolic Indices

    Get PDF
    Autosomal genetic analyses of blood lipids have yielded key insights for coronary heart disease (CHD). However, X chromosome genetic variation is understudied for blood lipids in large sample sizes. We now analyze genetic and blood lipid data in a high-coverage whole X chromosome sequencing study of 65,322 multi-ancestry participants and perform replication among 456,893 European participants. Common alleles on chromosome Xq23 are strongly associated with reduced total cholesterol, LDL cholesterol, and triglycerides (min P = 8.5 × 10−72), with similar effects for males and females. Chromosome Xq23 lipid-lowering alleles are associated with reduced odds for CHD among 42,545 cases and 591,247 controls (P = 1.7 × 10−4), and reduced odds for diabetes mellitus type 2 among 54,095 cases and 573,885 controls (P = 1.4 × 10−5). Although we observe an association with increased BMI, waist-to-hip ratio adjusted for BMI is reduced, bioimpedance analyses indicate increased gluteofemoral fat, and abdominal MRI analyses indicate reduced visceral adiposity. Co-localization analyses strongly correlate increased CHRDL1 gene expression, particularly in adipose tissue, with reduced concentrations of blood lipids

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

    Characterization of missing human genome sequences and copy-number polymorphic insertions

    No full text
    The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 new insertion sequences corresponding to 720 genomic loci. We found that a substantial fraction of these sequences are either missing, fragmented or misassigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determined that 18-37% of these new insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identified new exons and conserved noncoding sequences not yet represented in the reference genome. We developed a method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays

    Clinical and molecular delineation of the 17q21.31 microdeletion syndrome.

    No full text
    BACKGROUND: The chromosome 17q21.31 microdeletion syndrome is a novel genomic disorder that has originally been identified using high resolution genome analyses in patients with unexplained mental retardation. AIM: We report the molecular and/or clinical characterisation of 22 individuals with the 17q21.31 microdeletion syndrome. RESULTS: We estimate the prevalence of the syndrome to be 1 in 16,000 and show that it is highly underdiagnosed. Extensive clinical examination reveals that developmental delay, hypotonia, facial dysmorphisms including a long face, a tubular or pear-shaped nose and a bulbous nasal tip, and a friendly/amiable behaviour are the most characteristic features. Other clinically important features include epilepsy, heart defects and kidney/urologic anomalies. Using high resolution oligonucleotide arrays we narrow the 17q21.31 critical region to a 424 kb genomic segment (chr17: 41046729-41470954, hg17) encompassing at least six genes, among which is the gene encoding microtubule associated protein tau (MAPT). Mutation screening of MAPT in 122 individuals with a phenotype suggestive of 17q21.31 deletion carriers, but who do not carry the recurrent deletion, failed to identify any disease associated variants. In five deletion carriers we identify a <500 bp rearrangement hotspot at the proximal breakpoint contained within an L2 LINE motif and show that in every case examined the parent originating the deletion carries a common 900 kb 17q21.31 inversion polymorphism, indicating that this inversion is a necessary factor for deletion to occur (p<10(-5)). CONCLUSION: Our data establish the 17q21.31 microdeletion syndrome as a clinically and molecularly well recognisable genomic disorder

    The Human Pangenome Project: a global resource to map genomic diversity

    No full text
    The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

    Human whole-exome genotype data for Alzheimer’s disease

    Get PDF
    The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer’s Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD &gt; 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.</p
    corecore