21 research outputs found

    Nine out of ten samples were mistakenly switched by The Orang-utan Genome Consortium

    Get PDF
    The Sumatran orang-utan (Pongo abelii) reference genome was first published in 2011, in conjunction with ten re-sequenced genomes from unrelated wild-caught individuals. Together, these published data have been utilized in almost all great ape genomic studies, plus in much broader comparative genomic research. Here, we report that the original sequencing Consortium inadvertently switched nine of the ten samples and/or resulting re-sequenced genomes, erroneously attributing eight of these to the wrong source individuals. Among them is a genome from the recently identified Tapanuli (P. tapanuliensis) species: thus, this genome was sequenced and published a full six years prior to the species\u27 description. Sex was wrongly assigned to five known individuals; the numbers in one sample identifier were swapped; and the identifier for another sample most closely resembles that of a sample from another individual entirely. These errors have been reproduced in countless subsequent manuscripts, with noted implications for studies reliant on data from known individuals

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individual

    From karyotypes to precision genomics in 9p deletion and duplication syndromes

    Get PDF
    While 9p deletion and duplication syndromes have been studied for several years, small sample sizes and minimal high-resolution data have limited a comprehensive delineation of genotypic and phenotypic characteristics. In this study, we examined genetic data from 719 individuals in the worldwide 9p Network Cohort: a cohort seven to nine times larger than any previous study of 9p. Most breakpoints occur in bands 9p22 and 9p24, accounting for 35% and 38% of all breakpoints, respectively. Bands 9p11 and 9p12 have the fewest breakpoints, with each accounting for 0.6% of all breakpoints. The most common phenotype in 9p deletion and duplication syndromes is developmental delay, and we identified eight known neurodevelopmental disorder genes in 9p22 and 9p24. Since it has been previously reported that some individuals have a secondary structural variant related to the 9p variant, we examined our cohort for these variants and found 97 events. The top secondary variant involved 9q in 14 individuals (1.9%), including ring chromosomes and inversions. We identified a gender bias with significant enrichment for females (p = 0.0006) that may arise from a sex reversal in some individuals with 9p deletions. Genes on 9p were characterized regarding function, constraint metrics, and protein-protein interactions, resulting in a prioritized set of genes for further study. Finally, we achieved precision genomics in one child with a complex 9p structural variation using modern genomic technologies, demonstrating that long-read sequencing will be integral for some cases. Our study is the largest ever on 9p-related syndromes and provides key insights into genetic factors involved in these syndromes

    Chromosome Xq23 Is Associated with Lower Atherogenic Lipid Concentrations and Favorable Cardiometabolic Indices

    Get PDF
    Autosomal genetic analyses of blood lipids have yielded key insights for coronary heart disease (CHD). However, X chromosome genetic variation is understudied for blood lipids in large sample sizes. We now analyze genetic and blood lipid data in a high-coverage whole X chromosome sequencing study of 65,322 multi-ancestry participants and perform replication among 456,893 European participants. Common alleles on chromosome Xq23 are strongly associated with reduced total cholesterol, LDL cholesterol, and triglycerides (min P = 8.5 × 10−72), with similar effects for males and females. Chromosome Xq23 lipid-lowering alleles are associated with reduced odds for CHD among 42,545 cases and 591,247 controls (P = 1.7 × 10−4), and reduced odds for diabetes mellitus type 2 among 54,095 cases and 573,885 controls (P = 1.4 × 10−5). Although we observe an association with increased BMI, waist-to-hip ratio adjusted for BMI is reduced, bioimpedance analyses indicate increased gluteofemoral fat, and abdominal MRI analyses indicate reduced visceral adiposity. Co-localization analyses strongly correlate increased CHRDL1 gene expression, particularly in adipose tissue, with reduced concentrations of blood lipids

    The Human Pangenome Project: a global resource to map genomic diversity

    No full text
    The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

    Human whole-exome genotype data for Alzheimer’s disease

    Get PDF
    The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer’s Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD &gt; 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.</p
    corecore