8 research outputs found

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    MoGUL: Detecting Common Insertions and Deletions in a Population

    No full text
    Abstract. While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data. In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF> 0.06 and indel size> 20 base pairs.

    A map of human genome variation from population-scale sequencing

    No full text
    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research

    HNRNPC haploinsufficiency affects alternative splicing of intellectual disability-associated genes and causes a neurodevelopmental disorder

    No full text
    Heterogeneous nuclear ribonucleoprotein C (HNRNPC) is an essential, ubiquitously abundant protein involved in mRNA processing. Genetic variants in other members of the HNRNP family have been associated with neurodevelopmental disorders. Here, we describe 13 individuals with global developmental delay, intellectual disability, behavioral abnormalities, and subtle facial dysmorphology with heterozygous HNRNPC germline variants. Five of them bear an identical in-frame deletion of nine amino acids in the extreme C terminus. To study the effect of this recurrent variant as well as HNRNPC haploinsufficiency, we used induced pluripotent stem cells (iPSCs) and fibroblasts obtained from affected individuals. While protein localization and oligomerization were unaffected by the recurrent C-terminal deletion variant, total HNRNPC levels were decreased. Previously, reduced HNRNPC levels have been associated with changes in alternative splicing. Therefore, we performed a meta-analysis on published RNA-seq datasets of three different cell lines to identify a ubiquitous HNRNPC-dependent signature of alternative spliced exons. The identified signature was not only confirmed in fibroblasts obtained from an affected individual but also showed a significant enrichment for genes associated with intellectual disability. Hence, we assessed the effect of decreased and increased levels of HNRNPC on neuronal arborization and neuronal migration and found that either condition affects neuronal function. Taken together, our data indicate that HNRNPC haploinsufficiency affects alternative splicing of multiple intellectual disability-associated genes and that the developing brain is sensitive to aberrant levels of HNRNPC. Hence, our data strongly support the inclusion of HNRNPC to the family of HNRNP-related neurodevelopmental disorders
    corecore