338 research outputs found

    Perspectives on Human Genetic Variation from the HapMap Project

    Get PDF
    The completion of the International HapMap Project marks the start of a new phase in human genetics. The aim of the project was to provide a resource that facilitates the design of efficient genome-wide association studies, through characterising patterns of genetic variation and linkage disequilibrium in a sample of 270 individuals across four geographical populations. In total, over one million SNPs have been typed across these genomes, providing an unprecedented view of human genetic diversity. In this review we focus on what the HapMap project has taught us about the structure of human genetic variation and the fundamental molecular and evolutionary processes that shape it

    Copy Number Variants and Common Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies

    Get PDF
    Genome-wide association scans (GWASs) using single nucleotide polymorphisms (SNPs) have been completed successfully for several common disorders and have detected over 30 new associations. Considering the large sample sizes and genome-wide SNP coverage of the scans, one might have expected many of the common variants underpinning the genetic component of various disorders to have been identified by now. However, these studies have not evaluated the contribution of other forms of genetic variation, such as structural variation, mainly in the form of copy number variants (CNVs). Known CNVs account for over 15% of the assembled human genome sequence. Since CNVs are not easily tagged by SNPs, might have a wide range of copy number variability, and often fall in genomic regions not well covered by whole-genome arrays or not genotyped by the HapMap project, current GWASs have largely missed the contribution of CNVs to complex disorders. In fact, some CNVs have already been reported to show association with several complex disorders using candidate gene/region approaches, underpinning the importance of regions not investigated in current GWASs. This reveals the need for new generation arrays (some already in the market) and the use of tailored approaches to explore the full dimension of genome variability beyond the single nucleotide scale

    Positive Selection of a Pre-Expansion CAG Repeat of the Human SCA2 Gene

    Get PDF
    A region of approximately one megabase of human Chromosome 12 shows extensive linkage disequilibrium in Utah residents with ancestry from northern and western Europe. This strikingly large linkage disequilibrium block was analyzed with statistical and experimental methods to determine whether natural selection could be implicated in shaping the current genome structure. Extended Haplotype Homozygosity and Relative Extended Haplotype Homozygosity analyses on this region mapped a core region of the strongest conserved haplotype to the exon 1 of the Spinocerebellar ataxia type 2 gene (SCA2). Direct DNA sequencing of this region of the SCA2 gene revealed a significant association between a pre-expanded allele [(CAG)(8)CAA(CAG)(4)CAA(CAG)(8)] of CAG repeats within exon 1 and the selected haplotype of the SCA2 gene. A significantly negative Tajima's D value (−2.20, p < 0.01) on this site consistently suggested selection on the CAG repeat. This region was also investigated in the three other populations, none of which showed signs of selection. These results suggest that a recent positive selection of the pre-expansion SCA2 CAG repeat has occurred in Utah residents with European ancestry

    Latent class analysis variable selection

    Get PDF
    We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

    The Influence of Recombination on Human Genetic Diversity

    Get PDF
    In humans, the rate of recombination, as measured on the megabase scale, is positively associated with the level of genetic variation, as measured at the genic scale. Despite considerable debate, it is not clear whether these factors are causally linked or, if they are, whether this is driven by the repeated action of adaptive evolution or molecular processes such as double-strand break formation and mismatch repair. We introduce three innovations to the analysis of recombination and diversity: fine-scale genetic maps estimated from genotype experiments that identify recombination hotspots at the kilobase scale, analysis of an entire human chromosome, and the use of wavelet techniques to identify correlations acting at different scales. We show that recombination influences genetic diversity only at the level of recombination hotspots. Hotspots are also associated with local increases in GC content and the relative frequency of GC-increasing mutations but have no effect on substitution rates. Broad-scale association between recombination and diversity is explained through covariance of both factors with base composition. To our knowledge, these results are the first evidence of a direct and local influence of recombination hotspots on genetic variation and the fate of individual mutations. However, that hotspots have no influence on substitution rates suggests that they are too ephemeral on an evolutionary time scale to have a strong influence on broader scale patterns of base composition and long-term molecular evolution

    Identification of Two Independent Risk Factors for Lupus within the MHC in United Kingdom Families

    Get PDF
    The association of the major histocompatibility complex (MHC) with SLE is well established yet the causal variants arising from this region remain to be identified, largely due to inadequate study design and the strong linkage disequilibrium demonstrated by genes across this locus. The majority of studies thus far have identified strong association with classical class II alleles, in particular HLA-DRB1*0301 and HLA-DRB1*1501. Additional associations have been reported with class III alleles; specifically, complement C4 null alleles and a tumor necrosis factor promoter SNP (TNF-308G/A). However, the relative effects of these class II and class III variants have not been determined. We have thus used a family-based approach to map association signals across the MHC class II and class III regions in a cohort of 314 complete United Kingdom Caucasian SLE trios by typing tagging SNPs together with classical typing of the HLA-DRB1 locus. Using TDT and conditional regression analyses, we have demonstrated the presence of two distinct and independent association signals in SLE: HLA-DRB1*0301 (nominal p = 4.9 × 10−8, permuted p < 0.0001, OR = 2.3) and the T allele of SNP rs419788 (nominal p = 4.3 × 10−8, permuted p < 0.0001, OR = 2.0) in intron 6 of the class III region gene SKIV2L. Assessment of genotypic risk demonstrates a likely dominant model of inheritance for HLA-DRB1*0301, while rs419788-T confers susceptibility in an additive manner. Furthermore, by comparing transmitted and untransmitted parental chromosomes, we have delimited our class II signal to a 180 kb region encompassing the alleles HLA-DRB1*0301-HLA-DQA1*0501-HLA-DQB1*0201 alone. Our class III signal importantly excludes independent association at the TNF promoter polymorphism, TNF-308G/A, in our SLE cohort and provides a potentially novel locus for future genetic and functional studies

    SAQC: SNP Array Quality Control

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.</p> <p>Results</p> <p>We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples.</p> <p>Conclusions</p> <p>This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (<url>http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm</url>).</p

    Linkage Disequilibrium in Wild Mice

    Get PDF
    Crosses between laboratory strains of mice provide a powerful way of detecting quantitative trait loci for complex traits related to human disease. Hundreds of these loci have been detected, but only a small number of the underlying causative genes have been identified. The main difficulty is the extensive linkage disequilibrium (LD) in intercross progeny and the slow process of fine-scale mapping by traditional methods. Recently, new approaches have been introduced, such as association studies with inbred lines and multigenerational crosses. These approaches are very useful for interval reduction, but generally do not provide single-gene resolution because of strong LD extending over one to several megabases. Here, we investigate the genetic structure of a natural population of mice in Arizona to determine its suitability for fine-scale LD mapping and association studies. There are three main findings: (1) Arizona mice have a high level of genetic variation, which includes a large fraction of the sequence variation present in classical strains of laboratory mice; (2) they show clear evidence of local inbreeding but appear to lack stable population structure across the study area; and (3) LD decays with distance at a rate similar to human populations, which is considerably more rapid than in laboratory populations of mice. Strong associations in Arizona mice are limited primarily to markers less than 100 kb apart, which provides the possibility of fine-scale association mapping at the level of one or a few genes. Although other considerations, such as sample size requirements and marker discovery, are serious issues in the implementation of association studies, the genetic variation and LD results indicate that wild mice could provide a useful tool for identifying genes that cause variation in complex traits

    Haplotype-based quantitative trait mapping using a clustering algorithm

    Get PDF
    BACKGROUND: With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention. RESULTS: We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at . CONCLUSION: For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (≥ 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other factors such as recombination rates and the density of typed SNPs. Haplotype-based methods might provide higher power than methods based on a single SNP when using tag SNPs selected from a small number of samples or some other sources (such as HapMap data). Rank-based statistics usually have much lower power, as shown in our study

    The promoter polymorphism -232C/G of the PCK1 gene is associated with type 2 diabetes in a UK-resident South Asian population

    Get PDF
    Background: The PCK1 gene, encoding cytosolic phosphoenolpyruvate carboxykinase (PEPCK-C), has previously been implicated as a candidate gene for type 2 diabetes (T2D) susceptibility. Rodent models demonstrate that over-expression of Pck1 can result in T2D development and a single nucleotide polymorphism (SNP) in the promoter region of human PCK1 (-232C/G) has exhibited significant association with the disease in several cohorts. Within the UK-resident South Asian population, T2D is 4 to 6 times more common than in indigenous white Caucasians. Despite this, few studies have reported on the genetic susceptibility to T2D in this ethnic group and none of these has investigated the possible effect of PCK1 variants. We therefore aimed to investigate the association between common variants of the PCK1 gene and T2D in a UK-resident South Asian population of Punjabi ancestry, originating predominantly from the Mirpur area of Azad Kashmir, Pakistan. \ud \ud Methods: We used TaqMan assays to genotype five tagSNPs covering the PCK1 gene, including the -232C/G variant, in 903 subjects with T2D and 471 normoglycaemic controls. \ud \ud Results: Of the variants studied, only the minor allele (G) of the -232C/G SNP demonstrated a significant association with T2D, displaying an OR of 1.21 (95% CI: 1.03 - 1.42, p = 0.019). \ud \ud Conclusion: This study is the first to investigate the association between variants of the PCK1 gene and T2D in South Asians. Our results suggest that the -232C/G promoter polymorphism confers susceptibility to T2D in this ethnic group. \ud \ud Trial registration: UKADS Trial Registration: ISRCTN38297969
    corecore