82 research outputs found

    A robust clustering algorithm for identifying problematic samples in genome-wide association studies

    Get PDF
    Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections

    A second major histocompatibility complex susceptibility locus for multiple sclerosis

    Get PDF
    Objective: Variation in the major histocompatibility complex (MHC) on chromosome 6p21 is known to influence susceptibility to multiple sclerosis with the strongest effect originating from the HLA-DRB1 gene in the class II region. The possibility that other genes in the MHC independently influence susceptibility to multiple sclerosis has been suggested but remains unconfirmed. Methods: Using a combination of microsatellite, single nucleotide polymorphism, and human leukocyte antigen (HLA) typing, we screened the MHC in trio families looking for evidence of residual association above and beyond that attributable to the established DRB1*1501 risk haplotype. We then refined this analysis by extending the genotyping of classical HLA loci into independent cases and control subjects. Results: Screening confirmed the presence of residual association and suggested that this was maximal in the region of the HLA-C gene. Extending analysis of the classical loci confirmed that this residual association is partly due to allelic heterogeneity at the HLA-DRB1 locus, but also reflects an independent effect from the HLA-C gene. Specifically, the HLA-C*05 allele, or a variant in tight linkage disequilibrium with it, appears to exert a protective effect (p = 3.3 × 10−5). Interpretation: Variation in the HLA-C gene influences susceptibility to multiple sclerosis independently of any effect attributable to the nearby HLA-DRB1 gene

    The complex genetics of multiple sclerosis: pitfalls and prospects

    Get PDF
    The genetics of complex disease is entering a new and exciting era. The exponentially growing knowledge and technological capabilities emerging from the human genome project have finally reached the point where relevant genes can be readily and affordably identified. As a result, the last 12 months has seen a virtual explosion in new knowledge with reports of unequivocal association to relevant genes appearing almost weekly. The impact of these new discoveries in Neuroscience is incalculable at this stage but potentially revolutionary. In this review, an attempt is made to illuminate some of the mysteries surrounding complex genetics. Although focused almost exclusively on multiple sclerosis all the points made are essentially generic and apply equally well, with relatively minor addendums, to any other complex trait, neurological or otherwise

    A better coefficient of determination for genetic profile analysis

    Get PDF
    Genome-wide association studies have facilitated the construction of risk predictors for disease from multiple Single Nucleotide Polymorphism markers. The ability of such "genetic profiles" to predict outcome is usually quantified in an independent data set. Coefficients of determination (R-2) have been a useful measure to quantify the goodness-of-fit of the genetic profile. Various pseudo-R-2 measures for binary responses have been proposed. However, there is no standard or consensus measure because the concept of residual variance is not easily defined on the observed probability scale. Unlike other nongenetic predictors such as environmental exposure, there is prior information on genetic predictors because for most traits there are estimates of the proportion of variation in risk in the population due to all genetic factors, the heritability. It is this useful ability to benchmark that makes the choice of a measure of goodness-of-fit in genetic profiling different from that of nongenetic predictors. In this study, we use a liability threshold model to establish the relationship between the observed probability scale and underlying liability scale in measuring R-2 for binary responses. We show that currently used R-2 measures are difficult to interpret, biased by ascertainment, and not comparable to heritability. We suggest a novel and globally standard measure of R-2 that is interpretable on the liability scale. Furthermore, even when using ascertained case-control studies that are typical in human disease studies, we can obtain an R-2 measure on the liability scale that can be compared directly to heritability. Genet. Epidemiol. 36:214-224, 2012. (C) 2012 Wiley Periodicals, Inc

    Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene

    Get PDF
    The chromosome 16p13 region has been associated with several autoimmune diseases, including type 1 diabetes (T1D) and multiple sclerosis (MS). CLEC16A has been reported as the most likely candidate gene in the region, since it contains the most disease-associated single-nucleotide polymorphisms (SNPs), as well as an imunoreceptor tyrosine-based activation motif. However, here we report that intron 19 of CLEC16A, containing the most autoimmune disease-associated SNPs, appears to behave as a regulatory sequence, affecting the expression of a neighbouring gene, DEXI. The CLEC16A alleles that are protective from T1D and MS are associated with increased expression of DEXI, and no other genes in the region, in two independent monocyte gene expression data sets. Critically, using chromosome conformation capture (3C), we identified physical proximity between the DEXI promoter region and intron 19 of CLEC16A, separated by a loop of >150 kb. In reciprocal experiments, a 20 kb fragment of intron 19 of CLEC16A, containing SNPs associated with T1D and MS, as well as with DEXI expression, interacted with the promotor region of DEXI but not with candidate DNA fragments containing other potential causal genes in the region, including CLEC16A. Intron 19 of CLEC16A is highly enriched for transcription-factor-binding events and markers associated with enhancer activity. Taken together, these data indicate that although the causal variants in the 16p13 region lie within CLEC16A, DEXI is an unappreciated autoimmune disease candidate gene, and illustrate the power of the 3C approach in progressing from genome-wide association studies results to candidate causal genes
    corecore