31 research outputs found

    Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    Get PDF
    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants

    A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules

    Get PDF
    Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools

    Genetic Control of Obesity and Gut Microbiota Composition in Response to High-Fat, High-Sucrose Diet in Mice

    Get PDF
    Obesity is a highly heritable disease driven by complex interactions between genetic and environmental factors. Human genome-wide association studies (GWAS) have identified a number of loci contributing to obesity; however, a major limitation of these studies is the inability to assess environmental interactions common to obesity. Using a systems genetics approach, we measured obesity traits, global gene expression, and gut microbiota composition in response to a high-fat/high-sucrose (HF/HS) diet of more than 100 inbred strains of mice. Here we show that HF/HS feeding promotes robust, strain-specific changes in obesity that is not accounted for by food intake and provide evidence for a genetically determined set-point for obesity. GWAS analysis identified 11 genome-wide significant loci associated with obesity traits, several of which overlap with loci identified in human studies. We also show strong relationships between genotype and gut microbiota plasticity during HF/HS feeding and identify gut microbial phylotypes associated with obesity

    Mouse Genome-Wide Association and Systems Genetics Identify Asxl2 As a Regulator of Bone Mineral Density and Osteoclastogenesis

    Get PDF
    Significant advances have been made in the discovery of genes affecting bone mineral density (BMD); however, our understanding of its genetic basis remains incomplete. In the current study, genome-wide association (GWA) and co-expression network analysis were used in the recently described Hybrid Mouse Diversity Panel (HMDP) to identify and functionally characterize novel BMD genes. In the HMDP, a GWA of total body, spinal, and femoral BMD revealed four significant associations (−log10P>5.39) affecting at least one BMD trait on chromosomes (Chrs.) 7, 11, 12, and 17. The associations implicated a total of 163 genes with each association harboring between 14 and 112 genes. This list was reduced to 26 functional candidates by identifying those genes that were regulated by local eQTL in bone or harbored potentially functional non-synonymous (NS) SNPs. This analysis revealed that the most significant BMD SNP on Chr. 12 was a NS SNP in the additional sex combs like-2 (Asxl2) gene that was predicted to be functional. The involvement of Asxl2 in the regulation of bone mass was confirmed by the observation that Asxl2 knockout mice had reduced BMD. To begin to unravel the mechanism through which Asxl2 influenced BMD, a gene co-expression network was created using cortical bone gene expression microarray data from the HMDP strains. Asxl2 was identified as a member of a co-expression module enriched for genes involved in the differentiation of myeloid cells. In bone, osteoclasts are bone-resorbing cells of myeloid origin, suggesting that Asxl2 may play a role in osteoclast differentiation. In agreement, the knockdown of Asxl2 in bone marrow macrophages impaired their ability to form osteoclasts. This study identifies a new regulator of BMD and osteoclastogenesis and highlights the power of GWA and systems genetics in the mouse for dissecting complex genetic traits

    Efficient Design and Analysis of Genome-wide Association Studies

    No full text
    The recent advances in genomic technologies, have made it possible to collect large-scale information on genetic variation across a diverse biological landscape.This has resulted in an exponential influx of genetic information and the field of genetics has become data-rich in a relatively short amount of time. These developments have opened new avenues to elucidate the genetic basis of complex diseases, where the traditional disease study approaches had little success.In recent years, the genome-wide association study (GWAS) approach has gained widespread popularity for its ease of use and effectiveness, and is now the standard approach to study complex diseases.In GWAS, information on millions of single-nucleotide polymorphisms (SNPs) is collected from case and control individuals.SNP genotyping is cost-effective and due to their abundance in the genome, SNPs are correlated to their neighboring genetic variation, which makes them tags for genomic regions.Typically, each SNP is statistically tested for association to disease, and the genomic regions tagged by the significant SNPs are believed to be harboring the functional variants contributing to disease.In order to reduce the cost of GWAS and the redundancy in the information collected, an informative subset of the SNPs, or tag SNPs, are genotyped.Typically, the genomic regions harboring the significantly associated tag SNPs may be large and contain many additional polymorphisms.At this stage of the study it may not be clear which specific genes or polymorphisms are in fact most strongly associated to disease.We present a novel framework for designing cost-effective follow-up association studies to further characterize such regions by genotyping additional SNPs to identify all the associated polymorphisms.This identification of all associated polymorphisms provides a catalog of all possible functional variants, and the values of the actual association statistics at these polymorphisms may provide information to identify causal variants.We present the utility of our method in identifying significant associations and causal variants using simulated and real GWAS datasets.Although GWAS have been widely used to study associations of SNPs to disease phenotypes, there has been growing interest in applying the GWAS approach to high-throughput biological phenotypes, such as gene expression.In these studies, the goal is to identify genomic regions that affect gene expression levels, known as expression quantitative trait loci (eQTL).A challenge in applying GWAS to eQTL studies is that there are tens of thousands of measurements, each representing the expression level of one gene, for each sample tested, as opposed to values for one or two clinical traits.This results in a tremendous computational burden when performing the analysis, requiring computation for billions of tests and demands substantial computational resources.We present a novel two-stage approach to efficiently identify all of the significant associations without testing all the SNPs.In the first-stage, a small number of informative SNPs across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions.We demonstrate that this method increases the computational speed of eQTL studies by a factor of ten, and can be applied to reduce the computational burden of a wide range of association statistics.Finally, we develop a novel approach to address a problem that has been of fundamental interest to geneticists for decades. The contribution of genetics to a trait, termed as heritability, is often measured by the amount of variation in the trait that is due to genetics.Heritability, quantifies the role of genetics in a trait and provides insight about disease etiology.Traditionally, heritabilities were estimated in studies of individuals with known relatedness such as classical twin studies.Recently, estimating the heritability of a trait from unrelated individuals using GWAS data, and further, partitioning the heritability into the contributions of genomic regions has received a lot of attention.Existing methods partition the heritability by jointly estimating the contributions of all regions.However, these methods are computationally intractable and may be inaccurate when the number of regions is large.In this work, we present an alternative approach that partitions the total heritability into the contributions of an arbitrary number of regions, while performing these computations in parallel.We demonstrate that our method is more accurate and computationally efficient than existing approaches

    Efficiently Identifying Significant Associations in Genome-wide Association Studies

    No full text
    Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75

    Identifying Causal Variants at Loci with Multiple Signals of Association

    No full text
    Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20–50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/

    An association mapping framework to account for potential sex difference in genetic architectures

    No full text
    Over the past few years, genome-wide association studies have identified many trait-associated loci that have different effects on females and males, which increased attention to the genetic architecture differences between the sexes. The between-sex differences in genetic architectures can cause a variety of phenomena such as differences in the effect sizes at trait-associated loci, differences in the magnitudes of polygenic background effects, and differences in the phenotypic variances. However, current association testing approaches for dealing with sex, such as including sex as a covariate, cannot fully account for these phenomena and can be suboptimal in statistical power. We present a novel association mapping framework, MetaSex, that can comprehensively account for the genetic architecture differences between the sexes. Through simulations and applications to real data, we show that our framework has superior performance than previous approaches in association mapping.Y
    corecore