110 research outputs found

    Genetic signal maximization using environmental regression

    Get PDF
    Joint analyses of correlated phenotypes in genetic epidemiology studies are common. However, these analyses primarily focus on genetic correlation between traits and do not take into account environmental correlation. We describe a method that optimizes the genetic signal by accounting for stochastic environmental noise through joint analysis of a discrete trait and a correlated quantitative marker. We conducted bivariate analyses where heritability and the environmental correlation between the discrete and quantitative traits were calculated using Genetic Analysis Workshop 17 (GAW17) family data. The resulting inverse value of the environmental correlation between these traits was then used to determine a new β coefficient for each quantitative trait and was constrained in a univariate model. We conducted genetic association tests on 7,087 nonsynonymous SNPs in three GAW17 family replicates for Affected status with the β coefficient fixed for three quantitative phenotypes and compared these to an association model where the β coefficient was allowed to vary. Bivariate environmental correlations were 0.64 (± 0.09) for Q1, 0.798 (± 0.076) for Q2, and −0.169 (± 0.18) for Q4. Heritability of Affected status improved in each univariate model where a constrained β coefficient was used to account for stochastic environmental effects. No genome-wide significant associations were identified for either method but we demonstrated that constraining β for covariates slightly improved the genetic signal for Affected status. This environmental regression approach allows for increased heritability when the β coefficient for a highly correlated quantitative covariate is constrained and increases the genetic signal for the discrete trait

    Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

    Get PDF
    Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs

    Stratify or adjust? Dealing with multiple populations when evaluating rare variants

    Get PDF
    The unrelated individuals sample from Genetic Analysis Workshop 17 consists of a small number of subjects from eight population samples and genetic data composed mostly of rare variants. We compare two simple approaches to collapsing rare variants within genes for their utility in identifying genes that affect phenotype. We also compare results from stratified analyses to those from a pooled analysis that uses ethnicity as a covariate. We found that the two collapsing approaches were similarly effective in identifying genes that contain causative variants in these data. However, including population as a covariate was not an effective substitute for analyzing the subpopulations separately when only one subpopulation contained a rare variant linked to the phenotype

    Application of collapsing methods for continuous traits to the Genetic Analysis Workshop 17 exome sequence data

    Get PDF
    Genetic Analysis Workshop 17 used real sequence data from the 1000 Genomes Project and simulated phenotypes influenced by a large number of rare variants. Our aim is to evaluate the performance of various collapsing methods that were developed for analysis of multiple rare variants. We apply collapsing methods to continuous phenotypes Q1 and Q2 for all 200 replicates of the unrelated individuals data. Within each gene, we collapse (1) all SNPs, (2) all SNPs with minor allele frequency (MAF) < 0.05, and (3) nonsynonymous SNPs with MAF < 0.05. We consider two tests when collapsing variants: using the proportion of variants and using the presence/absence of any variant. We also compare our results to a single-marker analysis using PLINK. For phenotype Q1, the proportion test for collapsing rare nonsynonymous SNPs often performed the best. Two genes (FLT1 and KDR) had statistically significant results. A single-marker analysis using PLINK also provided statistically significant results for some SNPs within these two genes. For phenotype Q2, collapsing rare nonsynonymous SNPs performed the best, with almost no difference between proportion and presence tests. However, neither collapsing methods nor a single-marker analysis provided statistically significant results at the true genes for Q2. We also found that a large number of noncausal genes had high correlations with causal genes for Q1 and Q2, which may account for inflated false positives

    Gene-based partial least-squares approaches for detecting rare variant associations with complex traits

    Get PDF
    Genome-wide association studies are largely based on single-nucleotide polymorphisms and rest on the common disease/common variants (single-nucleotide polymorphisms) hypothesis. However, it has been argued in the last few years and is well accepted now that rare variants are valuable for studying common diseases. Although current genome-wide association studies have successfully discovered many genetic variants that are associated with common diseases, detecting associated rare variants remains a great challenge. Here, we propose two partial least-squares approaches to aggregate the signals of many single-nucleotide polymorphisms (SNPs) within a gene to reveal possible genetic effects related to rare variants. The availability of the 1000 Genomes Project offers us the opportunity to evaluate the effectiveness of these two gene-based approaches. Compared to results from a SNP-based analysis, the proposed methods were able to identify some (rare) SNPs that were missed by the SNP-based analysis

    Estimating heritability using family and unrelated individuals data

    Get PDF
    For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1 and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using the family data and the unrelated individuals data

    Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection

    Get PDF
    Next-generation sequencing technologies enable us to explore rare functional variants. However, most current statistical techniques are too underpowered to capture signals of rare variants in genome-wide association studies. We propose a supervised coalescing of single-nucleotide polymorphisms to obtain gene-based markers that can stably reveal possible genetic effects related to rare alleles. We use a newly developed empirical Bayes variable selection algorithm to identify associations between studied traits and genetic markers. Using our novel method, we analyzed the three continuous phenotypes in the GAW17 data set across 200 replicates, with intriguing results

    Identification of multiple rare variants associated with a disease

    Get PDF
    Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk

    Detecting disease rare alleles using single SNPs in families and haplotyping in unrelated subjects from the Genetic Analysis Workshop 17 data

    Get PDF
    We present an evaluation of discovery power for two association tests that work well with common alleles but are applied to the Genetic Analysis Workshop 17 simulations with rare causative single-nucleotide polymorphisms (SNPs) (minor allele frequency [MAF] < 1%). The methods used were genome-wide single-SNP association tests based on a linear mixed-effects model for discovery and applied to the familial sample and sliding windows haplotype association tests for replication, implemented within causative genes in the unrelated individuals sample. Both methods are evaluated with respect to the simulated trait Q2. The linear mixed-effects model and haplotype association tests failed to detect the rare alleles of the simulated associations. In contrast, the linear mixed-effects model and haplotype association tests detected effects for the most important simulated SNPs with MAF > 1%. We conclude that these findings reflect inadequate statistical power (the result of small simulated samples) for the complex genetic model that underlies these data

    A method to detect single-nucleotide polymorphisms accounting for a linkage signal using covariate-based affected relative pair linkage analysis

    Get PDF
    We evaluate an approach to detect single-nucleotide polymorphisms (SNPs) that account for a linkage signal with covariate-based affected relative pair linkage analysis in a conditional-logistic model framework using all 200 replicates of the Genetic Analysis Workshop 17 family data set. We begin by combining the multiple known covariate values into a single variable, a propensity score. We also use each SNP as a covariate, using an additive coding based on the number of minor alleles. We evaluate the distribution of the difference between LOD scores with the propensity score covariate only and LOD scores with the propensity score covariate and a SNP covariate. The inclusion of causal SNPs in causal genes increases LOD scores more than the inclusion of noncausal SNPs either within causal genes or outside causal genes. We compare the results from this method to results from a family-based association analysis and conclude that it is possible to identify SNPs that account for the linkage signals from genes using a SNP-covariate-based affected relative pair linkage approach
    corecore