332 research outputs found

    Clustering by genetic ancestry using genome-wide SNP data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls.</p> <p>Results</p> <p>We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations.</p> <p>Conclusions</p> <p>In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.</p

    Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models

    Full text link
    In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F‐distribution tests based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F‐distribution tests provide much more significant results than those of F‐tests of univariate analysis and optimal sequence kernel association test (SKAT‐O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F‐distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F‐distribution tests provide much more significant results than those of F‐tests of univariate analysis and SKAT‐O for the three biochemical traits. The approximate F‐distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT‐O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT‐O in the univariate case.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111259/1/gepi21895.pd

    RNA Editing Genes Associated with Extreme Old Age in Humans and with Lifespan in C. elegans

    Get PDF
    The strong familiality of living to extreme ages suggests that human longevity is genetically regulated. The majority of genes found thus far to be associated with longevity primarily function in lipoprotein metabolism and insulin/IGF-1 signaling. There are likely many more genetic modifiers of human longevity that remain to be discovered.Here, we first show that 18 single nucleotide polymorphisms (SNPs) in the RNA editing genes ADARB1 and ADARB2 are associated with extreme old age in a U.S. based study of centenarians, the New England Centenarian Study. We describe replications of these findings in three independently conducted centenarian studies with different genetic backgrounds (Italian, Ashkenazi Jewish and Japanese) that collectively support an association of ADARB1 and ADARB2 with longevity. Some SNPs in ADARB2 replicate consistently in the four populations and suggest a strong effect that is independent of the different genetic backgrounds and environments. To evaluate the functional association of these genes with lifespan, we demonstrate that inactivation of their orthologues adr-1 and adr-2 in C. elegans reduces median survival by 50%. We further demonstrate that inactivation of the argonaute gene, rde-1, a critical regulator of RNA interference, completely restores lifespan to normal levels in the context of adr-1 and adr-2 loss of function.Our results suggest that RNA editors may be an important regulator of aging in humans and that, when evaluated in C. elegans, this pathway may interact with the RNA interference machinery to regulate lifespan

    Genetic Signatures of Exceptional Longevity in Humans

    Get PDF
    Like most complex phenotypes, exceptional longevity is thought to reflect a combined influence of environmental (e.g., lifestyle choices, where we live) and genetic factors. To explore the genetic contribution, we undertook a genome-wide association study of exceptional longevity in 801 centenarians (median age at death 104 years) and 914 genetically matched healthy controls. Using these data, we built a genetic model that includes 281 single nucleotide polymorphisms (SNPs) and discriminated between cases and controls of the discovery set with 89% sensitivity and specificity, and with 58% specificity and 60% sensitivity in an independent cohort of 341 controls and 253 genetically matched nonagenarians and centenarians (median age 100 years). Consistent with the hypothesis that the genetic contribution is largest with the oldest ages, the sensitivity of the model increased in the independent cohort with older and older ages (71% to classify subjects with an age at death>102 and 85% to classify subjects with an age at death>105). For further validation, we applied the model to an additional, unmatched 60 centenarians (median age 107 years) resulting in 78% sensitivity, and 2863 unmatched controls with 61% specificity. The 281 SNPs include the SNP rs2075650 in TOMM40/APOE that reached irrefutable genome wide significance (posterior probability of association = 1) and replicated in the independent cohort. Removal of this SNP from the model reduced the accuracy by only 1%. Further in-silico analysis suggests that 90% of centenarians can be grouped into clusters characterized by different “genetic signatures” of varying predictive values for exceptional longevity. The correlation between 3 signatures and 3 different life spans was replicated in the combined replication sets. The different signatures may help dissect this complex phenotype into sub-phenotypes of exceptional longevity
    corecore