125 research outputs found
Clustering by genetic ancestry using genome-wide SNP data
<p>Abstract</p> <p>Background</p> <p>Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls.</p> <p>Results</p> <p>We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations.</p> <p>Conclusions</p> <p>In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.</p
Imputation of missing genotypes: an empirical evaluation of IMPUTE
<p>Abstract</p> <p>Background</p> <p>Imputation of missing genotypes is becoming a very popular solution for synchronizing genotype data collected with different microarray platforms but the effect of ethnic background, subject ascertainment, and amount of missing data on the accuracy of imputation are not well understood.</p> <p>Results</p> <p>We evaluated the accuracy of the program IMPUTE to generate the genotype data of partially or fully untyped single nucleotide polymorphisms (SNPs). The program uses a model-based approach to imputation that reconstructs the genotype distribution given a set of referent haplotypes and the observed data, and uses this distribution to compute the marginal probability of each missing genotype for each individual subject that is used to impute the missing data. We assembled genome-wide data from five different studies and three different ethnic groups comprising Caucasians, African Americans and Asians. We randomly removed genotype data and then compared the observed genotypes with those generated by IMPUTE. Our analysis shows 97% median accuracy in Caucasian subjects when less than 10% of the SNPs are untyped and missing genotypes are accepted regardless of their posterior probability. The median accuracy increases to 99% when we require 0.95 minimum posterior probability for an imputed genotype to be acceptable. The accuracy decreases to 86% or 94% when subjects are African Americans or Asians. We propose a strategy to improve the accuracy by leveraging the level of admixture in African Americans.</p> <p>Conclusion</p> <p>Our analysis suggests that IMPUTE is very accurate in samples of Caucasians origin, it is slightly less accurate in samples of Asians background, but substantially less accurate in samples of admixed background such as African Americans. Sample size and ascertainment do not seem to affect the accuracy of imputation.</p
Recommended from our members
Learning Bayesian Networks from Correlated Data
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures
NIA Long Life Family Study: Objectives, design, and heritability of cross-sectional and longitudinal phenotypes
The NIA Long Life Family Study (LLFS) is a longitudinal, multicenter, multinational, population-based multigenerational family study of the genetic and nongenetic determinants of exceptional longevity and healthy aging. The Visit 1 in-person evaluation (2006-2009) recruited 4 953 individuals from 539 two-generation families, selected from the upper 1% tail of the Family Longevity Selection Score (FLoSS, which quantifies the degree of familial clustering of longevity). Demographic, anthropometric, cognitive, activities of daily living, ankle-brachial index, blood pressure, physical performance, and pulmonary function, along with serum, plasma, lymphocytes, red cells, and DNA, were collected. A Genome Wide Association Scan (GWAS) (Ilumina Omni 2.5M chip) followed by imputation was conducted. Visit 2 (2014-2017) repeated all Visit 1 protocols and added carotid ultrasonography of atherosclerotic plaque and wall thickness, additional cognitive testing, and perceived fatigability. On average, LLFS families show healthier aging profiles than reference populations, such as the Framingham Heart Study, at all age/sex groups, for many critical healthy aging phenotypes. However, participants are not uniformly protected. There is considerable heterogeneity among the pedigrees, with some showing exceptional cognition, others showing exceptional grip strength, others exceptional pulmonary function, etc. with little overlap in these families. There is strong heritability for key healthy aging phenotypes, both cross-sectionally and longitudinally, suggesting that at least some of this protection may be genetic. Little of the variance in these heritable phenotypes is explained by the common genome (GWAS + Imputation), which may indicate that rare protective variants for specific phenotypes may be running in selected families
Meta-analysis of genetic variants associated with human exceptional longevity
Despite evidence from family studies that there is a strong genetic influence upon exceptional longevity, relatively few genetic variants have been associated with this trait. One reason could be that many genes individually have such weak effects that they cannot meet standard thresholds of genome wide significance, but as a group in specific combinations of genetic variations, they can have a strong influence. Previously we reported that such genetic signatures of 281 genetic markers associated with about 130 genes can do a relatively good job of differentiating centenarians from non-centenarians particularly if the centenarians are 106 years and older. This would support our hypothesis that the genetic influence upon exceptional longevity increases with older and older (and rarer) ages. We investigated this list of markers using similar genetic data from 5 studies of centenarians from the USA, Europe and Japan. The results from the meta-analysis show that many of these variants are associated with survival to these extreme ages in other studies. Since many centenarians compress morbidity and disability towards the end of their lives, these results could point to biological pathways and therefore new therapeutics to increase years of healthy lives in the general population
Recommended from our members
Learning Bayesian Networks from Correlated Data
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by Nature Publishing Group. The published article can be found at: http://www.nature.com/articles/srep2515
Protein signatures of centenarians and their offspring suggest centenarians age slower than other humans
Using samples from the New England Centenarian Study (NECS), we sought to characterize the serum proteome of 77 centenarians, 82 centenarians\u27 offspring, and 65 age-matched controls of the offspring (mean ages: 105, 80, and 79 years). We identified 1312 proteins that significantly differ between centenarians and their offspring and controls (FDR \u3c 1%), and two different protein signatures that predict longer survival in centenarians and in younger people. By comparing the centenarian signature with 2 independent proteomic studies of aging, we replicated the association of 484 proteins of aging and we identified two serum protein signatures that are specific of extreme old age. The data suggest that centenarians acquire similar aging signatures as seen in younger cohorts that have short survival periods, suggesting that they do not escape normal aging markers, but rather acquire them much later than usual. For example, centenarian signatures are significantly enriched for senescence-associated secretory phenotypes, consistent with those seen with younger aged individuals, and from this finding, we provide a new list of serum proteins that can be used to measure cellular senescence. Protein co-expression network analysis suggests that a small number of biological drivers may regulate aging and extreme longevity, and that changes in gene regulation may be important to reach extreme old age. This centenarian study thus provides additional signatures that can be used to measure aging and provides specific circulating biomarkers of healthy aging and longevity, suggesting potential mechanisms that could help prolong health and support longevity
Genome-wide association study of personality traits in the Long Life Family Study
Personality traits have been shown to be associated with longevity and healthy aging. In order to discover novel genetic modifiers associated with personality traits as related with longevity, we performed a genome-wide association study (GWAS) on personality factors assessed by NEO-FFI in individuals enrolled in the Long Life Family Study (LLFS), a study of 583 families (N up to 4595) with clustering for longevity in the United States and Denmark. Three SNPs, in almost perfect LD, associated with agreeableness reached genome-wide significance (p<10-8) and replicated in an additional sample of 1279 LLFS subjects, although one (rs9650241) failed to replicate and the other two were not available in two independent replication cohorts, the Baltimore Longitudinal Study of Aging and the New England Centenarian Study. Based on 10,000,000 permutations, the empirical p-value of 2X10-7 was observed for the genome-wide significant SNPs. Seventeen SNPs that reached marginal statistical significance in the two previous GWASs (p-value < 10-4 and 10-5), were also marginally significantly associated in this study (p-value < 0.05), although none of the associations passed the Bonferroni correction. In addition, we tested age-by-SNP interactions and found some significant associations. Since scores of personality traits in LLFS subjects change in the oldest ages, and genetic factors outweigh environmental factors to achieve extreme ages, these age-by-SNP interactions could be a proxy for complex gene-gene interactions affecting personality traits and longevity
Health and function of participants in the Long Life Family Study: A comparison with other cohorts
Individuals from families recruited for the Long Life Family Study (LLFS) (n= 4559) were examined and compared to individuals from other cohorts to determine whether the recruitment targeting longevity resulted in a cohort of individuals with better health and function. Other cohorts with similar data included the Cardiovascular Health Study, the Framingham Heart Study, and the New England Centenarian Study. Diabetes, chronic pulmonary disease and peripheral artery disease tended to be less common in LLFS probands and offspring compared to similar aged persons in the other cohorts. Pulse pressure and triglycerides were lower, high density lipids were higher, and a perceptual speed task and gait speed were better in LLFS. Age-specific comparisons showed differences that would be consistent with a higher peak, later onset of decline or slower rate of change across age in LLFS participants. These findings suggest several priority phenotypes for inclusion in future genetic analysis to identify loci contributing to exceptional survival
Genetic Signatures of Exceptional Longevity in Humans
Like most complex phenotypes, exceptional longevity is thought to reflect a combined influence of environmental (e.g., lifestyle choices, where we live) and genetic factors. To explore the genetic contribution, we undertook a genome-wide association study of exceptional longevity in 801 centenarians (median age at death 104 years) and 914 genetically matched healthy controls. Using these data, we built a genetic model that includes 281 single nucleotide polymorphisms (SNPs) and discriminated between cases and controls of the discovery set with 89% sensitivity and specificity, and with 58% specificity and 60% sensitivity in an independent cohort of 341 controls and 253 genetically matched nonagenarians and centenarians (median age 100 years). Consistent with the hypothesis that the genetic contribution is largest with the oldest ages, the sensitivity of the model increased in the independent cohort with older and older ages (71% to classify subjects with an age at death>102 and 85% to classify subjects with an age at death>105). For further validation, we applied the model to an additional, unmatched 60 centenarians (median age 107 years) resulting in 78% sensitivity, and 2863 unmatched controls with 61% specificity. The 281 SNPs include the SNP rs2075650 in TOMM40/APOE that reached irrefutable genome wide significance (posterior probability of association = 1) and replicated in the independent cohort. Removal of this SNP from the model reduced the accuracy by only 1%. Further in-silico analysis suggests that 90% of centenarians can be grouped into clusters characterized by different “genetic signatures” of varying predictive values for exceptional longevity. The correlation between 3 signatures and 3 different life spans was replicated in the combined replication sets. The different signatures may help dissect this complex phenotype into sub-phenotypes of exceptional longevity
- …