17 research outputs found
10 Years of GWAS in intraocular pressure
Intraocular pressure (IOP) is the only modifiable risk factor for glaucoma, the leading cause of irreversible blindness worldwide. In this review, we summarize the findings of genome-wide association studies (GWASs) of IOP published in the past 10 years and prior to December 2022. Over 190 genetic loci and candidate genes associated with IOP have been uncovered through GWASs, although most of these studies were conducted in subjects of European and Asian ancestries. We also discuss how these common variants have been used to derive polygenic risk scores for predicting IOP and glaucoma, and to infer causal relationship with other traits and conditions through Mendelian randomization. Additionally, we summarize the findings from a recent large-scale exome-wide association study (ExWAS) that identified rare variants associated with IOP in 40 novel genes, six of which are drug targets for clinical treatment or are being evaluated in clinical trials. Finally, we discuss the need for future genetic studies of IOP to include individuals from understudied populations, including Latinos and Africans, in order to fully characterize the genetic architecture of IOP
Exploiting genetics and genomics to improve the understanding of eye diseases [Editorial]
Editorial on the Research Topic
Exploiting genetics and genomics to improve the understanding of eye disease
A gene based combination test using GWAS summary data
Article describes how gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The authors propose a test named OWC based on summary statistics from GWAS data
New genetic loci link adipose and insulin biology to body fat distribution.
Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms
Recommended from our members
Ancestral diversity improves discovery and fine-mapping of genetic loci for anthropometric traits - the Hispanic/Latino Anthropometry Consortium
Hispanic/Latinos have been underrepresented in genome-wide association studies (GWAS) for anthropometric traits despite their notable anthropometric variability, ancestry proportions, and high burden of growth stunting and overweight/obesity. To address this knowledge gap, we analyzed densely-imputed genetic data in a sample of Hispanic/Latino adults to identify and fine-map genetic variants associated with body mass index (BMI), height, and BMI-adjusted waist-to-hip ratio (WHRadjBMI). We conducted a GWAS of 18 studies/consortia as part of the Hispanic/Latino Anthropometry (HISLA) Consortium (Stage 1, n=59,771) and generalized our findings in 9 additional studies (HISLA Stage 2, n=10,538). We conducted a trans-ancestral GWAS with summary statistics from HISLA Stage 1 and existing consortia of European and African ancestries. In our HISLA Stage 1+2 analyses, we discovered one BMI locus, as well as two BMI signals and another height signal each within established anthropometric loci. In our trans-ancestral meta-analysis, we discovered three BMI loci, one height locus, and one WHRadjBMI locus. We also identified three secondary signals for BMI, 28 for height, and two for WHRadjBMI in established loci. We show that 336 known BMI, 1,177 known height, and 143 known WHRadjBMI (combined) SNPs demonstrated suggestive transferability (nominal significance and effect estimate directional consistency) in Hispanic/Latino adults. Of these, 36 BMI, 124 height, and 11 WHRadjBMI SNPs were significant after trait-specific Bonferroni correction. Trans-ancestral meta-analysis of the three ancestries showed a small-to-moderate impact of uncorrected population stratification on the resulting effect size estimates. Our findings demonstrate that future studies may also benefit from leveraging diverse ancestries and differences in linkage disequilibrium patterns to discover novel loci and additional signals with less residual population stratification
Recommended from our members
Alzheimer’s disease risk prediction using automated machine learning
Background
Alzheimer’s disease (AD) is the most common late‐onset neurodegenerative disease. About 5.4 million Americans are living with AD. Unfortunately, there is no cure for AD at present, which makes early prediction crucial. Identifying individuals at increased risk of AD provides a better chance of benefiting from treatments. Risk prediction models are typically based on a limited number of predictors possibly with sub‐optimal performance. Here, we explore a state‐of‐the‐art automated machine learning (AutoML) framework for AD risk prediction, which can handle hundreds of predictors, including non‐traditional variables, with automatic feature engineering and model selection.
Method
We developed an AutoML model that aggregates polygenic risk scores (PRSs) and baseline individual characteristics (e.g., non‐genetic factors) for predicting AD. The PRSs were derived using summary statistics of the genome‐wide association studies from the Alzheimer Disease Genetics Consortium (ADGC) dataset (n = 19,918). The model was applied to 455,233 participants in UKBB without AD at baseline to predict development of AD at the final observation (n=1,452 developed AD). Our model was based on the H2O AutoML, an intelligent algorithm that can automatically select hyperparameters, tune ensembles of ML models, and carry out model assessment.
Result
The area under the receiver operating characteristic curve (AUC) for AD risk prediction was over 0.86. Polygenic risk scores ranked only second to age in feature importance. Furthermore, our AutoML model identified predictors that are not typically considered in traditional prediction models, such as an individual’s overall health rating and usual walking pace.
Conclusion
Our AutoML model improves the accuracy of AD risk prediction by efficiently exploring numerous predictors and ensemble models while greatly reducing manual coding hours. Furthermore, AutoML uncovered novel predictors for AD
Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies.
MOTIVATION: The risk of many complex diseases is determined by an interplay of genetic and environmental factors. The examination of gene-environment interactions (G×Es) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease-associated genes. However, the methods for testing G×Es for multiple traits are very limited.
METHOD: We developed novel approaches to test G×Es for multiple traits in sequencing association studies. We first perform a transformation of multiple traits by using either principal component analysis or standardization analysis. Then, we detect the effects of G×Es using novel proposed tests: testing the effect of an optimally weighted combination of G×Es (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ Fisher\u27s combination test to combine the p values.
RESULTS: Extensive simulation studies show that the type I error rates of the proposed methods are well controlled. Compared to the interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal G×Es. Application to the COPDGene Study demonstrates that our proposed methods are very effective.
CONCLUSIONS: Our proposed methods are useful tools in the identification of G×Es for multiple traits. The proposed methods can be used not only to identify G×Es for common variants, but also for rare variants. Therefore, they can be employed in identifying G×Es in both genome-wide association studies and next-generation sequencing data analyses
Recommended from our members
Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction
Background
Alzheimer’s disease (AD) is the most common late‐onset neurodegenerative disease. Identifying individuals at increased risk of developing AD is important for early intervention. Risk prediction models are typically based on a limited number of predictors, possibly with sub‐optimal performance. Here, we explore an explainable machine learning (ML) framework, XGBoost and SHapley Additive exPlanations (SHAP) values, for AD risk prediction, which can handle a large number of predictors and output the impact and importance of each predictor.
Method
We developed an XGBoost model that aggregates polygenic risk scores (PRSs), which include both PRS for AD risk and PRS for age at onset of AD, baseline individual characteristics (e.g., non‐genetic factors), and information from electronic health records for predicting incident AD. The PRSs were derived using summary statistics from genome‐wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC) dataset (n = 19,918). The model was applied to 457,936 white participants in UK Biobank to predict development of AD within 10 years after the baseline visit (n = 2,177 developed AD). We further used SHAP values to explain the relative information in model predictors.
Result
For participants of age 40 and older, the area under the receiver operating characteristic curve (AUC) for AD risk prediction was over 0.880. PRSs ranked second to age (the best predictor) in feature importance. For subjects of age 65 and above, PRSs for AD were the most important features. Our ML model not only identified traditional risk factors for AD, such as age, education, income, body mass index, diabetes, and blood pressure, but also identified predictors from electronic health records that are not typically considered in traditional prediction models, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolaemia, for developing AD. Furthermore, SHAP values aided the ranking of feature importance and model explanation.
Conclusion
Our ML model improves the accuracy of AD risk prediction by efficiently exploring numerous predictors. PRSs play the most important role in developing AD in individuals of age 65 and older. In application, the model also identified novel feature patterns for AD
Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction
Abstract Alzheimer’s disease (AD) is the most common late-onset neurodegenerative disorder. Identifying individuals at increased risk of developing AD is important for early intervention. Using data from the Alzheimer Disease Genetics Consortium, we constructed polygenic risk scores (PRSs) for AD and age-at-onset (AAO) of AD for the UK Biobank participants. We then built machine learning (ML) models for predicting development of AD, and explored feature importance among PRSs, conventional risk factors, and ICD-10 codes from electronic health records, a total of > 11,000 features using the UK Biobank dataset. We used eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP), which provided superior ML performance as well as aided ML model explanation. For participants age 40 and older, the area under the curve for AD was 0.88. For subjects of age 65 and older (late-onset AD), PRSs were the most important predictors. This is the first observation that PRSs constructed from the AD risk and AAO play more important roles than age in predicting AD. The ML model also identified important predictors from EHR, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolemia, for developing AD. Our ML model improved the accuracy of AD risk prediction by efficiently exploring numerous predictors and identified novel feature patterns
Recommended from our members
Combining quantitative and survival trait analyses identifies novel general and sex‐specific genes for age‐at‐onset of Alzheimer’s disease
Background
Alzheimer’s disease (AD) has high genetic heritability for both disease risk and age‐at‐onset (AAO) of AD. However, our understanding of genetics of AAO of AD lags behind AD risk. Here, we utilized two statistical approaches to identify genes modifying AAO of AD globally or in a sex‐specific manner.
Method
Pooled data from 9,219 AD cases and 10,345 controls from 20 cohorts from the Alzheimer Disease Genetic Consortium were analyzed using a linear mixed model (LMM) for AAO from cases only, and a Cox proportional hazard frailty model (CoxFM) from all subjects with AAO observed in cases and censored at age‐at‐exam in controls. Both methods modeled outcome on SNP, sex, APOE‐e4, and 10 principal components (PCs), and incorporated random intercept by cohort. Fisher’s method was used to combine results of LMM and CoxFM. Genome‐wide significant variants were determined based on p < 5×10−8 in individual or combined tests. Multiple secondary analyses were performed including sex‐specific association tests for top SNPs (p< 10−6). Two gene‐expression datasets of prefrontal cortex tissues from NCBI/GEO were analyzed by sex to evaluate the sex‐specific AAO genes.
Result
We identified 34 genome‐wide significant loci. We confirmed 6 known AD risk genes (APOE, CR1, BIN1, TREM2, PICALM, and FERMT2) also regulating AAO as AD. Twenty‐eight potential novel loci were identified including ARFGEF2, MAPK9, and the MLX region harboring functional related genes for AD. Sex‐specific analyses showed the effects of PICALM, MLX, and CDH2 on AAO were solely from females, and ATP2C1 from males. Particularly, the MLX region harboring 10 genes in strong LD, including variants in COASY and HSD17B1 previously reported to associate with earlier AAO of AD in female AD patients with Down Syndrome. Gene expression further supported higher differential expression of AD over controls for HSD17B1, COASY, PLEKHH3, and MLX in females than males.
Conclusion
Using two statistical methods to model AAO of AD, we identified six known AD risk genes and 28 novel loci associated with AAO of AD, and further narrowed down four loci with sex‐specific effects. Notably, HSD17B1, involved in estrogen activation, has strong functional implications warrant for further investigation