15 research outputs found
Fig 1 -
(A) Ancestry assignment of UK Biobank participants based on genetic principal components. (B) Comparison of ancestry assignment between those made within Prive et al., 2022 [19] and those made by us.</p
Fig 2 -
(A) Evaluation of published prostate cancer polygenic scores sourced from PGScatalog without stratifying by ancestry in the UK Biobank testing dataset. (B) Evaluation of published prostate cancer polygenic scores sourced from PGScatalog after stratifying by ancestry in the UK Biobank testing dataset. Confidence intervals were obtained by bootstrapping the AUC 2000 times. p values were obtained from De-Long’s test.</p
S1 Fig -
(A) Published or adjusted summary statistics were used to score the entire UK Biobank cohort. The cohort was subsequently split into training and testing datasets (2:1 ratio). Logistic models were fit on the training dataset using 10-fold cross validation (repeated 10 times) and evaluated in the testing data. Evaluations were conducted in an ancestry-agnostic and ancestry-aware manner. Additional validation was conducted in the All of Us cohort in a similar manner. (B) Ancestry-specific summary statistics (or total) were adjusted for all other ancestries in a pairwise manner. This resulted in 206 scores that were evaluated in all populations. All available ancestry-types of GWAS summary statistics (African, Asian, European, and total) were combined with six types of adjustment methods (Clump, prsCS, prsCSx, IMPACT, XPASS, PolyFun) and four types of ancestry-specific reference panels (AFR, EAS, EUR and total) to produce 206 sets of adjusted summary statistics. Each set of adjusted summary statistics were then combined with genotypic data for all males in the UK Biobank to generate polygenic risk scores (TIF)</p
Characteristics of published polygenic risk scores.
Characteristics of published polygenic risk scores.</p
S5 Fig -
(A) Ancestry annotation in the All of Us dataset. (B) Age distribution in the All of Us dataset. (TIF)</p
Baseline characteristics of cases and controls in the UK biobank cohort.
Baseline characteristics of cases and controls in the UK biobank cohort.</p
Baseline characteristics of training and testing data in the UK biobank cohort.
Baseline characteristics of training and testing data in the UK biobank cohort.</p
S4 Fig -
(A) Age distribution in the UK biobank cohort. (B) Evaluation of model trained on all PRSs aggregated. (C) Evaluation of age as a disease risk modifier. Interaction between PRS and age buckets are shown. Points are colored based on the p value from a likelihood ratio test to compare the model with an interaction to the one without. (TIF)</p
Mean AUROC and Nagelkerke R<sup>2</sup>.
Prostate cancer is a heritable disease with ancestry-biased incidence and mortality. Polygenic risk scores (PRSs) offer promising advancements in predicting disease risk, including prostate cancer. While their accuracy continues to improve, research aimed at enhancing their effectiveness within African and Asian populations remains key for equitable use. Recent algorithmic developments for PRS derivation have resulted in improved pan-ancestral risk prediction for several diseases. In this study, we benchmark the predictive power of six widely used PRS derivation algorithms, including four of which adjust for ancestry, against prostate cancer cases and controls from the UK Biobank and All of Us cohorts. We find modest improvement in discriminatory ability when compared with a simple method that prioritizes variants, clumping, and published polygenic risk scores. Our findings underscore the importance of improving upon risk prediction algorithms and the sampling of diverse cohorts.</div
Fig 3 -
(A) Pairwise Pearson correlation matrix of 206 scores generated. (B) Evaluation of Nagelkerke’s pseudo R2 based on models trained on summary statistics adjusted for specific ancestries. (C) Evaluation of AUROC in the UK Biobank testing cohort. (D) Identification of polygenic scores associated with highest AUROC for each evaluation-derivation ancestry pair. Text within squares indicates associated algorithm. Top and right sample size annotations are in log scale.</p