A Genome-wide Association Study of Schizophrenia in the South African Xhosa and Generalizability of Polygenic Risk Score across African populations

Abstract

African populations are vastly underrepresented in genetic studies despite having the most genetic variation globally and facing wide-ranging environmental exposures. Most of these studies have been conducted in populations of European (EUR) ancestry using GWAS arrays that represent the genetic variation in these populations. Thus, the prediction accuracy of polygenic risk scores (PRS) derived from EUR ancestry populations is less accurate in populations of non-European ancestry, and least accurate in African (AFR) ancestry populations. The extent to which PRS prediction accuracy varies within AFR ancestry populations has not, however, been previously investigated. This study had two aims: the first was to investigate the contribution of common variants to the risk of schizophrenia in the South African Xhosa (SAX) population through genome-wide association study (GWAS) analysis, and to determine if PRS derived from EUR and East Asian (EAS) ancestry populations from the Psychiatric Genomics Consortium (PGC) Schizophrenia Working Group were generalizable to SAX. The second aim was to assess the generalizability of PRS for non-psychiatric phenotypes that were derived from EUR ancestry individuals from the UK Biobank (UKB, n = ~350,000) in the Uganda General Population Cohort (GPC, n = 4,778) and the South African Drakenstein Child Health Study (DHCS, n = 638). To address the first aim, a GWAS was conducted in 2,086 Xhosa individuals from South Africa with and without schizophrenia (ncases = 1,038; ncontrols = 1,048) using a custom-designed Affymetrix GWAS array designed to capture variation in the Xhosa population. The schizophrenia GWAS in SAX yielded one SNP (rs35172303 ; P = 4.74e-08, OR = 0.6004, 95%CI:[0.499,0.721]) in ZFP3 that met genome-wide significance. The association of variants in ZFP3 from the schizophrenia GWAS is consistent with those from an earlier exomesequence study in SAX undertaken by colleagues, but this gene has not previously been associated with schizophrenia in large-scale schizophrenia GWAS of predominantly EUR ancestry. After characterizing the genetic architecture of schizophrenia in SAX, it was found that the heritability was enriched across functional categories involved in the regulation of gene expression. Then, the accuracy of PRS derived from PGC Schizophrenia Working Group from both EUR and EAS ancestries in predicting schizophrenia in SAX was quantified. There was low PRS prediction accuracy using PGC-derived summary statistics in SAX (PGC-EUR: max R2 = 0.0057, P = 0.008; PGC-EAS: max R2 = 0.0059, P = 0.007). These findings are consistent with previous findings that showed that PRS predication accuracy is low when discovery and target cohorts come from different ancestral backgrounds. For the second aim, PRS prediction accuracy was quantified in simulations using data from the African Genome Variation project (AGVP) to represent continental AFR diversity. Samples were categorised by geographical region into West, East and South Africa cohorts. Each cohort was divided into a discovery and target datasets. The West and East African discovery data was used to predict the simulated phenotype in the three target cohorts. Using UKB EUR ancestry individuals, PRS prediction accuracy was assessed for 34 anthropometric and blood panel traits in the Uganda GPC, and then meta-analysed UKB with PAGE (Population Architecture using Genomics and Epidemiology, comprising about 50,000 Latino/Hispanic and African-American individuals) and BBJ (Biobank Japan, n = ~162,000) to assess how the inclusion of diverse sample impacts PRS prediction accuracy. Simulations were limited by sample size but showed that PRS prediction accuracy was highest when the discovery and target cohorts were matched by African region, and for phenotypes with the sparsest genetic architecture. Using empirical data from UKB and the Uganda GPC, a low prediction accuracy was observed across all 34 quantitative traits in GPC when using GWAS data from UKB. There was differential prediction accuracy across AFR ancestry groups within UKB, i.e. the prediction accuracy was highest for the Ethiopian and admixed populations, and lowest for southern African populations. When comparing PRS prediction accuracy of East African individuals from the UKB to that of individuals from GPC, the prediction accuracy was lowest in the Ugandan GPC population, indicating that the difference in environments between the two groups may be contributing to the difference in PRS accuracy. Moreover, the cross-ancestry meta-analyses showed that the inclusion of diverse samples in large scale studies improves PRS prediction accuracy, most especially for phenotypes with population-enriched variants. It was demonstrated for the first time in this thesis that EUR ancestry-derived PRS prediction accuracy varied within continental AFR ancestry groups, and tracks with population history and the evolution of humans. The higher prediction accuracy observed in Ethiopians can be explained by their genetic proximity to Europeans as a result of the back to Africa migration, whereas the southern African populations (including SAX) are more proximal to the ancestral populations that never left the continent. It is therefore imperative to not only include more African samples in future large-scale studies, but to have samples that adequately represent the genetic and environmental diversity on the African continent

    Similar works