Background: Orofacial clefts (OFCs) – cleft lip with/without cleft palate (CL/P) and cleft palate (CP) – are the most common craniofacial malformations among newborns. Both CL/P and CP show strong familial aggregation resulting in high estimated heritability. Previously identified genetic risk factors account for about a quarter of the estimated total heritability of risk to OFCs, indicating additional genetic risk loci remain to be identified. The aim of this thesis is to update imputed genotypes generated from a genome-wide marker panel and use both observed and imputed genetic variants to identify the genetic risk factors for OFCs in a case-parent trio study of OFC.
Methods: We imputed genotypes on case-parent trios from the Genes and Environment Association (GENEVA) consortium using the Michigan Imputation Server, and then conducted genome-wide association analysis to identify genetic variants associated with risk of CL/P and CP separately. For each cleft subtype, we performed genotypic transmission disequilibrium test (gTDT) using the trio R package on common single nucleotide polymorphic (SNP) markers (i.e. those with a minor allele frequency [MAF] ≥ 5%) in all the trios together, and then stratified by ethnicity (Asian and European sub-groups).
Results: We identified two genes not previously reported as associated with risk to CL/P - 18q12 (TTR) and 4q22 (GRID2). The most significant SNP in the region of TTR (rs1375445) reached genome-wide significance in the combined set of all trios (p = 4.33 x 10-8) with RR=1.35 [95%CI: (1.21, 1.51)], despite not achieving this level of significance in either the European sub-group (p = 2.94 x 10-5) or Asian sub-group (p = 5.52 x 10-5) separately. However, the most significant SNP of GRID2 (rs1471079) reached genome-wide significance only in the Asian sub-group (p = 1.82 x 10-7) with estimated RR = 0.70 [95%CI: (0.60, 0.80)]. Both of these imputed SNPs have high imputation accuracy (rs1375445 R2 = 0.96; rs1471079 R2 = 0.97). Additionally, for CL/P, we replicated significant association of 8 regions identified in previous studies of these case-parent trios, including 8q24 (recognized as a gene desert), 1q32 (IRF6), 20q12 (MAFB), 17p13 (NTN1) and 1p22 (ABCA4). The most significant SNPs in six of these regions were imputed. The most significant SNP (rs17242358) in the 8q24 region showed genome-wide significance (p = 1.75 x 10-16) in the combined set of all trios. This imputed SNP showed over-transmission of A allele (over G allele) with estimated RR = 2.09 [95%CI: (1.76, 2.49)]. This imputed SNP achieved quite different levels of significance in the European (p = 7.11 x 10-14) and Asian sub-groups (p = 7.3 x 10-4) primarily because the MAF differed across the two sub-groups (MAF = 23% in Europeans and 2% in Asians). We did not detect any genome-wide significant locus for the OFC subtype CP.
Conclusions: Our findings confirm the complex genetic architecture and the heterogeneity of genes influencing risk to OFCs. We replicated most previously reported genetic risk factors for these GENEVA case-parent trios. We also identified two new genetic risk factors for CL/P that require further investigation. Stratification by racial groups helped detect OFC risk loci specific to certain groups. In addition, imputation helped improve the statistical power to detect genetic risk factors for OFC