Post-GWAS functional characterisation of colorectal cancer risk loci

Abstract

Large bowel cancer, or colorectal cancer (CRC) is the third most common cause of cancer worldwide and the fourth biggest cause of cancer mortality. Twin studies have shown that the heritable contribution is ~35%, with ~5% of cases due to rare, high-penetrance mutations. In the last decade, the use of genome-wide association studies on large, well-characterised case-control cohorts of CRC has facilitated the identification of over 25 common genetic variants that carry with them an increased predisposition to colorectal cancer, invoking the common-disease common variant paradigm. As almost all of these variants lie within non-coding regions, the underlying causal mechanism is to-date poorly understood for the majority of these loci, and it is thought that they mediate risk by influencing gene expression levels. To test this hypothesis, an agnostic approach that utilises expression quantitative trait loci (eQTL) analysis was first carried on 115 normal colorectal mucosa samples and 59 peripheral blood mononuclear cells (PBMC). As these heritable variation on gene expression are likely to be subtle, there is a strong emphasis on the technical methodology to minimise experimentally-induced non-biological variations, including the extraction of high-quality RNA from primary tissue, the selection and validation of reference genes for normalisation of gene expression quantification, as well as internal validation of the samples and data processing. Thereafter, the association between the 25 CRC risk variants and the expression of their cis-genes were examined systematically, demonstrating that ten of these variants are also tissue-specific eQTLs. This intermediate phenotype strongly suggests that they confer risk, at least in part, by modifying regulatory mechanisms. One of the best eQTL associations (Xp22.2) is investigated in further detail to reveal a novel indel polymorphism (Indel24) at the distal promoter region of target gene SHROOM2 that influenced both transcript abundance and CRC risk more than the original tagging SNP. Functional verification with gene reporter assays indicated that Indel24 displays differential allelic control over transcriptional activity. Further in silico analysis and mutations to the reporter gene constructs provided evidence that Indel24 modulates transcription by modifying the spacing between CCAAT motifs and the consequent binding affinity of NF-Y transcription factor. siRNA depletion of NF-Y was associated with a reduction in transcriptional activity of the Indel24 gene construct as well as endogenous SHROOM2, which is strongly supportive of the interaction between Indel24 and NF-Y in the transcriptional activation of SHROOM2. Preliminary evidence is suggestive of SHROOM2 being expressed at the top of the intestinal epithelial crypt and playing a role in cell cycle regulation. Hypothesis-driven approaches can also be of utility in demonstrating functionality of CRC risk variants, complementing the hypothesis-free approach of eQTL analysis. Guided by a recently discovered gene-environment interaction between the 16q22.1 risk variant and circulating vitamin D levels, the influence of the rs9929218 SNP on CDH1 gene expression was examined, in relation to the expression of putative regulatory genes derived from in silico analysis and studies of other target genes. Although there was no direct association between rs9929218 and CDH1 expression, there were multiple two-way interactions that were together suggestive of rs9929218 influencing the VDR/FOXO4 regulation of CDH1. This provides functional support for the mechanism underlying the epidemiological observation of the gene-environment interaction between 16q22.1 and vitamin D, and demonstrates a candidate-based approach in deciphering the link between genetic locus and CRC susceptibility. In summary, the research presented in this thesis has validated the experimental rationale of utilising expression studies of normal colorectal mucosa to hone in on the molecular mechanisms and susceptibility genes underlying the association between common genetic variation and CRC risk

    Similar works