Large bowel cancer, or colorectal cancer (CRC) is the third most common cause of
cancer worldwide and the fourth biggest cause of cancer mortality. Twin studies have
shown that the heritable contribution is ~35%, with ~5% of cases due to rare, high-penetrance
mutations. In the last decade, the use of genome-wide association studies
on large, well-characterised case-control cohorts of CRC has facilitated the
identification of over 25 common genetic variants that carry with them an increased
predisposition to colorectal cancer, invoking the common-disease common variant
paradigm. As almost all of these variants lie within non-coding regions, the
underlying causal mechanism is to-date poorly understood for the majority of these
loci, and it is thought that they mediate risk by influencing gene expression levels.
To test this hypothesis, an agnostic approach that utilises expression quantitative trait
loci (eQTL) analysis was first carried on 115 normal colorectal mucosa samples and
59 peripheral blood mononuclear cells (PBMC). As these heritable variation on gene
expression are likely to be subtle, there is a strong emphasis on the technical
methodology to minimise experimentally-induced non-biological variations,
including the extraction of high-quality RNA from primary tissue, the selection and
validation of reference genes for normalisation of gene expression quantification, as
well as internal validation of the samples and data processing. Thereafter, the
association between the 25 CRC risk variants and the expression of their cis-genes
were examined systematically, demonstrating that ten of these variants are also
tissue-specific eQTLs. This intermediate phenotype strongly suggests that they
confer risk, at least in part, by modifying regulatory mechanisms. One of the best
eQTL associations (Xp22.2) is investigated in further detail to reveal a novel indel
polymorphism (Indel24) at the distal promoter region of target gene SHROOM2 that
influenced both transcript abundance and CRC risk more than the original tagging
SNP. Functional verification with gene reporter assays indicated that Indel24
displays differential allelic control over transcriptional activity. Further in silico
analysis and mutations to the reporter gene constructs provided evidence that Indel24
modulates transcription by modifying the spacing between CCAAT motifs and the
consequent binding affinity of NF-Y transcription factor. siRNA depletion of NF-Y
was associated with a reduction in transcriptional activity of the Indel24 gene
construct as well as endogenous SHROOM2, which is strongly supportive of the
interaction between Indel24 and NF-Y in the transcriptional activation of
SHROOM2. Preliminary evidence is suggestive of SHROOM2 being expressed at the
top of the intestinal epithelial crypt and playing a role in cell cycle regulation.
Hypothesis-driven approaches can also be of utility in demonstrating functionality of
CRC risk variants, complementing the hypothesis-free approach of eQTL analysis.
Guided by a recently discovered gene-environment interaction between the 16q22.1
risk variant and circulating vitamin D levels, the influence of the rs9929218 SNP on
CDH1 gene expression was examined, in relation to the expression of putative
regulatory genes derived from in silico analysis and studies of other target genes.
Although there was no direct association between rs9929218 and CDH1 expression,
there were multiple two-way interactions that were together suggestive of rs9929218
influencing the VDR/FOXO4 regulation of CDH1. This provides functional support
for the mechanism underlying the epidemiological observation of the gene-environment
interaction between 16q22.1 and vitamin D, and demonstrates a
candidate-based approach in deciphering the link between genetic locus and CRC
susceptibility.
In summary, the research presented in this thesis has validated the experimental
rationale of utilising expression studies of normal colorectal mucosa to hone in on
the molecular mechanisms and susceptibility genes underlying the association
between common genetic variation and CRC risk