Non-canonical base pairs that escape the proof-reading activity of the DNA
polymerase emerge from DNA replication as DNA mismatches. To promote
genomic integrity, these DNA mismatches are corrected by a secondary
protection system, called DNA mismatch repair (MMR). Understanding the
details of MMR is important for human health as defects in mismatch repair can
result in cancer (e.g. hereditary nonpolyposis colorectal cancer, also known as
Lynch syndrome).
Being normally stochastic in nature, mismatches can emerge at random
locations in a chromosome. Therefore, using a molecular tool to generate
substrates for the MMR system at a defined locus has been particularly useful in
my study of DNA mismatch repair in vivo. In this study, I have used a CTG•CAG
repeat array, also called the “TNR array”, to generate frequent substrates for the
MMR system in Escherichia coli. In E. coli, the MMR system searches for hemimethylated
GATC motifs around a mismatch to initiate removal of the faulty
nascent (un-methylated) strand. Analysing the usage of GATC motifs around the
TNR array, I have found that the MMR system preferentially utilizes the GATC
motifs on the origin distal side of the TNR array demonstrating that the
bidirectionality of MMR in vitro is constrained in live cells. My results suggest
that in vivo MMR operates by searching for the nearest hemimethylated GATC
site located between the mismatch and the replication fork and excision of the
nascent strand occurs directionally away from the fork towards the mismatch.
Previous in vitro studies have established that the excision reaction during MMR
terminates at a discrete point about 100 bp beyond a mismatch. However, in
vivo recombination at a 275 bp tandem repeat, which has been proposed to be
mediated by single stranded DNA generated during the excision reaction, has
suggested that the end point of the excision reaction in live cells may extend
much further from the mismatch than this. I have used this assay for extended
excision to determine the influence of GATC sites on excision tracts. In this
study, modification of the GATC motifs on the origin proximal side of the TNR
has shown that the excision reaction does not stop at a GATC motif on the origin
proximal side of the mismatch. In addition, sequential modifications of GATC
motifs on the origin distal side of the TNR array, thereby shifting the start point
of the excision reaction to a greater distance, have suggested that the length of
an excision tract is a function of the distance it covers from the start point rather
than from a mismatch.
My observation of directionality with respect to DNA replication in the
recognition of GATC sites suggested that MMR and DNA replication might be
coupled in some way and that perhaps active (or blocked) MMR might impede
the progress of the replication fork. However, no replication intermediates were
detected using two-dimensional agarose gel electrophoresis of genomic DNA
fragment containing the TNR array upon restriction digestion. I was therefore
unable to support the hypothesis that active or blocked MMR led to a slowing
down of DNA replication.
Given my observation of a decrease in MMR by separating the mismatch from
the closest origin distal GATC site, I set out to test whether MMR caused any
selection pressure for the genomic distribution of GATC motifs. To do this, I
generated artificial model genomes using a Markovian algorithm based on the
nucleotide composition and codon usage in E. coli. Strikingly, the comparison of
the distribution of GATC motifs in the E. coli genome with those from artificial
sequences has shown that GATC motifs are distributed randomly in E. coli
genome, except for a small clustering effect which has been detected for short
spaced (0-40 basepairs) GATC motifs. The observed distribution of slightly
over-represented GATC motifs in the E. coli genome appears to be a function of
the total number of GATC motifs and it seems that the DNA mismatch repair
system has evolved to utilize the natural distribution of GATC motifs to maintain
genomic integrity