Epedemiology and Public Health, Imperial College London
Doi
Abstract
In this thesis I focus on the development and application of hidden Markov model
(HMM) to solve problems in statistical genetics. Our method, based on a HMM,
models the joint haplotype structure in the samples, where the parameters in the
model are estimated by the Baum-Welch EM algorithm. Also, the model does not
require pre-defined blocks or a sliding window scheme to define haplotype boundaries. Thus our method is computationally efficient and applicable for either the
whole genome sequence or the candidate gene sequence.
The first application of this model is for disease association testing using inferred
ancestral haplotypes. We employed a HMM to cluster haplotypes into groups of
predicted common ancestral haplotypes from diploid genotypes. The results from
simulation studies show that our method greatly outperforms single-SNP analyses
and has greater power than a haplotype-based method, CLADHC, in most simulation scenarios. The second application is for inferring haplotypic phase and to
predict missing genotypes in polyploid organisms. Using a simulation study we
demonstrate that the method provides accurate estimates of haplotypic phase and
missing genotypes for diploids, triploids and tetraploids. The third application is
for joint CNV/SNP haplotype and missing data inference. The results are very
encouraging for this application.
With the increasing availability of genotype data in both diploid and polyploid
organisms, we believe that our programs can facilitate the investigation of genetic
variations in genome-wide scale studies