unknown

Bayesian analysis of models of population divergence for SNP variation data

Abstract

Probabilistic models to describe genetic differentiation between populations typically fail to include the effect of complex ancestry. A Bayesian hierarchical model proposed by Nicholson et al. (2002) (ND) provides a framework for assessing differentiation using population-wise parameters for single-nucleotide polymorphism (SNP) data under certain assumptions regarding the evolution of allele frequencies over time. Although the ND model offers a coherent method to estimate population divergence, a rather simplistic assumption must be made about the historical evolution of populations. Since shared ancestry between populations results in correlations in allele frequencies, it is the potential capture of such correlations that motivates the development of the new model reported here. This thesis presents a review of the ND model using simulated and newly available SNP data, highlighting situations where the ND model does and does not fit the data well. The model was fitted using Markov-chain Monte-Carlo (MCMC) methods, and the fit assessed using residual diagnostics. Nicholson et al. (2002) reported instability in parameter estimates when a population was removed from the data set and the model re-fitted. Analysis of simulated data ensured that this is not an inherent property of the ND model and therefore can be used to highlight discrepancies with the model. Analyses on real data show that the ND model works well for groups of Europeans with low levels of genetic differentiation between populations, but a lack of fit is found when groups of populations dispersed across continents are considered. Data are also simulated under an alternative ancestral configuration and it is shown that lack of fit, manifest in residuals and estimator instability, is present when analysed using the ND model. An extension to the ND model is developed and fitted, supposing that discrepancies in the modelling assumptions of the ND model are due to the effect of alternative ancestral relationships. The ND and the new model are compared, as regards their fit to various data sets, and it is found that in some cases the new model does provide a better fit and in other cases the distinction is unclear. The new model is also used to infer the most likely ancestral relationships between populations sampled from the Human Genome Diversity Panel

    Similar works