Using Dirichlet Process Priors For Bayesian Mixture Clustering

Abstract

We describe a non-parametric Bayesian model using genotype data to classify individuals among populations where the total number of populations is unknown. The model assumes that a population is characterized by a set of allele frequencies that follow multinomial distributions. The Dirichlet Process is applied as the prior distribution. The method estimates the number of populations together with the allele frequencies and the ancestry coefficients of each individual. Distance matrices and bootstrap support numbers based on MCMC runs are generated to create a phylogeny of the ancestral populations

    Similar works