Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/