276 research outputs found
Coalescent histories for lodgepole species trees
Coalescent histories are combinatorial structures that describe for a given
gene tree and species tree the possible lists of branches of the species tree
on which the gene tree coalescences take place. Properties of the number of
coalescent histories for gene trees and species trees affect a variety of
probabilistic calculations in mathematical phylogenetics. Exact and asymptotic
evaluations of the number of coalescent histories, however, are known only in a
limited number of cases. Here we introduce a particular family of species
trees, the \emph{lodgepole} species trees , in which
tree has taxa. We determine the number of coalescent
histories for the lodgepole species trees, in the case that the gene tree
matches the species tree, showing that this number grows with in the
number of taxa . This computation demonstrates the existence of tree
families in which the growth in the number of coalescent histories is faster
than exponential. Further, it provides a substantial improvement on the lower
bound for the ratio of the largest number of matching coalescent histories to
the smallest number of matching coalescent histories for trees with taxa,
increasing a previous bound of
to . We discuss the implications of our
enumerative results for phylogenetic computations
On the number of ranked species trees producing anomalous ranked gene trees
Analysis of probability distributions conditional on species trees has
demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene
trees that are more probable than the ranked gene tree that accords with the
ranked species tree. Here, to improve the characterization of ARGTs, we study
enumerative and probabilistic properties of two classes of ranked labeled
species trees, focusing on the presence or avoidance of certain subtree
patterns associated with the production of ARGTs. We provide exact enumerations
and asymptotic estimates for cardinalities of these sets of trees, showing that
as the number of species increases without bound, the fraction of all ranked
labeled species trees that are ARGT-producing approaches 1. This result extends
beyond earlier existence results to provide a probabilistic claim about the
frequency of ARGTs
A Population-Genetic Perspective on the Similarities and Differences among Worldwide Human Populations
Recent studies have produced a variety of advances in the investigation of genetic similarities and differences among human populations. Here, I pose a series of questions about human population- genetic similarities and differences, and I then answer these questions by numerical computation with a single shared population-genetic dataset. The collection of answers obtained provides an introductory perspective for understanding key results on the features of worldwide human genetic variation
Algorithms for Selecting Informative Marker Panels for Population Assignment
Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species—carp, cat, chicken, dog, fly, grayling, human, and maize—this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species—dog— producing this level of accuracy with only three markers, and the eighth species—human— requiring ∼13–16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63393/1/cmb.2005.12.1183.pd
A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
To a given gene tree topology and species tree topology with leaves
labeled bijectively from a fixed set , one can associate a set of ancestral
configurations, each of which encodes a set of gene lineages that can be found
at a given node of a species tree. We introduce a lattice structure on
ancestral configurations, studying the directed graphs that provide graphical
representations of lattices of ancestral configurations. For a matching gene
tree topology and species tree topology, we present a method for defining the
digraph of ancestral configurations from the tree topology by using iterated
cartesian products of graphs. We show that a specific set of paths on the
digraph of ancestral configurations is in bijection with the set of labeled
histories -- a well-known phylogenetic object that enumerates possible temporal
orderings of the coalescences of a tree. For each of a series of tree families,
we obtain closed-form expressions for the number of labeled histories by using
this bijection to count paths on associated digraphs. Finally, we prove that
our lattice construction extends to nonmatching tree pairs, and we use it to
characterize pairs having the maximal number of ancestral
configurations for a fixed . We discuss how the construction provides new
methods for performing enumerations of combinatorial aspects of gene and
species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first
author name update, minor changes to the tex
A test of the influence of continental axes of orientation on patterns of human gene flow
The geographic distribution of genetic variation reflects trends in past population migrations and can be used to make inferences about these migrations. It has been proposed that the east–west orientation of the Eurasian landmass facilitated the rapid spread of ancient technological innovations across Eurasia, while the north–south orientation of the Americas led to a slower diffusion of technology there. If the diffusion of technology was accompanied by gene flow, then this hypothesis predicts that genetic differentiation in the Americas along lines of longitude will be greater than that in Eurasia along lines of latitude. We use 678 microsatellite loci from 68 indigenous populations in Eurasia and the Americas to investigate the spatial axes that underlie population‐genetic variation. We find that genetic differentiation increases more rapidly along lines of longitude in the Americas than along lines of latitude in Eurasia. Distance along lines of latitude explains a sizeable portion of genetic distance in Eurasia, whereas distance along lines of longitude does not explain a large proportion of Eurasian genetic variation. Genetic differentiation in the Americas occurs along both latitudinal and longitudinal axes and has a greater magnitude than corresponding differentiation in Eurasia, even when adjusting for the lower level of genetic variation in the American populations. These results support the view that continental orientation has influenced migration patterns and has played an important role in determining both the structure of human genetic variation and the distribution and spread of cultural traits. Am J Phys Anthropol 2011. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/88031/1/21533_ftp.pd
A compendium of covariances and correlation coefficients of coalescent tree properties
Gene genealogies are frequently studied by measuring properties such as their
height (), length (), sum of external branches (), sum of internal
branches (), and mean of their two basal branches (), and the coalescence
times that contribute to the other genealogical features (). These tree
properties and their relationships can provide insight into the effects of
population-genetic processes on genealogies and genetic sequences. Here, under
the coalescent model, we study the 15 correlations among pairs of features of
genealogical trees: , , , , , and for a sample
of size , with . We report high correlations among ,
, and , with all pairwise correlations of these quantities
having values greater than or equal to in the limit as . Although has an expectation of 2 for all and has
expectation 2 in the limit as , their limiting
correlation is 0. The results contribute toward understanding features of the
shapes of coalescent trees
Enumeration of coalescent histories for caterpillar species trees and -pseudocaterpillar gene trees
For a fixed set containing taxon labels, an ordered pair consisting
of a gene tree topology and a species tree bijectively labeled with the
labels of possesses a set of coalescent histories -- mappings from the set
of internal nodes of to the set of edges of describing possible lists
of edges in on which the coalescences in take place. Enumerations of
coalescent histories for gene trees and species trees have produced suggestive
results regarding the pairs that, for a fixed , have the largest
number of coalescent histories. We define a class of 2-cherry binary tree
topologies that we term -pseudocaterpillars, examining coalescent histories
for non-matching pairs , in the case in which has a caterpillar
shape and has a -pseudocaterpillar shape. Using a construction that
associates coalescent histories for with a class of "roadblocked"
monotonic paths, we identify the -pseudocaterpillar labeled gene tree
topology that, for a fixed caterpillar labeled species tree topology, gives
rise to the largest number of coalescent histories. The shape that maximizes
the number of coalescent histories places the "second" cherry of the
-pseudocaterpillar equidistantly from the root of the "first" cherry and
from the tree root. A symmetry in the numbers of coalescent histories for
-pseudocaterpillar gene trees and caterpillar species trees is seen to exist
around the maximizing value of the parameter . The results provide insight
into the factors that influence the number of coalescent histories possible for
a given gene tree and species tree
Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences
Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd
- …