Search CORE

276 research outputs found

Coalescent histories for lodgepole species trees

Author: Disanto Filippo
Rosenberg Noah A.
Publication venue
Publication date: 01/01/2015
Field of study

Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees

(\lambda_n)_{n\geq 0}

, in which tree

\lambda_n

has

m=2n+1

taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with

m!!

in the number of taxa

m

. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with

m

taxa, increasing a previous bound of

(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m}

[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}

. We discuss the implications of our enumerative results for phylogenetic computations

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

On the number of ranked species trees producing anomalous ranked gene trees

Author: Disanto Filippo
Rosenberg Noah A.
Publication venue
Publication date: 01/01/2014
Field of study

Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches 1. This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A Population-Genetic Perspective on the Similarities and Differences among Worldwide Human Populations

Author: Rosenberg Noah A.
Publication venue: DigitalCommons@WayneState
Publication date: 01/12/2011
Field of study

Recent studies have produced a variety of advances in the investigation of genetic similarities and differences among human populations. Here, I pose a series of questions about human population- genetic similarities and differences, and I then answer these questions by numerical computation with a single shared population-genetic dataset. The collection of answers obtained provides an introductory perspective for understanding key results on the features of worldwide human genetic variation

Crossref

PubMed Central

Digital Commons@Wayne State University

Algorithms for Selecting Informative Marker Panels for Population Assignment

Author: Rosenberg Noah A.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/11/2005
Field of study

Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species—carp, cat, chicken, dog, fly, grayling, human, and maize—this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species—dog— producing this level of accuracy with only three markers, and the eighth species—human— requiring ∼13–16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63393/1/cmb.2005.12.1183.pd

Deep Blue Documents at the University of Michigan

A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees

Author: Lappo Egor
Rosenberg Noah A
Publication venue
Publication date: 09/09/2022
Field of study

To a given gene tree topology

G

and species tree topology

S

with leaves labeled bijectively from a fixed set

X

, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of a species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations of lattices of ancestral configurations. For a matching gene tree topology and species tree topology, we present a method for defining the digraph of ancestral configurations from the tree topology by using iterated cartesian products of graphs. We show that a specific set of paths on the digraph of ancestral configurations is in bijection with the set of labeled histories -- a well-known phylogenetic object that enumerates possible temporal orderings of the coalescences of a tree. For each of a series of tree families, we obtain closed-form expressions for the number of labeled histories by using this bijection to count paths on associated digraphs. Finally, we prove that our lattice construction extends to nonmatching tree pairs, and we use it to characterize pairs

(G,S)

having the maximal number of ancestral configurations for a fixed

G

. We discuss how the construction provides new methods for performing enumerations of combinatorial aspects of gene and species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first author name update, minor changes to the tex

arXiv.org e-Print Archive

A test of the influence of continental axes of orientation on patterns of human gene flow

Author: Ramachandran Sohini
Rosenberg Noah A.
Publication venue: 'Wiley'
Publication date: 01/12/2011
Field of study

The geographic distribution of genetic variation reflects trends in past population migrations and can be used to make inferences about these migrations. It has been proposed that the east–west orientation of the Eurasian landmass facilitated the rapid spread of ancient technological innovations across Eurasia, while the north–south orientation of the Americas led to a slower diffusion of technology there. If the diffusion of technology was accompanied by gene flow, then this hypothesis predicts that genetic differentiation in the Americas along lines of longitude will be greater than that in Eurasia along lines of latitude. We use 678 microsatellite loci from 68 indigenous populations in Eurasia and the Americas to investigate the spatial axes that underlie population‐genetic variation. We find that genetic differentiation increases more rapidly along lines of longitude in the Americas than along lines of latitude in Eurasia. Distance along lines of latitude explains a sizeable portion of genetic distance in Eurasia, whereas distance along lines of longitude does not explain a large proportion of Eurasian genetic variation. Genetic differentiation in the Americas occurs along both latitudinal and longitudinal axes and has a greater magnitude than corresponding differentiation in Eurasia, even when adjusting for the lower level of genetic variation in the American populations. These results support the view that continental orientation has influenced migration patterns and has played an important role in determining both the structure of human genetic variation and the distribution and spread of cultural traits. Am J Phys Anthropol 2011. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/88031/1/21533_ftp.pd

PubMed Central

Deep Blue Documents at the University of Michigan

A compendium of covariances and correlation coefficients of coalescent tree properties

Author: Alimpiev Egor
Rosenberg Noah A
Publication venue
Publication date: 18/11/2021
Field of study

Gene genealogies are frequently studied by measuring properties such as their height (

H

), length (

L

), sum of external branches (

E

), sum of internal branches (

I

), and mean of their two basal branches (

B

), and the coalescence times that contribute to the other genealogical features (

T

). These tree properties and their relationships can provide insight into the effects of population-genetic processes on genealogies and genetic sequences. Here, under the coalescent model, we study the 15 correlations among pairs of features of genealogical trees:

H_n

L_n

E_n

I_n

B_n

, and

T_k

for a sample of size

n

, with

2 \leq k \leq n

. We report high correlations among

H_n

L_n

I_n,

and

B_n

, with all pairwise correlations of these quantities having values greater than or equal to

\sqrt{6} [6 \zeta(3) + 6 - \pi^2] / ( \pi \sqrt{18 + 9\pi^2 - \pi^4}) \approx 0.84930

in the limit as

n \rightarrow \infty

. Although

E_n

has an expectation of 2 for all

n

and

H_n

has expectation 2 in the limit as

n \rightarrow \infty

, their limiting correlation is 0. The results contribute toward understanding features of the shapes of coalescent trees

arXiv.org e-Print Archive

Enumeration of coalescent histories for caterpillar species trees and $p$ -pseudocaterpillar gene trees

Author: Alimpiev Egor
Rosenberg Noah A
Publication venue
Publication date: 24/03/2021
Field of study

For a fixed set

X

containing

n

taxon labels, an ordered pair consisting of a gene tree topology

G

and a species tree

S

bijectively labeled with the labels of

X

possesses a set of coalescent histories -- mappings from the set of internal nodes of

G

to the set of edges of

S

describing possible lists of edges in

S

on which the coalescences in

G

take place. Enumerations of coalescent histories for gene trees and species trees have produced suggestive results regarding the pairs

(G,S)

that, for a fixed

n

, have the largest number of coalescent histories. We define a class of 2-cherry binary tree topologies that we term

p

-pseudocaterpillars, examining coalescent histories for non-matching pairs

(G,S)

, in the case in which

S

has a caterpillar shape and

G

has a

p

-pseudocaterpillar shape. Using a construction that associates coalescent histories for

(G,S)

with a class of "roadblocked" monotonic paths, we identify the

p

-pseudocaterpillar labeled gene tree topology that, for a fixed caterpillar labeled species tree topology, gives rise to the largest number of coalescent histories. The shape that maximizes the number of coalescent histories places the "second" cherry of the

p

-pseudocaterpillar equidistantly from the root of the "first" cherry and from the tree root. A symmetry in the numbers of coalescent histories for

p

-pseudocaterpillar gene trees and caterpillar species trees is seen to exist around the maximizing value of the parameter

p

. The results provide insight into the factors that influence the number of coalescent histories possible for a given gene tree and species tree

arXiv.org e-Print Archive

Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences

Author: Rosenberg Noah A.
Than Cuong V.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2011
Field of study

Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd

Deep Blue Documents at the University of Michigan