1,443 research outputs found
Counting Coalescent Histories
Given a species tree and a gene tree, a valid coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. I develop a recursion for the number of valid coalescent histories that exist for an arbitrary gene tree/species tree pair, when one gene lineage is studied per species. The result is obtained by defining a concept of m-extended coalescent histories, enumerating and counting these histories, and taking the special case of m = 1. As a sum over valid coalescent histories appears in a formula for the probability that a random gene tree evolving along the branches of a fixed species tree has a specified labeled topology, the enumeration of valid coalescent histories can considerably reduce the effort required for evaluating this formula.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63175/1/cmb.2006.0109.pd
Coalescent histories for lodgepole species trees
Coalescent histories are combinatorial structures that describe for a given
gene tree and species tree the possible lists of branches of the species tree
on which the gene tree coalescences take place. Properties of the number of
coalescent histories for gene trees and species trees affect a variety of
probabilistic calculations in mathematical phylogenetics. Exact and asymptotic
evaluations of the number of coalescent histories, however, are known only in a
limited number of cases. Here we introduce a particular family of species
trees, the \emph{lodgepole} species trees , in which
tree has taxa. We determine the number of coalescent
histories for the lodgepole species trees, in the case that the gene tree
matches the species tree, showing that this number grows with in the
number of taxa . This computation demonstrates the existence of tree
families in which the growth in the number of coalescent histories is faster
than exponential. Further, it provides a substantial improvement on the lower
bound for the ratio of the largest number of matching coalescent histories to
the smallest number of matching coalescent histories for trees with taxa,
increasing a previous bound of
to . We discuss the implications of our
enumerative results for phylogenetic computations
Importance sampling for Lambda-coalescents in the infinitely many sites model
We present and discuss new importance sampling schemes for the approximate
computation of the sample probability of observed genetic types in the
infinitely many sites model from population genetics. More specifically, we
extend the 'classical framework', where genealogies are assumed to be governed
by Kingman's coalescent, to the more general class of Lambda-coalescents and
develop further Hobolth et. al.'s (2008) idea of deriving importance sampling
schemes based on 'compressed genetrees'. The resulting schemes extend earlier
work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner
and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance
comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree
In this paper, we provide a polynomial time algorithm to calculate the
probability of a {\it ranked} gene tree topology for a given species tree,
where a ranked tree topology is a tree topology with the internal vertices
being ordered. The probability of a gene tree topology can thus be calculated
in polynomial time if the number of orderings of the internal vertices is a
polynomial number. However, the complexity of calculating the probability of a
gene tree topology with an exponential number of rankings for a given species
tree remains unknown
Enumeration of coalescent histories for caterpillar species trees and -pseudocaterpillar gene trees
For a fixed set containing taxon labels, an ordered pair consisting
of a gene tree topology and a species tree bijectively labeled with the
labels of possesses a set of coalescent histories -- mappings from the set
of internal nodes of to the set of edges of describing possible lists
of edges in on which the coalescences in take place. Enumerations of
coalescent histories for gene trees and species trees have produced suggestive
results regarding the pairs that, for a fixed , have the largest
number of coalescent histories. We define a class of 2-cherry binary tree
topologies that we term -pseudocaterpillars, examining coalescent histories
for non-matching pairs , in the case in which has a caterpillar
shape and has a -pseudocaterpillar shape. Using a construction that
associates coalescent histories for with a class of "roadblocked"
monotonic paths, we identify the -pseudocaterpillar labeled gene tree
topology that, for a fixed caterpillar labeled species tree topology, gives
rise to the largest number of coalescent histories. The shape that maximizes
the number of coalescent histories places the "second" cherry of the
-pseudocaterpillar equidistantly from the root of the "first" cherry and
from the tree root. A symmetry in the numbers of coalescent histories for
-pseudocaterpillar gene trees and caterpillar species trees is seen to exist
around the maximizing value of the parameter . The results provide insight
into the factors that influence the number of coalescent histories possible for
a given gene tree and species tree
The Time Machine: A Simulation Approach for Stochastic Trees
In the following paper we consider a simulation technique for stochastic
trees. One of the most important areas in computational genetics is the
calculation and subsequent maximization of the likelihood function associated
to such models. This typically consists of using importance sampling (IS) and
sequential Monte Carlo (SMC) techniques. The approach proceeds by simulating
the tree, backward in time from observed data, to a most recent common ancestor
(MRCA). However, in many cases, the computational time and variance of
estimators are often too high to make standard approaches useful. In this paper
we propose to stop the simulation, subsequently yielding biased estimates of
the likelihood surface. The bias is investigated from a theoretical point of
view. Results from simulation studies are also given to investigate the balance
between loss of accuracy, saving in computing time and variance reduction.Comment: 22 Pages, 5 Figure
- …