1,129 research outputs found
Maximal clades in random binary search trees
We study maximal clades in random phylogenetic trees with the Yule-Harding
model or, equivalently, in binary search trees. We use probabilistic methods to
reprove and extend earlier results on moment asymptotics and asymptotic
normality. In particular, we give an explanation of the curious phenomenon
observed by Drmota, Fuchs and Lee (2014) that asymptotic normality holds, but
one should normalize using half the variance.Comment: 25 page
Fringe trees, Crump-Mode-Jagers branching processes and -ary search trees
This survey studies asymptotics of random fringe trees and extended fringe
trees in random trees that can be constructed as family trees of a
Crump-Mode-Jagers branching process, stopped at a suitable time. This includes
random recursive trees, preferential attachment trees, fragmentation trees,
binary search trees and (more generally) -ary search trees, as well as some
other classes of random trees.
We begin with general results, mainly due to Aldous (1991) and Jagers and
Nerman (1984). The general results are applied to fringe trees and extended
fringe trees for several particular types of random trees, where the theory is
developed in detail. In particular, we consider fringe trees of -ary search
trees in detail; this seems to be new.
Various applications are given, including degree distribution, protected
nodes and maximal clades for various types of random trees. Again, we emphasise
results for -ary search trees, and give for example new results on protected
nodes in -ary search trees.
A separate section surveys results on height, saturation level, typical depth
and total path length, due to Devroye (1986), Biggins (1995, 1997) and others.
This survey contains well-known basic results together with some additional
general results as well as many new examples and applications for various
classes of random trees
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
Phylogenetic trees are the fundamental mathematical representation of
evolutionary processes in biology. As data objects, they are characterized by
the challenges associated with "big data," as well as the complication that
their discrete geometric structure results in a non-Euclidean phylogenetic tree
space, which poses computational and statistical limitations. We propose and
study a novel framework to study sets of phylogenetic trees based on tropical
geometry. In particular, we focus on characterizing our framework for
statistical analyses of evolutionary biological processes represented by
phylogenetic trees. Our setting exhibits analytic, geometric, and topological
properties that are desirable for theoretical studies in probability and
statistics, as well as increased computational efficiency over the current
state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl
Recommended from our members
Probability, Trees and Algorithms
The subject of this workshop were probabilistic aspects of algorithms for fundamental problems such as sorting, searching, selecting of and within data, random permutations, algorithms based on combinatorial trees or search trees, continuous limits of random trees and random graphs as well as random geometric graphs. The deeper understanding of the complexity of such algorithms and of shape characteristics of large discrete structures require probabilistic models and an asymptotic analysis of random discrete structures. The talks of this workshop focused on probabilistic, combinatorial and analytic techniques to study asymptotic properties of large random combinatorial structures
Circumstances in which parsimony but not compatibility will be provably misleading
Phylogenetic methods typically rely on an appropriate model of how data
evolved in order to infer an accurate phylogenetic tree. For molecular data,
standard statistical methods have provided an effective strategy for extracting
phylogenetic information from aligned sequence data when each site (character)
is subject to a common process. However, for other types of data (e.g.
morphological data), characters can be too ambiguous, homoplastic or saturated
to develop models that are effective at capturing the underlying process of
change. To address this, we examine the properties of a classic but neglected
method for inferring splits in an underlying tree, namely, maximum
compatibility. By adopting a simple and extreme model in which each character
either fits perfectly on some tree, or is entirely random (but it is not known
which class any character belongs to) we are able to derive exact and explicit
formulae regarding the performance of maximum compatibility. We show that this
method is able to identify a set of non-trivial homoplasy-free characters, when
the number of taxa is large, even when the number of random characters is
large. By contrast, we show that a method that makes more uniform use of all
the data --- maximum parsimony --- can provably estimate trees in which {\em
none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure
Edit distance metrics for measuring dissimilarity between labeled gene trees
Les arbres phylogénétiques sont des instruments de biologie évolutive offrant de formidables moyens d'étude pour la génomique comparative.
Ils fournissent des moyens de représenter des mécanismes permettant de modéliser les relations de parenté entre les espèces ou les membres de familles de gènes en fonction de la diversité taxonomique, ainsi que des observations et des renseignements sur l'histoire évolutive, la structure et la variation des processus biologiques.
Cependant, les méthodes traditionnelles d'inférence phylogénétique ont la réputation d'être sensibles aux erreurs.
Il est donc indispensable de comparer les arbres phylogénétiques et de les analyser pour obtenir la meilleure interprétation des données biologiques qu'ils peuvent fournir.
Nous commençons par aborder les travaux connexes existants pour déduire, comparer et analyser les arbres phylogénétiques, en évaluant leurs bonnes caractéristiques ainsi que leurs défauts, et discuter des pistes d'améliorations futures.
La deuxième partie de cette thèse se concentre sur le développement de mesures efficaces et précises pour analyser et comparer des paires d'arbres génétiques avec des nœuds internes étiquetés. Nous montrons que notre extension de la métrique bien connue de Robinson-Foulds donne lieu à une bonne métrique pour la comparaison d'arbres génétiques étiquetés sous divers modèles évolutifs, et qui peuvent impliquer divers événements évolutifs.Phylogenetic trees are instruments of evolutionary biology offering great insight for comparative genomics.
They provide mechanisms to model the kinship relations between species or members of gene families as a function of taxonomic diversity. They also provide evidence and insights into the evolutionary history, structure, and variation of biological processes.
However, traditional phylogenetic inference methods have the reputation to be prone to errors.
Therefore, comparing and analysing phylogenetic trees is indispensable for obtaining the best interpretation of the biological information they can provide.
We start by assessing existing related work to infer, compare, and analyse phylogenetic trees, evaluating their advantageous traits and flaws, and discussing avenues for future improvements.
The second part of this thesis focuses on the development of efficient and accurate metrics to analyse and compare pairs of gene trees with labeled internal nodes. We show that our attempt in extending the popular Robinson-Foulds metric is useful for the preliminary analysis and comparison of labeled gene trees under various evolutionary models that may involve various evolutionary events
Efficient inference of bacterial strain trees from genome-scale multilocus data
Motivation: In bacterial evolution, inferring a strain tree, which is the evolutionary history of different strains of the same bacterium, plays a major role in analyzing and understanding the evolution of strongly isolated populations, population divergence and various evolutionary events, such as horizontal gene transfer and homologous recombination. Inferring a strain tree from multilocus data of these strains is exceptionally hard since, at this scale of evolution, processes such as homologous recombination result in a very high degree of gene tree incongruence
Recommended from our members
Metaheuristic approaches for the quartet method of hierarchical clustering
Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several metaheuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighbourhood Search metaheuristic is the most effective approach to the problem, obtaining high quality solutions in short computational running times
- …