853 research outputs found
Neighborhoods of trees in circular orderings
In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species X is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set X. To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (nni), the subtree prune and regraft (spr) and the tree bisection and reconnection (tbr) operations. We show that the size of such a neighborhood of a binary tree associated with the nni operation is independent of the tree’s topology, but that this is not the case for the spr and tbr operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the spr and tbr operations and characterize those trees for which these bounds are attained
A Note on Encodings of Phylogenetic Networks of Bounded Level
Driven by the need for better models that allow one to shed light into the
question how life's diversity has evolved, phylogenetic networks have now
joined phylogenetic trees in the center of phylogenetics research. Like
phylogenetic trees, such networks canonically induce collections of
phylogenetic trees, clusters, and triplets, respectively. Thus it is not
surprising that many network approaches aim to reconstruct a phylogenetic
network from such collections. Related to the well-studied perfect phylogeny
problem, the following question is of fundamental importance in this context:
When does one of the above collections encode (i.e. uniquely describe) the
network that induces it? In this note, we present a complete answer to this
question for the special case of a level-1 (phylogenetic) network by
characterizing those level-1 networks for which an encoding in terms of one (or
equivalently all) of the above collections exists. Given that this type of
network forms the first layer of the rich hierarchy of level-k networks, k a
non-negative integer, it is natural to wonder whether our arguments could be
extended to members of that hierarchy for higher values for k. By giving
examples, we show that this is not the case
From trees to networks and back
The evolutionary history of a set of species is commonly represented by a phylogenetic tree. Often, however, the data contain conflicting signals, which can be better represented by a more general structure, namely a phylogenetic network. Such networks allow the display of
several alternative evolutionary scenarios simultaneously but this can come at the price of complex visual representations. Using so-called circular split networks reduces this complexity, because this type of network can always be visualized in the plane without any crossing
edges. These circular split networks form the core of this thesis. We construct them, use them as a search space for minimum evolution trees and explore their properties.
More specifically, we present a new method, called SuperQ, to construct a circular split network summarising a collection of phylogenetic trees that have overlapping leaf sets. Then, we explore the set of phylogenetic trees associated with a �fixed circular split network, in particular using it as a search space for optimal trees. This set
represents just a tiny fraction of the space of all phylogenetic trees, but we still �find trees within it that compare quite favourably with those obtained by a leading heuristic, which uses tree edit operations for searching the whole tree space. In the last part, we advance our
understanding of the set of phylogenetic trees associated with a circular split network. Specifically, we investigate the size of the so-called circular tree neighbourhood for the three tree edit operations, tree bisection and reconnection (tbr), subtree prune and regraft (spr) and nearest neighbour interchange (nni)
Computing Maximum Agreement Forests without Cluster Partitioning is Folly
Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance
Deep kernelization for the Tree Bisection and Reconnnect (TBR) distance in phylogenetics
We describe a kernel of size 9k-8 for the NP-hard problem of computing the
Tree Bisection and Reconnect (TBR) distance k between two unrooted binary
phylogenetic trees. We achieve this by extending the existing portfolio of
reduction rules with three novel new reduction rules. Two of the rules are
based on the idea of topologically transforming the trees in a
distance-preserving way in order to guarantee execution of earlier reduction
rules. The third rule extends the local neighbourhood approach introduced in
(Kelk and Linz, Annals of Combinatorics 24(3), 2020) to more global structures,
allowing new situations to be identified when deletion of a leaf definitely
reduces the TBR distance by one. The bound on the kernel size is tight up to an
additive term. Our results also apply to the equivalent problem of computing a
Maximum Agreement Forest (MAF) between two unrooted binary phylogenetic trees.
We anticipate that our results will be more widely applicable for computing
agreement-forest based dissimilarity measures.Comment: 38 pages. In this version a figure has been added, some references
have been added, some small typo's have been fixed and the introduction and
conclusion have been slightly extended. Submitted for journal revie
Barking up the wrong tree : some obstacles to phylogenetic reconstruction
Phylogenetics is the study of evolutionary relationships between entities, usually biological in nature. The primary aim of such study is to elucidate the structure of these evolutionary histories. Unfortunately, such study can run into a variety of obstacles, both practical and theoretical. In this thesis we explore theoretical obstacles to phylogenetic reconstruction, by examining several scenarios in which distinguishing between similar structures can become quite difficult. In Chapter 2, we consider when metrics on trees and metrics on networks can become indistinguishable, and present several novel results in this area, showing that it is possible for any tree metric to be represented on a non-trivial network, and provide early results on the possible structures of these networks. In Chapter 3, we consider tree-based networks - a phenomenon in which networks have a strong tree-like signal. We present the first findings on these networks in the context of unrooted non-binary networks. We characterise the circumstances under which such networks can become `saturated' by these signals, and provide some graph theoretical results in this area as well. In Chapter 4 we consider the scenario in which two trees can appear similar due to their hierarchical structure. We present a new metric to quantify this similarity, and use simulations to show several promising properties of the metric and the relative accuracy of a function that gives an upper bound to the metric
Recommended from our members
Mathematical Modeling of Viral Evolution and Epidemiology
Phylogenetic trees can be used to study the evolution of any sequence that evolves, including viruses. In a viral epidemic, the history of transmission events defines constraints on the evolutionary history of the viral population. The spread of many viruses is driven by social and sexual networks, and because of the relationship between their evolutionary and transmission histories, phylogenetic inference from viral sequences can be used to improve the inference of patterns of the epidemic, which in turn may be able to enhance epidemiological intervention. The simultaneous simulation of viral transmission networks, phylogenetic trees, and sequences can provide a method to observe the effects of virus model parameters on the epidemic as well as to study the accuracies and errors of transmission inference tools, but the success of such simulations relies on the existence of appropriate models. Further, the development of massively-scalable tools to analyze ultra-large datasets of viral sequences can aid epidemiologists in the real-time surveillance of the spread of disease. To enable viral epidemic simulation analyses, I developed FAVITES: a novel framework to simulate viral transmission networks, phylogenetic trees, and sequences, and I used FAVITES to study the effects of model parameters on epidemic outcomes. In an effort to better capture the unbalanced topologies commonly observed in retroviral phylogenies, I developed a novel evolutionary model (dual-birth), derived probabilistic distributions and theoretical expectations of trees sampled under the model, developed an approach to estimate model parameters given real data, and used the model to analyze Alu retrotransposons in the human genome. In order to potentially aid public health officials, I developed a scalable and non-parametric phylogenetic method of viral transmission risk prioritization, which I evaluated against current best-practice methods via simulation and real data. Lastly, I contributed to Bioinformatics education by developing multiple publicly-accessible adaptive online interactive texts
New algorithms and mathematical tools for phylogenetics beyond trees
Phylogenetic trees and networks are mathematical structures for representing the evolutionary history of a set of taxa. The need for methods to build such structures from various type of data, as well as the need to understand the story these data may tell, give rise to exciting new challenges for mathematics and computer sciences. This thesis presents some recent advances in both these directions. It features new mathematical methodology for reconstructing phylogenetic networks, and new computational tools for inferring complex evolutionary scenarios. These come with a thorough analysis, assessing their attractiveness in terms of their theoretical properties. It expands on previous results, which are themselves briefly reviewed, and conclude with potentially interesting further research questions
- …