15 research outputs found
Kernelizations for the hybridization number problem on multiple nonbinary trees
Given a finite set , a collection of rooted phylogenetic
trees on and an integer , the Hybridization Number problem asks if there
exists a phylogenetic network on that displays all trees from
and has reticulation number at most . We show two kernelization algorithms
for Hybridization Number, with kernel sizes and
respectively, with the number of input trees and their maximum
outdegree. Experiments on simulated data demonstrate the practical relevance of
these kernelization algorithms. In addition, we present an -time
algorithm, with and some computable function of
On Computing the Maximum Parsimony Score of a Phylogenetic Network
Phylogenetic networks are used to display the relationship of different
species whose evolution is not treelike, which is the case, for instance, in
the presence of hybridization events or horizontal gene transfers. Tree
inference methods such as Maximum Parsimony need to be modified in order to be
applicable to networks. In this paper, we discuss two different definitions of
Maximum Parsimony on networks, "hardwired" and "softwired", and examine the
complexity of computing them given a network topology and a character. By
exploiting a link with the problem Multicut, we show that computing the
hardwired parsimony score for 2-state characters is polynomial-time solvable,
while for characters with more states this problem becomes NP-hard but is still
approximable and fixed parameter tractable in the parsimony score. On the other
hand we show that, for the softwired definition, obtaining even weak
approximation guarantees is already difficult for binary characters and
restricted network topologies, and fixed-parameter tractable algorithms in the
parsimony score are unlikely. On the positive side we show that computing the
softwired parsimony score is fixed-parameter tractable in the level of the
network, a natural parameter describing how tangled reticulate activity is in
the network. Finally, we show that both the hardwired and softwired parsimony
score can be computed efficiently using Integer Linear Programming. The
software has been made freely available
Treewidth of display graphs: bounds, brambles and applications
Phylogenetic trees and networks are leaf-labelled graphs used to model evolution. Display graphs are created by identifying common leaf labels in two or more phylogenetic trees or networks. The treewidth of such graphs is bounded as a function of many common dissimilarity measures between phylogenetic trees and this has been leveraged in fixed parameter tractability results. Here we further elucidate the properties of display graphs and their interaction with treewidth. We show that it is NP-hard to recognize display graphs, but that display graphs of bounded treewidth can be recognized in linear time. Next we show that if a phylogenetic network displays (i.e. topologically embeds) a phylogenetic tree, the treewidth of their display graph is bounded by a function of the treewidth of the original network (and also by various other parameters). In fact, using a bramble argument we show that this treewidth bound is sharp up to an additive term of 1. We leverage this bound to give an FPT algorithm, parameterized by treewidth, for determining whether a network displays a tree, which is an intensively-studied problem in the field. We conclude with a discussion on the future use of display graphs and treewidth in phylogenetics
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
A tight kernel for computing the tree bisection and reconnection distance between two phylogenetic trees
In 2001 Allen and Steel showed that, if subtree and chain reduction rules
have been applied to two unrooted phylogenetic trees, the reduced trees will
have at most 28k taxa where k is the TBR (Tree Bisection and Reconnection)
distance between the two trees. Here we reanalyse Allen and Steel's
kernelization algorithm and prove that the reduced instances will in fact have
at most 15k-9 taxa. Moreover we show, by describing a family of instances which
have exactly 15k-9 taxa after reduction, that this new bound is tight. These
instances also have no common clusters, showing that a third
commonly-encountered reduction rule, the cluster reduction, cannot further
reduce the size of the kernel in the worst case. To achieve these results we
introduce and use "unrooted generators" which are analogues of rooted
structures that have appeared earlier in the phylogenetic networks literature.
Using similar argumentation we show that, for the minimum hybridization problem
on two rooted trees, 9k-2 is a tight bound (when subtree and chain reduction
rules have been applied) and 9k-4 is a tight bound (when, additionally, the
cluster reduction has been applied) on the number of taxa, where k is the
hybridization number of the two trees.Comment: One figure added, two small typos fixed. This version to appear in
SIDMA (SIAM Journal on Discrete Mathematics
Efficiency of Algorithms in Phylogenetics
Phylogenetics is the study of evolutionary relationships between species. Phylogenetic
trees have long been the standard object used in evolutionary biology to illustrate how a
given set of species are related. There are some groups (including certain plant and fish
species) for which the ancestral history contains reticulation events, caused by processes that
include hybridization, lateral gene transfer, and recombination. For such groups of species, it
is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic
digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least
two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and
tractability of mathematical models for phylogenetic network methods.
Three important and related measures for summarizing the dissimilarity in phylogenetic
trees are the minimum number of hybridization events required to fit two phylogenetic trees
onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune
and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the
TBR distance) between two phylogenetic trees. The respective problems of computing these
measures are known to be NP-hard, but also fixed-parameter tractable in their respective
natural parameters. This means that, while they are hard to compute in general, for cases
in which a parameter (here the hybridization number and rSPR/TBR distance, respectively)
is small, the problem can be solved efficiently even for large input trees. Here, we present
new analyses showing that the use of the âcluster reductionâ rule â already defined for the
hybridization number and the rSPR distance and introduced here for the TBR distance â can
transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time
one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter
and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic
network displaying both trees. These results appear in [9].
Traditional âdistance based methodsâ reconstruct a phylogenetic tree from a matrix of pairwise
distances between taxa. A phylogenetic network is a generalization of a phylogenetic
tree that can describe evolutionary events such as reticulation and hybridization that are not
tree-like. Although evolution has been known to be more accurately modelled by a network
than a tree for some time, only recently have efforts been made to directly reconstruct a
phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present
a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can
accurately reconstruct ultrametric tree-child networks from the set of distinct distances
between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the
safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2.
This means that if we can obtain accurate estimates of the set of distances between each pair
of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs
the true network
On Unrooted and Root-Uncertain Variants of Several Well-Known Phylogenetic Network Problems
International audienceThe hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To thisend we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an unrooted phylogenetic network displays (i.e. embeds) an unrooted phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the tree bisection and reconnect distance of the two unrooted trees. In the third part of the paper we consider the âroot uncertainâ variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees