268 research outputs found
Maximum agreement and compatible supertrees
AbstractGiven a set of leaf-labelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have non-identical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT.A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size.It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.SMAST is shown to be W[2]-hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A similar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, cannot be approximated in polynomial time within any constant factor, unless P=NP. These results also hold when the input trees have a bounded number of leaves.The presented results apply to both collections of rooted and unrooted trees
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
We give a 2-approximation algorithm for the Maximum Agreement Forest problem
on two rooted binary trees. This NP-hard problem has been studied extensively
in the past two decades, since it can be used to compute the rooted Subtree
Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm
is combinatorial and its running time is quadratic in the input size. To prove
the approximation guarantee, we construct a feasible dual solution for a novel
linear programming formulation. In addition, we show this linear program is
stronger than previously known formulations, and we give a compact formulation,
showing that it can be solved in polynomial tim
Relaxed Agreement Forests
There are multiple factors which can cause the phylogenetic inference process
to produce two or more conflicting hypotheses of the evolutionary history of a
set X of biological entities. That is: phylogenetic trees with the same set of
leaf labels X but with distinct topologies. This leads naturally to the goal of
quantifying the difference between two such trees T_1 and T_2. Here we
introduce the problem of computing a 'maximum relaxed agreement forest' (MRAF)
and use this as a proxy for the dissimilarity of T_1 and T_2, which in this
article we assume to be unrooted binary phylogenetic trees. MRAF asks for a
partition of the leaf labels X into a minimum number of blocks S_1, S_2, ...
S_k such that for each i, the subtrees induced in T_1 and T_2 by S_i are
isomorphic up to suppression of degree-2 nodes and taking the labels X into
account. Unlike the earlier introduced maximum agreement forest (MAF) model,
the subtrees induced by the S_i are allowed to overlap. We prove that it is
NP-hard to compute MRAF, by reducing from the problem of partitioning a
permutation into a minimum number of monotonic subsequences (PIMS).
Furthermore, we show that MRAF has a polynomial time O(log n)-approximation
algorithm where n=|X| and permits exact algorithms with single-exponential
running time. When at least one of the two input trees has a caterpillar
topology, we prove that testing whether a MRAF has size at most k can be
answered in polynomial time when k is fixed. We also note that on two
caterpillars the approximability of MRAF is related to that of PIMS. Finally,
we establish a number of bounds on MRAF, compare its behaviour to MAF both in
theory and in an experimental setting and discuss a number of open problems.Comment: 14 pages plus appendi
A parsimony-based metric for phylogenetic trees
In evolutionary biology various metrics have been defined and studied for comparing phylogenetic trees. Such metrics are used, for example, to compare competing evolutionary hypotheses or to help organize algorithms that search for optimal trees. Here we introduce a new metric dpdp on the collection of binary phylogenetic trees each labeled by the same set of species. The metric is based on the so-called parsimony score, an important concept in phylogenetics that is commonly used to construct phylogenetic trees. Our main results include a characterization of the unit neighborhood of a tree in the dpdp metric, and an explicit formula for its diameter, that is, a formula for the maximum possible value of dpdp over all possible pairs of trees labeled by the same set of species. We also show that dpdp is closely related to the well-known tree bisection and reconnection (tbr) and subtree prune and regraft (spr) distances, a connection which will hopefully provide a useful new approach to understanding properties of these and related metrics
Cycle killer... qu'est-ce que c'est? On the comparative approximability of hybridization number and directed feedback vertex set
We show that the problem of computing the hybridization number of two rooted
binary phylogenetic trees on the same set of taxa X has a constant factor
polynomial-time approximation if and only if the problem of computing a
minimum-size feedback vertex set in a directed graph (DFVS) has a constant
factor polynomial-time approximation. The latter problem, which asks for a
minimum number of vertices to be removed from a directed graph to transform it
into a directed acyclic graph, is one of the problems in Karp's seminal 1972
list of 21 NP-complete problems. However, despite considerable attention from
the combinatorial optimization community it remains to this day unknown whether
a constant factor polynomial-time approximation exists for DFVS. Our result
thus places the (in)approximability of hybridization number in a much broader
complexity context, and as a consequence we obtain that hybridization number
inherits inapproximability results from the problem Vertex Cover. On the
positive side, we use results from the DFVS literature to give an O(log r log
log r) approximation for hybridization number, where r is the value of an
optimal solution to the hybridization number problem
A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
We give a 2-approximation algorithm for the Maximum Agreement Forest problem
on two rooted binary trees. This NP-hard problem has been studied extensively
in the past two decades, since it can be used to compute the Subtree
Prune-and-Regraft (SPR) distance between two phylogenetic trees. Our result
improves on the very recent 2.5-approximation algorithm due to Shi, Feng, You
and Wang (2015). Our algorithm is the first approximation algorithm for this
problem that uses LP duality in its analysis
The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics
A multilabeled tree (or MUL-tree) is a rooted tree in which every leaf is labelled by an element from some set, but in which more than one leaf may be labelled by the same element of that set. In phylogenetics, such trees are used in biogeographical studies, to study the evolution of gene families, and also within approaches to construct phylogenetic networks. A multilabelled tree in which no leaf-labels are repeated is called a phylogenetic tree, and one in which every label is the same is also known as a tree-shape. In this paper, we consider the complexity of computing metrics on MUL-trees that are obtained by extending metrics on phylogenetic trees. In particular, by restricting our attention to tree shapes, we show that computing the metric extension on MUL-trees is NP-complete for two well-known metrics on phylogenetic trees, namely, the path-difference and Robinson Foulds distances. We also show that the extension of the Robinson Foulds distance is fixed parameter tractable with respect to the distance parameter. The path distance complexity result allows us to also answer an open problem concerning the complexity of solving the quadratic assignment problem for two matrices that are a Robinson similarity and a Robinson dissimilarity, which we show to be NP-complete. We conclude by considering the maximum agreement subtree (MAST) distance on phylogenetic trees to MUL-trees. Although its extension to MUL-trees can be computed in polynomial time, we show that computing its natural generalization to more than two MUL-trees is NP-complete, although fixed-parameter tractable in the maximum degree when the number of given trees is bounded
- …