5,310 research outputs found
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
On Computing the Maximum Parsimony Score of a Phylogenetic Network
Phylogenetic networks are used to display the relationship of different
species whose evolution is not treelike, which is the case, for instance, in
the presence of hybridization events or horizontal gene transfers. Tree
inference methods such as Maximum Parsimony need to be modified in order to be
applicable to networks. In this paper, we discuss two different definitions of
Maximum Parsimony on networks, "hardwired" and "softwired", and examine the
complexity of computing them given a network topology and a character. By
exploiting a link with the problem Multicut, we show that computing the
hardwired parsimony score for 2-state characters is polynomial-time solvable,
while for characters with more states this problem becomes NP-hard but is still
approximable and fixed parameter tractable in the parsimony score. On the other
hand we show that, for the softwired definition, obtaining even weak
approximation guarantees is already difficult for binary characters and
restricted network topologies, and fixed-parameter tractable algorithms in the
parsimony score are unlikely. On the positive side we show that computing the
softwired parsimony score is fixed-parameter tractable in the level of the
network, a natural parameter describing how tangled reticulate activity is in
the network. Finally, we show that both the hardwired and softwired parsimony
score can be computed efficiently using Integer Linear Programming. The
software has been made freely available
When two trees go to war
Rooted phylogenetic networks are often constructed by combining trees,
clusters, triplets or characters into a single network that in some
well-defined sense simultaneously represents them all. We review these four
models and investigate how they are related. In general, the model chosen
influences the minimum number of reticulation events required. However, when
one obtains the input data from two binary trees, we show that the minimum
number of reticulations is independent of the model. The number of
reticulations necessary to represent the trees, triplets, clusters (in the
softwired sense) and characters (with unrestricted multiple crossover
recombination) are all equal. Furthermore, we show that these results also hold
when not the number of reticulations but the level of the constructed network
is minimised. We use these unification results to settle several complexity
questions that have been open in the field for some time. We also give explicit
examples to show that already for data obtained from three binary trees the
models begin to diverge
Convex Graph Invariant Relaxations For Graph Edit Distance
The edit distance between two graphs is a widely used measure of similarity
that evaluates the smallest number of vertex and edge deletions/insertions
required to transform one graph to another. It is NP-hard to compute in
general, and a large number of heuristics have been proposed for approximating
this quantity. With few exceptions, these methods generally provide upper
bounds on the edit distance between two graphs. In this paper, we propose a new
family of computationally tractable convex relaxations for obtaining lower
bounds on graph edit distance. These relaxations can be tailored to the
structural properties of the particular graphs via convex graph invariants.
Specific examples that we highlight in this paper include constraints on the
graph spectrum as well as (tractable approximations of) the stability number
and the maximum-cut values of graphs. We prove under suitable conditions that
our relaxations are tight (i.e., exactly compute the graph edit distance) when
one of the graphs consists of few eigenvalues. We also validate the utility of
our framework on synthetic problems as well as real applications involving
molecular structure comparison problems in chemistry.Comment: 27 pages, 7 figure
- …