3,763 research outputs found
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees
Reticulate events play an important role in determining evolutionary
relationships. The problem of computing the minimum number of such events to
explain discordance between two phylogenetic trees is a hard computational
problem. Even for binary trees, exact solvers struggle to solve instances with
reticulation number larger than 40-50. Here we present CycleKiller and
NonbinaryCycleKiller, the first methods to produce solutions verifiably close
to optimality for instances with hundreds or even thousands of reticulations.
Using simulations, we demonstrate that these algorithms run quickly for large
and difficult instances, producing solutions that are very close to optimality.
As a spin-off from our simulations we also present TerminusEst, which is the
fastest exact method currently available that can handle nonbinary trees: this
is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All
three methods are based on extensions of previous theoretical work and are
publicly available. We also apply our methods to real data
A quadratic kernel for computing the hybridization number of multiple trees
It has recently been shown that the NP-hard problem of calculating the
minimum number of hybridization events that is needed to explain a set of
rooted binary phylogenetic trees by means of a hybridization network is
fixed-parameter tractable if an instance of the problem consists of precisely
two such trees. In this paper, we show that this problem remains
fixed-parameter tractable for an arbitrarily large set of rooted binary
phylogenetic trees. In particular, we present a quadratic kernel
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
Kernelizations for the hybridization number problem on multiple nonbinary trees
Given a finite set , a collection of rooted phylogenetic
trees on and an integer , the Hybridization Number problem asks if there
exists a phylogenetic network on that displays all trees from
and has reticulation number at most . We show two kernelization algorithms
for Hybridization Number, with kernel sizes and
respectively, with the number of input trees and their maximum
outdegree. Experiments on simulated data demonstrate the practical relevance of
these kernelization algorithms. In addition, we present an -time
algorithm, with and some computable function of
Parameterized Complexity Dichotomy for Steiner Multicut
The Steiner Multicut problem asks, given an undirected graph G, terminals
sets T1,...,Tt V(G) of size at most p, and an integer k, whether
there is a set S of at most k edges or nodes s.t. of each set Ti at least one
pair of terminals is in different connected components of G \ S. This problem
generalizes several graph cut problems, in particular the Multicut problem (the
case p = 2), which is fixed-parameter tractable for the parameter k [Marx and
Razgon, Bousquet et al., STOC 2011].
We provide a dichotomy of the parameterized complexity of Steiner Multicut.
That is, for any combination of k, t, p, and the treewidth tw(G) as constant,
parameter, or unbounded, and for all versions of the problem (edge deletion and
node deletion with and without deletable terminals), we prove either that the
problem is fixed-parameter tractable or that the problem is hard (W[1]-hard or
even (para-)NP-complete). We highlight that:
- The edge deletion version of Steiner Multicut is fixed-parameter tractable
for the parameter k+t on general graphs (but has no polynomial kernel, even on
trees). We present two proofs: one using the randomized contractions technique
of Chitnis et al, and one relying on new structural lemmas that decompose the
Steiner cut into important separators and minimal s-t cuts.
- In contrast, both node deletion versions of Steiner Multicut are W[1]-hard
for the parameter k+t on general graphs.
- All versions of Steiner Multicut are W[1]-hard for the parameter k, even
when p=3 and the graph is a tree plus one node. Hence, the results of Marx and
Razgon, and Bousquet et al. do not generalize to Steiner Multicut.
Since we allow k, t, p, and tw(G) to be any constants, our characterization
includes a dichotomy for Steiner Multicut on trees (for tw(G) = 1), and a
polynomial time versus NP-hardness dichotomy (by restricting k,t,p,tw(G) to
constant or unbounded).Comment: As submitted to journal. This version also adds a proof of
fixed-parameter tractability for parameter k+t using the technique of
randomized contraction
- …