30,589 research outputs found
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees
Reticulate events play an important role in determining evolutionary
relationships. The problem of computing the minimum number of such events to
explain discordance between two phylogenetic trees is a hard computational
problem. Even for binary trees, exact solvers struggle to solve instances with
reticulation number larger than 40-50. Here we present CycleKiller and
NonbinaryCycleKiller, the first methods to produce solutions verifiably close
to optimality for instances with hundreds or even thousands of reticulations.
Using simulations, we demonstrate that these algorithms run quickly for large
and difficult instances, producing solutions that are very close to optimality.
As a spin-off from our simulations we also present TerminusEst, which is the
fastest exact method currently available that can handle nonbinary trees: this
is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All
three methods are based on extensions of previous theoretical work and are
publicly available. We also apply our methods to real data
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
Recommended from our members
Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme
The controlled outflows from a reservoir or dam are highly dependent on the decisions made by the reservoir operators, instead of a natural hydrological process. Difference exists between the natural upstream inflows to reservoirs and the controlled outflows from reservoirs that supply the downstream users. With the decision maker's awareness of changing climate, reservoir management requires adaptable means to incorporate more information into decision making, such as water delivery requirement, environmental constraints, dry/wet conditions, etc. In this paper, a robust reservoir outflow simulation model is presented, which incorporates one of the well-developed data-mining models (Classification and Regression Tree) to predict the complicated human-controlled reservoir outflows and extract the reservoir operation patterns. A shuffled cross-validation approach is further implemented to improve CART's predictive performance. An application study of nine major reservoirs in California is carried out. Results produced by the enhanced CART, original CART, and random forest are compared with observation. The statistical measurements show that the enhanced CART and random forest overperform the CART control run in general, and the enhanced CART algorithm gives a better predictive performance over random forest in simulating the peak flows. The results also show that the proposed model is able to consistently and reasonably predict the expert release decisions. Experiments indicate that the release operation in the Oroville Lake is significantly dominated by SWP allocation amount and reservoirs with low elevation are more sensitive to inflow amount than others
- …