3,763 research outputs found

    On unrooted and root-uncertain variants of several well-known phylogenetic network problems

    Get PDF
    The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

    A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees

    Get PDF
    Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work and are publicly available. We also apply our methods to real data

    A quadratic kernel for computing the hybridization number of multiple trees

    Full text link
    It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel

    TreeGrad: Transferring Tree Ensembles to Neural Networks

    Full text link
    Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

    Kernelizations for the hybridization number problem on multiple nonbinary trees

    Get PDF
    Given a finite set XX, a collection T\mathcal{T} of rooted phylogenetic trees on XX and an integer kk, the Hybridization Number problem asks if there exists a phylogenetic network on XX that displays all trees from T\mathcal{T} and has reticulation number at most kk. We show two kernelization algorithms for Hybridization Number, with kernel sizes 4k(5k)t4k(5k)^t and 20k2(Δ+−1)20k^2(\Delta^+-1) respectively, with tt the number of input trees and Δ+\Delta^+ their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an nf(k)tn^{f(k)}t-time algorithm, with n=∣X∣n=|X| and ff some computable function of kk

    Parameterized Complexity Dichotomy for Steiner Multicut

    Get PDF
    The Steiner Multicut problem asks, given an undirected graph G, terminals sets T1,...,Tt ⊆\subseteq V(G) of size at most p, and an integer k, whether there is a set S of at most k edges or nodes s.t. of each set Ti at least one pair of terminals is in different connected components of G \ S. This problem generalizes several graph cut problems, in particular the Multicut problem (the case p = 2), which is fixed-parameter tractable for the parameter k [Marx and Razgon, Bousquet et al., STOC 2011]. We provide a dichotomy of the parameterized complexity of Steiner Multicut. That is, for any combination of k, t, p, and the treewidth tw(G) as constant, parameter, or unbounded, and for all versions of the problem (edge deletion and node deletion with and without deletable terminals), we prove either that the problem is fixed-parameter tractable or that the problem is hard (W[1]-hard or even (para-)NP-complete). We highlight that: - The edge deletion version of Steiner Multicut is fixed-parameter tractable for the parameter k+t on general graphs (but has no polynomial kernel, even on trees). We present two proofs: one using the randomized contractions technique of Chitnis et al, and one relying on new structural lemmas that decompose the Steiner cut into important separators and minimal s-t cuts. - In contrast, both node deletion versions of Steiner Multicut are W[1]-hard for the parameter k+t on general graphs. - All versions of Steiner Multicut are W[1]-hard for the parameter k, even when p=3 and the graph is a tree plus one node. Hence, the results of Marx and Razgon, and Bousquet et al. do not generalize to Steiner Multicut. Since we allow k, t, p, and tw(G) to be any constants, our characterization includes a dichotomy for Steiner Multicut on trees (for tw(G) = 1), and a polynomial time versus NP-hardness dichotomy (by restricting k,t,p,tw(G) to constant or unbounded).Comment: As submitted to journal. This version also adds a proof of fixed-parameter tractability for parameter k+t using the technique of randomized contraction
    • …
    corecore