Search CORE

3,763 research outputs found

On unrooted and root-uncertain variants of several well-known phylogenetic network problems

Author: Boes Olivier
Kelk Steven
Stamoulis Georgios
Stougie Leen
van Iersel Leo
Publication venue
Publication date: 01/01/2016
Field of study

The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

arXiv.org e-Print Archive

Maastricht University Research Portal

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees

Author: Kelk Steven
Lekić Nela
Scornavacca Celine
van Iersel Leo
Publication venue
Publication date: 01/01/2014
Field of study

Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work and are publicly available. We also apply our methods to real data

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

CWI's Institutional Repository

A quadratic kernel for computing the hybridization number of multiple trees

Author: Iersel
Leo Van
Simone Linz
Publication venue
Publication date: 01/03/2012
Field of study

It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

Crossref

CWI's Institutional Repository

TreeGrad: Transferring Tree Ensembles to Neural Networks

Author: C Siu
DH Wolpert
F Pedregosa
JA Blackard
JH Friedman
K Nakai
L Breiman
SK Murthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2019
Field of study

Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

arXiv.org e-Print Archive

Crossref

Kernelizations for the hybridization number problem on multiple nonbinary trees

Author: Kelk Steven
Scornavacca Celine
van Iersel Leo
Publication venue
Publication date: 22/03/2016
Field of study

Given a finite set

X

, a collection

\mathcal{T}

of rooted phylogenetic trees on

X

and an integer

k

, the Hybridization Number problem asks if there exists a phylogenetic network on

X

that displays all trees from

\mathcal{T}

and has reticulation number at most

k

. We show two kernelization algorithms for Hybridization Number, with kernel sizes

4k(5k)^t

and

20k^2(\Delta^+-1)

respectively, with

t

the number of input trees and

\Delta^+

their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an

n^{f(k)}t

-time algorithm, with

n=|X|

and

f

some computable function of

k

arXiv.org e-Print Archive

Maastricht University Research Portal

Parameterized Complexity Dichotomy for Steiner Multicut

Author: Bringmann Karl
Hermelin Danny
Mnich Matthias
van Leeuwen Erik Jan
Publication venue
Publication date: 01/01/2014
Field of study

The Steiner Multicut problem asks, given an undirected graph G, terminals sets T1,...,Tt

\subseteq

V(G) of size at most p, and an integer k, whether there is a set S of at most k edges or nodes s.t. of each set Ti at least one pair of terminals is in different connected components of G \ S. This problem generalizes several graph cut problems, in particular the Multicut problem (the case p = 2), which is fixed-parameter tractable for the parameter k [Marx and Razgon, Bousquet et al., STOC 2011]. We provide a dichotomy of the parameterized complexity of Steiner Multicut. That is, for any combination of k, t, p, and the treewidth tw(G) as constant, parameter, or unbounded, and for all versions of the problem (edge deletion and node deletion with and without deletable terminals), we prove either that the problem is fixed-parameter tractable or that the problem is hard (W[1]-hard or even (para-)NP-complete). We highlight that: - The edge deletion version of Steiner Multicut is fixed-parameter tractable for the parameter k+t on general graphs (but has no polynomial kernel, even on trees). We present two proofs: one using the randomized contractions technique of Chitnis et al, and one relying on new structural lemmas that decompose the Steiner cut into important separators and minimal s-t cuts. - In contrast, both node deletion versions of Steiner Multicut are W[1]-hard for the parameter k+t on general graphs. - All versions of Steiner Multicut are W[1]-hard for the parameter k, even when p=3 and the graph is a tree plus one node. Hence, the results of Marx and Razgon, and Bousquet et al. do not generalize to Steiner Multicut. Since we allow k, t, p, and tw(G) to be any constants, our characterization includes a dichotomy for Steiner Multicut on trees (for tw(G) = 1), and a polynomial time versus NP-hardness dichotomy (by restricting k,t,p,tw(G) to constant or unbounded).Comment: As submitted to journal. This version also adds a proof of fixed-parameter tractability for parameter k+t using the technique of randomized contraction

arXiv.org e-Print Archive

Maastricht University Research Portal

CiteSeerX

Dagstuhl Research Online Publication Server

MPG.PuRe