Search CORE

1,126 research outputs found

A quadratic kernel for computing the hybridization number of multiple trees

Author: Iersel
Leo Van
Simone Linz
Publication venue
Publication date: 01/03/2012
Field of study

It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

CWI's Institutional Repository

On unrooted and root-uncertain variants of several well-known phylogenetic network problems

Author: Boes Olivier
Kelk Steven
Stamoulis Georgios
Stougie Leen
van Iersel Leo
Publication venue
Publication date: 01/01/2016
Field of study

The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

arXiv.org e-Print Archive

Maastricht University Research Portal

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Kernelizations for the hybridization number problem on multiple nonbinary trees

Author: Kelk Steven
Scornavacca Celine
van Iersel Leo
Publication venue
Publication date: 22/03/2016
Field of study

Given a finite set

X

, a collection

\mathcal{T}

of rooted phylogenetic trees on

X

and an integer

k

, the Hybridization Number problem asks if there exists a phylogenetic network on

X

that displays all trees from

\mathcal{T}

and has reticulation number at most

k

. We show two kernelization algorithms for Hybridization Number, with kernel sizes

4k(5k)^t

and

20k^2(\Delta^+-1)

respectively, with

t

the number of input trees and

\Delta^+

their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an

n^{f(k)}t

-time algorithm, with

n=|X|

and

f

some computable function of

k

arXiv.org e-Print Archive

Maastricht University Research Portal

Regression approaches for Approximate Bayesian Computation

Author: Blum Michael GB
Publication venue
Publication date: 05/07/2017
Field of study

This book chapter introduces regression approaches and regression adjustment for Approximate Bayesian Computation (ABC). Regression adjustment adjusts parameter values after rejection sampling in order to account for the imperfect match between simulations and observations. Imperfect match between simulations and observations can be more pronounced when there are many summary statistics, a phenomenon coined as the curse of dimensionality. Because of this imperfect match, credibility intervals obtained with regression approaches can be inflated compared to true credibility intervals. The chapter presents the main concepts underlying regression adjustment. A theorem that compares theoretical properties of posterior distributions obtained with and without regression adjustment is presented. Last, a practical application of regression adjustment in population genetics shows that regression adjustment shrinks posterior distributions compared to rejection approaches, which is a solution to avoid inflated credibility intervals.Comment: Book chapter, published in Handbook of Approximate Bayesian Computation 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Computing Maximum Agreement Forests without Cluster Partitioning is Folly

Author: Li Zhijiang
Zeh Norbert
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance

Dagstuhl Research Online Publication Server

On unrooted and root-uncertain variants of several well-known phylogenetic network problems

Author: Boes O. (Olivier)
Iersel L.J.J. (Leo) van
Kelk S.M. (Steven)
Stamoulis G. (Georgios)
Stougie L. (Leen)
Publication venue
Publication date: 02/09/2016
Field of study

CWI's Institutional Repository

A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

Author: Fokoue Ernest
Publication venue
Publication date: 01/01/2014
Field of study

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

arXiv.org e-Print Archive

Bulgarian Digital Mathematics Library at IMI-BAS

On Unrooted and Root-Uncertain Variants of Several Well-Known Phylogenetic Network Problems

Author: A Schrijver
Andreas DM Gunawan
BL Allen
BT Drew
C Semple
C Whidden
C Whidden
C Whidden
CH Papadimitriou
D Richards
DA Morrison
DH Huson
DH Huson
DH Huson
EW Wilberg
Georgios Stamoulis
I Kanj
J Chen
J Chen
J Gramm
JF Wendel
Leen Stougie
Leo van Iersel
LJJ Iersel van
LJJ Iersel van
LJJ Iersel van
LJJ Iersel van
M Baroni
M Bordewich
M Bordewich
M Bordewich
M Bordewich
M Bordewich
Olivier Boes
P Gambette
RG Downey
S Kelk
S Kelk
Steven Kelk
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref