Search CORE

1,253 research outputs found

Kernelizations for the hybridization number problem on multiple nonbinary trees

Author: Kelk Steven
Scornavacca Celine
van Iersel Leo
Publication venue
Publication date: 22/03/2016
Field of study

Given a finite set

X

, a collection

\mathcal{T}

of rooted phylogenetic trees on

X

and an integer

k

, the Hybridization Number problem asks if there exists a phylogenetic network on

X

that displays all trees from

\mathcal{T}

and has reticulation number at most

k

. We show two kernelization algorithms for Hybridization Number, with kernel sizes

4k(5k)^t

and

20k^2(\Delta^+-1)

respectively, with

t

the number of input trees and

\Delta^+

their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an

n^{f(k)}t

-time algorithm, with

n=|X|

and

f

some computable function of

k

arXiv.org e-Print Archive

Maastricht University Research Portal

On unrooted and root-uncertain variants of several well-known phylogenetic network problems

Author: Boes Olivier
Kelk Steven
Stamoulis Georgios
Stougie Leen
van Iersel Leo
Publication venue
Publication date: 01/01/2016
Field of study

The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

arXiv.org e-Print Archive

Maastricht University Research Portal

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

A quadratic kernel for computing the hybridization number of multiple trees

Author: Iersel
Leo Van
Simone Linz
Publication venue
Publication date: 01/03/2012
Field of study

It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

CWI's Institutional Repository

A tree-based kernel for graphs with continuous attributes

Author: Martino Giovanni Da San
Navarin Nicolò
Sperduti Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/12/2016
Field of study

The availability of graph data with node attributes that can be either discrete or real-valued is constantly increasing. While existing kernel methods are effective techniques for dealing with graphs having discrete node labels, their adaptation to non-discrete or continuous node attributes has been limited, mainly for computational issues. Recently, a few kernels especially tailored for this domain, and that trade predictive performance for computational efficiency, have been proposed. In this paper, we propose a graph kernel for complex and continuous nodes' attributes, whose features are tree structures extracted from specific graph visits. The kernel manages to keep the same complexity of state-of-the-art kernels while implicitly using a larger feature space. We further present an approximated variant of the kernel which reduces its complexity significantly. Experimental results obtained on six real-world datasets show that the kernel is the best performing one on most of them. Moreover, in most cases the approximated version reaches comparable performances to current state-of-the-art kernels in terms of classification accuracy while greatly shortening the running times.Comment: This work has been submitted to the IEEE Transactions on Neural Networks and Learning Systems for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Discourse Structure in Machine Translation Evaluation

Author: Guzmán Francisco
Joty Shafiq
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 01/01/2017
Field of study

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

Pattern discovery in structural databases with applications to bioinformatics

Author: Zhang Sen
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2005
Field of study

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others. The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. This technique has been applied to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. The technique is also extended to undirected acyclic graphs (or free trees). The second FSM technique extends traditional MAST (maximum agreement subtree) algorithms by employing the Apriori data mining technique to find frequent agreement subtrees in multiple phylogenies. The correctness and completeness of the new mining algorithm are presented. The method is also extended to unrooted phylogenetic trees. Both FSM techniques studied in the thesis have been implemented into a toolkit, which is fully operational and accessible on the World Wide Web

Digital Commons @ New Jersey Institute of Technology (NJIT)

A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees

Author: Kelk Steven
Piovesan Teresa
Publication venue
Publication date: 01/01/2012
Field of study

Here we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on X, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem

arXiv.org e-Print Archive

Maastricht University Research Portal

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications