Search CORE

17 research outputs found

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Author: Kao Ming-Yang
Lam Tak-Wah
Sung Wing-Kin
Ting Hing-Fung
Publication venue
Publication date: 01/01/2001
Field of study

A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.Comment: To appear in Journal of Algorithm

arXiv.org e-Print Archive

HKU Scholars Hub

Analyzing the Flow of Information from Initial Publishing to Wikipedia

Author: Villanueva Nathan T
Publication venue
Publication date: 23/05/2018
Field of study

This thesis covers my efforts at researching the factors that lead to a research paper being cited by Wikipedia. Wikipedia is one of the most popular websites on the internet for quickly learning about a specific topic. It achieved this by being able to back up its claims with cited sources, many of which are research papers. I wanted to see exactly how those papers were found by Wikipedia’s editors when they write the articles. To do this, I gathered thousands of computer science research papers from arXiv.org, as well as a selection of papers that were cited by Wikipedia, so that I could examine those papers and see what made them visible and attractive to the Wikipedia editors. After I gathered the information on how and when these papers are cited, I ran a series of tests on them to learn as much as I could about what causes a paper to be cited by Wikipedia. I discovered that papers that are cited by Wikipedia tend to be more popular than papers which are not cited by Wikipedia even before they are cited but getting cited by Wikipedia can result in a boost in popularity. Wikipedia editors also tend to choose papers that either showcase a creation of the author(s) or give a general overview on a topic. I also discovered one paper that was likely added to Wikipedia by the author in an attempt at increased visibility

Texas A&M Repository

The generalized Robinson-Foulds metric

Author: B.L. Allen
C. Finden
D. Bogdanowicz
D.E. Critchlow
D.F. Robinson
K. Dabrowski
L.A. Lewis
M. Deza
M.-Y. Kao
M.S. Bansal
O. Dubois
R.G. Downey
S.-J. Sul
T. Griebel
T. Munzner
T.M.W. Nye
Y. Lin
Publication venue
Publication date: 01/01/2013
Field of study

The Robinson-Foulds (RF) metric is arguably the most widely used measure of phylogenetic tree similarity, despite its well-known shortcomings: For example, moving a single taxon in a tree can result in a tree that has maximum distance to the original one; but the two trees are identical if we remove the single taxon. To this end, we propose a natural extension of the RF metric that does not simply count identical clades but instead, also takes similar clades into consideration. In contrast to previous approaches, our model requires the matching between clades to respect the structure of the two trees, a property that the classical RF metric exhibits, too. We show that computing this generalized RF metric is, unfortunately, NP-hard. We then present a simple Integer Linear Program for its computation, and evaluate it by an all-against-all comparison of 100 trees from a benchmark data set. We find that matchings that respect the tree structure differ significantly from those that do not, underlining the importance of this natural condition.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

Analyzing the Flow of Information from Initial Publishing to Wikipedia

Author: Villanueva Nathan T
Publication venue
Publication date: 23/05/2018
Field of study

Texas A&M Repository

Fixed Parameter Polynomial Time Algorithms for Maximum Agreement and Compatible Supertrees

Author: Hoang Viet Tung
Sung Wing-Kin
Publication venue
Publication date: 01/01/2008
Field of study

Consider a set of labels

L

and a set of trees {\mathcal T} = \{{\mathcal T}^{(1), {\mathcal T}^{(2), ..., {\mathcal T}^{(k) \$ where each tree {\mathcal T}^{(i)

is distinctly leaf-labeled by some subset of

. One fundamental problem is to find the biggest tree (denoted as supertree) to represent

\mathcal T}

which minimizes the disagreements with the trees in

{\mathcal T}

under certain criteria. This problem finds applications in phylogenetics, database, and data mining. In this paper, we focus on two particular supertree problems, namely, the maximum agreement supertree problem (MASP) and the maximum compatible supertree problem (MCSP). These two problems are known to be NP-hard for

k \geq 3

. This paper gives the first polynomial time algorithms for both MASP and MCSP when both

and the maximum degree

D$ of the trees are constant

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

ScholarBank@NUS

Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

Author: Droschinsky Andre
Kriege Nils M.
Mutzel Petra
Publication venue
Publication date: 01/01/2016
Field of study

The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is

{\sf NP}

-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order

n

and maximum degree

\Delta

our algorithm achieves a running time of

\mathcal{O}(n^2\Delta)

by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from

\mathcal{O}(n^6+Tn^2)

\mathcal{O}(n^3+Tn\Delta)

, where

n

is the order of the larger tree,

T

is the number of different solutions, and

\Delta

is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Enumerating All Maximal Frequent Subtrees

Author: Deepak Akshay
Fernández-Baca David
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2012
Field of study

Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identifies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by denition and reveal additional subtrees not displayed by MAST. This can happen in tow ways - such a subtree is included in majority but not all of the input trees or such a subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collection o ftrees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though very useful, the number of FSTs suffer from combinatorial explosion - just a single enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any othe rFST. the set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web

Digital Repository @ Iowa State University (ISU)

Faster Algorithms for Semi-Matching Problems

Author: Fakcharoenphol Jittat
Laekhanukit Bundit
Nanongkai Danupon
Publication venue
Publication date: 12/06/2012
Field of study

We consider the problem of finding \textit{semi-matching} in bipartite graphs which is also extensively studied under various names in the scheduling literature. We give faster algorithms for both weighted and unweighted case. For the weighted case, we give an

O(nm\log n)

-time algorithm, where

n

is the number of vertices and

m

is the number of edges, by exploiting the geometric structure of the problem. This improves the classical

O(n^3)

algorithms by Horn [Operations Research 1973] and Bruno, Coffman and Sethi [Communications of the ACM 1974]. For the unweighted case, the bound could be improved even further. We give a simple divide-and-conquer algorithm which runs in

O(\sqrt{n}m\log n)

time, improving two previous

O(nm)

-time algorithms by Abraham [MSc thesis, University of Glasgow 2003] and Harvey, Ladner, Lov\'asz and Tamir [WADS 2003 and Journal of Algorithms 2006]. We also extend this algorithm to solve the \textit{Balance Edge Cover} problem in

O(\sqrt{n}m\log n)

time, improving the previous

O(nm)

-time algorithm by Harada, Ono, Sadakane and Yamashita [ISAAC 2008].Comment: ICALP 201

arXiv.org e-Print Archive

CiteSeerX

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Author: Ahuja
Chung
Cole
Cormen
Farach
Farach
Finden
Friedman
Gabow
Gupta
Gusfield
Hillis
Hing-Fung Ting
Kao
Kao
Kilpeläinen
Kimia
Le
Mannila
Materna
Ming-Yang Kao
Przytycka
Shapiro
Steel
Tak-Wah Lam
Takahashi
Tokuyama
Tokuyama
Wing-Kin Sung
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref