Computing Robinson-Foulds supertree for two trees

Abstract

Supertree problems are important in phylogeny estimation. Supertree construction takes in a set of input trees on subsets of species and aims to find a supertree containing all species subjective to some combinatorial or statistical criterion. As such, it can be used to combine trees estimated by different research projects, or to construct species trees from gene trees that may not contain all species, or to serve a part in divide-and-conquer pipelines that improve the scalability of large scale phylogeny estimation. Yet the most promising supertree methods, such as the popular Robinson-Foulds Supertree (RFS) methods, not only cannot guarantee an optimal solution but also are computationally intensive by themselves, as they are heuristics for NP-hard optimization problems. We present the first polynomial time algorithm to exactly solve the RFS problem on two binary input trees, and prove that finding the Robinson-Foulds Supertree of three input trees is NP-hard. We present GreedyRFS, a greedy heuristic for the Robinson-Foulds Supertree problem that operates by using our exact algorithm for RFS on pairs of trees, until all the trees are merged into a single supertree. Our experiments show that GreedyRFS has better accuracy than FastRFS, the leading heuristic for RFS, when the number of input trees is small, which is the natural case for use within divide-and-conquer pipelines

    Similar works