23 research outputs found
New Results on Optimizing Rooted Triplets Consistency
A set of phylogenetic trees with overlapping leaf sets is consistent if it can be merged without conflicts into a supertree. In this paper, we study the polynomial-time approximability of two related optimization problems called the maximum rooted triplets consistency problem (\textsc{MaxRTC}) and the minimum rooted triplets inconsistency problem (\textsc{MinRTI}) in which the input is a set of rooted triplets, and where the objectives are to find a largest cardinality subset of which is consistent and a smallest cardinality subset of whose removal from results in a consistent set, respectively. We first show that a simple modification to Wu’s Best-Pair-Merge-First heuristic [25] results in a bottom-up-based 3-approximation for \textsc{MaxRTC}. We then demonstrate how any approximation algorithm for \textsc{MinRTI} could be used to approximate \textsc{MaxRTC}, and thus obtain the first polynomial-time approximation algorithm for \textsc{MaxRTC} with approximation ratio smaller than 3. Next, we prove that f
Optimizing Phylogenetic Supertrees Using Answer Set Programming
The supertree construction problem is about combining several phylogenetic
trees with possibly conflicting information into a single tree that has all the
leaves of the source trees as its leaves and the relationships between the
leaves are as consistent with the source trees as possible. This leads to an
optimization problem that is computationally challenging and typically
heuristic methods, such as matrix representation with parsimony (MRP), are
used. In this paper we consider the use of answer set programming to solve the
supertree construction problem in terms of two alternative encodings. The first
is based on an existing encoding of trees using substructures known as
quartets, while the other novel encoding captures the relationships present in
trees through direct projections. We use these encodings to compute a
genus-level supertree for the family of cats (Felidae). Furthermore, we compare
our results to recent supertrees obtained by the MRP method.Comment: To appear in Theory and Practice of Logic Programming (TPLP),
Proceedings of ICLP 201
Trinets encode tree-child and level-2 phylogenetic networks
Phylogenetic networks generalize evolutionary trees, and are commonly used to
represent evolutionary histories of species that undergo reticulate
evolutionary processes such as hybridization, recombination and lateral gene
transfer. Recently, there has been great interest in trying to develop methods
to construct rooted phylogenetic networks from triplets, that is rooted trees
on three species. However, although triplets determine or encode rooted
phylogenetic trees, they do not in general encode rooted phylogenetic networks,
which is a potential issue for any such method. Motivated by this fact, Huber
and Moulton recently introduced trinets as a natural extension of rooted
triplets to networks. In particular, they showed that level-1 phylogenetic
networks are encoded by their trinets, and also conjectured that all
"recoverable" rooted phylogenetic networks are encoded by their trinets. Here
we prove that recoverable binary level-2 networks and binary tree-child
networks are also encoded by their trinets. To do this we prove two
decomposition theorems based on trinets which hold for all recoverable binary
rooted phylogenetic networks. Our results provide some additional evidence in
support of the conjecture that trinets encode all recoverable rooted
phylogenetic networks, and could also lead to new approaches to construct
phylogenetic networks from trinets
Binets: fundamental building blocks for phylogenetic networks
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More speci_cally, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known Graph Isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to _nd a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks
Phylogenetics from paralogs
Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty.
Results: We demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.
Availability:The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX (TM) Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphyl
A Revenue Function for Comparison-Based Hierarchical Clustering
Comparison-based learning addresses the problem of learning when, instead of
explicit features or pairwise similarities, one only has access to comparisons
of the form: \emph{Object is more similar to than to .} Recently, it
has been shown that, in Hierarchical Clustering, single and complete linkage
can be directly implemented using only such comparisons while several
algorithms have been proposed to emulate the behaviour of average linkage.
Hence, finding hierarchies (or dendrograms) using only comparisons is a well
understood problem. However, evaluating their meaningfulness when no
ground-truth nor explicit similarities are available remains an open question.
In this paper, we bridge this gap by proposing a new revenue function that
allows one to measure the goodness of dendrograms using only comparisons. We
show that this function is closely related to Dasgupta's cost for hierarchical
clustering that uses pairwise similarities. On the theoretical side, we use the
proposed revenue function to resolve the open problem of whether one can
approximately recover a latent hierarchy using few triplet comparisons. On the
practical side, we present principled algorithms for comparison-based
hierarchical clustering based on the maximisation of the revenue and we
empirically compare them with existing methods.Comment: 26 pages, 6 figures, 5 tables. Transactions on Machine Learning
Research (2023