10 research outputs found

    A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures

    Get PDF
    [Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request

    Phylogenetics from paralogs

    Get PDF
    Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty. Results: We demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements. Availability:The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX (TM) Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphyl

    Reconstructing Gene Trees From Fitch's Xenology Relation

    Full text link
    Two genes are xenologs in the sense of Fitch if they are separated by at least one horizontal gene transfer event. Horizonal gene transfer is asymmetric in the sense that the transferred copy is distinguished from the one that remains within the ancestral lineage. Hence xenology is more precisely thought of as a non-symmetric relation: yy is xenologous to xx if yy has been horizontally transferred at least once since it diverged from the least common ancestor of xx and yy. We show that xenology relations are characterized by a small set of forbidden induced subgraphs on three vertices. Furthermore, each xenology relation can be derived from a unique least-resolved edge-labeled phylogenetic tree. We provide a linear-time algorithm for the recognition of xenology relations and for the construction of its least-resolved edge-labeled phylogenetic tree. The fact that being a xenology relation is a heritable graph property, finally has far-reaching consequences on approximation problems associated with xenology relations

    Maximum agreement and compatible supertrees

    Get PDF
    AbstractGiven a set of leaf-labelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have non-identical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT.A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size.It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.SMAST is shown to be W[2]-hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A similar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, cannot be approximated in polynomial time within any constant factor, unless P=NP. These results also hold when the input trees have a bounded number of leaves.The presented results apply to both collections of rooted and unrooted trees

    Fixed parameter algorithms for compatible and agreement supertree problems

    Get PDF
    Biologists represent evolutionary history of species through phylogenetic trees. Leaves of a phylogenetic tree represent the species and internal vertices represent the extinct ancestors. Given a collection of input phylogenetic trees, a common problem in computational biology is to build a supertree that captures the evolutionary history of all the species in the input trees, and is consistent with each of the input trees. In this document we study the tree compatibility and agreement supertree problems. Tree compatibility problem is NP-complete but has been shown to be fixed parameter tractable when parametrized by number of input trees. We characterize the compatible supertree problem in terms of triangulation of a structure called the display graph. We also give an alternative characterization in terms of cuts of the display graph. We show how these characterizations are related to characterization given in terms of triangulation of the edge label intersection graph. We then give a characterization of the agreement supertree problem. In real world data, consistent supertrees do not always exist. Inconsistencies can be dealt with by contraction of edges or removal of taxa. The agreement supertree edge contraction (AST-EC) problem asks if a collection of k rooted trees can be made to agree by contraction of at most p edges. Similarly, the agreement supertree taxon removal (AST-TR) problem asks if a collection of k rooted trees can be made to agree by removal of at most p taxa. We give fixed parameter algorithms for both cases when parametrized by k and p. We study the long standing conjecture on the perfect phylogeny problem; there exists a function f (r) such that a given collection C of r-state characters is compatible if and only if every f (r) subset of C is compatible. We will show that for r ≥ 2, f (r) ≥ lceil (r/2) rceil * lfloor(r/2)rfloor + 1
    corecore