78 research outputs found

    Maximum agreement and compatible supertrees

    Get PDF
    AbstractGiven a set of leaf-labelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have non-identical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT.A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size.It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.SMAST is shown to be W[2]-hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A similar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, cannot be approximated in polynomial time within any constant factor, unless P=NP. These results also hold when the input trees have a bounded number of leaves.The presented results apply to both collections of rooted and unrooted trees

    Relaxed Agreement Forests

    Full text link
    There are multiple factors which can cause the phylogenetic inference process to produce two or more conflicting hypotheses of the evolutionary history of a set X of biological entities. That is: phylogenetic trees with the same set of leaf labels X but with distinct topologies. This leads naturally to the goal of quantifying the difference between two such trees T_1 and T_2. Here we introduce the problem of computing a 'maximum relaxed agreement forest' (MRAF) and use this as a proxy for the dissimilarity of T_1 and T_2, which in this article we assume to be unrooted binary phylogenetic trees. MRAF asks for a partition of the leaf labels X into a minimum number of blocks S_1, S_2, ... S_k such that for each i, the subtrees induced in T_1 and T_2 by S_i are isomorphic up to suppression of degree-2 nodes and taking the labels X into account. Unlike the earlier introduced maximum agreement forest (MAF) model, the subtrees induced by the S_i are allowed to overlap. We prove that it is NP-hard to compute MRAF, by reducing from the problem of partitioning a permutation into a minimum number of monotonic subsequences (PIMS). Furthermore, we show that MRAF has a polynomial time O(log n)-approximation algorithm where n=|X| and permits exact algorithms with single-exponential running time. When at least one of the two input trees has a caterpillar topology, we prove that testing whether a MRAF has size at most k can be answered in polynomial time when k is fixed. We also note that on two caterpillars the approximability of MRAF is related to that of PIMS. Finally, we establish a number of bounds on MRAF, compare its behaviour to MAF both in theory and in an experimental setting and discuss a number of open problems.Comment: 14 pages plus appendi

    A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

    Get PDF
    We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel linear programming formulation. In addition, we show this linear program is stronger than previously known formulations, and we give a compact formulation, showing that it can be solved in polynomial tim

    About the largest subtree common to several X-trees

    Get PDF
    Étant donnĂ©s plusieursX-arbres, ou arbres phylogĂ©nĂ©tiques, sur le mĂȘme ensembleX, nous cherchons Ă  construire un plus grand sous-ensembleY⊂Xtel que les arbres partiels induits surYsoient identiques d’un point de vue topologique, c’est-Ă -dire indĂ©pendamment des longueurs des arĂȘtes. Ce problĂšme, connu sous le nom de MAST (Maximum Agreement SubTree), est NP-Difficile, dans le cas gĂ©nĂ©ral, dĂšs que le nombre deX-arbres est supĂ©rieur Ă  2. Nous prĂ©sentons un algorithme approchĂ© qui construit un arbre partiel commun maximal. Il est facilement programmable et suffisamment efficace sur une centaine deX-arbres connectant une centaine d’élĂ©ments pour Ă©valuer la taille moyenne d’un sous-arbre commun Ă  desX-arbres indĂ©pendants. La distribution observĂ©e permet d’estimer la taille critique d’un sous-arbre commun et de mesurer la congruence de plusieurs arbres Ă©volutifs.Given severalX-trees or unrooted phylogenetic trees on the same set of taxaX, we look for a largest subsetY⊂Xsuch that al l the partial trees reduced byYare topologically identical. This common subtree is called a MAST for Maximum Agreement SubTree. The problem has polynomial complexity when there are only two trees but generally it is NP-hard for more than two. We introduce a polynomial approximation algorithm for the multiple case, which is easy to implement, very efficient and which produces a maximal common subtree. It begins with the computation of an upper bound for its size and designates elements inXthat cannot belong to a common subtree of a given size. Simulations on random and real data have shown that this heuristic often provides an optimal solution as soon as the number of trees is larger than 5. Then, we develop a statistical study to evaluate the average size of a MAST corresponding to independent trees. The computed distribution allows to estimate the critical size of a MAST to reveal some congruence between trees

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

    Get PDF
    We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel linear programming formulation. In addition, we show this linear program is stronger than previously known formulations, and we give a compact formulation, showing that it can be solved in polynomial tim

    A Gap-{ETH}-Tight Approximation Scheme for Euclidean {TSP}

    Get PDF
    We revisit the classic task of finding the shortest tour of nn points in dd-dimensional Euclidean space, for any fixed constant d≄2d \geq 2. We determine the optimal dependence on Δ\varepsilon in the running time of an algorithm that computes a (1+Δ)(1+\varepsilon)-approximate tour, under a plausible assumption. Specifically, we give an algorithm that runs in 2O(1/Δd−1)nlog⁥n2^{\mathcal{O}(1/\varepsilon^{d-1})} n\log n time. This improves the previously smallest dependence on Δ\varepsilon in the running time (1/Δ)O(1/Δd−1)nlog⁥n(1/\varepsilon)^{\mathcal{O}(1/\varepsilon^{d-1})}n \log n of the algorithm by Rao and Smith (STOC 1998). We also show that a 2o(1/Δd−1)poly(n)2^{o(1/\varepsilon^{d-1})}\text{poly}(n) algorithm would violate the Gap-Exponential Time Hypothesis (Gap-ETH). Our new algorithm builds upon the celebrated quadtree-based methods initially proposed by Arora (J. ACM 1998), but it adds a simple new idea that we call \emph{sparsity-sensitive patching}. On a high level this lets the granularity with which we simplify the tour depend on how sparse it is locally. Our approach is (arguably) simpler than the one by Rao and Smith since it can work without geometric spanners. We demonstrate the technique extends easily to other problems, by showing as an example that it also yields a Gap-ETH-tight approximation scheme for Rectilinear Steiner Tree
    • 

    corecore