13,651 research outputs found

    Complexity of Splits Reconstruction for Low-Degree Trees

    Full text link
    Given a vertex-weighted tree T, the split of an edge xy in T is min{s_x(xy), s_y(xy)} where s_u(uv) is the sum of all weights of vertices that are closer to u than to v in T. Given a set of weighted vertices V and a multiset of splits S, we consider the problem of constructing a tree on V whose splits correspond to S. The problem is known to be NP-complete, even when all vertices have unit weight and the maximum vertex degree of T is required to be no more than 4. We show that the problem is strongly NP-complete when T is required to be a path, the problem is NP-complete when all vertices have unit weight and the maximum degree of T is required to be no more than 3, and it remains NP-complete when all vertices have unit weight and T is required to be a caterpillar with unbounded hair length and maximum degree at most 3. We also design polynomial time algorithms for the variant where T is required to be a path and the number of distinct vertex weights is constant, and the variant where all vertices have unit weight and T has a constant number of leaves. The latter algorithm is not only polynomial when the number of leaves, k, is a constant, but also fixed-parameter tractable when parameterized by k. Finally, we shortly discuss the problem when the vertex weights are not given but can be freely chosen by an algorithm. The considered problem is related to building libraries of chemical compounds used for drug design and discovery. In these inverse problems, the goal is to generate chemical compounds having desired structural properties, as there is a strong correlation between structural properties, such as the Wiener index, which is closely connected to the considered problem, and biological activity

    Circumstances in which parsimony but not compatibility will be provably misleading

    Full text link
    Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g. morphological data), characters can be too ambiguous, homoplastic or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number nn of taxa is large, even when the number of random characters is large. By contrast, we show that a method that makes more uniform use of all the data --- maximum parsimony --- can provably estimate trees in which {\em none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure

    Dynamic Ordered Sets with Exponential Search Trees

    Full text link
    We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and updating a dynamic set of n integer keys in linear space. Here searching an integer y means finding the maximum key in the set which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT Press, 2001). The best previous deterministic linear space bound was O(log n/loglog n) due Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for applications in subsequent paper
    • …
    corecore