93,583 research outputs found

    On the Complexity of Searching in Trees: Average-case Minimization

    Full text link
    We focus on the average-case analysis: A function w : V -> Z+ is given which defines the likelihood for a node to be the one marked, and we want the strategy that minimizes the expected number of queries. Prior to this paper, very little was known about this natural question and the complexity of the problem had remained so far an open question. We close this question and prove that the above tree search problem is NP-complete even for the class of trees with diameter at most 4. This results in a complete characterization of the complexity of the problem with respect to the diameter size. In fact, for diameter not larger than 3 the problem can be shown to be polynomially solvable using a dynamic programming approach. In addition we prove that the problem is NP-complete even for the class of trees of maximum degree at most 16. To the best of our knowledge, the only known result in this direction is that the tree search problem is solvable in O(|V| log|V|) time for trees with degree at most 2 (paths). We match the above complexity results with a tight algorithmic analysis. We first show that a natural greedy algorithm attains a 2-approximation. Furthermore, for the bounded degree instances, we show that any optimal strategy (i.e., one that minimizes the expected number of queries) performs at most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum degree of T. We combine this result with a non-trivial exponential time algorithm to provide an FPTAS for trees with bounded degree

    Principal components analysis in the space of phylogenetic trees

    Full text link
    Phylogenetic analysis of DNA or other data commonly gives rise to a collection or sample of inferred evolutionary trees. Principal Components Analysis (PCA) cannot be applied directly to collections of trees since the space of evolutionary trees on a fixed set of taxa is not a vector space. This paper describes a novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA. Given a data set of phylogenetic trees, a geodesic principal path is sought that maximizes the variance of the data under a form of projection onto the path. Due to the high dimensionality of tree-space and the nonlinear nature of this problem, the computational complexity is potentially very high, so approximate optimization algorithms are used to search for the optimal path. Principal paths identified in this way reveal and quantify the main sources of variation in the original collection of trees in terms of both topology and branch lengths. The approach is illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species.Comment: Published in at http://dx.doi.org/10.1214/11-AOS915 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The permutation-path coloring problem on trees

    Get PDF
    AbstractIn this paper we first show that the permutation-path coloring problem is NP-hard even for very restrictive instances like involutions, which are permutations that contain only cycles of length at most two, on both binary trees and on trees having only two vertices with degree greater than two, and for circular permutations, which are permutations that contain exactly one cycle, on trees with maximum degree greater than or equal to 4. We calculate a lower bound on the average complexity of the permutation-path coloring problem on arbitrary networks. Then we give combinatorial and asymptotic results for the permutation-path coloring problem on linear networks in order to show that the average number of colors needed to color any permutation on a linear network on n vertices is n/4+o(n). We extend these results and obtain an upper bound on the average complexity of the permutation-path coloring problem on arbitrary trees, obtaining exact results in the case of generalized star trees. Finally we explain how to extend these results for the involutions-path coloring problem on arbitrary trees

    A tight upper bound for the path length of AVL trees

    Get PDF
    AbstractWe prove that the internal path length of an AVL tree of size N is bounded from above by 1.4404N(log2 N-log2log2N)+O(N) and show that this bound is achieved by an infinite family of AVL trees, each tree of which is not of maximal height. These results carry over to the comparison cost of brother trees

    c-trie++: A Dynamic Trie Tailored for Fast Prefix Searches

    Full text link
    Given a dynamic set KK of kk strings of total length nn whose characters are drawn from an alphabet of size σ\sigma, a keyword dictionary is a data structure built on KK that provides locate, prefix search, and update operations on KK. Under the assumption that α=w/lgσ\alpha = w / \lg \sigma characters fit into a single machine word ww, we propose a keyword dictionary that represents KK in nlgσ+Θ(klgn)n \lg \sigma + \Theta(k \lg n) bits of space, supporting all operations in O(m/α+lgα)O(m / \alpha + \lg \alpha) expected time on an input string of length mm in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations

    Dynamic Trees with Almost-Optimal Access Cost

    Get PDF
    An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost optimal search trees. There exists a long literature of constructing almost optimal search trees dynamically, i.e., when the access pattern is not known in advance. All of these trees, e.g., splay trees and treaps, provide a multiplicative approximation to the optimal search cost. In this paper we show how to maintain an almost optimal weighted binary search tree under access operations and insertions of new elements where the approximation is an additive constant. More technically, we maintain a tree in which the depth of the leaf holding an element e_i does not exceed min(log(W/w_i),log n)+O(1) where w_i is the number of times e_i was accessed and W is the total length of the access sequence. Our techniques can also be used to encode a sequence of m symbols with a dynamic alphabetic code in O(m) time so that the encoding length is bounded by m(H+O(1)), where H is the entropy of the sequence. This is the first efficient algorithm for adaptive alphabetic coding that runs in constant time per symbol
    corecore