4,365 research outputs found

    A Fast Algorithm for Computing Geodesic Distances in Tree Space

    Get PDF
    Comparing and computing distances between phylogenetic trees are important biological problems, especially for models where edge lengths play an important role. The geodesic distance measure between two phylogenetic trees with edge lengths is the length of the shortest path between them in the continuous tree space introduced by Billera, Holmes, and Vogtmann. This tree space provides a powerful tool for studying and comparing phylogenetic trees, both in exhibiting a natural distance measure and in providing a Euclidean-like structure for solving optimization problems on trees. An important open problem is to find a polynomial time algorithm for finding geodesics in tree space. This paper gives such an algorithm, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained

    Fast approximation of centrality and distances in hyperbolic graphs

    Full text link
    We show that the eccentricities (and thus the centrality indices) of all vertices of a δ\delta-hyperbolic graph G=(V,E)G=(V,E) can be computed in linear time with an additive one-sided error of at most cδc\delta, i.e., after a linear time preprocessing, for every vertex vv of GG one can compute in O(1)O(1) time an estimate e^(v)\hat{e}(v) of its eccentricity eccG(v)ecc_G(v) such that eccG(v)e^(v)eccG(v)+cδecc_G(v)\leq \hat{e}(v)\leq ecc_G(v)+ c\delta for a small constant cc. We prove that every δ\delta-hyperbolic graph GG has a shortest path tree, constructible in linear time, such that for every vertex vv of GG, eccG(v)eccT(v)eccG(v)+cδecc_G(v)\leq ecc_T(v)\leq ecc_G(v)+ c\delta. These results are based on an interesting monotonicity property of the eccentricity function of hyperbolic graphs: the closer a vertex is to the center of GG, the smaller its eccentricity is. We also show that the distance matrix of GG with an additive one-sided error of at most cδc'\delta can be computed in O(V2log2V)O(|V|^2\log^2|V|) time, where c<cc'< c is a small constant. Recent empirical studies show that many real-world graphs (including Internet application networks, web networks, collaboration networks, social networks, biological networks, and others) have small hyperbolicity. So, we analyze the performance of our algorithms for approximating centrality and distance matrix on a number of real-world networks. Our experimental results show that the obtained estimates are even better than the theoretical bounds.Comment: arXiv admin note: text overlap with arXiv:1506.01799 by other author

    Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

    Full text link
    Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published 33-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page

    Principal components analysis in the space of phylogenetic trees

    Full text link
    Phylogenetic analysis of DNA or other data commonly gives rise to a collection or sample of inferred evolutionary trees. Principal Components Analysis (PCA) cannot be applied directly to collections of trees since the space of evolutionary trees on a fixed set of taxa is not a vector space. This paper describes a novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA. Given a data set of phylogenetic trees, a geodesic principal path is sought that maximizes the variance of the data under a form of projection onto the path. Due to the high dimensionality of tree-space and the nonlinear nature of this problem, the computational complexity is potentially very high, so approximate optimization algorithms are used to search for the optimal path. Principal paths identified in this way reveal and quantify the main sources of variation in the original collection of trees in terms of both topology and branch lengths. The approach is illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species.Comment: Published in at http://dx.doi.org/10.1214/11-AOS915 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore