10,638 research outputs found

    Shortest path distance in random k-nearest neighbor graphs

    Full text link
    Consider a weighted or unweighted k-nearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are detrimental to machine learning. We also study the behavior of the shortest path distance in weighted kNN graphs.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    Probabilistic Analysis of Optimization Problems on Generalized Random Shortest Path Metrics

    Get PDF
    Simple heuristics often show a remarkable performance in practice for optimization problems. Worst-case analysis often falls short of explaining this performance. Because of this, "beyond worst-case analysis" of algorithms has recently gained a lot of attention, including probabilistic analysis of algorithms. The instances of many optimization problems are essentially a discrete metric space. Probabilistic analysis for such metric optimization problems has nevertheless mostly been conducted on instances drawn from Euclidean space, which provides a structure that is usually heavily exploited in the analysis. However, most instances from practice are not Euclidean. Little work has been done on metric instances drawn from other, more realistic, distributions. Some initial results have been obtained by Bringmann et al. (Algorithmica, 2013), who have used random shortest path metrics on complete graphs to analyze heuristics. The goal of this paper is to generalize these findings to non-complete graphs, especially Erd\H{o}s-R\'enyi random graphs. A random shortest path metric is constructed by drawing independent random edge weights for each edge in the graph and setting the distance between every pair of vertices to the length of a shortest path between them with respect to the drawn weights. For such instances, we prove that the greedy heuristic for the minimum distance maximum matching problem, the nearest neighbor and insertion heuristics for the traveling salesman problem, and a trivial heuristic for the kk-median problem all achieve a constant expected approximation ratio. Additionally, we show a polynomial upper bound for the expected number of iterations of the 2-opt heuristic for the traveling salesman problem.Comment: An extended abstract appeared in the proceedings of WALCOM 201

    Probabilistic Analyses of Combinatorial Optimization Problems on Random Shortest Path Metrics

    Get PDF
    Simple heuristics for combinatorial optimization problems often show a remarkable performance in practice. Worst-case analysis often falls short of explaining this performance. Because of this, ‘beyond worst-case analysis’ of algorithms has recently gained a lot of attention, including probabilistic analysis of algorithms.The instances of many combinatorial optimization problems are essentially a discrete metric space. Probabilistic analysis for such metric optimization problems has nevertheless mostly been conducted on instances drawn from Euclidean space, which provides a structure that is usually heavily exploited in the analysis. However, most instances from practice are not Euclidean. Little work has been done on metric instances drawn from other, more realistic, distributions. Some initial results have been obtained by Bringmann et al. (Algorithmica, 2015), who have used random shortest path metrics generated from complete graphs to analyse heuristics.In this thesis we look at several variations of the random shortest path metrics, and perform probabilistic analyses for some simple heuristics for several combinatorial optimization problems on these random metric spaces. A random shortest path metric is constructed by drawing independent random edge weights for each edge in a graph and setting the distance between every pair of vertices to the length of a shortest path between them, with respect to the drawn weights.We provide some basic properties of the distances between vertices in random shortest path metrics. Using these properties, we perform several probabilistic analyses. For random shortest path metrics generated from (dense) Erdős-Rényi random graphs we show that the greedy heuristic for the minimum-distance perfect matching problem, the nearest neighbor and insertion heuristics for the traveling salesman problem, and a trivial heuristic for the k-median problem all achieve a constant expected approximation ratio. Additionally, we show a polynomial upper bound for the expected number of iterations of the 2-opt heuristic for the traveling salesman problem in this model.For random shortest path metrics generated from sparse graphs we show that the greedy heuristic for the minimum-distance perfect matching problem, and the nearest neighbor and insertion heuristics for the traveling salesman problem all achieve a constant expected approximation ratio. Additionally, we show that the 2-opt heuristic for the traveling salesman problem also achieves a constant expected approximation ratio in this model. For random shortest path metrics generated from complete graphs we analyse a simple greedy heuristic for the facility location problem: opening the κ cheapest facilities (with κ only depending on the facility opening costs). If the facility opening costs are such that κ is not too large, then we show that this heuristic is asymptotically optimal. For large values of κ we provide a closed-form expression as upper bound for the expected approximation ratio and we evaluate this expression for the special case where all facility opening costs are equal.Moreover, we show in this model that a simple 2-approximation algorithm for the Steiner tree problem is asymptotically optimal as long as the number of terminals is not too large. We also present some numerical results that imply that the 2-opt heuristic for the traveling salesman problem seems to perform rather poorly in this model

    Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

    Full text link
    Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published 33-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page

    Defining Equitable Geographic Districts in Road Networks via Stable Matching

    Full text link
    We introduce a novel method for defining geographic districts in road networks using stable matching. In this approach, each geographic district is defined in terms of a center, which identifies a location of interest, such as a post office or polling place, and all other network vertices must be labeled with the center to which they are associated. We focus on defining geographic districts that are equitable, in that every district has the same number of vertices and the assignment is stable in terms of geographic distance. That is, there is no unassigned vertex-center pair such that both would prefer each other over their current assignments. We solve this problem using a version of the classic stable matching problem, called symmetric stable matching, in which the preferences of the elements in both sets obey a certain symmetry. In our case, we study a graph-based version of stable matching in which nodes are stably matched to a subset of nodes denoted as centers, prioritized by their shortest-path distances, so that each center is apportioned a certain number of nodes. We show that, for a planar graph or road network with nn nodes and kk centers, the problem can be solved in O(nnlogn)O(n\sqrt{n}\log n) time, which improves upon the O(nk)O(nk) runtime of using the classic Gale-Shapley stable matching algorithm when kk is large. Finally, we provide experimental results on road networks for these algorithms and a heuristic algorithm that performs better than the Gale-Shapley algorithm for any range of values of kk.Comment: 9 pages, 4 figures, to appear in 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2017) November 7-10, 2017, Redondo Beach, California, US

    From random walks to distances on unweighted graphs

    Full text link
    Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201
    corecore