10,638 research outputs found
Shortest path distance in random k-nearest neighbor graphs
Consider a weighted or unweighted k-nearest neighbor graph that has been
built on n data points drawn randomly according to some density p on R^d. We
study the convergence of the shortest path distance in such graphs as the
sample size tends to infinity. We prove that for unweighted kNN graphs, this
distance converges to an unpleasant distance function on the underlying space
whose properties are detrimental to machine learning. We also study the
behavior of the shortest path distance in weighted kNN graphs.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Probabilistic Analysis of Optimization Problems on Generalized Random Shortest Path Metrics
Simple heuristics often show a remarkable performance in practice for
optimization problems. Worst-case analysis often falls short of explaining this
performance. Because of this, "beyond worst-case analysis" of algorithms has
recently gained a lot of attention, including probabilistic analysis of
algorithms.
The instances of many optimization problems are essentially a discrete metric
space. Probabilistic analysis for such metric optimization problems has
nevertheless mostly been conducted on instances drawn from Euclidean space,
which provides a structure that is usually heavily exploited in the analysis.
However, most instances from practice are not Euclidean. Little work has been
done on metric instances drawn from other, more realistic, distributions. Some
initial results have been obtained by Bringmann et al. (Algorithmica, 2013),
who have used random shortest path metrics on complete graphs to analyze
heuristics.
The goal of this paper is to generalize these findings to non-complete
graphs, especially Erd\H{o}s-R\'enyi random graphs. A random shortest path
metric is constructed by drawing independent random edge weights for each edge
in the graph and setting the distance between every pair of vertices to the
length of a shortest path between them with respect to the drawn weights. For
such instances, we prove that the greedy heuristic for the minimum distance
maximum matching problem, the nearest neighbor and insertion heuristics for the
traveling salesman problem, and a trivial heuristic for the -median problem
all achieve a constant expected approximation ratio. Additionally, we show a
polynomial upper bound for the expected number of iterations of the 2-opt
heuristic for the traveling salesman problem.Comment: An extended abstract appeared in the proceedings of WALCOM 201
Probabilistic Analyses of Combinatorial Optimization Problems on Random Shortest Path Metrics
Simple heuristics for combinatorial optimization problems often show a remarkable performance in practice. Worst-case analysis often falls short of explaining this performance. Because of this, ‘beyond worst-case analysis’ of algorithms has recently gained a lot of attention, including probabilistic analysis of algorithms.The instances of many combinatorial optimization problems are essentially a discrete metric space. Probabilistic analysis for such metric optimization problems has nevertheless mostly been conducted on instances drawn from Euclidean space, which provides a structure that is usually heavily exploited in the analysis. However, most instances from practice are not Euclidean. Little work has been done on metric instances drawn from other, more realistic, distributions. Some initial results have been obtained by Bringmann et al. (Algorithmica, 2015), who have used random shortest path metrics generated from complete graphs to analyse heuristics.In this thesis we look at several variations of the random shortest path metrics, and perform probabilistic analyses for some simple heuristics for several combinatorial optimization problems on these random metric spaces. A random shortest path metric is constructed by drawing independent random edge weights for each edge in a graph and setting the distance between every pair of vertices to the length of a shortest path between them, with respect to the drawn weights.We provide some basic properties of the distances between vertices in random shortest path metrics. Using these properties, we perform several probabilistic analyses. For random shortest path metrics generated from (dense) Erdős-Rényi random graphs we show that the greedy heuristic for the minimum-distance perfect matching problem, the nearest neighbor and insertion heuristics for the traveling salesman problem, and a trivial heuristic for the k-median problem all achieve a constant expected approximation ratio. Additionally, we show a polynomial upper bound for the expected number of iterations of the 2-opt heuristic for the traveling salesman problem in this model.For random shortest path metrics generated from sparse graphs we show that the greedy heuristic for the minimum-distance perfect matching problem, and the nearest neighbor and insertion heuristics for the traveling salesman problem all achieve a constant expected approximation ratio. Additionally, we show that the 2-opt heuristic for the traveling salesman problem also achieves a constant expected approximation ratio in this model. For random shortest path metrics generated from complete graphs we analyse a simple greedy heuristic for the facility location problem: opening the κ cheapest facilities (with κ only depending on the facility opening costs). If the facility opening costs are such that κ is not too large, then we show that this heuristic is asymptotically optimal. For large values of κ we provide a closed-form expression as upper bound for the expected approximation ratio and we evaluate this expression for the special case where all facility opening costs are equal.Moreover, we show in this model that a simple 2-approximation algorithm for the Steiner tree problem is asymptotically optimal as long as the number of terminals is not too large. We also present some numerical results that imply that the 2-opt heuristic for the traveling salesman problem seems to perform rather poorly in this model
Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph
Data-sensitive metrics adapt distances locally based the density of data
points with the goal of aligning distances and some notion of similarity. In
this paper, we give the first exact algorithm for computing a data-sensitive
metric called the nearest neighbor metric. In fact, we prove the surprising
result that a previously published -approximation is an exact algorithm.
The nearest neighbor metric can be viewed as a special case of a
density-based distance used in machine learning, or it can be seen as an
example of a manifold metric. Previous computational research on such metrics
despaired of computing exact distances on account of the apparent difficulty of
minimizing over all continuous paths between a pair of points. We leverage the
exact computation of the nearest neighbor metric to compute sparse spanners and
persistent homology. We also explore the behavior of the metric built from
point sets drawn from an underlying distribution and consider the more general
case of inputs that are finite collections of path-connected compact sets.
The main results connect several classical theories such as the conformal
change of Riemannian metrics, the theory of positive definite functions of
Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop
novel proof techniques based on the combination of screw functions and
Lipschitz extensions that may be of independent interest.Comment: 15 page
Defining Equitable Geographic Districts in Road Networks via Stable Matching
We introduce a novel method for defining geographic districts in road
networks using stable matching. In this approach, each geographic district is
defined in terms of a center, which identifies a location of interest, such as
a post office or polling place, and all other network vertices must be labeled
with the center to which they are associated. We focus on defining geographic
districts that are equitable, in that every district has the same number of
vertices and the assignment is stable in terms of geographic distance. That is,
there is no unassigned vertex-center pair such that both would prefer each
other over their current assignments. We solve this problem using a version of
the classic stable matching problem, called symmetric stable matching, in which
the preferences of the elements in both sets obey a certain symmetry. In our
case, we study a graph-based version of stable matching in which nodes are
stably matched to a subset of nodes denoted as centers, prioritized by their
shortest-path distances, so that each center is apportioned a certain number of
nodes. We show that, for a planar graph or road network with nodes and
centers, the problem can be solved in time, which improves
upon the runtime of using the classic Gale-Shapley stable matching
algorithm when is large. Finally, we provide experimental results on road
networks for these algorithms and a heuristic algorithm that performs better
than the Gale-Shapley algorithm for any range of values of .Comment: 9 pages, 4 figures, to appear in 25th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL
2017) November 7-10, 2017, Redondo Beach, California, US
From random walks to distances on unweighted graphs
Large unweighted directed graphs are commonly used to capture relations
between entities. A fundamental problem in the analysis of such networks is to
properly define the similarity or dissimilarity between any two vertices.
Despite the significance of this problem, statistical characterization of the
proposed metrics has been limited. We introduce and develop a class of
techniques for analyzing random walks on graphs using stochastic calculus.
Using these techniques we generalize results on the degeneracy of hitting times
and analyze a metric based on the Laplace transformed hitting time (LTHT). The
metric serves as a natural, provably well-behaved alternative to the expected
hitting time. We establish a general correspondence between hitting times of
the Brownian motion and analogous hitting times on the graph. We show that the
LTHT is consistent with respect to the underlying metric of a geometric graph,
preserves clustering tendency, and remains robust against random addition of
non-geometric edges. Tests on simulated and real-world data show that the LTHT
matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201
- …