102,834 research outputs found
Theoretically-Efficient and Practical Parallel DBSCAN
The DBSCAN method for spatial clustering has received significant attention
due to its applicability in a variety of data analysis tasks. There are fast
sequential algorithms for DBSCAN in Euclidean space that take work
for two dimensions, sub-quadratic work for three or more dimensions, and can be
computed approximately in linear work for any constant number of dimensions.
However, existing parallel DBSCAN algorithms require quadratic work in the
worst case, making them inefficient for large datasets. This paper bridges the
gap between theory and practice of parallel DBSCAN by presenting new parallel
algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the
work bounds of their sequential counterparts, and are highly parallel
(polylogarithmic depth). We present implementations of our algorithms along
with optimizations that improve their practical performance. We perform a
comprehensive experimental evaluation of our algorithms on a variety of
datasets and parameter settings. Our experiments on a 36-core machine with
hyper-threading show that we outperform existing parallel DBSCAN
implementations by up to several orders of magnitude, and achieve speedups by
up to 33x over the best sequential algorithms
The Geometric Maximum Traveling Salesman Problem
We consider the traveling salesman problem when the cities are points in R^d
for some fixed d and distances are computed according to geometric distances,
determined by some norm. We show that for any polyhedral norm, the problem of
finding a tour of maximum length can be solved in polynomial time. If
arithmetic operations are assumed to take unit time, our algorithms run in time
O(n^{f-2} log n), where f is the number of facets of the polyhedron determining
the polyhedral norm. Thus for example we have O(n^2 log n) algorithms for the
cases of points in the plane under the Rectilinear and Sup norms. This is in
contrast to the fact that finding a minimum length tour in each case is
NP-hard. Our approach can be extended to the more general case of quasi-norms
with not necessarily symmetric unit ball, where we get a complexity of
O(n^{2f-2} log n).
For the special case of two-dimensional metrics with f=4 (which includes the
Rectilinear and Sup norms), we present a simple algorithm with O(n) running
time. The algorithm does not use any indirect addressing, so its running time
remains valid even in comparison based models in which sorting requires Omega(n
\log n) time. The basic mechanism of the algorithm provides some intuition on
why polyhedral norms allow fast algorithms.
Complementing the results on simplicity for polyhedral norms, we prove that
for the case of Euclidean distances in R^d for d>2, the Maximum TSP is NP-hard.
This sheds new light on the well-studied difficulties of Euclidean distances.Comment: 24 pages, 6 figures; revised to appear in Journal of the ACM.
(clarified some minor points, fixed typos
Fast directional continuous spherical wavelet transform algorithms
We describe the construction of a spherical wavelet analysis through the
inverse stereographic projection of the Euclidean planar wavelet framework,
introduced originally by Antoine and Vandergheynst and developed further by
Wiaux et al. Fast algorithms for performing the directional continuous wavelet
analysis on the unit sphere are presented. The fast directional algorithm,
based on the fast spherical convolution algorithm developed by Wandelt and
Gorski, provides a saving of O(sqrt(Npix)) over a direct quadrature
implementation for Npix pixels on the sphere, and allows one to perform a
directional spherical wavelet analysis of a 10^6 pixel map on a personal
computer.Comment: 10 pages, 3 figures, replaced to match version accepted by IEEE
Trans. Sig. Pro
Incremental Distance Transforms (IDT)
A new generic scheme for incremental implementations of distance transforms (DT) is presented: Incremental Distance Transforms (IDT). This scheme is applied on the cityblock, Chamfer, and three recent exact Euclidean DT (E2DT). A benchmark shows that for all five DT, the incremental implementation results in a significant speedup: 3.4Ăâ10Ă. However, significant differences (i.e., up to 12.5Ă) among the DT remain present. The FEED transform, one of the recent E2DT, even showed to be faster than both city-block and Chamfer DT. So, through a very efficient incremental processing scheme for DT, a relief is found for E2DTâs computational burden
Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph
Data-sensitive metrics adapt distances locally based the density of data
points with the goal of aligning distances and some notion of similarity. In
this paper, we give the first exact algorithm for computing a data-sensitive
metric called the nearest neighbor metric. In fact, we prove the surprising
result that a previously published -approximation is an exact algorithm.
The nearest neighbor metric can be viewed as a special case of a
density-based distance used in machine learning, or it can be seen as an
example of a manifold metric. Previous computational research on such metrics
despaired of computing exact distances on account of the apparent difficulty of
minimizing over all continuous paths between a pair of points. We leverage the
exact computation of the nearest neighbor metric to compute sparse spanners and
persistent homology. We also explore the behavior of the metric built from
point sets drawn from an underlying distribution and consider the more general
case of inputs that are finite collections of path-connected compact sets.
The main results connect several classical theories such as the conformal
change of Riemannian metrics, the theory of positive definite functions of
Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop
novel proof techniques based on the combination of screw functions and
Lipschitz extensions that may be of independent interest.Comment: 15 page
- âŠ