102,834 research outputs found

    Theoretically-Efficient and Practical Parallel DBSCAN

    Full text link
    The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nlog⁥n)O(n\log n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case, making them inefficient for large datasets. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with hyper-threading show that we outperform existing parallel DBSCAN implementations by up to several orders of magnitude, and achieve speedups by up to 33x over the best sequential algorithms

    The Geometric Maximum Traveling Salesman Problem

    Full text link
    We consider the traveling salesman problem when the cities are points in R^d for some fixed d and distances are computed according to geometric distances, determined by some norm. We show that for any polyhedral norm, the problem of finding a tour of maximum length can be solved in polynomial time. If arithmetic operations are assumed to take unit time, our algorithms run in time O(n^{f-2} log n), where f is the number of facets of the polyhedron determining the polyhedral norm. Thus for example we have O(n^2 log n) algorithms for the cases of points in the plane under the Rectilinear and Sup norms. This is in contrast to the fact that finding a minimum length tour in each case is NP-hard. Our approach can be extended to the more general case of quasi-norms with not necessarily symmetric unit ball, where we get a complexity of O(n^{2f-2} log n). For the special case of two-dimensional metrics with f=4 (which includes the Rectilinear and Sup norms), we present a simple algorithm with O(n) running time. The algorithm does not use any indirect addressing, so its running time remains valid even in comparison based models in which sorting requires Omega(n \log n) time. The basic mechanism of the algorithm provides some intuition on why polyhedral norms allow fast algorithms. Complementing the results on simplicity for polyhedral norms, we prove that for the case of Euclidean distances in R^d for d>2, the Maximum TSP is NP-hard. This sheds new light on the well-studied difficulties of Euclidean distances.Comment: 24 pages, 6 figures; revised to appear in Journal of the ACM. (clarified some minor points, fixed typos

    Fast directional continuous spherical wavelet transform algorithms

    Full text link
    We describe the construction of a spherical wavelet analysis through the inverse stereographic projection of the Euclidean planar wavelet framework, introduced originally by Antoine and Vandergheynst and developed further by Wiaux et al. Fast algorithms for performing the directional continuous wavelet analysis on the unit sphere are presented. The fast directional algorithm, based on the fast spherical convolution algorithm developed by Wandelt and Gorski, provides a saving of O(sqrt(Npix)) over a direct quadrature implementation for Npix pixels on the sphere, and allows one to perform a directional spherical wavelet analysis of a 10^6 pixel map on a personal computer.Comment: 10 pages, 3 figures, replaced to match version accepted by IEEE Trans. Sig. Pro

    Incremental Distance Transforms (IDT)

    Get PDF
    A new generic scheme for incremental implementations of distance transforms (DT) is presented: Incremental Distance Transforms (IDT). This scheme is applied on the cityblock, Chamfer, and three recent exact Euclidean DT (E2DT). A benchmark shows that for all five DT, the incremental implementation results in a significant speedup: 3.4×−10×. However, significant differences (i.e., up to 12.5×) among the DT remain present. The FEED transform, one of the recent E2DT, even showed to be faster than both city-block and Chamfer DT. So, through a very efficient incremental processing scheme for DT, a relief is found for E2DT’s computational burden

    Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

    Full text link
    Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published 33-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page
    • 

    corecore