580 research outputs found

    New Applications of Nearest-Neighbor Chains: Euclidean TSP and Motorcycle Graphs

    Get PDF
    We show new applications of the nearest-neighbor chain algorithm, a technique that originated in agglomerative hierarchical clustering. We use it to construct the greedy multi-fragment tour for Euclidean TSP in O(n log n) time in any fixed dimension and for Steiner TSP in planar graphs in O(n sqrt(n)log n) time; we compute motorcycle graphs, a central step in straight skeleton algorithms, in O(n^(4/3+epsilon)) time for any epsilon>0

    Dynamic Geometric Data Structures via Shallow Cuttings

    Get PDF
    We present new results on a number of fundamental problems about dynamic geometric data structures: 1) We describe the first fully dynamic data structures with sublinear amortized update time for maintaining (i) the number of vertices or the volume of the convex hull of a 3D point set, (ii) the largest empty circle for a 2D point set, (iii) the Hausdorff distance between two 2D point sets, (iv) the discrete 1-center of a 2D point set, (v) the number of maximal (i.e., skyline) points in a 3D point set. The update times are near n^{11/12} for (i) and (ii), n^{7/8} for (iii) and (iv), and n^{2/3} for (v). Previously, sublinear bounds were known only for restricted "semi-online" settings [Chan, SODA 2002]. 2) We slightly improve previous fully dynamic data structures for answering extreme point queries for the convex hull of a 3D point set and nearest neighbor search for a 2D point set. The query time is O(log^2n), and the amortized update time is O(log^4n) instead of O(log^5n) [Chan, SODA 2006; Kaplan et al., SODA 2017]. 3) We also improve previous fully dynamic data structures for maintaining the bichromatic closest pair between two 2D point sets and the diameter of a 2D point set. The amortized update time is O(log^4n) instead of O(log^7n) [Eppstein 1995; Chan, SODA 2006; Kaplan et al., SODA 2017]

    Dynamic Connectivity in Disk Graphs

    Get PDF
    Let S ⊆ R2 be a set of n sites in the plane, so that every site s ∈ S has an associated radius rs > 0. Let D(S) be the disk intersection graph defined by S, i.e., the graph with vertex set S and an edge between two distinct sites s, t ∈ S if and only if the disks with centers s, t and radii rs , rt intersect. Our goal is to design data structures that maintain the connectivity structure of D(S) as sites are inserted and/or deleted in S. First, we consider unit disk graphs, i.e., we fix rs = 1, for all sites s ∈ S. For this case, we describe a data structure that has O(log2 n) amortized update time and O(log n/ log log n) query time. Second, we look at disk graphs with bounded radius ratio Ψ, i.e., for all s ∈ S, we have 1 ≤ rs ≤ Ψ, for a parameter Ψ that is known in advance. Here, we not only investigate the fully dynamic case, but also the incremental and the decremental scenario, where only insertions or only deletions of sites are allowed. In the fully dynamic case, we achieve amortized expected update time O(Ψ log4 n) and query time O(log n/ log log n). This improves the currently best update time by a factor of Ψ. In the incremental case, we achieve logarithmic dependency on Ψ, with a data structure that has O(α(n)) amortized query time and O(log Ψ log4 n) amortized expected update time, where α(n) denotes the inverse Ackermann function. For the decremental setting, we first develop an efficient decremental disk revealing data structure: given two sets R and B of disks in the plane, we can delete disks from B, and upon each deletion, we receive a list of all disks in R that no longer intersect the union of B. Using this data structure, we get decremental data structures with a query time of O(log n/ log log n) that supports deletions in O(n log Ψ log4 n) overall expected time for disk graphs with bounded radius ratio Ψ and O(n log5 n) overall expected time for disk graphs with arbitrary radii, assuming that the deletion sequence is oblivious of the internal random choices of the data structures

    Dynamic Enumeration of Similarity Joins

    Get PDF

    Graph-based time-space trade-offs for approximate near neighbors

    Get PDF
    We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n=2o(d)n = 2^{o(d)} on the dd-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor c > 1 in query time nρq+o(1)n^{\rho_q + o(1)} and space n1+ρs+o(1)n^{1 + \rho_s + o(1)}, for arbitrary ρq,ρs0\rho_q, \rho_s \geq 0 satisfying \begin{align} (2c^2 - 1) \rho_q + 2 c^2 (c^2 - 1) \sqrt{\rho_s (1 - \rho_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small cc and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the recent optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [SODA'17]. We further study how the trade-offs scale when the data set is of size n=2Θ(d)n = 2^{\Theta(d)}, and analyze asymptotic complexities when applying these results to lattice sieving

    Online Data Structures in External Memory

    Get PDF
    The original publication is available at www.springerlink.comThe data sets for many of today's computer applications are too large to t within the computer's internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of online data structures for external memory, some very old and some very new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), bu er trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way

    Transactional Support for Visual Instance Search

    Get PDF
    International audienceThis article addresses the issue of dynamicity and durability for scalable indexing of very large and rapidly growing collections of local features for visual instance retrieval. By extending the NV-tree, a scalable disk-based high-dimensional index, we show how to implement the ACID properties of transactions which ensure both dynamicity and durability. We present a detailed performance evaluation of the transactional NV-tree, showing that the insertion throughput is excellent despite the effort to enforce the ACID properties

    PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search

    Full text link
    This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequential, and cannot scale to large data, or are specialized for low-dimensional data. This paper unifies the different variants of density peaks clustering into a single framework, PECANN, by abstracting out several key steps common to this class of algorithms. One such key step is to find nearest neighbors that satisfy a predicate function, and one of the main contributions of this paper is an efficient way to do this predicate search using graph-based approximate nearest neighbor search (ANNS). To provide ample parallelism, we propose a doubling search technique that enables points to find an approximate nearest neighbor satisfying the predicate in a small number of rounds. Our technique can be applied to many existing graph-based ANNS algorithms, which can all be plugged into PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading. Compared to the state-of-the-art FASTDP algorithm for high-dimensional density peaks clustering, which is sequential, our best algorithm is 45x-734x faster while achieving competitive ARI scores. Compared to the state-of-the-art parallel DPC-based algorithm, which is optimized for low dimensions, we show that PECANN is two orders of magnitude faster. As far as we know, our work is the first to evaluate DPC variants on large high-dimensional real-world image and text embedding datasets
    corecore