580 research outputs found
New Applications of Nearest-Neighbor Chains: Euclidean TSP and Motorcycle Graphs
We show new applications of the nearest-neighbor chain algorithm, a technique that originated in agglomerative hierarchical clustering. We use it to construct the greedy multi-fragment tour for Euclidean TSP in O(n log n) time in any fixed dimension and for Steiner TSP in planar graphs in O(n sqrt(n)log n) time; we compute motorcycle graphs, a central step in straight skeleton algorithms, in O(n^(4/3+epsilon)) time for any epsilon>0
Dynamic Geometric Data Structures via Shallow Cuttings
We present new results on a number of fundamental problems about dynamic geometric data structures:
1) We describe the first fully dynamic data structures with sublinear amortized update time for maintaining (i) the number of vertices or the volume of the convex hull of a 3D point set, (ii) the largest empty circle for a 2D point set, (iii) the Hausdorff distance between two 2D point sets, (iv) the discrete 1-center of a 2D point set, (v) the number of maximal (i.e., skyline) points in a 3D point set. The update times are near n^{11/12} for (i) and (ii), n^{7/8} for (iii) and (iv), and n^{2/3} for (v). Previously, sublinear bounds were known only for restricted "semi-online" settings [Chan, SODA 2002].
2) We slightly improve previous fully dynamic data structures for answering extreme point queries for the convex hull of a 3D point set and nearest neighbor search for a 2D point set. The query time is O(log^2n), and the amortized update time is O(log^4n) instead of O(log^5n) [Chan, SODA 2006; Kaplan et al., SODA 2017].
3) We also improve previous fully dynamic data structures for maintaining the bichromatic closest pair between two 2D point sets and the diameter of a 2D point set. The amortized update time is O(log^4n) instead of O(log^7n) [Eppstein 1995; Chan, SODA 2006; Kaplan et al., SODA 2017]
Dynamic Connectivity in Disk Graphs
Let S ⊆ R2 be a set of n sites in the plane, so that every site s ∈ S has an associated
radius rs > 0. Let D(S) be the disk intersection graph defined by S, i.e., the graph
with vertex set S and an edge between two distinct sites s, t ∈ S if and only if the
disks with centers s, t and radii rs , rt intersect. Our goal is to design data structures
that maintain the connectivity structure of D(S) as sites are inserted and/or deleted
in S. First, we consider unit disk graphs, i.e., we fix rs = 1, for all sites s ∈ S.
For this case, we describe a data structure that has O(log2 n) amortized update time
and O(log n/ log log n) query time. Second, we look at disk graphs with bounded
radius ratio Ψ, i.e., for all s ∈ S, we have 1 ≤ rs ≤ Ψ, for a parameter Ψ that is
known in advance. Here, we not only investigate the fully dynamic case, but also the
incremental and the decremental scenario, where only insertions or only deletions of
sites are allowed. In the fully dynamic case, we achieve amortized expected update
time O(Ψ log4 n) and query time O(log n/ log log n). This improves the currently
best update time by a factor of Ψ. In the incremental case, we achieve logarithmic
dependency on Ψ, with a data structure that has O(α(n)) amortized query time and
O(log Ψ log4 n) amortized expected update time, where α(n) denotes the inverse Ackermann
function. For the decremental setting, we first develop an efficient decremental
disk revealing data structure: given two sets R and B of disks in the plane, we can delete
disks from B, and upon each deletion, we receive a list of all disks in R that no longer
intersect the union of B. Using this data structure, we get decremental data structures
with a query time of O(log n/ log log n) that supports deletions in O(n log Ψ log4 n)
overall expected time for disk graphs with bounded radius ratio Ψ and O(n log5 n)
overall expected time for disk graphs with arbitrary radii, assuming that the deletion
sequence is oblivious of the internal random choices of the data structures
Graph-based time-space trade-offs for approximate near neighbors
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size on the -dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor c > 1 in query time and space , for arbitrary satisfying \begin{align} (2c^2 - 1) \rho_q + 2 c^2 (c^2 - 1) \sqrt{\rho_s (1 - \rho_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the recent optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [SODA'17]. We further study how the trade-offs scale when the data set is of size , and analyze asymptotic complexities when applying these results to lattice sieving
Online Data Structures in External Memory
The original publication is available at www.springerlink.comThe data sets for many of today's computer applications are
too large to t within the computer's internal memory and must instead
be stored on external storage devices such as disks. A major performance
bottleneck can be the input/output communication (or I/O) between
the external and internal memories. In this paper we discuss a variety of
online data structures for external memory, some very old and some very
new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D
range search), bu er trees (for batched dynamic problems), interval trees
with weight-balanced B-trees (for stabbing queries), priority search trees
(for 3-sided 2-D range search), and R-trees and other spatial structures.
We also discuss several open problems along the way
Transactional Support for Visual Instance Search
International audienceThis article addresses the issue of dynamicity and durability for scalable indexing of very large and rapidly growing collections of local features for visual instance retrieval. By extending the NV-tree, a scalable disk-based high-dimensional index, we show how to implement the ACID properties of transactions which ensure both dynamicity and durability. We present a detailed performance evaluation of the transactional NV-tree, showing that the insertion throughput is excellent despite the effort to enforce the ACID properties
PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search
This paper studies density-based clustering of point sets. These methods use
dense regions of points to detect clusters of arbitrary shapes. In particular,
we study variants of density peaks clustering, a popular type of algorithm that
has been shown to work well in practice. Our goal is to cluster large
high-dimensional datasets, which are prevalent in practice. Prior solutions are
either sequential, and cannot scale to large data, or are specialized for
low-dimensional data.
This paper unifies the different variants of density peaks clustering into a
single framework, PECANN, by abstracting out several key steps common to this
class of algorithms. One such key step is to find nearest neighbors that
satisfy a predicate function, and one of the main contributions of this paper
is an efficient way to do this predicate search using graph-based approximate
nearest neighbor search (ANNS). To provide ample parallelism, we propose a
doubling search technique that enables points to find an approximate nearest
neighbor satisfying the predicate in a small number of rounds. Our technique
can be applied to many existing graph-based ANNS algorithms, which can all be
plugged into PECANN.
We implement five clustering algorithms with PECANN and evaluate them on
synthetic and real-world datasets with up to 1.28 million points and up to 1024
dimensions on a 30-core machine with two-way hyper-threading. Compared to the
state-of-the-art FASTDP algorithm for high-dimensional density peaks
clustering, which is sequential, our best algorithm is 45x-734x faster while
achieving competitive ARI scores. Compared to the state-of-the-art parallel
DPC-based algorithm, which is optimized for low dimensions, we show that PECANN
is two orders of magnitude faster. As far as we know, our work is the first to
evaluate DPC variants on large high-dimensional real-world image and text
embedding datasets
- …