8,190 research outputs found
Maximum Inner-Product Search using Tree Data-structures
The problem of {\em efficiently} finding the best match for a query in a
given set with respect to the Euclidean distance or the cosine similarity has
been extensively studied in literature. However, a closely related problem of
efficiently finding the best match with respect to the inner product has never
been explored in the general setting to the best of our knowledge. In this
paper we consider this general problem and contrast it with the existing
best-match algorithms. First, we propose a general branch-and-bound algorithm
using a tree data structure. Subsequently, we present a dual-tree algorithm for
the case where there are multiple queries. Finally we present a new data
structure for increasing the efficiency of the dual-tree algorithm. These
branch-and-bound algorithms involve novel bounds suited for the purpose of
best-matching with inner products. We evaluate our proposed algorithms on a
variety of data sets from various applications, and exhibit up to five orders
of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201
A Study on Geometry Contrast Enhancement for 3D Point Models
Electrical EngineeringPoint primitives have come into the spotlight as a representation method of 3D models. A lot of researches have been performed on the modeling, processing, and rendering 3D point models. Especially, various methods have been developed for the extraction and preservation of the salient features of corners, curves, and edges in 3D point models. However, little effort has been made to extract and enhance the weak features that are relatively imperceptible due to the low geometry contrast. In this thesis, we propose a novel method to improve the visibility of 3D point models by enhancing the geometry contrast of weak features. We first define a weak feature region as a group of local points yielding small deviations of normal directions. Then we define the geometry histogram for each region as the distribution of the signed distance between a feature point and the locally approximated plane. We equalize and stretch the geometry histogram and move the corresponding feature points accordingly. We also render the enhanced model using the normal mapping for better visual presentation. Experimental results demonstrate that the proposed method enhances the geometry contrast of 3D point models by refining the appearance of the weak features. We expect that the geometry contrast enhancement algorithm will facilitate many applications in various fields.ope
PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search
This paper studies density-based clustering of point sets. These methods use
dense regions of points to detect clusters of arbitrary shapes. In particular,
we study variants of density peaks clustering, a popular type of algorithm that
has been shown to work well in practice. Our goal is to cluster large
high-dimensional datasets, which are prevalent in practice. Prior solutions are
either sequential, and cannot scale to large data, or are specialized for
low-dimensional data.
This paper unifies the different variants of density peaks clustering into a
single framework, PECANN, by abstracting out several key steps common to this
class of algorithms. One such key step is to find nearest neighbors that
satisfy a predicate function, and one of the main contributions of this paper
is an efficient way to do this predicate search using graph-based approximate
nearest neighbor search (ANNS). To provide ample parallelism, we propose a
doubling search technique that enables points to find an approximate nearest
neighbor satisfying the predicate in a small number of rounds. Our technique
can be applied to many existing graph-based ANNS algorithms, which can all be
plugged into PECANN.
We implement five clustering algorithms with PECANN and evaluate them on
synthetic and real-world datasets with up to 1.28 million points and up to 1024
dimensions on a 30-core machine with two-way hyper-threading. Compared to the
state-of-the-art FASTDP algorithm for high-dimensional density peaks
clustering, which is sequential, our best algorithm is 45x-734x faster while
achieving competitive ARI scores. Compared to the state-of-the-art parallel
DPC-based algorithm, which is optimized for low dimensions, we show that PECANN
is two orders of magnitude faster. As far as we know, our work is the first to
evaluate DPC variants on large high-dimensional real-world image and text
embedding datasets
- …