8,190 research outputs found

    Maximum Inner-Product Search using Tree Data-structures

    Full text link
    The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201

    A Study on Geometry Contrast Enhancement for 3D Point Models

    Get PDF
    Electrical EngineeringPoint primitives have come into the spotlight as a representation method of 3D models. A lot of researches have been performed on the modeling, processing, and rendering 3D point models. Especially, various methods have been developed for the extraction and preservation of the salient features of corners, curves, and edges in 3D point models. However, little effort has been made to extract and enhance the weak features that are relatively imperceptible due to the low geometry contrast. In this thesis, we propose a novel method to improve the visibility of 3D point models by enhancing the geometry contrast of weak features. We first define a weak feature region as a group of local points yielding small deviations of normal directions. Then we define the geometry histogram for each region as the distribution of the signed distance between a feature point and the locally approximated plane. We equalize and stretch the geometry histogram and move the corresponding feature points accordingly. We also render the enhanced model using the normal mapping for better visual presentation. Experimental results demonstrate that the proposed method enhances the geometry contrast of 3D point models by refining the appearance of the weak features. We expect that the geometry contrast enhancement algorithm will facilitate many applications in various fields.ope

    PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search

    Full text link
    This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequential, and cannot scale to large data, or are specialized for low-dimensional data. This paper unifies the different variants of density peaks clustering into a single framework, PECANN, by abstracting out several key steps common to this class of algorithms. One such key step is to find nearest neighbors that satisfy a predicate function, and one of the main contributions of this paper is an efficient way to do this predicate search using graph-based approximate nearest neighbor search (ANNS). To provide ample parallelism, we propose a doubling search technique that enables points to find an approximate nearest neighbor satisfying the predicate in a small number of rounds. Our technique can be applied to many existing graph-based ANNS algorithms, which can all be plugged into PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading. Compared to the state-of-the-art FASTDP algorithm for high-dimensional density peaks clustering, which is sequential, our best algorithm is 45x-734x faster while achieving competitive ARI scores. Compared to the state-of-the-art parallel DPC-based algorithm, which is optimized for low dimensions, we show that PECANN is two orders of magnitude faster. As far as we know, our work is the first to evaluate DPC variants on large high-dimensional real-world image and text embedding datasets
    corecore