15 research outputs found

    Evaluating tradeoff between recall and perfomance of GPU permutation index

    Get PDF
    Query-by-content, by means of similarity search, is a fundamental operation for applications that deal with multimedia data. For this kind of query it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the dissimilarity between the query object and each database object. This search problem can be formalized with the concept of metric space. In this scenario, the search efficiency is understood as minimizing the number of distance calculations required to answer them. Building an index can be a solution, but with very large metric databases is not enough, it is also necessary to speed up the queries by using high performance computing, as GPU, and in some cases is reasonable to accept a fast answer although it was inexact. In this work we evaluate the tradeoff between the answer quality and time performance of our implementation of Permutation Index, on a pure GPU architecture, used to solve in parallel multiple approximate similarity searches on metric databases.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Efficient similarity search on multimedia databases

    Get PDF
    Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High Performance Computing (HPC) techniques in their solutions. The Graphics Processing Unit (GPU) as an alternative HPC device has been increasingly used to speedup certain computing processes. This work introduces a pure GPU architecture to build the Permutation Index and to solve approximate similarity queries on multimedia databases. The empirical results of each implementation have achieved different level of speedup which are related with characteristics of GPU and the particular database used.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Efficient similarity search on multimedia databases

    Get PDF
    Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High Performance Computing (HPC) techniques in their solutions. The Graphics Processing Unit (GPU) as an alternative HPC device has been increasingly used to speedup certain computing processes. This work introduces a pure GPU architecture to build the Permutation Index and to solve approximate similarity queries on multimedia databases. The empirical results of each implementation have achieved different level of speedup which are related with characteristics of GPU and the particular database used.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Approximate reverse k-nearest neighbor queries in general metric spaces

    Full text link

    Group Reverse Nearest Neighbor Search using Modified Skip Graph

    Get PDF
    The reverse nearest neighbor search is used for spatial queries. The reverse nearest neighbor search, the object in high dimensional space has a certain region where all objects inside the region will think of query object as their nearest neighbor. The existing methods for reverse nearest neighbor search are limited to the single query point, which is inefficient for the high dimensional spatial databases etc. Therefore, in this paper we proposed a group reverse nearest neighbor search which can find multiple query objects in a specific region. In this paper we proposed method for group reverse nearest neighbor queries using modified skip graph

    New Variations of the Maximum Coverage Facility Location Problem

    Get PDF
    Consider a competitive facility location scenario where, given a set U of n users and a set F of m facilities in the plane, the objective is to place a new facility in an appropriate place such that the number of users served by the new facility is maximized. Here users and facilities are considered as points in the plane, and each user takes service from its nearest facility, where the distance between a pair of points is measured in either L1 or L2 or L∞ metric. This problem is also known as the maximum coverage (MaxCov) problem. In this paper, we will consider the k-MaxCov problem, where the objective is to place k (⩾1) new facilities such that the total number of users served by these k new facilities is maximized. We begin by proposing an O(nlogn) time algorithm for the k-MaxCov problem, when the existing facilities are all located on a single straight line and the new facilities are also restricted to lie on the same line. We then study the 2-MaxCov problem in the plane, and propose an O(n2) time and space algorithm in the L1 and L∞ metrics. In the L2 metric, we solve the 2-MaxCov problem in the plane in O(n3logn) time and O(n2logn) space. Finally, we consider the 2-Farthest-MaxCov problem, where a user is served by its farthest facility, and propose an algorithm that runs in O(nlogn) time, in all the three metrics

    SAH: Shifting-aware Asymmetric Hashing for Reverse kk-Maximum Inner Product Search

    Full text link
    This paper investigates a new yet challenging problem called Reverse kk-Maximum Inner Product Search (RkkMIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of RkkMIPS aims to find a set of user vectors whose inner products with the query vector are one of the kk largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the RkkMIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4\sim8×\times faster than the state-of-the-art methods for RkkMIPS while achieving F1-scores of over 90\%. The code is available at \url{https://github.com/HuangQiang/SAH}.Comment: Accepted by AAAI 202

    Ranked Reverse Nearest Neighbor Search

    Get PDF

    Reverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing

    Get PDF
    We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We introduce a method that is both provably correct and efficient in all dimensions, based on a reduction of the problem to one instance of \e-nearest neighbor search plus a controlled number of instances of {\em exhaustive rr-\pleb}, a variant of {\em Point Location among Equal Balls} where all the rr-balls centered at the data points that contain the query point are sought for, not just one. The former problem has been extensively studied and elegantly solved in high dimensions using Locality-Sensitive Hashing (LSH) techniques. By contrast, the latter problem has a complexity that is still not fully understood. We revisit the analysis of the LSH scheme for exhaustive rr-\pleb using a somewhat refined notion of locality-sensitive family of hash function, which brings out a meaningful output-sensitive term in the complexity of the problem. Our analysis, combined with a non-isometric lifting of the data, enables us to answer exhaustive rr-\pleb queries (and down the road reverse nearest neighbors queries) efficiently. Along the way, we obtain a simple algorithm for answering exact nearest neighbor queries, whose complexity is parametrized by some {\em condition number} measuring the inherent difficulty of a given instance of the problem.Nous étudions le problème de la recherche efficace de plus proches voisins inverses en grandes dimensions. Étant donné un nuage de points PP et un paramètre \e, notre objectif est de pré-traiter le nuage PP de telle sorte à pouvoir trouver rapidement l'ensemble des plus proches voisins inverses d'un point de requête qq quelconque, plus éventuellement un petit nombre de faux positifs qui sont proches d'être des plus proches voisins inverses de qq. Alors que des solutions efficaces et prouvées existent pour ce problème en dimensions petites ou fixées, à ce jour les méthodes proposées en grandes dimensions sont essentiellement heuristiques. Nous proposons une méthode à la fois efficace et prouvée en toutes dimensions, basée sur une réduction du problème à un petit nombre d'instances des problèmes classiques de recherche de plus proche voisin approché et de recherche exhaustive de voisins à distance rr fixée. La complexité intrinsèque de ce dernier problème reste peu connue. Nous proposons une nouvelle analyse du comportement de certaines techniques de hachage sensibles à la localisation (LSH) sur ce problème, qui met en évidence une borne dépendant de la taille de la sortie, et qui, combinée à un relèvement non-isométrique des points en dimension plus grande, permet de résoudre le problème de la recherche de plus proches voisins inverses efficacement, via la réduction citée précédemment. Dans la foulée nous proposons également une méthode pour effectuer des recherches de plus proches voisins exacts, dont la complexité est paramétrée par un indice de {\em conditionnement} mesurant la difficulté intrinsèque d'une instance particulière du problème

    R-Forest for Approximate Nearest Neighbor Queries in High Dimensional Space

    Get PDF
    Searching high dimensional space has been a challenge and an area of intense research for many years. The dimensionality curse has rendered most existing index methods all but useless causing people to research other techniques. In my dissertation I will try to resurrect one of the best known index structures, R-Tree, which most have given up on as a viable method of answering high dimensional queries. I have pointed out the various advantages of R-Tree as a method for answering approximate nearest neighbor queries, and the advantages of locality sensitive hashing and locality sensitive B-Tree, which are the most successful methods today. I started by looking at improving the maintenance of R-Tree by the use of bulk loading and insertion. I proposed and implemented a new method that bulk loads the index which was an improvement of standard method. I then turned my attention to nearest neighbor queries, which is a much more challenging problem especially in high dimensional space. Initially I developed a set of heuristics, easily implemented in R-Tree, which improved the efficiency of high dimensional approximate nearest neighbor queries. To further refine my method I took another approach, by developing a new model, known as R-Forest, which takes advantage of space partitioning while still using R-Tree as its index structure. With this new approach I was able to implement new heuristics and can show that R-Forest, comprised of a set of R-Trees, is a viable solution tohigh dimensional approximate nearest neighbor queries when compared to established methods
    corecore