777 research outputs found

    Re-ranking Permutation-Based Candidate Sets with the n-Simplex Projection

    Get PDF
    In the realm of metric search, the permutation-based approaches have shown very good performance in indexing and supporting approximate search on large databases. These methods embed the metric objects into a permutation space where candidate results to a given query can be efficiently identified. Typically, to achieve high effectiveness, the permutation-based result set is refined by directly comparing each candidate object to the query one. Therefore, one drawback of these approaches is that the original dataset needs to be stored and then accessed during the refining step. We propose a refining approach based on a metric embedding, called n-Simplex projection, that can be used on metric spaces meeting the n-point property. The n-Simplex projection provides upper- and lower-bounds of the actual distance, derived using the distances between the data objects and a finite set of pivots. We propose to reuse the distances computed for building the data permutations to derive these bounds and we show how to use them to improve the permutation-based results. Our approach is particularly advantageous for all the cases in which the traditional refining step is too costly, e.g. very large dataset or very expensive metric function

    SPLX-Perm: A Novel Permutation-Based Representation for Approximate Metric Search

    Get PDF
    Many approaches for approximate metric search rely on a permutation-based representation of the original data objects. The main advantage of transforming metric objects into permutations is that the latter can be efficiently indexed and searched using data structures such as inverted-files and prefix trees. Typically, the permutation is obtained by ordering the identifiers of a set of pivots according to their distances to the object to be represented. In this paper, we present a novel approach to transform metric objects into permutations. It uses the object-pivot distances in combination with a metric transformation, called n-Simplex projection. The resulting permutation-based representation , named SPLX-Perm, is suitable only for the large class of metric space satisfying the n-point property. We tested the proposed approach on two benchmarks for similarity search. Our preliminary results are encouraging and open new perspectives for further investigations on the use of the n-Simplex projection for supporting permutation-based indexing

    Projection pursuit for discrete data

    Get PDF
    This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato's great works is used to illustrate the methods. Along with some basic distribution theory, an automated procedure for computing informative projections is introduced.Comment: Published in at http://dx.doi.org/10.1214/193940307000000482 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Visualising many-objective populations

    Get PDF
    Copyright © 2012 ACM14th International Conference on Genetic and Evolutionary Computation (GECCO 2012), Philadelphia, USA, 7-11 July 2012Optimisation problems often comprise a large set of objectives, and visualising the set of solutions to a problem can help with understanding them, assisting a decision maker. If the set of objectives is larger than three, visualising solutions to the problem is a difficult task. Techniques for visualising high-dimensional data are often difficult to interpret. Conversely, discarding objectives so that the solutions can be visualised in two or three spatial dimensions results in a loss of potentially important information. We demonstrate four methods for visualising many-objective populations, two of which use the complete set of objectives to present solutions in a clear and intuitive fashion and two that compress the objectives of a population into two dimensions whilst minimising the information that is lost. All of the techniques are illustrated on populations of solutions to optimisation test problems

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space
    corecore