30 research outputs found

    Reference point hyperplane trees

    Get PDF
    Our context of interest is tree-structured exact search in metric spaces. We make the simple observation that, the deeper a data item is within the tree, the higher the probability of that item being excluded from a search. Assuming a fixed and independent probability p of any subtree being excluded at query time, the probability of an individual data item being accessed is (1−p)d for a node at depth d. In a balanced binary tree half of the data will be at the maximum depth of the tree so this effect should be significant and observable. We test this hypothesis with two experiments on partition trees. First, we force a balance by adjusting the partition/exclusion criteria, and compare this with unbalanced trees where the mean data depth is greater. Second, we compare a generic hyperplane tree with a monotone hyperplane tree, where also the mean depth is greater. In both cases the tree with the greater mean data depth performs better in high-dimensional spaces. We then experiment with increasing the mean depth of nodes by using a small, fixed set of reference points to make exclusion decisions over the whole tree, so that almost all of the data resides at the maximum depth. Again this can be seen to reduce the overall cost of indexing. Furthermore, we observe that having already calculated reference point distances for all data, a final filtering can be applied if the distance table is retained. This reduces further the number of distance calculations required, whilst retaining scalability. The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure

    Supermetric search with the four-point property

    Get PDF
    Metric indexing research is concerned with the efficient evaluation of queries in metric spaces. In general, a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most such mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely 4-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric space as, in terms of metric search, they are significantly more tractable. We show some stronger geometric guarantees deriving from the four-point property which can be used in indexing to great effect, and show results for two of the SISAP benchmark searches that are substantially better than any previously published

    A log square average case algorithm to make insertions in fast similarity search

    Get PDF
    To speed up similarity based searches many indexing techniques have been proposed in order to address the problem of efficiency. However, most of the proposed techniques do not admit fast insertion of new elements once the index is built. The main effect is that changes in the environment are very costly to be taken into account. In this work, we propose a new technique to allow fast insertions of elements in a family of static tree-based indexes. Unlike other techniques, the resulting index is exactly equal to the index that would be obtained by building it from scratch. Therefore there is no performance degradation in search time. We show that the expected number of distance computations (and the average time complexity) is bounded by a function that grows with log2(n) where n is the size of the database. In order to check the correctness of our approach some experiments with artificial and real data are carried out.This work has been supported in part by Grants TIN2009-14205-C04-01 from the Spanish CICYT (Ministerio de Ciencia e Innovación), the IST Programme of the European Community, under the Pascal Network of Excellence, IST-2002-506778, and the program CONSOLIDER INGENIO 2010 (CSD2007-00018)

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    Supermetric search

    Get PDF
    Metric search is concerned with the efficient evaluation of queries in metric spaces. In general, a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space that is finitely four-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric space as, in terms of metric search, they are significantly more tractable. Supermetric spaces include all those governed by Euclidean, Cosine, Jensen–Shannon and Triangular distances, and are thus commonly used within many domains. In previous work we have given a generic mathematical basis for the supermetric property and shown how it can improve indexing performance for a given exact search structure. Here we present a full investigation into its use within a variety of different hyperplane partition indexing structures, and go on to show some more of its flexibility by examining a search structure whose partition and exclusion conditions are tailored, at each node, to suit the individual reference points and data set present there. Among the results given, we show a new best performance for exact search using a well-known benchmark

    Efficient Similarity Search on Quasi-Metric Graphs

    Get PDF

    Consultas métrico espaciales

    Get PDF
    Los espacios métricos permiten modelar bases de datos que soportan búsquedas por similitud, es decir, búsquedas de objetos parecidos a uno dado. Las bases de datos espaciales permiten almacenar y recuperar eficientemente datos que poseen alguna componente espacial. Existen aplicaciones donde resulta de interés realizar búsquedas por similitud pero teniendo en cuenta también el componente espacial. Este tipo de consultas no puede resolverse eficientemente ni con índices espaciales, ni con índices métricos. En este artículo abordamos el estudio de estas consultas con el fin de formalizarlas y proponer un método eficiente para su resolución. Para ello, presentamos el MeTree, una combinación del índice métrico FQT con el índice espacial R-Tree que permite consultas combinando ambos aspectos.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

    Consultas métrico espaciales

    Get PDF
    Los espacios métricos permiten modelar bases de datos que soportan búsquedas por similitud, es decir, búsquedas de objetos parecidos a uno dado. Las bases de datos espaciales permiten almacenar y recuperar eficientemente datos que poseen alguna componente espacial. Existen aplicaciones donde resulta de interés realizar búsquedas por similitud pero teniendo en cuenta también el componente espacial. Este tipo de consultas no puede resolverse eficientemente ni con índices espaciales, ni con índices métricos. En este artículo abordamos el estudio de estas consultas con el fin de formalizarlas y proponer un método eficiente para su resolución. Para ello, presentamos el MeTree, una combinación del índice métrico FQT con el índice espacial R-Tree que permite consultas combinando ambos aspectos.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

    Consultas métrico espaciales

    Get PDF
    Los espacios métricos permiten modelar bases de datos que soportan búsquedas por similitud, es decir, búsquedas de objetos parecidos a uno dado. Las bases de datos espaciales permiten almacenar y recuperar eficientemente datos que poseen alguna componente espacial. Existen aplicaciones donde resulta de interés realizar búsquedas por similitud pero teniendo en cuenta también el componente espacial. Este tipo de consultas no puede resolverse eficientemente ni con índices espaciales, ni con índices métricos. En este artículo abordamos el estudio de estas consultas con el fin de formalizarlas y proponer un método eficiente para su resolución. Para ello, presentamos el MeTree, una combinación del índice métrico FQT con el índice espacial R-Tree que permite consultas combinando ambos aspectos.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

    Pivot-based Metric Indexing

    Get PDF
    The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.</jats:p
    corecore