87 research outputs found

    Solving All-k-Nearest Neighbor Problem without an Index

    Get PDF
    Among the similarity queries in metric spaces, there are one that obtains the k-nearest neighbors of all the elements in the database (All-k-NN). One way to solve it is the naïve one: comparing each object in the database with all the other ones and returning the k elements nearest to it (k-NN). Another way to do this is by preprocessing the database to build an index, and then searching on this index for the k-NN of each element of the dataset. Answering to the All-k-NN problem allows to build the k-Nearest Neighbor graph (kNNG). Given an object collection of a metric space, the Nearest Neighbor Graph (NNG) associates each node with its closest neighbor under the given metric. If we link each object to their k nearest neighbors, we obtain the k Nearest Neighbor Graph (kNNG).The kNNG can be considered an index for a database, which is quite efficient and can allow improvements. In this work, we propose a new technique to solve the All-k-NN problem which do not use any index to obtain the k-NN of each element. This approach solves the problem avoiding as many comparisons as possible, only comparing some database elements and taking advantage of the distance function properties. Its total cost is significantly lower than that of the naïve solution.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

    An unbalanced approach to metric space searching

    Get PDF
    Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. This problem have received a lot of attention recently, specially in the pattern recognition community. The Excluded Middle Vantage Point Forest (VP–forest) is a data structure designed to search in high dimensional vector spaces. A VP–forest is built as a collection of balanced Vantage Point Trees (VP–trees). In this work we propose a novel two-fold approach for searching. Firstly we extend the VP– forest to search in metric spaces, and more importantly we test a counterintuitive modification to the VP–tree, namely to unbalance it. In exact searching an unbalanced data structure perform poorly, and most of the algorithmic effort is directed to obtain a balanced data structure. The unbalancing approach is motivated by a recent data structure (the List of Clusters ) specialized in high dimensional metric space searches, which is an extremely unbalanced data structure (a linked list) outperforming other approaches.Eje: AlgoritmosRed de Universidades con Carreras en Informática (RedUNCI

    Optimizing the spatial approximation tree from the root

    Get PDF
    Many computational applications need to look for information in a database. Nowadays, the predominance of nonconventional databases makes the similarity search (i.e., searching elements of the database that are "similar" to a given query) becomes a preponderant concept. The Spatial Approximation Tree has been shown that it compares favorably against alternative data structures for similarity searching in metric spaces of medium to high dimensionality ("difficult" spaces) or queries with low selectivity. However, for the construction process the tree root has been randomly selected and the tree ,in its shape and performance, is completely determined by this selection. Therefore, we are interested in improve mainly the searches in this data structure trying to select the tree root so to reflect some of the own characteristics of the metric space to be indexed. We regard that selecting the root in this way it allows a better adaption of the data structure to the intrinsic dimensionality of the metric space considered, so also it achieves more efficient similarity searches.Facultad de Informátic

    Optimizing the spatial approximation tree from the root

    Get PDF
    Many computational applications need to look for information in a database. Nowadays, the predominance of nonconventional databases makes the similarity search (i.e., searching elements of the database that are "similar" to a given query) becomes a preponderant concept. The Spatial Approximation Tree has been shown that it compares favorably against alternative data structures for similarity searching in metric spaces of medium to high dimensionality ("difficult" spaces) or queries with low selectivity. However, for the construction process the tree root has been randomly selected and the tree ,in its shape and performance, is completely determined by this selection. Therefore, we are interested in improve mainly the searches in this data structure trying to select the tree root so to reflect some of the own characteristics of the metric space to be indexed. We regard that selecting the root in this way it allows a better adaption of the data structure to the intrinsic dimensionality of the metric space considered, so also it achieves more efficient similarity searches.Facultad de Informátic

    Approximate Nearest Neighbor Graph via Index Construction

    Get PDF
    Given a collection of objects in a metric space, the Nearest Neighbor Graph (NNG) associate each node with its closest neighbor under the given metric. It can be obtained trivially by computing the nearest neighbor of every object. To avoid computing every distance pair an index could be constructed. Unfortunately, due to the curse of dimensionality the indexed and the brute force methods are almost equally inefficient. This bring the attention to algorithms computing approximate versions of NNG. The DiSAT is a proximity searching tree. It is hierarchical. The root computes the distances to all objects, and each child node of the root computes the distance to all its subtree recursively. Top levels will have accurate computation of the nearest neighbor, and as we descend the tree this information would be less accurate. If we perform a few rebuilds of the index, taking deep nodes in each iteration, keeping score of the closest known neighbor, it is possible to compute an Approximate NNG (ANNG). Accordingly, in this work we propose to obtain de ANNG by this approach, without performing any search, and we tested this proposal in both synthetic and real world databases with good results both in costs and response quality.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

    All Near Neighbor GraphWithout Searching

    Get PDF
    Given a collection of n objects equipped with a distance function d(·, ·), the Nearest Neighbor Graph (NNG) consists in finding the nearest neighbor of each object in the collection. Without an index the total cost of NNG is quadratic. Using an index the cost would be sub-quadratic if the search for individual items is sublinear. Unfortunately, due to the so called curse of dimensionality the indexed and the brute force methods are almost equally inefficient. In this paper we present an efficient algorithm to build the Near Neighbor Graph (nNG), that is an approximation of NNG, using only the index construction, without actually searching for objects.Facultad de Informátic

    Approximate Nearest Neighbor Graph via Index Construction

    Get PDF
    Given a collection of objects in a metric space, the Nearest Neighbor Graph (NNG) associate each node with its closest neighbor under the given metric. It can be obtained trivially by computing the nearest neighbor of every object. To avoid computing every distance pair an index could be constructed. Unfortunately, due to the curse of dimensionality the indexed and the brute force methods are almost equally inefficient. This bring the attention to algorithms computing approximate versions of NNG. The DiSAT is a proximity searching tree. It is hierarchical. The root computes the distances to all objects, and each child node of the root computes the distance to all its subtree recursively. Top levels will have accurate computation of the nearest neighbor, and as we descend the tree this information would be less accurate. If we perform a few rebuilds of the index, taking deep nodes in each iteration, keeping score of the closest known neighbor, it is possible to compute an Approximate NNG (ANNG). Accordingly, in this work we propose to obtain de ANNG by this approach, without performing any search, and we tested this proposal in both synthetic and real world databases with good results both in costs and response quality.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

    An unbalanced approach to metric space searching

    Get PDF
    Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. This problem have received a lot of attention recently, specially in the pattern recognition community. The Excluded Middle Vantage Point Forest (VP–forest) is a data structure designed to search in high dimensional vector spaces. A VP–forest is built as a collection of balanced Vantage Point Trees (VP–trees). In this work we propose a novel two-fold approach for searching. Firstly we extend the VP– forest to search in metric spaces, and more importantly we test a counterintuitive modification to the VP–tree, namely to unbalance it. In exact searching an unbalanced data structure perform poorly, and most of the algorithmic effort is directed to obtain a balanced data structure. The unbalancing approach is motivated by a recent data structure (the List of Clusters ) specialized in high dimensional metric space searches, which is an extremely unbalanced data structure (a linked list) outperforming other approaches.Eje: AlgoritmosRed de Universidades con Carreras en Informática (RedUNCI

    Optimización de búsquedas en bases de datos métricas

    Get PDF
    Con la evolución de las tecnologías de información y comunicación, han surgido depósitos no estructurados de información. No sólo se consultan nuevos tipos de datos tales como texto libre, imágenes, audio y vi deo; sino que además, en algunos casos, ya no se puede estructurar más la información en claves y registros. Estos escenarios requieren un modelo más general tal como bases de datos métricas. La necesidad de una respuesta rápida y adecuada, y un eficiente uso del espacio disponible, hace necesaria la existencia de estructuras de datos especializadas que incluyan estos aspectos. En particular, nos vamos a dedicar a cómo resolver eficientemente no sólo las búsquedas, sino también a algunos otros temas de interés en el ámbito de las bases de datos métricas. Así, la investigación apunta a poner estos nuevos modelos de bases de datos a un nivel de madurez similar al de las bases de datos tradicionales.Eje: Ingeniería de Software y Base de DatosRed de Universidades con Carreras en Informática (RedUNCI

    Eligiendo raíces para el Árbol de aproximación espacial

    Get PDF
    Muchas aplicaciones computacionales necesitan buscar información en una base de datos. En la actualidad el predominio de las bases de datos multimedia hace que la búsqueda por similitud o búsqueda por proximidad, es decir buscar elementos de la base de datos que sean similares a un elemento de consulta dado, se vuelva un concepto preponderante. El Árbol de Aproximación Espacial ha demostrado ser muy competitivo para la búsqueda por similitud en espacios métricos de media a alta dimensionalidad (espacios difíciles ) o para responder a consultas con baja selectividad. Sin embargo, para su construcción se elegía su raí z al azar y ello determinaba completamente el árbol tanto en su forma como en su desempe ño. Así , nuestro interés fue el de optimizar las búsquedas en dicha estructura tratando de que la raíz sea elegida de manera tal que re fleje alguna de las caracterí sticas propias del espacio métrico a indexar. Creemos que de esta forma permitimos que la estructura se adapte mejor a la dimensión intrí nseca del espacio métrico considerado, lo cual redunda en búsquedas más efi cientes.Many computational applications need to search information in a database. At the present time the predominance of multimedia databases does that the similarity search or proximity search, that is to look for elements of the database that are similar to a given query element, becomes a preponderant concept. The Spatial Approximation Trees have shown to be competitive for similarity search in spaces with medium to high dimensionality ( dif cult spaces) or for queries with low selectivity. Nevertheless, for its construction its root was chosen randomly and it completely determines the tree, not only in its shape but also in its searching performance. Thus, our interest was to optimize searches in this data structure trying to choose the tree root in a way that the characteristics of indexed space can be re ected. We consider that, by this way, the data structure can adapt itself better to the dimension of the considered metric space, which results in more ef cient similarity searches.IV Workshop de Ingeniería de Software y Base de DatosRed de Universidades con Carreras en Informática (RedUNCI
    corecore