6,886 research outputs found
Solving All-k-Nearest Neighbor Problem without an Index
Among the similarity queries in metric spaces, there are one that obtains the k-nearest neighbors of all the elements in the database (All-k-NN). One way to solve it is the naïve one: comparing each object in the database with all the other ones and returning the k elements nearest to it (k-NN). Another way to do this is by preprocessing the database to build an index, and then searching on this index for the k-NN of each element of the dataset. Answering to the All-k-NN problem allows to build the k-Nearest Neighbor graph (kNNG). Given an object collection of a metric space, the Nearest Neighbor Graph (NNG) associates each node with its closest neighbor under the given metric. If we link each object to their k nearest neighbors, we obtain the k Nearest Neighbor Graph (kNNG).The kNNG can be considered an index for a database, which is quite efficient and can allow improvements.
In this work, we propose a new technique to solve the All-k-NN problem which do not use any index to obtain the k-NN of each element. This approach solves the problem avoiding as many comparisons as possible, only comparing some database elements and taking advantage of the distance function properties. Its total cost is significantly lower than that of the naïve solution.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic
Approximate Nearest Neighbor Searching with Non-Euclidean and Weighted Distances
We present a new approach to approximate nearest-neighbor queries in fixed
dimension under a variety of non-Euclidean distances. We are given a set of
points in , an approximation parameter , and
a distance function that satisfies certain smoothness and growth-rate
assumptions. The objective is to preprocess into a data structure so that
for any query point in , it is possible to efficiently report
any point of whose distance from is within a factor of
of the actual closest point.
Prior to this work, the most efficient data structures for approximate
nearest-neighbor searching in spaces of constant dimensionality applied only to
the Euclidean metric. This paper overcomes this limitation through a method
called convexification. For admissible distance functions, the proposed data
structures answer queries in logarithmic time using space, nearly matching the best known bounds for the
Euclidean metric. These results apply to both convex scaling distance functions
(including the Mahalanobis distance and weighted Minkowski metrics) and Bregman
divergences (including the Kullback-Leibler divergence and the Itakura-Saito
distance)
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Approximate Nearest Neighbor Search for Low Dimensional Queries
We study the Approximate Nearest Neighbor problem for metric spaces where the
query points are constrained to lie on a subspace of low doubling dimension,
while the data is high-dimensional. We show that this problem can be solved
efficiently despite the high dimensionality of the data.Comment: 25 page
Improving metric access methods with bucket files
Modern applications deal with complex data, where retrieval by similarity plays an important role in most of them. Complex data whose primary comparison mechanisms are similarity predicates are usually immersed in metric spaces. Metric Access Methods (MAMs) exploit the metric space properties to divide the metric space into regions and conquer efficiency on the processing of similarity queries, like range and k-nearest neighbor queries. \ud
Existing MAM use homogeneous data structures to improve query execution, pursuing the same techniques employed by traditional methods developed to retrieve scalar and multidimensional data. In this paper, we combine hashing and hierarchical ball partitioning approaches to achieve a hybrid index that is tuned to improve similarity queries targeting complex data sets, with search algorithms that reduce total execution time by aggressively reducing the number of distance calculations. We applied our technique in the Slim-tree and performed experiments over real data sets showing that the proposed technique is able to reduce the execution time of both range and k-nearest queries to at least half of the Slim-tree. Moreover, this technique is general to be applied over many existing MAM.CAPESCNPqFAPESPInternational Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow
Active Nearest-Neighbor Learning in Metric Spaces
We propose a pool-based non-parametric active learning algorithm for general
metric spaces, called MArgin Regularized Metric Active Nearest Neighbor
(MARMANN), which outputs a nearest-neighbor classifier. We give prediction
error guarantees that depend on the noisy-margin properties of the input
sample, and are competitive with those obtained by previously proposed passive
learners. We prove that the label complexity of MARMANN is significantly lower
than that of any passive learner with similar error guarantees. MARMANN is
based on a generalized sample compression scheme, and a new label-efficient
active model-selection procedure
- …