2,033 research outputs found
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
A Learned Index for Exact Similarity Search in Metric Spaces
Indexing is an effective way to support efficient query processing in large
databases. Recently the concept of learned index has been explored actively to
replace or supplement traditional index structures with machine learning models
to reduce storage and search costs. However, accurate and efficient similarity
query processing in high-dimensional metric spaces remains to be an open
challenge. In this paper, a novel indexing approach called LIMS is proposed to
use data clustering and pivot-based data transformation techniques to build
learned indexes for efficient similarity query processing in metric spaces. The
underlying data is partitioned into clusters such that each cluster follows a
relatively uniform data distribution. Data redistribution is achieved by
utilizing a small number of pivots for each cluster. Similar data are mapped
into compact regions and the mapped values are totally ordinal. Machine
learning models are developed to approximate the position of each data record
on the disk. Efficient algorithms are designed for processing range queries and
nearest neighbor queries based on LIMS, and for index maintenance with dynamic
updates. Extensive experiments on real-world and synthetic datasets demonstrate
the superiority of LIMS compared with traditional indexes and state-of-the-art
learned indexes.Comment: 14 pages, 14 figures, submitted to Transactions on Knowledge and Data
Engineerin
Distance Matrix Approach to Content Image Retrieval
As the volume of image data and the need of using it in various applications is growing significantly in
the last days it brings a necessity of retrieval efficiency and effectiveness. Unfortunately, existing indexing
methods are not applicable to a wide range of problem-oriented fields due to their operating time limitations and
strong dependency on the traditional descriptors extracted from the image. To meet higher requirements, a novel
distance-based indexing method for region-based image retrieval has been proposed and investigated. The
method creates premises for considering embedded partitions of images to carry out the search with different
refinement or roughening level and so to seek the image meaningful content
- …