2 research outputs found
Querying metric spaces with bit operations
Funding: This work was supported by ESRC grant ES/L007487/1 “Administrative Data Research Centre—Scotland".Metric search techniques can be usefully characterised by the time at which distance calculations are performed during a query. Most exact search mechanisms use a “just-in-time” approach where distances are calculated as part of a navigational strategy. An alternative is to use a “one-time” approach, where distances to a fixed set of reference objects are calculated at the start of each query. These distances are typically used to re-cast data and queries into a different space where querying is more efficient, allowing an approximate solution to be obtained. In this paper we use a “one-time” approach for an exact search mechanism. A fixed set of reference objects is used to define a large set of regions within the original space, and each query is assessed with respect to the definition of these regions. Data is then accessed if, and only if, it is useful for the calculation of the query solution. As dimensionality increases, the number of defined regions must increase, but the memory required for the exclusion calculation does not. We show that the technique gives excellent performance over the SISAP benchmark data sets, and most interestingly we show how increases in dimensionality may be countered by relatively modest increases in the number of reference objects used.Postprin
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes