Search CORE

5 research outputs found

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

Distributed similarity queries in metric spaces

Author: CHEN Lu
DING Xin
GAO Yunjun
YANG Keyu
ZHANG Yuanliang
ZHENG Baihua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2019
Field of study

Institutional Knowledge at Singapore Management University

Pivot-based Metric Indexing

Author: CHEN Lu
GAO Yunjun
JENSEN Christian S.
YANG Hanyu
YANG Keyu
ZHENG Baihua
Publication venue: 'VLDB Endowment'
Publication date: 01/06/2017
Field of study

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.</jats:p

Crossref

Institutional Knowledge at Singapore Management University

VBN

Optimal Pivots to Minimize the Index Size for Metric Access Methods

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref