Search CORE

885 research outputs found

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

Design, Implementation and Preliminary Analysis of General Multidimensional Trees

Author: Bereczky Nikolett
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2012
Field of study

In this thesis, a new multidimensional data structure, the q-kd tree, for storing points lying in a multidimensional space is defined, implemented and experimentally analyzed. This new data structure has k-d trees and quad-trees as particular cases. The main difference between q-kd trees and either kd-trees or quad-trees is the way in which discriminants are assigned to each node of the tree. While this is fixed for kd-trees and quad-trees, it is variable for q-kd trees. We propose two different ways for assigning discriminants to nodes, the heuristics: Split Tendency and Prob-of-1. These heuristics allow us to build what we call quasi-optimal q-kd trees and randomly-split q-kd trees respectively. Experimentally we show that our variants of q-kd trees are in between quad-trees and k-d trees concerning the memory space and internal path length, and that by proper parameter settings it is possible to construct q-kd trees taylored to the space and time restrictions we can have.Incomin

UPCommons. Portal del coneixement obert de la UPC

Fast Construction of Nets in Low Dimensional Metrics, and Their Applications

Author: Har-Peled Sariel
Mendel Manor
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2005
Field of study

We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: Approximate nearest neighbor search, well-separated pair decomposition, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near-linear and the space being used is linear.Comment: 41 pages. Extensive clean-up of minor English error

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

Design, Implementation and Preliminary Analysis of General Multidimensional Trees

Author: Bereczky Nikolett
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2012
Field of study

New models for efficient authenticated dictionaries

Author: Atighehchi Kevin
Bonnecaze Alexis
Risterucci Gabriel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

International audienceWe propose models for data authentication which take into account the behavior of the clients who perform queries. Our models reduce the size of the authenticated proof when the frequency of the query corresponding to a given data is higher. Existing models implicitly assume the frequency distribution of queries to be uniform, but in reality, this distribution generally follows Zipf's law. Our models better reflect reality and the communication cost between clients and the server provider is reduced allowing the server to save bandwidth. The obtained gain on the average proof size compared to existing schemes depends on the parameter of Zipf law. The greater the parameter, the greater the gain. When the frequency distribution follows a perfect Zipf's law, we obtain a gain that can reach 26%. Experiments show the existence of applications for which Zipf parameter is greater than 1, leading to even higher gains

HAL AMU

On construction, performance, and diversification for structured queries on the semantic desktop

Author: Minack Enrico
Publication venue: Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2011
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

LIPIcs, Volume 248, ISAAC 2022, Complete Volume

Author: Bae Sang Won
Park Heejin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 248, ISAAC 2022, Complete Volum

Dagstuhl Research Online Publication Server

Data polygamy : the many-many relationships among urban spatio-temporal data sets

Author: Chirigati F.
Damoulas T.
Doraiswamy H.
Freire J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/06/2016
Field of study

The increasing ability to collect data from urban environments, coupled with a push towards openness by governments, has resulted in the availability of numerous spatio-temporal data sets covering diverse aspects of a city. Discovering relationships between these data sets can produce new insights by enabling domain experts to not only test but also generate hypotheses. However, discovering these relationships is difficult. First, a relationship between two data sets may occur only at certain locations and/or time periods. Second, the sheer number and size of the data sets, coupled with the diverse spatial and temporal scales at which the data is available, presents computational challenges on all fronts, from indexing and querying to analyzing them. Finally, it is nontrivial to differentiate between meaningful and spurious relationships. To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets. We have performed an experimental evaluation using over 300 spatial-temporal urban data sets which shows that our approach is scalable and effective at identifying interesting relationships

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository