Search CORE

13,383 research outputs found

On the evaluation of exact-match and range queries over multidimensional data in distributed hash tables

Author: Malensek Matthew
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2012
Field of study

2012 Fall.Includes bibliographical references.The quantity and precision of geospatial and time series observational data being collected has increased alongside the steady expansion of processing and storage capabilities in modern computing hardware. The storage requirements for this information are vastly greater than the capabilities of a single computer, and are primarily met in a distributed manner. However, distributed solutions often impose strict constraints on retrieval semantics. In this thesis, we investigate the factors that influence storage and retrieval operations on large datasets in a cloud setting, and propose a lightweight data partitioning and indexing scheme to facilitate these operations. Our solution provides expressive retrieval support through range-based and exact-match queries and can be applied over massive quantities of multidimensional data. We provide benchmarks to illustrate the relative advantage of using our solution over a general-purpose cloud storage engine in a distributed network of heterogeneous computing resources

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Fast Exact Search in Hamming Space with Multi-Index Hashing

Author: Fleet David J.
Norouzi Mohammad
Punjani Ali
Publication venue
Publication date: 24/04/2014
Field of study

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits

arXiv.org e-Print Archive

CiteSeerX

Performance comparison of point and spatial access methods

Author: D. Comer
H.P. Kriegel
J. Nievergelt
K. Hinrichs
K.-Y. Whang
M. Tamminen
W.A. Burkhard
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/1990
Field of study

In the past few years a large number of multidimensional point access methods, also called multiattribute index structures, has been suggested, all of them claiming good performance. Since no performance comparison of these structures under arbitrary (strongly correlated nonuniform, short "ugly") data distributions and under various types of queries has been performed, database researchers and designers were hesitant to use any of these new point access methods. As shown in a recent paper, such point access methods are not only important in traditional database applications. In new applications such as CAD/CIM and geographic or environmental information systems, access methods for spatial objects are needed. As recently shown such access methods are based on point access methods in terms of functionality and performance. Our performance comparison naturally consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in part I I spatial access methods for rectangles will be compared. In part I we present a survey and classification of existing point access methods. Then we carefully select the following four methods for implementation and performance comparison under seven different data files (distributions) and various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is robust under ugly data and queries. In part I I we compare spatial access methods for rectangles. After presenting a survey and classification of existing spatial access methods we carefully selected the following four methods for implementation and performance comparison under six different data files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree. This comparison is a first step towards a standardized testbed or benchmark. We offer our data and query files to each designer of a new point or spatial access method such that he can run his implementation in our testbed

Crossref

Open Access LMU

A class of structured P2P systems supporting browsing

Author: Cohen Julien
Publication venue
Publication date: 01/01/2009
Field of study

Browsing is a way of finding documents in a large amount of data which is complementary to querying and which is particularly suitable for multimedia documents. Locating particular documents in a very large collection of multimedia documents such as the ones available in peer to peer networks is a difficult task. However, current peer to peer systems do not allow to do this by browsing. In this report, we show how one can build a peer to peer system supporting a kind of browsing. In our proposal, one must extend an existing distributed hash table system with a few features : handling partial hash-keys and providing appropriate routing mechanisms for these hash-keys. We give such an algorithm for the particular case of the Tapestry distributed hash table. This is a work in progress as no proper validation has been done yet.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters

Author: Eppstein David
Hagerup Torben
Kopelowitz Tsvi
Yang Xiao
Publication venue
Publication date: 29/08/2017
Field of study

We show how to answer spatial multiple-set intersection queries in O(n(log w)/w + kt) expected time, where n is the total size of the t sets involved in the query, w is the number of bits in a memory word, k is the output size, and c is any fixed constant. This improves the asymptotic performance over previous solutions and is based on an interesting data structure, known as 2-3 cuckoo hash-filters. Our results apply in the word-RAM model (or practical RAM model), which allows for constant-time bit-parallel operations, such as bitwise AND, OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs. Our solutions apply to any multiple-set intersection queries in spatial data sets that can be reduced to one-dimensional range queries, such as spatial join queries for one-dimensional points or sets of points stored along space-filling curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International Conference on Advances in Geographic Information System

arXiv.org e-Print Archive

Crossref