Search CORE

273 research outputs found

A unified approach to linear probing hashing with buckets

Author: Janson Svante
Viola Alfredo
Publication venue
Publication date: 22/10/2014
Field of study

We give a unified analysis of linear probing hashing with a general bucket size. We use both a combinatorial approach, giving exact formulas for generating functions, and a probabilistic approach, giving simple derivations of asymptotic results. Both approaches complement nicely, and give a good insight in the relation between linear probing and random walks. A key methodological contribution, at the core of Analytic Combinatorics, is the use of the symbolic method (based on q-calculus) to directly derive the generating functions to analyze.Comment: 49 page

arXiv.org e-Print Archive

CiteSeerX

Tradeoffs for nearest neighbors on the sphere

Author: Laarhoven Thijs
Publication venue
Publication date: 01/01/2015
Field of study

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity

n^{\rho_q}

and update complexity

n^{\rho_u}

for data sets of size

n

is given by the following equation in terms of the approximation factor

c

and the exponents

\rho_q

and

\rho_u

c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}.

For small

c=1+\epsilon

, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity

n^{1-4\epsilon^2}

. Balancing the query and update costs leads to optimal complexities

n^{1/(2c^2-1)}

, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity

n^{o(1)}

can be achieved at the cost of a space complexity of the order

n^{1/(4\epsilon^2)}

, matching the bound

n^{\Omega(1/\epsilon^2)}

of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large

c

, minimizing the update complexity results in a query complexity of

n^{2/c^2+O(1/c^4)}

, improving upon the related exponent for large

c

of [Kapralov, PODS'15] by a factor

2

, and matching the bound

n^{\Omega(1/c^2)}

of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities

n^{1/(2c^2-1)}

, while a minimum query time complexity can be achieved with update complexity

n^{2/c^2+O(1/c^4)}

, improving upon the previous best exponents of Kapralov by a factor

2

.Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1605.02701 [cs.DS]

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space

Author: Pham Ninh
Publication venue
Publication date: 01/01/2017
Field of study

We study the

r

-near neighbors reporting problem (

r

-NN), i.e., reporting \emph{all} points in a high-dimensional point set

S

that lie within a radius

r

of a given query point

q

. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving

r

-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for

r

-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201

arXiv.org e-Print Archive

Copenhagen University Research Information System

Analysing the Performance of GPU Hash Tables for State Space Exploration

Author: Cassee Nathan
Wijs Anton
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2017
Field of study

In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore is a GPU-based model checker that uses a hash table to efficiently keep track of already explored states. As a large number of states is discovered and stored during such an exploration, the hash table should be able to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare two different hash tables optimised for the GPU, one being the GPUexplore hash table, and the other using Cuckoo hashing. We compare the performance of both hash tables using random and non-random data obtained from model checking experiments, to analyse the applicability of the two hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than GPUexplore hashing for random data, and that Cuckoo hashing is five to nine times faster for non-random data. This suggests great potential to further speed up GPUexplore in the near future.Comment: In Proceedings GaM 2017, arXiv:1712.0834

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Constant Sequence Extension for Fast Search Using Weighted Hamming Distance

Author: Li Haizhou
Lin Zhiping
Weng Zhenyu
Zhuang Huiping
Publication venue
Publication date: 06/06/2023
Field of study

Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However, search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length, while that using HD is proportional to the sequence length. To overcome this challenge, we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length, which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also, compared with the HD-based non-exhaustive search method, our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items

arXiv.org e-Print Archive

CHAP : Enabling efficient hardware-based multiple hash schemes for IP lookup

Author: Cho S
Demetriades S
Hanna M
Melhem R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Building a high performance IP lookup engine remains a challenge due to increasingly stringent throughput requirements and the growing size of IP tables. An emerging approach for IP lookup is the use of set associative memory architecture, which is basically a hardware implementation of an open addressing hash table with the property that each row of the hash table can be searched in one memory cycle. While open addressing hash tables, in general, provide good average-case search performance, their memory utilization and worst-case performance can degrade quickly due to bucket overflows. This paper presents a new simple hash probing scheme called CHAP (Content-based HAsh Probing) that tackles the hash overflow problem. In CHAP, the probing is based on the content of the hash table, thus avoiding the classical side effects of probing. We show through experimenting with real IP tables how CHAP can effectively deal with the overflow. © IFIP International Federation for Information Processing 2009

CiteSeerX

D-Scholarship@Pitt

Fast and Simple Compact Hashing via Bucketing

Author: Koppl Dominik
Puglisi Simon J.
Raman Rajeev
Publication venue
Publication date: 01/09/2022
Field of study

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0, ..., u - 1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where B(u, n) = log2 ((u)(n)) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time), while having competitive query times. A related notion is that of a compact hash ID map, which stores a set (S) over cap of n keys from U, and implicitly associates each key in (S) over cap with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on S, while using space close to B(u, n) bits. One of our approaches is suitable for use as a compact hash ID map.Peer reviewe

Helsingin yliopiston digitaalinen arkisto