Search CORE

9 research outputs found

More Analysis of Double Hashing for Balanced Allocations

Author: Mitzenmacher Michael
Publication venue
Publication date: 02/03/2015
Field of study

With double hashing, for a key

x

, one generates two hash values

f(x)

and

g(x)

, and then uses combinations

(f(x) +i g(x)) \bmod n

for

i=0,1,2,...

to generate multiple hash values in the range

[0,n-1]

from the initial two. For balanced allocations, keys are hashed into a hash table where each bucket can hold multiple keys, and each key is placed in the least loaded of

d

choices. It has been shown previously that asymptotically the performance of double hashing and fully random hashing is the same in the balanced allocation paradigm using fluid limit methods. Here we extend a coupling argument used by Lueker and Molodowitch to show that double hashing and ideal uniform hashing are asymptotically equivalent in the setting of open address hash tables to the balanced allocation setting, providing further insight into this phenomenon. We also discuss the potential for and bottlenecks limiting the use this approach for other multiple choice hashing schemes.Comment: 13 pages ; current draft ; will be submitted to conference shortl

arXiv.org e-Print Archive

Crossref

Balanced Allocations and Double Hashing

Author: Alon N.
Dillinger P.C.
Heileman G.
Mitzenmacher M.
Mitzenmacher M.
Richa A.
Vadhan S.
Vvedenskaya N.D.
Publication venue
Publication date: 29/01/2014
Field of study

Double hashing has recently found more common usage in schemes that use multiple hash functions. In double hashing, for an item

x

, one generates two hash values

f(x)

and

g(x)

, and then uses combinations

(f(x) +k g(x)) \bmod n

for

k=0,1,2,...

to generate multiple hash values from the initial two. We first perform an empirical study showing that, surprisingly, the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of

d

choices, as well as several related variants. We then provide theoretical results that explain the behavior of double hashing in this context.Comment: Further updated, small improvements/typos fixe

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open-addressing hashing with unequal-probability keys

Author: Mastrorocco G
Rossi D
Salvini R
Seddaiu M
Vanneschi C
Publication venue: Published by Elsevier Inc.
Publication date: 31/12/1980
Field of study

This paper describes the use of a drone in collecting data for mapping discontinuities within a marble quarry. A topographic survey was carried out in order to guarantee high spatial accuracy in the exterior orientation of images. Photos were taken close to the slopes and at different angles, depending on the orientation of the quarry walls. This approach was used to overcome the problem of shadow areas and to obtain detailed information on any feature desired. Dense three-dimensional (3D) point clouds obtained through image processing were used to rebuild the quarry geometry. Discontinuities were then mapped deterministically in detail. Joint attitude interpretation was not always possible due to the regular shape of the cut walls; for every discontinuity set we therefore also mapped the uncertainty. This, together with additional fracture characteristics, was used to build 3D discrete fracture network models. Preliminary results reveal the advantage of modern photogrammetric systems in producing detailed orthophotos; the latter allow accurate mapping in areas difficult to access (one of the main limitations of traditional techniques). The results highlight the benefits of integrating photogrammetric data with those collected through classical methods: the resulting knowledge of the site is crucially important in instability analyses involving numerical modelling.Part of the present study was undertaken within the framework of the Italian National Research Project PRIN2009, funded by the Ministry of Education, Universities and Research, which involves the collaboration between the University of Siena, ‘La Sapienza’ University of Rome, and USL1 of Massa and Carrara (Mining Engineering Operative Unit – Department of Prevention). The authors acknowledge M. Pellegri and D. Gullì (USL1, Mining Engineering Operative Unit – Department of Prevention), M. Ferrari, M. Profeti and V. Carnicelli (Cooperativa Cavatori Lorano), X. Chaoshui and P.A. Dowd (School of Civil, Environmental and Mining Engineering, University of Adelaide, South Australia) and M. Bocci (Geographike) for their support of this research

Elsevier - Publisher Connector

Crossref

Archivio della Ricerca - Università degli Studi di Siena

Directory of Open Access Journals

Open Research Exeter

Double Hashing Thresholds via Local Weak Convergence

Author: Leconte Mathieu
Publication venue: HAL CCSD
Publication date: 02/10/2013
Field of study

International audienceA lot of interest has recently arisen in the analysis of multiple-choice "cuckoo hashing" schemes. In this context, a main performance criterion is the load threshold under which the hashing scheme is able to build a valid hashtable with high probability in the limit of large systems; various techniques have successfully been used to answer this question (differential equations, combinatorics, cavity method) for increasing levels of generality of the model. However, the hashing scheme analysed so far is quite utopic in that it requires to generate a lot of independent, fully random choices. Schemes with reduced randomness exists, such as "double hashing", which is expected to provide similar asymptotic results as the ideal scheme, yet they have been more resistant to analysis so far. In this paper, we point out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold. The path followed is to show that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness

Crossref

INRIA a CCSD electronic archive server

Dynamic Space Efficient Hashing

Author: Maier Tobias
Sanders Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient, i.e., their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual memory overcommitting. However, these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments. Our main result is DySECT (Dynamic Space Efficient Cuckoo Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 98%. With up to 2.7 times better performance than the next best solution

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees

Author: De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Molero Edgar Costa
Vanbever Laurent
Publication venue
Publication date: 28/09/2023
Field of study

The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 Canary prototype and evaluate it on a Tofino switch. We then validate Canary through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art

arXiv.org e-Print Archive

Recommended from our members

Analysis and design of algorithms : double hashing and parallel graph searching

Author: Molodowitch Mariko
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

The following is in two parts, corresponding to the two separate topics in the dissertation.Probabilistic Analysis of Double HashingIn [GS78], a deep and elegant analysis shows that double hashing is asymptotically equivalent to the ideal uniform hashing up to a load factor of about 0.319. In this paper we show how a resampling technique can be used to develop a surprisingly simple proof of the result that this equivalence holds for load factors arbitrarily close to 1.Parallel Depth First Search of Planar Directed Acyclic GraphsIn 1988, Kao [Kao88] presented the first NC algorithm for the depth first search of a directed planar graph. Recently, Kao and Klein [KK90] reduced the number of processors required from O(n^4) to linear, but the time bound is O(log^8 n).We present an algorithm for the depth first search of a planar directed acyclic graph with k sources using O(n) processors and O(log k log n) time on a CRCW PRAM model. For planar dags with a single source and a single sink, we present a simple optimal algorithm which gives the depth first search in O(log n) time with O(n/log n) processors on an EREW PRAM. For a single-source multiple-sink planar dag, we have an O(log n) time O(n) processor EREW algorithm. The EREW algorithms assume that the embedding is given. A simplified variant of the depth first search of a multisource planar dag can be used to solve the single source reachability problem for a planar directed acyclic graph in O(log^2 n) time and O(n) processors on an CRCW PRAM. Since an O(log^4 n) algorithm for this problem is used as a subroutine by Kao and Klein in their depth first search for the general planar directed graph, this will lower their time bound by a factor of log^2 n. Our work uses the concept of a planar Euler tour depth first search, a depth first search in which the Euler tour around the tree is planar and crosses no tree edge. This concept may prove to be of use in other parallel algorithms for planar graphs

eScholarship - University of California

Random hypergraphs for hashing-based data structures

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2020
Field of study

This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,…,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: • A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. • A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt Wörterbücher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufällige Möglichkeiten zur Speicherung jedes Schlüssels vorzusehen. Man stelle sich vor, Information über eine Menge S von m = |S| Schlüsseln soll in n Speicherplätzen abgelegt werden, die durch [n] = {1,…,n} indiziert sind. Jeder Schlüssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von Speicherplätzen durch eine zufällige Hashfunktion unabhängig von anderen Schlüsseln zugewiesen. Die Information über x darf nun ausschließlich in den Plätzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele Schlüssel um dieselben Speicherplätze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. Für die meisten Verteilungen von e(x) lässt sich Erfolg oder Misserfolg allerdings sehr zuverlässig vorhersagen, da für Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und für c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. Hauptsächlich werden wir zwei Arten von Datenstrukturen betrachten: • Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder Schlüssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der Speicherplätze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema für dynamisch wachsenden Schlüsselmengen. • Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) für alle x [ELEMENT OF] S. Diesmal sollen die Werte in den Speicherplätzen aus e(x) eine lineare Gleichung erfüllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines Schlüssels y wird ein Ergebnis z zurückgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhängenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezüglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) für x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: Schälbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschäftigen sich daher mit den Wahrscheinlichkeiten für das Vorliegen dieser Eigenschaften abhängig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine Rückübersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergänzt und gestützt

Digitale Bibliothek Thüringen