125 research outputs found
On randomness in Hash functions
In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful
Computational Utilities for the Game of Simplicial Nim
Simplicial nim games, a class of impartial games, have very interesting mathematical properties. Winning strategies on a simplicial nim game can be determined by the set of positions in the game whose Sprague-Grundy values are zero (also zero positions). In this work, I provide two major contributions to the study of simplicial nim games. First, I provide a modern and efficient implementation of the Sprague-Grundy function for an arbitrary simplicial complex, and discuss its performance and scope of viability. Secondly, I provide a method to find a simple mathematical expression to model that function if it exists. I show the effectiveness of this method on determining mathematical expressions that classify the set of zero positions onseveral simplicial nim games
Incremental Edge Orientation in Forests
For any forest G = (V, E) it is possible to orient the edges E so that no vertex in V has out-degree greater than 1. This paper considers the incremental edge-orientation problem, in which the edges E arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of 3 while flipping at most O(log log n) edge orientations per edge insertion, with high probability in n. The algorithm requires worst-case time O(log n log log n) per insertion, and takes amortized time O(1). The previous state of the art required up to O(log n / log log n) edge flips per insertion.
We then apply our edge-orientation results to the problem of dynamic Cuckoo hashing. The problem of designing simple families ? of hash functions that are compatible with Cuckoo hashing has received extensive attention. These families ? are known to satisfy static guarantees, but do not come typically with dynamic guarantees for the running time of inserts and deletes. We show how to transform static guarantees (for 1-associativity) into near-state-of-the-art dynamic guarantees (for O(1)-associativity) in a black-box fashion. Rather than relying on the family ? to supply randomness, as in past work, we instead rely on randomness within our table-maintenance algorithm
Recommended from our members
ALGORITHMS FOR MASSIVE, EXPENSIVE, OR OTHERWISE INCONVENIENT GRAPHS
A long-standing assumption common in algorithm design is that any part of the input is accessible at any time for unit cost. However, as we work with increasingly large data sets, or as we build smaller devices, we must revisit this assumption. In this thesis, I present some of my work on graph algorithms designed for circumstances where traditional assumptions about inputs do not apply. 1. Classical graph algorithms require direct access to the input graph and this is not feasible when the graph is too large to fit in memory. For computation on massive graphs we consider the dynamic streaming graph model. Given an input graph defined by as a stream of edge insertions and deletions, our goal is to approximate properties of this graph using space that is sublinear in the size of the stream. In this thesis, I present algorithms for approximating vertex connectivity, hypergraph edge connectivity, maximum coverage, unique coverage, and temporal connectivity in graph streams. 2. In certain applications the input graph is not explicitly represented, but its edges may be discovered via queries which require costly computation or measurement. I present two open-source systems which solve real-world problems via graph algorithms which may access their inputs only through costly edge queries. M ESH is a memory manager which compacts memory efficiently by finding an approximate graph matching subject to stringent time and edge query restrictions. PathCache is an efficiently scalable network measurement platform that outperforms the current state of the art
Applications of Derandomization Theory in Coding
Randomized techniques play a fundamental role in theoretical computer science
and discrete mathematics, in particular for the design of efficient algorithms
and construction of combinatorial objects. The basic goal in derandomization
theory is to eliminate or reduce the need for randomness in such randomized
constructions. In this thesis, we explore some applications of the fundamental
notions in derandomization theory to problems outside the core of theoretical
computer science, and in particular, certain problems related to coding theory.
First, we consider the wiretap channel problem which involves a communication
system in which an intruder can eavesdrop a limited portion of the
transmissions, and construct efficient and information-theoretically optimal
communication protocols for this model. Then we consider the combinatorial
group testing problem. In this classical problem, one aims to determine a set
of defective items within a large population by asking a number of queries,
where each query reveals whether a defective item is present within a specified
group of items. We use randomness condensers to explicitly construct optimal,
or nearly optimal, group testing schemes for a setting where the query outcomes
can be highly unreliable, as well as the threshold model where a query returns
positive if the number of defectives pass a certain threshold. Finally, we
design ensembles of error-correcting codes that achieve the
information-theoretic capacity of a large class of communication channels, and
then use the obtained ensembles for construction of explicit capacity achieving
codes.
[This is a shortened version of the actual abstract in the thesis.]Comment: EPFL Phd Thesi
Randomness Extraction in AC0 and with Small Locality
Randomness extractors, which extract high quality (almost-uniform) random
bits from biased random sources, are important objects both in theory and in
practice. While there have been significant progress in obtaining near optimal
constructions of randomness extractors in various settings, the computational
complexity of randomness extractors is still much less studied. In particular,
it is not clear whether randomness extractors with good parameters can be
computed in several interesting complexity classes that are much weaker than P.
In this paper we study randomness extractors in the following two models of
computation: (1) constant-depth circuits (AC0), and (2) the local computation
model. Previous work in these models, such as [Vio05a], [GVW15] and [BG13],
only achieve constructions with weak parameters. In this work we give explicit
constructions of randomness extractors with much better parameters. As an
application, we use our AC0 extractors to study pseudorandom generators in AC0,
and show that we can construct both cryptographic pseudorandom generators
(under reasonable computational assumptions) and unconditional pseudorandom
generators for space bounded computation with very good parameters.
Our constructions combine several previous techniques in randomness
extractors, as well as introduce new techniques to reduce or preserve the
complexity of extractors, which may be of independent interest. These include
(1) a general way to reduce the error of strong seeded extractors while
preserving the AC0 property and small locality, and (2) a seeded randomness
condenser with small locality.Comment: 62 page
Communication Steps for Parallel Query Processing
We consider the problem of computing a relational query on a large input
database of size , using a large number of servers. The computation is
performed in rounds, and each server can receive only
bits of data, where is a parameter that controls
replication. We examine how many global communication steps are needed to
compute . We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires , where is the fractional vertex
cover of the hypergraph of . We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication
- …