10,846 research outputs found
Lower Bounds for Oblivious Data Structures
An oblivious data structure is a data structure where the memory access
patterns reveals no information about the operations performed on it. Such data
structures were introduced by Wang et al. [ACM SIGSAC'14] and are intended for
situations where one wishes to store the data structure at an untrusted server.
One way to obtain an oblivious data structure is simply to run a classic data
structure on an oblivious RAM (ORAM). Until very recently, this resulted in an
overhead of for the most natural setting of parameters.
Moreover, a recent lower bound for ORAMs by Larsen and Nielsen [CRYPTO'18] show
that they always incur an overhead of at least if used in a
black box manner. To circumvent the overhead, researchers have
instead studied classic data structure problems more directly and have obtained
efficient solutions for many such problems such as stacks, queues, deques,
priority queues and search trees. However, none of these data structures
process operations faster than , leaving open the question of
whether even faster solutions exist. In this paper, we rule out this
possibility by proving lower bounds for oblivious stacks,
queues, deques, priority queues and search trees.Comment: To appear at SODA'1
Snapshot-Oblivious RAMs: Sub-Logarithmic Efficiency for Short Transcripts
Oblivious RAM (ORAM) is a powerful technique to prevent harmful data breaches. Despite tremendous progress in improving the concrete performance of ORAM, it remains too slow for use in many practical settings; recent breakthroughs in lower bounds indicate this inefficiency is inherent for ORAM and even some natural relaxations.
This work introduces snapshot-oblivious RAMs, a new secure memory access primitive. Snapshot-oblivious RAMs bypass lower bounds by providing security only for transcripts whose length (call it c) is fixed and known ahead of time. Intuitively, snapshot-oblivious RAMs provide strong security for attacks of short duration, such as the snapshot attacks targeted by many encrypted databases.
We give an ORAM-style definition of this new primitive, and present several constructions. The underlying design principle of our constructions is to store the history of recent operations in a data structure that can be accessed obliviously. We instantiate this paradigm with data structures that remain on the client, giving a snapshot-oblivious RAM with constant bandwidth overhead. We also show how these data structures can be stored on the server and accessed using oblivious memory primitives. Our most efficient instantiation achieves O(log c) bandwidth overhead. By extending recent ORAM lower bounds, we show this performance is asymptotically optimal. Along the way, we define a new hash queue data structure—essentially, a dictionary whose elements can be modified in a first-in-first-out fashion—which may be of independent interest
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
Communication-optimal Parallel and Sequential Cholesky Decomposition
Numerical algorithms have two kinds of costs: arithmetic and communication,
by which we mean either moving data between levels of a memory hierarchy (in
the sequential case) or over a network connecting processors (in the parallel
case). Communication costs often dominate arithmetic costs, so it is of
interest to design algorithms minimizing communication. In this paper we first
extend known lower bounds on the communication cost (both for bandwidth and for
latency) of conventional (O(n^3)) matrix multiplication to Cholesky
factorization, which is used for solving dense symmetric positive definite
linear systems. Second, we compare the costs of various Cholesky decomposition
implementations to these lower bounds and identify the algorithms and data
structures that attain them. In the sequential case, we consider both the
two-level and hierarchical memory models. Combined with prior results in [13,
14, 15], this gives a set of communication-optimal algorithms for O(n^3)
implementations of the three basic factorizations of dense linear algebra: LU
with pivoting, QR and Cholesky. But it goes beyond this prior work on
sequential LU by optimizing communication for any number of levels of memory
hierarchy.Comment: 29 pages, 2 tables, 6 figure
Online Sorting via Searching and Selection
In this paper, we present a framework based on a simple data structure and
parameterized algorithms for the problems of finding items in an unsorted list
of linearly ordered items based on their rank (selection) or value (search). As
a side-effect of answering these online selection and search queries, we
progressively sort the list. Our algorithms are based on Hoare's Quickselect,
and are parameterized based on the pivot selection method.
For example, if we choose the pivot as the last item in a subinterval, our
framework yields algorithms that will answer q<=n unique selection and/or
search queries in a total of O(n log q) average time. After q=\Omega(n) queries
the list is sorted. Each repeated selection query takes constant time, and each
repeated search query takes O(log n) time. The two query types can be
interleaved freely. By plugging different pivot selection methods into our
framework, these results can, for example, become randomized expected time or
deterministic worst-case time. Our methods are easy to implement, and we show
they perform well in practice
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
- …