27 research outputs found
Data-Oblivious Graph Algorithms in Outsourced External Memory
Motivated by privacy preservation for outsourced data, data-oblivious
external memory is a computational framework where a client performs
computations on data stored at a semi-trusted server in a way that does not
reveal her data to the server. This approach facilitates collaboration and
reliability over traditional frameworks, and it provides privacy protection,
even though the server has full access to the data and he can monitor how it is
accessed by the client. The challenge is that even if data is encrypted, the
server can learn information based on the client data access pattern; hence,
access patterns must also be obfuscated. We investigate privacy-preserving
algorithms for outsourced external memory that are based on the use of
data-oblivious algorithms, that is, algorithms where each possible sequence of
data accesses is independent of the data values. We give new efficient
data-oblivious algorithms in the outsourced external memory model for a number
of fundamental graph problems. Our results include new data-oblivious
external-memory methods for constructing minimum spanning trees, performing
various traversals on rooted trees, answering least common ancestor queries on
trees, computing biconnected components, and forming open ear decompositions.
None of our algorithms make use of constant-time random oracles.Comment: 20 page
Worst-Case Optimal Algorithms for Parallel Query Processing
In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query when all relations have
size equal to is , where is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries
On the Computational Complexity of MapReduce
In this paper we study MapReduce computations from a complexity-theoretic
perspective. First, we formulate a uniform version of the MRC model of Karloff
et al. (2010). We then show that the class of regular languages, and moreover
all of sublogarithmic space, lies in constant round MRC. This result also
applies to the MPC model of Andoni et al. (2014). In addition, we prove that,
conditioned on a variant of the Exponential Time Hypothesis, there are strict
hierarchies within MRC so that increasing the number of rounds or the amount of
time per processor increases the power of MRC. To the best of our knowledge we
are the first to approach the MapReduce model with complexity-theoretic
techniques, and our work lays the foundation for further analysis relating
MapReduce to established complexity classes
Communication Steps for Parallel Query Processing
We consider the problem of computing a relational query on a large input
database of size , using a large number of servers. The computation is
performed in rounds, and each server can receive only
bits of data, where is a parameter that controls
replication. We examine how many global communication steps are needed to
compute . We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires , where is the fractional vertex
cover of the hypergraph of . We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication