1,428 research outputs found
Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy
We consider a scenario involving computations over a massive dataset stored
distributedly across multiple workers, which is at the core of distributed
learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework
to simultaneously provide (1) resiliency against stragglers that may prolong
computations; (2) security against Byzantine (or malicious) workers that
deliberately modify the computation for their benefit; and (3)
(information-theoretic) privacy of the dataset amidst possible collusion of
workers. LCC, which leverages the well-known Lagrange polynomial to create
computation redundancy in a novel coded form across workers, can be applied to
any computation scenario in which the function of interest is an arbitrary
multivariate polynomial of the input dataset, hence covering many computations
of interest in machine learning. LCC significantly generalizes prior works to
go beyond linear computations. It also enables secure and private computing in
distributed settings, improving the computation and communication efficiency of
the state-of-the-art. Furthermore, we prove the optimality of LCC by showing
that it achieves the optimal tradeoff between resiliency, security, and
privacy, i.e., in terms of tolerating the maximum number of stragglers and
adversaries, and providing data privacy against the maximum number of colluding
workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the
conventional uncoded implementation of distributed least-squares linear
regression by up to , and also achieves a
- speedup over the state-of-the-art straggler
mitigation strategies
How to Securely Compute the Modulo-Two Sum of Binary Sources
In secure multiparty computation, mutually distrusting users in a network
want to collaborate to compute functions of data which is distributed among the
users. The users should not learn any additional information about the data of
others than what they may infer from their own data and the functions they are
computing. Previous works have mostly considered the worst case context (i.e.,
without assuming any distribution for the data); Lee and Abbe (2014) is a
notable exception. Here, we study the average case (i.e., we work with a
distribution on the data) where correctness and privacy is only desired
asymptotically.
For concreteness and simplicity, we consider a secure version of the function
computation problem of K\"orner and Marton (1979) where two users observe a
doubly symmetric binary source with parameter p and the third user wants to
compute the XOR. We show that the amount of communication and randomness
resources required depends on the level of correctness desired. When zero-error
and perfect privacy are required, the results of Data et al. (2014) show that
it can be achieved if and only if a total rate of 1 bit is communicated between
every pair of users and private randomness at the rate of 1 is used up. In
contrast, we show here that, if we only want the probability of error to vanish
asymptotically in block length, it can be achieved by a lower rate (binary
entropy of p) for all the links and for private randomness; this also
guarantees perfect privacy. We also show that no smaller rates are possible
even if privacy is only required asymptotically.Comment: 6 pages, 1 figure, extended version of submission to IEEE Information
Theory Workshop, 201
Finding the Median (Obliviously) with Bounded Space
We prove that any oblivious algorithm using space to find the median of a
list of integers from requires time . This bound also applies to the problem of determining whether the median
is odd or even. It is nearly optimal since Chan, following Munro and Raman, has
shown that there is a (randomized) selection algorithm using only
registers, each of which can store an input value or -bit counter,
that makes only passes over the input. The bound also implies
a size lower bound for read-once branching programs computing the low order bit
of the median and implies the analog of for length oblivious branching programs
Learning to Prune: Speeding up Repeated Computations
It is common to encounter situations where one must solve a sequence of
similar computational problems. Running a standard algorithm with worst-case
runtime guarantees on each instance will fail to take advantage of valuable
structure shared across the problem instances. For example, when a commuter
drives from work to home, there are typically only a handful of routes that
will ever be the shortest path. A naive algorithm that does not exploit this
common structure may spend most of its time checking roads that will never be
in the shortest path. More generally, we can often ignore large swaths of the
search space that will likely never contain an optimal solution.
We present an algorithm that learns to maximally prune the search space on
repeated computations, thereby reducing runtime while provably outputting the
correct solution each period with high probability. Our algorithm employs a
simple explore-exploit technique resembling those used in online algorithms,
though our setting is quite different. We prove that, with respect to our model
of pruning search spaces, our approach is optimal up to constant factors.
Finally, we illustrate the applicability of our model and algorithm to three
classic problems: shortest-path routing, string search, and linear programming.
We present experiments confirming that our simple algorithm is effective at
significantly reducing the runtime of solving repeated computations
- …