319 research outputs found
MOON: MapReduce On Opportunistic eNvironments
Abstract—MapReduce offers a flexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments
Communication-Computation Efficient Gradient Coding
This paper develops coding techniques to reduce the running time of
distributed learning tasks. It characterizes the fundamental tradeoff to
compute gradients (and more generally vector summations) in terms of three
parameters: computation load, straggler tolerance and communication cost. It
further gives an explicit coding scheme that achieves the optimal tradeoff
based on recursive polynomial constructions, coding both across data subsets
and vector components. As a result, the proposed scheme allows to minimize the
running time for gradient computations. Implementations are made on Amazon EC2
clusters using Python with mpi4py package. Results show that the proposed
scheme maintains the same generalization error while reducing the running time
by compared to uncoded schemes and compared to prior coded
schemes focusing only on stragglers (Tandon et al., ICML 2017)
- …