13 research outputs found
Network calculus for parallel processing
In this note, we present preliminary results on the use of "network calculus"
for parallel processing systems, specifically MapReduce
An Optimized Model for MapReduce Based on Hadoop
Aiming at the waste of computing resources resulting from sequential control of running mechanism of MapReduce model on Hadoop platform,Fork/Join framework has been introduced into this model to make full use of CPU resource of each node. From the perspective of fine-grained parallel data processing, combined with Fork/Join framework,a parallel and multi-thread model,this paper optimizes MapReduce model and puts forward a MapReduce+Fork/Join programming model which is a distributed and parallel architecture combined with coarse-grained and fine-grained on Hadoop platform to Support two-tier levels of parallelism architecture both in shared and distributed memory machines. A test is made under the environment of Hadoop cluster composed of four nodes. And the experimental results prove that this model really can improve performance and efficiency of the whole system and it is not only suitable for handling tasks with data intensive but also tasks with computing intensive. it is an effective optimization and improvement to the MapReduce model of big data processing
Offline Scheduling of Map and Reduce Tasks on Hadoop Systems
International audienceMapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the time-indexed formulation. We aim consider the maximum of constraints of the MapReduce environment. Solutions for the presented model would be a reference for the on-line Schedules in the case of low and medium instances. Our work is useful in term of the problem definition: constraints are based on observations and take into account resources consumption, data locality, heterogeneous machines and workflow management; this paper defines boundaries references to evaluate the online model
Scheduling MapReduce Jobs under Multi-Round Precedences
We consider non-preemptive scheduling of MapReduce jobs with multiple tasks
in the practical scenario where each job requires several map-reduce rounds. We
seek to minimize the average weighted completion time and consider scheduling
on identical and unrelated parallel processors. For identical processors, we
present LP-based O(1)-approximation algorithms. For unrelated processors, the
approximation ratio naturally depends on the maximum number of rounds of any
job. Since the number of rounds per job in typical MapReduce algorithms is a
small constant, our scheduling algorithms achieve a small approximation ratio
in practice. For the single-round case, we substantially improve on previously
best known approximation guarantees for both identical and unrelated
processors. Moreover, we conduct an experimental analysis and compare the
performance of our algorithms against a fast heuristic and a lower bound on the
optimal solution, thus demonstrating their promising practical performance
Energy Efficient Scheduling of MapReduce Jobs
MapReduce is emerged as a prominent programming model for data-intensive
computation. In this work, we study power-aware MapReduce scheduling in the
speed scaling setting first introduced by Yao et al. [FOCS 1995]. We focus on
the minimization of the total weighted completion time of a set of MapReduce
jobs under a given budget of energy. Using a linear programming relaxation of
our problem, we derive a polynomial time constant-factor approximation
algorithm. We also propose a convex programming formulation that we combine
with standard list scheduling policies, and we evaluate their performance using
simulations.Comment: 22 page
Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors
We propose constant approximation algorithms for generalizations of the
Flexible Flow Shop (FFS) problem which form a realistic model for
non-preemptive scheduling in MapReduce systems. Our results concern the
minimization of the total weighted completion time of a set of MapReduce jobs
on unrelated processors and improve substantially on the model proposed by
Moseley et al. (SPAA 2011) in two directions. First, we consider each job
consisting of multiple Map and Reduce tasks, as this is the key idea behind
MapReduce computations, and we propose a constant approximation algorithm.
Then, we introduce into our model the crucial cost of data shuffle phase, i.e.,
the cost for the transmission of intermediate data from Map to Reduce tasks. In
fact, we model this phase by an additional set of Shuffle tasks for each job
and we manage to keep the same approximation ratio when they are scheduled on
the same processors with the corresponding Reduce tasks and to provide also a
constant ratio when they are scheduled on different processors. This is the
most general setting of the FFS problem (with a special third stage) for which
a constant approximation ratio is known