Search CORE

13 research outputs found

Network calculus for parallel processing

Author: Kamarava S.
Kesidis G.
Liebeherr J.
Shan Y.
Urgaonkar B.
Publication venue
Publication date: 31/01/2015
Field of study

In this note, we present preliminary results on the use of "network calculus" for parallel processing systems, specifically MapReduce

arXiv.org e-Print Archive

CiteSeerX

An Optimized Model for MapReduce Based on Hadoop

Author: Hong Zhang
Jie Cao
Min Wang
Xiao-ming Wang
Yan-hong Ma
Yi-rong Guo
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2016
Field of study

Aiming at the waste of computing resources resulting from sequential control of running mechanism of MapReduce model on Hadoop platform，Fork/Join framework has been introduced into this model to make full use of CPU resource of each node. From the perspective of fine-grained parallel data processing, combined with Fork/Join framework，a parallel and multi-thread model，this paper optimizes MapReduce model and puts forward a MapReduce+Fork/Join programming model which is a distributed and parallel architecture combined with coarse-grained and fine-grained on Hadoop platform to Support two-tier levels of parallelism architecture both in shared and distributed memory machines. A test is made under the environment of Hadoop cluster composed of four nodes. And the experimental results prove that this model really can improve performance and efficiency of the whole system and it is not only suitable for handling tasks with data intensive but also tasks with computing intensive. it is an effective optimization and improvement to the MapReduce model of big data processing

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Offline Scheduling of Map and Reduce Tasks on Hadoop Systems

Author: Jlassi Aymen
Martineau Patrick
Tkindt Vincent
Publication venue: 'Scitepress'
Publication date: 01/01/2015
Field of study

International audienceMapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the time-indexed formulation. We aim consider the maximum of constraints of the MapReduce environment. Solutions for the presented model would be a reference for the on-line Schedules in the case of low and medium instances. Our work is useful in term of the problem definition: constraints are based on observations and take into account resources consumption, data locality, heterogeneous machines and workflow management; this paper defines boundaries references to evaluate the online model

Crossref

HAL Université de Tours

Scheduling MapReduce Jobs under Multi-Round Precedences

Author: AM Hariri
D Fotakis
DB Shmoys
FN Afrati
J Aspnes
JR Correa
LA Hall
M Mastrolilli
M Queyranne
M Queyranne
RL Graham
WL Eastman
Publication venue
Publication date: 16/02/2016
Field of study

We consider non-preemptive scheduling of MapReduce jobs with multiple tasks in the practical scenario where each job requires several map-reduce rounds. We seek to minimize the average weighted completion time and consider scheduling on identical and unrelated parallel processors. For identical processors, we present LP-based O(1)-approximation algorithms. For unrelated processors, the approximation ratio naturally depends on the maximum number of rounds of any job. Since the number of rounds per job in typical MapReduce algorithms is a small constant, our scheduling algorithms achieve a small approximation ratio in practice. For the single-round case, we substantially improve on previously best known approximation guarantees for both identical and unrelated processors. Moreover, we conduct an experimental analysis and compare the performance of our algorithms against a fast heuristic and a lower bound on the optimal solution, thus demonstrating their promising practical performance

arXiv.org e-Print Archive

Crossref

Energy Efficient Scheduling of MapReduce Jobs

Author: Bampis Evripidis
Chau Vincent
Letsios Dimitrios
Lucarelli Giorgio
Milis Ioannis
Zois Georgios
Publication venue
Publication date: 01/01/2014
Field of study

MapReduce is emerged as a prominent programming model for data-intensive computation. In this work, we study power-aware MapReduce scheduling in the speed scaling setting first introduced by Yao et al. [FOCS 1995]. We focus on the minimization of the total weighted completion time of a set of MapReduce jobs under a given budget of energy. Using a linear programming relaxation of our problem, we derive a polynomial time constant-factor approximation algorithm. We also propose a convex programming formulation that we combine with standard list scheduling policies, and we evaluate their performance using simulations.Comment: 22 page

arXiv.org e-Print Archive

HAL Evry

Crossref

Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors

Author: Dimitris Fotakis
Emmanouil Zampetakis
Georgios Zois
Ioannis Milis
Orestis Papadigenopoulos
Publication venue
Publication date: 24/06/2014
Field of study

We propose constant approximation algorithms for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two directions. First, we consider each job consisting of multiple Map and Reduce tasks, as this is the key idea behind MapReduce computations, and we propose a constant approximation algorithm. Then, we introduce into our model the crucial cost of data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. In fact, we model this phase by an additional set of Shuffle tasks for each job and we manage to keep the same approximation ratio when they are scheduled on the same processors with the corresponding Reduce tasks and to provide also a constant ratio when they are scheduled on different processors. This is the most general setting of the FFS problem (with a special third stage) for which a constant approximation ratio is known

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

Optimizing big data processing performance in the public cloud: opportunities and approaches

Author: Dan Wang
Jiangchuan Liu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref