Location of Repository

Assigning Tasks for Efficiency in Hadoop

By Michael J. Fischer, Xueyuan Su and Yitong Yin

Abstract

In recent years Google’s MapReduce has emerged as a leading large-scale data processing architecture. Adopted by companies such as Amazon, Facebook, Google, IBM and Yahoo! in daily use, and more recently put in use by several universities, it allows parallel processing of huge volumes of data over cluster of machines. Hadoop is a free Java implementation of MapReduce. In Hadoop, files are split into blocks and replicated and spread over all servers in a network. Each job is also split into many small pieces called tasks. Several tasks are processed on a single server, and a job is not completed until all the assigned tasks are finished. A crucial factor that affects the completion time of a job is the particular assignment of tasks to servers. Given a placement of the input data over servers, one wishes to find the assignment that minimizes the total completion time. In this paper, an idealized Hadoop model is proposed to investigate the Hadoop task assignment problem. It is shown that there is no feasible algorithm to find the optimal Hadoop task assignment unless P = NP. Assignments that are computed by the round robin algorithm inspired by the current Hadoop scheduler are shown to deviate from optimum by a multiplicative factor in the worst case. A flow-based algorithm is presented that computes assignments that are optimal to within an additive constant

Topics: and concurrency, F.1.3 [Computation by Abstract Devices, Complexity Measures and Classes—reducibility and completeness, F.2.2 [Analysis of Algorithms and Problem Complexity, Nonnumerical Algorithms and Problems—sequencing and scheduling General Terms, Algorithms, Performance, The
Year: 2010
OAI identifier: oai:CiteSeerX.psu:10.1.1.192.670
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.yale.edu/publica... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.