A Survey on Optimal Scheduler: Improving Efficiency in Parallel Execution Tasks in Hadoop

Abstract

Hadoop’s implementation of the Map Reduce programming model pipelines the data processing and provides fault tolerance. Input data is partitioned and distributed as map tasks to individual cluster nodes for parallel execution. Map task splits the input data that is on the Hadoop Distributed File System and map function is applied to the input data. iShuffle finds the number of map output partitions and it places map output partition to nodes. Shufflers and the shuffle manager are the components used in iShuffle. The shuffler implements an operation which pushes the output data of mapping process to different nodes. Here, multiple servers are used to produce results in a short time. Data sets related to air pollution are collected. They are processed by the servers. This increases the efficiency and reduces the job completion time

    Similar works