Minimizing Task Initialization Overhead of Hadoop via HDFS Block Coalescing

Abstract

Department of Computer Science and EngineeringIn this work, we present a novel HDFS block coalescing scheme that mitigates the YARN container overhead. YARN is designed to be a generic resource manager that decouples programming models from the resource management infrastructure. We show that YARN???s generic design incurs significant overhead as each container must perform various initialization steps including the authentication. In order to reduce the container overhead without making significant changes to the existing YARN framework, we propose to leverage the input split, which is the logical representation of physical HDFS blocks. The HDFS block coalescing scheme creates large input splits to enable a single map wave and to reduce the number of containers and their initialization overhead. Our experimental study shows the block coalescing scheme significantly reduces the container overhead while it achieves good load balancing and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase.clos

    Similar works

    Full text

    thumbnail-image

    Available Versions