1 research outputs found

    Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

    No full text
    Many widely used, latency sensitive, data-parallel distributed systems, such as HDFS, Hive, and Spark choose to use the Java Virtual Machine (JVM), despite debate on the overhead of doing so. This thesis analyzes the extent and causes of the JVM performance overhead in the above mentioned systems. Surprisingly, we find that the warm-up overhead is frequently the bottleneck, taking 33% of execution time for a 1GB HDFS read, and an average of 21 seconds for Spark queries. The findings on JVM warm-up overhead reveal a contradiction between the principle of parallelization, i.e., speeding up long running jobs by parallelizing them into short tasks, and amortizing JVM warm-up overhead through long tasks. We solve this problem by designing HotTub, a modified JVM that amortizes the warm-up overhead over the lifetime of a cluster node instead of over a single job.M.A.S
    corecore