75,730 research outputs found

    Application profiling and resource management for MapReduce

    Get PDF
    Scale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity computers and processing data within restrictive time deadlines, are imperative. In this thesis, we target two aspects of distributed systems: application profiling and resource management. We study a MapReduce system in detail, which is a programming paradigm for large scale distributed computing, and presents solutions to tackle three key problems. Firstly, this thesis analyzes the characteristics of jobs running on the MapReduce system to reveal the problem—the Application scope of MapReduce has been extended beyond the original design goal that was large-scale data processing. This problem enables us to present a Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different MapReduce jobs with complementing resource usage characteristics. Secondly, this thesis studies the current job priority mechanism focusing on resource management. In the MapReduce system, job priority only exists at scheduling level. High priority jobs are placed at the front of the scheduling queue and dispatched first. Resource, however, is fairly shared among jobs running at the same worker node without any consideration for their priorities. In order to resolve this, this thesis presents a non-intrusive slot layering solution, which dynamically allocates resource between running jobs based on their priority and efficiently reduces the execution time of high priority jobs while improves overall throughput. Last, based on the fact of underutilization of resource at each individual worker node, this thesis propose a new way, Local Resource Shaper (LRS), to smooth resource consumption of each individual job by automatically tuning the execution of concurrent jobs to maximize resource utilization while minimizing resource contention

    Application profiling and resource management for MapReduce

    Get PDF
    Scale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity computers and processing data within restrictive time deadlines, are imperative. In this thesis, we target two aspects of distributed systems: application profiling and resource management. We study a MapReduce system in detail, which is a programming paradigm for large scale distributed computing, and presents solutions to tackle three key problems. Firstly, this thesis analyzes the characteristics of jobs running on the MapReduce system to reveal the problem—the Application scope of MapReduce has been extended beyond the original design goal that was large-scale data processing. This problem enables us to present a Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different MapReduce jobs with complementing resource usage characteristics. Secondly, this thesis studies the current job priority mechanism focusing on resource management. In the MapReduce system, job priority only exists at scheduling level. High priority jobs are placed at the front of the scheduling queue and dispatched first. Resource, however, is fairly shared among jobs running at the same worker node without any consideration for their priorities. In order to resolve this, this thesis presents a non-intrusive slot layering solution, which dynamically allocates resource between running jobs based on their priority and efficiently reduces the execution time of high priority jobs while improves overall throughput. Last, based on the fact of underutilization of resource at each individual worker node, this thesis propose a new way, Local Resource Shaper (LRS), to smooth resource consumption of each individual job by automatically tuning the execution of concurrent jobs to maximize resource utilization while minimizing resource contention

    An Optimization of Energy Saving in Cloud Environment

    Get PDF
    Cloud computing is a technology in distributed computing which facilitates pay per model based on user demand and requirement. Cloud can be defined as a collection of virtual machines. This includes both computational and storage facility. The goal of cloud computing is to provide efficient access to remote and geographically distributed resources. Cloud Computing is developing day by day and faces many challenges; one of them is i) Load Balancing and ii) Task scheduling. Load balancing is defined as division of the amount of work that a system has to do between two or more systems so that more work gets done in the same amount of time and all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Load balancing is mainly used for server clustering. Task Scheduling is a set of policies to control the work order to be performed by a system. It is also a technique which is used to improve the overall execution time of the job. Task Scheduling is responsible for selection of best suitable resources for task execution, by taking some parameters into consideration. A good task scheduler adapts its scheduling strategy according to the changing environment and the type of task. In this paper, the Energy Saving Load Balancing (ESLB) Algorithm and Energy Saving Task Scheduling (ESTS) algorithm was proposed. The various scheduling algorithms (FCFS, RR, PRIORITY, and SJF) are reviewed and compared. The ESLB algorithm and ESTS algorithm was tested in cloudsim toolkit and the result shows better performance
    • …
    corecore