Search CORE

75,730 research outputs found

Application profiling and resource management for MapReduce

Author: Nagtegaal Iris D.
Porte Robert J.
Ruys Anthony T.
Tanis Pieter J.
van Duijvendijk Peter
van Gulik Thomas M.
Verhoef Cornelis
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2014
Field of study

Scale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity computers and processing data within restrictive time deadlines, are imperative. In this thesis, we target two aspects of distributed systems: application profiling and resource management. We study a MapReduce system in detail, which is a programming paradigm for large scale distributed computing, and presents solutions to tackle three key problems. Firstly, this thesis analyzes the characteristics of jobs running on the MapReduce system to reveal the problem—the Application scope of MapReduce has been extended beyond the original design goal that was large-scale data processing. This problem enables us to present a Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different MapReduce jobs with complementing resource usage characteristics. Secondly, this thesis studies the current job priority mechanism focusing on resource management. In the MapReduce system, job priority only exists at scheduling level. High priority jobs are placed at the front of the scheduling queue and dispatched first. Resource, however, is fairly shared among jobs running at the same worker node without any consideration for their priorities. In order to resolve this, this thesis presents a non-intrusive slot layering solution, which dynamically allocates resource between running jobs based on their priority and efficiently reduces the execution time of high priority jobs while improves overall throughput. Last, based on the fact of underutilization of resource at each individual worker node, this thesis propose a new way, Local Resource Shaper (LRS), to smooth resource consumption of each individual job by automatically tuning the execution of concurrent jobs to maximize resource utilization while minimizing resource contention

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

EUR Research Repository

Sydney eScholarship

Radboud Repository

Dissertations of the University of Groningen

Application profiling and resource management for MapReduce

Author: Lu Peng
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

Sydney eScholarship

Recommended from our members

Priority-grouping method for parallel multi-scheduling in Grid

Author: Abraham GT
James A
Yaacob N
Publication venue: 'Elsevier BV'
Publication date: 01/09/2015
Field of study

With the advent in multicore computers, the scheduling of Grid jobs can be made more effective if scaled to fully utilize the underlying hardware, and parallelized to benefit from the exploitation of multicores. The fact that sequential algorithms do not scale with multicore systems nor benefit from parallelism remains a major obstacle to scheduling in the Grid. As multicore systems become ever more pervasive in our computing lives, over reliance on such systems for passive parallelism does not offer the best option in harnessing the benefits of their multiprocessors for Grid scheduling. An explicit means of exploiting parallelism for Grid scheduling is required. The Group-based Parallel Multi-scheduler, introduced in this paper, is aimed at effectively exploiting the benefits of multicore systems for Grid scheduling by splitting jobs and machines into paired groups and independently scheduling jobs in parallel from those groups. We implemented two job grouping methods, Execution Time Balanced (ETB) and Execution Time Sorted then Balanced (ETSB), and two machine grouping methods, Evenly Distributed (EvenDist) and Similar Together (SimTog). For each method, we varied the number of groups between 2, 4 and 8. We then executed the MinMin Grid scheduling algorithm independently within the groups. We demonstrated that by sharing jobs and machines into groups before scheduling, the computation time for the scheduling process drastically improved by magnitudes of 85% over the ordinary MinMin algorithm when implemented on a HPC system. We also found that our balanced group based approach achieved better results than our previous Priority based grouping approach

Nottingham Trent Institutional Repository (IRep)

Coventry University Pure Portal

Recommended from our members

Group-based parallel multi-scheduler for grid computing

Author: Abraham GT
James A
Yaacob N
Publication venue: 'Elsevier BV'
Publication date: 01/09/2015
Field of study

Nottingham Trent Institutional Repository (IRep)

Coventry University Pure Portal

An Optimization of Energy Saving in Cloud Environment

Author: N. Kalyana Sundaram, Dr. S.P. Rajagopalan
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2017
Field of study

Cloud computing is a technology in distributed computing which facilitates pay per model based on user demand and requirement. Cloud can be defined as a collection of virtual machines. This includes both computational and storage facility. The goal of cloud computing is to provide efficient access to remote and geographically distributed resources. Cloud Computing is developing day by day and faces many challenges; one of them is i) Load Balancing and ii) Task scheduling. Load balancing is defined as division of the amount of work that a system has to do between two or more systems so that more work gets done in the same amount of time and all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Load balancing is mainly used for server clustering. Task Scheduling is a set of policies to control the work order to be performed by a system. It is also a technique which is used to improve the overall execution time of the job. Task Scheduling is responsible for selection of best suitable resources for task execution, by taking some parameters into consideration. A good task scheduler adapts its scheduling strategy according to the changing environment and the type of task. In this paper, the Energy Saving Load Balancing (ESLB) Algorithm and Energy Saving Task Scheduling (ESTS) algorithm was proposed. The various scheduling algorithms (FCFS, RR, PRIORITY, and SJF) are reviewed and compared. The ESLB algorithm and ESTS algorithm was tested in cloudsim toolkit and the result shows better performance

International Journal on Recent and Innovation Trends in Computing and Communication