Search CORE

21,717 research outputs found

Recommended from our members

Scheduling reentrant jobs on parallel machines with a remote server

Author: Chakhlevitch K.
Glass C.
Publication venue: Faculty of Actuarial Science & Insurance, City University London
Publication date: 01/01/2008
Field of study

This paper explores a specific combinatorial problem relating to re-entrant jobs on parallel primary machines, with a remote server machine. A middle operation is required by each job on the server before it returns to its primary processing machine. The problem is inspired by the logistics of a semi-automated micro-biology laboratory. The testing programme in the laboratory corresponds roughly to a hybrid flowshop, whose bottleneck stage is the subject of study. We demonstrate the NP-hard nature of the problem, and provide various structural features. A heuristic is developed and tested on randomly generated benchmark data. Results indicate solutions reliably within 1.5% of optimum. We also provide a greedy 2-approximation algorithm. Test on real-life data from the microbiology laboratory indicate a 20% saving relative to current practice, which is more than can be achieved currently with 3 instead of 2 people staffing the primary machines

City Research Online

Predicting Scheduling Failures in the Cloud

Author: Khomh Foutse
Soualhia Mbarka
Tahar Sofiene
Publication venue
Publication date: 01/01/2015
Field of study

Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud computing systems, the majority relying on the computational complexity of tasks and the distribution of resources. However, several tasks scheduled following these algorithms still fail because of unforeseen changes in the cloud environments. In this paper, using tasks execution and resource utilization data extracted from the execution traces of real world applications at Google, we explore the possibility of predicting the scheduling outcome of a task using statistical models. If we can successfully predict tasks failures, we may be able to reduce the execution time of jobs by rescheduling failed tasks earlier (i.e., before their actual failing time). Our results show that statistical models can predict task failures with a precision up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of such predictions using the tool kit GloudSim and found that they can improve the number of finished tasks by up to 40%. We also perform a case study using the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene expression correlations analysis study from breast cancer research. We find that when extending the scheduler of Hadoop with our predictive models, the percentage of failed jobs can be reduced by up to 45%, with an overhead of less than 5 minutes

arXiv.org e-Print Archive

PolyPublie

Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

Author: Murali Prakash
Vadhiyar Sathish
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/12/2017
Field of study

High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

arXiv.org e-Print Archive

Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach

Author: Buyya Rajkumar
Hilman Muhammad H.
Rodriguez Maria A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/10/2018
Field of study

Many algorithms in workflow scheduling and resource provisioning rely on the performance estimation of tasks to produce a scheduling plan. A profiler that is capable of modeling the execution of tasks and predicting their runtime accurately, therefore, becomes an essential part of any Workflow Management System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS) platforms that use clouds for deploying scientific workflows, task runtime prediction becomes more challenging because it requires the processing of a significant amount of data in a near real-time scenario while dealing with the performance variability of cloud resources. Hence, relying on methods such as profiling tasks' execution data using basic statistical description (e.g., mean, standard deviation) or batch offline regression techniques to estimate the runtime may not be suitable for such environments. In this paper, we propose an online incremental learning approach to predict the runtime of tasks in scientific workflows in clouds. To improve the performance of the predictions, we harness fine-grained resources monitoring data in the form of time-series records of CPU utilization, memory usage, and I/O activities that are reflecting the unique characteristics of a task's execution. We compare our solution to a state-of-the-art approach that exploits the resources monitoring data based on regression machine learning technique. From our experiments, the proposed strategy improves the performance, in terms of the error, up to 29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM International Conference on Utility and Cloud Computin

arXiv.org e-Print Archive

Crossref

Comparative study of different approaches to solve batch process scheduling and optimisation problems

Author: Huang Wei
Sun Yanming
Tan Yaqing
Yue Yong
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2012
Field of study

Effective approaches are important to batch process scheduling problems, especially those with complex constraints. However, most research focus on improving optimisation techniques, and those concentrate on comparing their difference are inadequate. This study develops an optimisation model of batch process scheduling problems with complex constraints and investigates the performance of different optimisation techniques, such as Genetic Algorithm (GA) and Constraint Programming (CP). It finds that CP has a better capacity to handle batch process problems with complex constraints but it costs longer time

University of Bedfordshire Repository

Experimental Study of Remote Job Submission and Execution on LRM through Grid Computing Mechanisms

Author: Prajapati Harshadkumar B.
Shah Vipul A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/07/2014
Field of study

Remote job submission and execution is fundamental requirement of distributed computing done using Cluster computing. However, Cluster computing limits usage within a single organization. Grid computing environment can allow use of resources for remote job execution that are available in other organizations. This paper discusses concepts of batch-job execution using LRM and using Grid. The paper discusses two ways of preparing test Grid computing environment that we use for experimental testing of concepts. This paper presents experimental testing of remote job submission and execution mechanisms through LRM specific way and Grid computing ways. Moreover, the paper also discusses various problems faced while working with Grid computing environment and discusses their trouble-shootings. The understanding and experimental testing presented in this paper would become very useful to researchers who are new to the field of job management in Grid.Comment: Fourth International Conference on Advanced Computing & Communication Technologies (ACCT), 201

arXiv.org e-Print Archive

Crossref