Search CORE

53,655 research outputs found

An Approach for Modeling and Ranking Node-level Stragglers in Cloud Datacenters

Author: Garraghan P
Ouyang X
Townend P
Wang C
Xu J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2016
Field of study

The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance

Crossref

Lancaster E-Prints

White Rose Research Online

Predicting Scheduling Failures in the Cloud

Author: Khomh Foutse
Soualhia Mbarka
Tahar Sofiene
Publication venue
Publication date: 01/01/2015
Field of study

Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud computing systems, the majority relying on the computational complexity of tasks and the distribution of resources. However, several tasks scheduled following these algorithms still fail because of unforeseen changes in the cloud environments. In this paper, using tasks execution and resource utilization data extracted from the execution traces of real world applications at Google, we explore the possibility of predicting the scheduling outcome of a task using statistical models. If we can successfully predict tasks failures, we may be able to reduce the execution time of jobs by rescheduling failed tasks earlier (i.e., before their actual failing time). Our results show that statistical models can predict task failures with a precision up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of such predictions using the tool kit GloudSim and found that they can improve the number of finished tasks by up to 40%. We also perform a case study using the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene expression correlations analysis study from breast cancer research. We find that when extending the scheduler of Hadoop with our predictive models, the percentage of failed jobs can be reduced by up to 45%, with an overhead of less than 5 minutes

arXiv.org e-Print Archive

PolyPublie

Managing Uncertainty: A Case for Probabilistic Grid Scheduling

Author: Lazarevic Aleksandar
Prnjat Ognjen
Sacks Lionel
Publication venue
Publication date: 01/07/2006
Field of study

The Grid technology is evolving into a global, service-orientated architecture, a universal platform for delivering future high demand computational services. Strong adoption of the Grid and the utility computing concept is leading to an increasing number of Grid installations running a wide range of applications of different size and complexity. In this paper we address the problem of elivering deadline/economy based scheduling in a heterogeneous application environment using statistical properties of job historical executions and its associated meta-data. This approach is motivated by a study of six-month computational load generated by Grid applications in a multi-purpose Grid cluster serving a community of twenty e-Science projects. The observed job statistics, resource utilisation and user behaviour is discussed in the context of management approaches and models most suitable for supporting a probabilistic and autonomous scheduling architecture

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Statistic Rate Monotonic Scheduling

Author: Atlas Alia
Bestavros Azer
Publication venue: Boston University Computer Science Department
Publication date: 02/05/1998
Field of study

In this paper we present Statistical Rate Monotonic Scheduling (SRMS), a generalization of the classical RMS results of Liu and Layland that allows scheduling periodic tasks with highly variable execution times and statistical QoS requirements. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. The feasibility test for SRMS ensures that using SRMS' scheduling algorithms, it is possible for a given periodic task set to share a given resource (e.g. a processor, communication medium, switching device, etc.) in such a way that such sharing does not result in the violation of any of the periodic tasks QoS constraints. The SRMS scheduling algorithm incorporates a number of unique features. First, it allows for fixed priority scheduling that keeps the tasks' value (or importance) independent of their periods. Second, it allows for job admission control, which allows the rejection of jobs that are not guaranteed to finish by their deadlines as soon as they are released, thus enabling the system to take necessary compensating actions. Also, admission control allows the preservation of resources since no time is spent on jobs that will miss their deadlines anyway. Third, SRMS integrates reservation-based and best-effort resource scheduling seamlessly. Reservation-based scheduling ensures the delivery of the minimal requested QoS; best-effort scheduling ensures that unused, reserved bandwidth is not wasted, but rather used to improve QoS further. Fourth, SRMS allows a system to deal gracefully with overload conditions by ensuring a fair deterioration in QoS across all tasks---as opposed to penalizing tasks with longer periods, for example. Finally, SRMS has the added advantage that its schedulability test is simple and its scheduling algorithm has a constant overhead in the sense that the complexity of the scheduler is not dependent on the number of the tasks in the system. We have evaluated SRMS against a number of alternative scheduling algorithms suggested in the literature (e.g. RMS and slack stealing), as well as refinements thereof, which we describe in this paper. Consistently throughout our experiments, SRMS provided the best performance. In addition, to evaluate the optimality of SRMS, we have compared it to an inefficient, yet optimal scheduler for task sets with harmonic periods.National Science Foundation (CCR-970668

Boston University Institutional Repository (OpenBU)

Recommended from our members

CAN CHANGE PREDICTION HELP PRIORITISE REDESIGN WORK IN FUTURE ENGINEERING SYSTEMS?

Author: Caldwell NHM
Clarkson PJ
Wynn DC
Publication venue: 11TH INTERNATIONAL DESIGN CONFERENCE (DESIGN 2010), VOL 1-3
Publication date: 01/01/2010
Field of study

Future design environments will necessitate improved management of the propagation and impacts of changes. To ascertain whether change prediction can assist in making better work prioritisation decisions, this paper develops a new simulation approach and applies it to a model of a complex aerospace product, which was elicited from industry. We use an accepted technique to generate potential change propagation trees and apply Monte Carlo methods to generate a sample space within which multiple scheduling policies could be evaluated and compared. The experiments reveal that poor coordination of change activity can result in significant process inefficiencies, that the potential for inefficiency increases for larger change networks, and that a modest ability to accurately predict change propagation in the specific case at hand could have a dramatic effect in reducing unnecessary rework. The experiments also suggest that the capability of predicting multiple steps of change propagation would provide only minimal additional improvement.International Design Conference - DESIGN 201

Apollo (Cambridge)

Multiprocessor task scheduling in multistage hyrid flowshops: a genetic algorithm approach

Author: Davis L
F Sivrikaya şerifoğlu
G Ulusoy
Ghashghai E
Holland J
Riane F
Syslo MM
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2004
Field of study

This paper considers multiprocessor task scheduling in a multistage hybrid flow-shop environment. The objective is to minimize the make-span, that is, the completion time of all the tasks in the last stage. This problem is of practical interest in the textile and process industries. A genetic algorithm (GA) is developed to solve the problem. The GA is tested against a lower bound from the literature as well as against heuristic rules on a test bed comprising 400 problems with up to 100 jobs, 10 stages, and with up to five processors on each stage. For small problems, solutions found by the GA are compared to optimal solutions, which are obtained by total enumeration. For larger problems, optimum solutions are estimated by a statistical prediction technique. Computational results show that the GA is both effective and efficient for the current problem. Test problems are provided in a web site at www.benchmark.ibu.edu.tr/mpt-h; fsp

Crossref

Sabanci University Research Database