Search CORE

11 research outputs found

Discovering Job Preemptions in the Open Science Grid

Author: Bockelman Brian
Swanson David
Weitzel Derek
Zhang Zhe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2018
Field of study

The Open Science Grid(OSG) is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Predicting resource usage for enhanced job scheduling for opportunistic resources in HEP

Author: Fischer Max
Heiss Andreas
Kuehn Eileen
Lange Sven
Petzold Andreas
Publication venue: EDP Sciences
Publication date: 14/01/2021
Field of study

KITopen

Effective Scheduling of Grid Resources Using Failure Prediction

Author: Kang Woochul
Kim Jibum
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/07/2016
Field of study

In large-scale grid environments, accurate failure prediction is critical to achieve effective resource allocation while assuring specified QoS levels, such as reliability. Traditional methods, such as statistical estimation techniques, can be considered to predict the reliability of resources. However, naive statistical methods often ignore critical characteristic behavior of the resources. In particular, periodic behaviors of grid resources are not captured well by statistical methods. In this paper, we present an alternative mechanism for failure prediction. In our approach, the periodic pattern of resource failures are determined and actively exploited for resource allocation with better QoS guarantees. The proposed scheme is evaluated under a realistic simulation environment of computational grids. The availability of computing resources are simulated according to real trace that was collected from our large-scale monitoring experiment on campus computers. Our evaluation results show that the proposed approach enables significantly higher resource scheduling effectiveness under a variety of workloads compared to baseline approaches

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Adaps – A three-phase adaptive prediction system for the run-time of jobs based on user behaviour

Author: Glasner Christian
Volkert Jens
Publication venue: Elsevier Inc.
Publication date: 31/03/2011
Field of study

AbstractIn heterogeneous and distributed environments it is necessary to create schedules for utilising resources in an efficient way. This generation often poses a problem for a scheduler, since several aspects have to be considered. One way of supporting a scheduler is to provide accurate predictions of the run-times of the submitted jobs. A large number of current techniques offer statistical models that are deployed on previously filtered data. As users have different jobs, and because the attributes of their jobs differ, filtering data and choosing an appropriate prediction method has to cover these aspects. This article describes Adaps, a system for run-time prediction that works in three phases. Each is independently adjusting to the jobs of a user, based on historical information. This leads to a user specific clustering of data and to a flexible utilisation of different prediction techniques in order to create a user-centred prediction model

Elsevier - Publisher Connector

Informatics for High-Throughput and Distributed Analysis of Medical Images

Author: Covington Kelsie Jade
Publication venue: VANDERBILT
Publication date
Field of study

Vanderbilt Electronic Thesis and Dissertation Archive

Coordinated rescheduling of Bag-of-Tasks for executions on multiple resource providers

Author: Abramson
Altschul
Beaumont
Berman
Bruneo
Cirne
Haji
Jarvis
Jarvis
Kim
Lee
Lublin
Macías
Pande
Sanjay
Shmueli
Tsafrir
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Recommended from our members

Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments

Author: Naghshnejad Mina
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches

eScholarship - University of California

Una aproximación evolutiva a la planificación en entornos HPC basada en la incorporación de criterios subjetivos

Author: Monroy Camafreita Juan
Publication venue
Publication date: 01/01/2015
Field of study

[Resumen] En el contexto de un centro de supercomputación, por muy elevados que sean los recursos, la demanda será siempre superior. Por ello, los usuarios deben realizar solicitudes para la ejecución de sus trabajos, que se ponen en espera hasta que el planificador del sistema decide pasarlos a ejecución. Pero, por desconocimiento o temor a que los trabajos sean abortados, estas solicitudes son normalmente muy imprecisas, dificultando la labor del planificador. Además, los planificadores son difíciles de configurar y en todo momento asumen que una planificación dada va a satisfacer de igual manera a todos los usuarios. En este trabajo se propone un sistema de planificación que utiliza técnicas de computación evolutiva para permitir la definición de políticas de planificación de manera más natural y estimar las necesidades reales de recursos para lograr planificaciones más precisas. Adicionalmente, se considera el concepto de calidad de servicio percibida, posibilitando la incorporación de criterios subjetivos en el proceso de planificación para mantener un alto nivel de satisfacción en el conjunto de usuarios y en el propio centro de supercomputación. Finalmente, se modelan diversos aspectos de los propios recursos computacionales mejorando aún más la precisión en la planificación, especialmente en sistemas heterogéneos.[Abstract]In the context of a supercomputing center, no matter what its computational resources are, the demand will always be higher. Therefore, users must send their jobs to a queue, where they are put on hold until the scheduler decides to execute them. But, through ignorance or fear that jobs are aborted, these requests are usually very imprecise, hindering the performance of the scheduler. In addition, schedulers are difficult to configure and they assume that a given scheduling will satisfy equally to all users at all times. This thesis proposes a scheduler for high performance computing systems based on evolutionary computation techniques to allow the definition of scheduling policies more naturally and to estimate the real needs of resources in order to achieve more accurate schedules. Additionally, the concept of perceived quality of service is considered, enabling the incorporation of subjective criteria in the scheduling process to maintain a high level of satisfaction in the set of users and in the supercomputing center itself. Finally, various aspects of the computational resources are modeled to further improving accuracy in scheduling, especially in heterogeneous systems.[Resumo]No contexto dun centro de supercomputación, por moi elevados que sexan os recursos, a demanda será sempre superior. Por elo, os usuarios deben realizar solicitudes para a execución dos seus traballos, que se poñen en espera ata que o planificador do sistema decide pasalos a execución. Pero, por descoñecemento ou temor a que os traballos sexan abortados, estas solicitudes son normalmente moi imprecisas, dificultando o labor do planificador. Ademais, os planificadores son difíciles de configurar e en todo momento asumen que unha planificación dada vai satisfacer de igual maneira a todos os usuarios. Neste traballo proponse un sistema de planificación que utiliza técnicas de computación evolutiva para permitir a definición de políticas de planificación de maneira máis natural e estimar as necesidades reais de recursos para lograr planificacións máis precisas. Adicionalmente, considérase o concepto de calidade de servizo percibida, posibilitando a incorporación de criterios subxectivos no proceso de planificación para manter un alto nivel de satisfacción no conxunto de usuarios e no propio centro de supercomputación. Finalmente, se modelan diversos aspectos dos propios recursos computacionáis mellorando aínda máis a precisión na planificación, especialmente en sistemas heteroxéneos

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas