551 research outputs found

    Including accurate user estimates in HPC schedulers: ban empirical analysis

    Get PDF
    This article focuses on the problem of dealing with low accuracy of job runtime estimates provided by users of high performance computing systems. The main goal of the study is to evaluate the benefits on the system utilization of providing accurate estimations, in order to motivate users to make an effort to provide better estimates. We propose the Penalty Scheduling Policy for including information about user estimates. The experimental evaluation is performed over realistic workload and scenarios, and validated by the use of a job scheduler simulator. We simulated different static and dynamic scenarios, which emulate diverse user behavior regarding the estimation of jobs runtime. Results demonstrate that the accuracy of users runtime estimates influences the waiting time of jobs. Under our proposed policy, in a scenario where users improve their estimates, waiting time of users with high accuracy can be up to 2.43 times lower than users with the lowest accuracy.XV Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Including accurate user estimates in HPC schedulers: ban empirical analysis

    Get PDF
    This article focuses on the problem of dealing with low accuracy of job runtime estimates provided by users of high performance computing systems. The main goal of the study is to evaluate the benefits on the system utilization of providing accurate estimations, in order to motivate users to make an effort to provide better estimates. We propose the Penalty Scheduling Policy for including information about user estimates. The experimental evaluation is performed over realistic workload and scenarios, and validated by the use of a job scheduler simulator. We simulated different static and dynamic scenarios, which emulate diverse user behavior regarding the estimation of jobs runtime. Results demonstrate that the accuracy of users runtime estimates influences the waiting time of jobs. Under our proposed policy, in a scenario where users improve their estimates, waiting time of users with high accuracy can be up to 2.43 times lower than users with the lowest accuracy.XV Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Improving the Performance of Batch Schedulers Using Online Job Size Classification

    Get PDF
    Job scheduling in high-performance computing platforms is a hard problem that involves uncertainties on both the job arrival process and their execution time. Users typically provide a loose upper bound estimate for job execution times that are hardly useful. Previous studies attempted to improve these estimates using regression techniques. Although these attempts provide reasonable predictions, they require a long period of training data. Furthermore, aiming for perfect prediction may be of limited use for scheduling purposes. In this work, we propose a simpler approach by classifying jobs as small or large and prioritizing the execution of small jobs over large ones. Indeed, small jobs are the most impacted by queuing delays but they typically represent a light load and incur a small burden on the other jobs. The classifier operates online and learns by using data collected over the previous weeks, facilitating its deployment and enabling fast adaptations to changes in workload characteristics. We evaluate our approach using four scheduling policies on six HPC platform workload traces. We show that: (i) incorporating such classification reduces the average bounded slowdown of jobs in all scenarios, and (ii) the obtained improvements are comparable, in most scenarios, to the ideal hypothetical situation where the scheduler would know the exact running time of jobs in advance

    User-aware performance evaluation and optimization of parallel job schedulers

    Get PDF
    Die Dissertation User-Aware Performance Evaluation and Optimization of Parallel Job Schedulers beschäftigt sich mit der realitätsnahen, dynamischen Simulation und Optimierung von Lastsituationen in parallelen Rechensystemen unter Berücksichtigung von Feedback-Effekten zwischen Performance und Nutzerverhalten. Die Besonderheit solcher Systeme liegt in der geteilten Nutzung durch mehrere Anwender, was zu einer Einschr änkung der Verfügbarkeit von Ressourcen führt. Sollten nicht alle Rechenanfragen, die sogenannten Jobs, gleichzeitig ausgeführt werden können, werden diese in Warteschlangen zwischengespeichert. Da das Verhalten der Nutzer nicht genau bekannt ist, entsteht eine große Unsicherheit bezüglich zukünftiger Lastsituationen. Ziel ist es, Methoden zu finden, die eine Ressourcenzuweisung erzeugt, die den Zielvorstellungen der Nutzer soweit wie möglich entspricht und diese Methoden realistisch zu evaluieren. Dabei ist auch zu berücksichtigen, dass das Nutzerverhalten und die Zielvorstellungen der Nutzer in Abhängigkeit von der Lastsituation und der Ressourcenzuweisung variieren. Es wird ein dreigliedriger Forschungsansatz gewählt: Analyse von Nutzerverhalten unter Ressourcenbeschränkung: In Traces von parallelen Rechensystem zeigt sich, dass die Wartezeit auf Rechenergebnisse mit dem zukünftigen Nutzerverhalten korreliert, d.h. dass es im Durchschnitt länger dauert bis ein Nutzer, der lange warten musste, erneut das System nutzt. Im Rahmen des Promotionsprojekts wurde diese Analyse fortgesetzt und zusätzliche Korrelationen zwischen weiteren Systemparametern (neben der Wartezeit) und Nutzerverhalten aufgedeckt. Des Weiteren wurden Funktionen zur Zufriedenheit und Reaktion von Nutzern auf variierende Antwortzeiten von Rechensystemen entwickelt. Diese Ergebnisse wurden durch eine Umfrage unter Nutzern von Parallelrechner an der TU Dortmund erzielt, für die ein spezieller Frageboden entwickelt wurde. Modellierung von Nutzerverhalten und Feedback Wegen des dynamischen Zusammenhangs zwischen Systemgeschwindigkeit und Nutzerverhalten ist es nötig, Zuweisungsstrategien in dynamischen, feedback-getriebenen Simulationen zu evaluieren. Hierzu wurde ein mehrstufiges Nutzermodell entwickelt, welches die aktuellen Annahmen an Nutzerverhalten beinhaltet und das zukünftige Hinzufügen von zusätzlichen Verhaltenskomponenten ermöglicht. Die Kernelemente umfassen bisher Modelle für den Tages- und Nachtrhythmus, den Arbeitsrhythmus und Eigenschaften der submittierten Jobs. Das dynamische Feedback ist derart gestaltet, dass erst die Fertigstellung von bestimmten Jobs die zukünftige Jobeinreichung auslöst. Optimierung der Allokationsstrategien zur Steigerung der Nutzerzufriedenheit Die mit Hilfe des Fragebogens entwickelte Wartezeitakzeptanz von Nutzern ist durch ein MILP optimiert worden. Das MILP sucht nach Lösungen, die möglichst viele Jobs innerhalb eines akzeptierten Wartezeitfensters startet bzw. die Summe der Verspätungen minimiert. Durch die Komplexität dieses Optimierungsalgorithmus besteht die Evaluation bisher nur auf fixierten, statischen Szenarien, die Abbilder bestimmter System- und Warteschlangenzustände abbilden. Deswegen ist weiterhin geplant, Schedulingverfahren zur Steigerung der Anzahl an eingereichten Jobs und der Wartezeitzufriedenheit mit Hilfe des dynamischen Modells zu evaluieren

    Energy Demand Response for High-Performance Computing Systems

    Get PDF
    The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI
    corecore