Search CORE

63 research outputs found

Predicting batch queue job wait times for informed scheduling of urgent HPC workloads

Author: Belikov Evgenij
Brown Nick
Gibb Gordon
Nash Rupert
Publication venue
Publication date: 28/04/2022
Field of study

There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.Comment: Preprint of article at the 2022 Cray User Group (CUG

arXiv.org e-Print Archive

Edinburgh Research Explorer

The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing

Author: Brown Nicholas
Gibb Gordon
Nash Rupert
Prodan Bianca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/12/2019
Field of study

The use of High Performance Computing (HPC) to compliment urgent decision making in the event of disasters is an important future potential use of supercomputers. However, the usage modes involved are rather different from how HPC has been used traditionally. As such, there are many obstacles that need to be overcome, not least the unbounded wait times in the batch system queues, to make the use of HPC in disaster response practical. In this paper, we present how the VESTEC project plans to overcome these issues and develop a working prototype of an urgent computing control system. We describe the requirements for such a system and analyse the different technologies available that can be leveraged to successfully build such a system. We finally explore the design of the VESTEC system and discuss ongoing challenges that need to be addressed to realise a production level system.Comment: Preprint of paper in 2019 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment

Author: Bernabeu Miguel O.
Carver Hywel B.
Coveney Peter V.
Groen Derek
Hetherington James
Nash Rupert W.
Publication venue: 'Elsevier BV'
Publication date: 18/03/2013
Field of study

We investigate the performance of the HemeLB lattice-Boltzmann simulator for cerebrovascular blood flow, aimed at providing timely and clinically relevant assistance to neurosurgeons. HemeLB is optimised for sparse geometries, supports interactive use, and scales well to 32,768 cores for problems with ~81 million lattice sites. We obtain a maximum performance of 29.5 billion site updates per second, with only an 11% slowdown for highly sparse problems (5% fluid fraction). We present steering and visualisation performance measurements and provide a model which allows users to predict the performance, thereby determining how to run simulations with maximum accuracy within time constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16 figures, 7 table

arXiv.org e-Print Archive

Elsevier - Publisher Connector

UCL Discovery

Supercomputing with MPI meets the Common Workflow Language standards: an experience report

Author: Brown Nick
Crusoe Michael R
Kontak Max
Nash Rupert W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2020
Field of study

Use of standards-based workflows is still somewhat unusual by high-performance computing users. In this paper we describe the experience of using the Common Workflow Language (CWL) standards to describe the execution, in parallel, of MPI-parallelised applications. In particular, we motivate and describe the simple extension to the specification which was required, as well as our implementation of this within the CWL reference runner. We discuss some of the unexpected benefits, such as simple use of HPC-oriented performance measurement tools, and CWL software requirements interfacing with HPC module systems. We close with a request for comment from the community on how these features could be adopted within versions of the CWL standards.Comment: Submitted to 15th Workshop on Workflows in Support of Large-Scale Science (WORKS20

arXiv.org e-Print Archive

Institute of Transport Research:Publications

VU Research Portal

Edinburgh Research Explorer

The role of interactive super-computing in using HPC for urgent decision making

Author: Brown Nicholas
Der Chien Wei
Gibb Gordon
Kontak Max
Nash Rupert
Olshevsky Vyacheslav
Prodan Bianca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/12/2019
Field of study

Technological advances are creating exciting new opportunities that have the potential to move HPC well beyond traditional computational workloads. In this paper we focus on the potential for HPC to be instrumental in responding to disasters such as wildfires, hurricanes, extreme flooding, earthquakes, tsunamis, winter weather conditions, and accidents. Driven by the VESTEC EU funded H2020 project, our research looks to prove HPC as a tool not only capable of simulating disasters once they have happened, but also one which is able to operate in a responsive mode, supporting disaster response teams making urgent decisions in real-time. Whilst this has the potential to revolutionise disaster response, it requires the ability to drive HPC interactively, both from the user's perspective and also based upon the arrival of data. As such interactivity is a critical component in enabling HPC to be exploited in the role of supporting disaster response teams so that urgent decision makers can make the correct decision first time, every time

Institute of Transport Research:Publications

Crossref

Edinburgh Research Explorer

Scipedia

Exploring the origins of the power-law properties of energy landscapes: An egg-box model

Author: Andrade
Barabási
Bryngelson
Büchner
Claire P. Massen
Debenedetti
Doye
Doye
Doye
Doye
Heuer
Jonathan P.K. Doye
Leopold
Massen
Middleton
Noya
Rupert W. Nash
Schrøder
Sciortino
Sciortino
Stillinger
Stillinger
Stillinger
Tsai
Wales
Wales
Watts
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Multidimensional potential energy landscapes (PELs) have a Gaussian distribution for the energies of the minima, but at the same time the distribution of the hyperareas for the basins of attraction surrounding the minima follows a power-law. To explore how both these features can simultaneously be true, we introduce an ``egg-box'' model. In these model landscapes, the Gaussian energy distribution is used as a starting point and we examine whether a power-law basin area distribution can arise as a natural consequence through the swallowing up of higher-energy minima by larger low-energy basins when the variance of this Gaussian is increased sufficiently. Although the basin area distribution is substantially broadened by this process,it is insufficient to generate power-laws, highlighting the role played by the inhomogeneous distribution of basins in configuration space for actual PELs.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

PolNet:A Tool to Quantify Network-Level Cell Polarity and Blood Flow in Vascular Remodeling

Author: Bernabeu Miguel O.
Coveney Peter V
Franco Claudio A
Gerhardt Holger
Jones Martin L.
Nash Rupert W.
Pezzarossa Anna
Publication venue: 'Elsevier BV'
Publication date: 01/05/2018
Field of study

In this article, we present PolNet, an open-source software tool for the study of blood flow and cell-level biological activity during vessel morphogenesis. We provide an image acquisition, segmentation, and analysis protocol to quantify endothelial cell polarity in entire in vivo vascular networks. In combination, we use computational fluid dynamics to characterize the hemodynamics of the vascular networks under study. The tool enables, to our knowledge for the first time, a network-level analysis of polarity and flow for individual endothelial cells. To date, PolNet has proven invaluable for the study of endothelial cell polarization and migration during vascular patterning, as demonstrated by two recent publications. Additionally, the tool can be easily extended to correlate blood flow with other experimental observations at the cellular/molecular level. We release the source code of our tool under the Lesser General Public License

Crossref

UCL Discovery

Edinburgh Research Explorer

MDC Repository

A Bespoke Workflow Management System for Data-Driven Urgent HPC

Author: Brown Nick
Cardil Adrián
Cisneros Joaquín Ramírez
Fidalgo Humberto Díaz
Gibb Gordon P S
Kontak Max
Mendes Miguel
Monedero Santiago
Nash Rupert W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In this paper we present a workflow management system which permits the kinds of data-driven workflows required by urgent computing, namely where new data is integrated into the workflow as a disaster progresses in order refine the predictions as time goes on. This allows the workflow toadapt to new data at runtime, a capability that most workflow management systems do not possess. The workflow management system was developed for the EU-funded VESTEC project, which aims to fuse HPC with real-time data for supporting urgent decision making. We first describe an example workflow from the VESTEC project, and show why existing workflow technologies do not meet the needs of the project. We then go on to present the design of our Workflow Management System, describe how it is implemented into the VESTEC system, and provide an example of the workflow system in use for a test case

Institute of Transport Research:Publications

Edinburgh Research Explorer

Impact of blood rheology on wall shear stress in a model of the middle cerebral artery

Author: Bernabeu Miguel O
Carver Hywel B
Coveney Peter V
Groen Derek
Hetherington James
Krueger Timm
Nash Rupert W
Publication venue: 'The Royal Society'
Publication date: 06/04/2013
Field of study

Perturbations to the homeostatic distribution of mechanical forces exerted by blood on the endothelial layer have been correlated with vascular pathologies including intracranial aneurysms and atherosclerosis. Recent computational work suggests that in order to correctly characterise such forces, the shear-thinning properties of blood must be taken into account. To the best of our knowledge, these findings have never been compared against experimentally observed pathological thresholds. In the current work, we apply the three-band diagram (TBD) analysis due to Gizzi et al. to assess the impact of the choice of blood rheology model on a computational model of the right middle cerebral artery. Our results show that, in the model under study, the differences between the wall shear stress predicted by a Newtonian model and the well known Carreau-Yasuda generalized Newtonian model are only significant if the vascular pathology under study is associated with a pathological threshold in the range 0.94 Pa to 1.56 Pa, where the results of the TBD analysis of the rheology models considered differs. Otherwise, we observe no significant differences.Comment: 14 pages, 6 figures, published at Interface Focu

arXiv.org e-Print Archive

Crossref

UCL Discovery

PubMed Central

Edinburgh Research Explorer