63 research outputs found
The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing
The use of High Performance Computing (HPC) to compliment urgent decision
making in the event of disasters is an important future potential use of
supercomputers. However, the usage modes involved are rather different from how
HPC has been used traditionally. As such, there are many obstacles that need to
be overcome, not least the unbounded wait times in the batch system queues, to
make the use of HPC in disaster response practical. In this paper, we present
how the VESTEC project plans to overcome these issues and develop a working
prototype of an urgent computing control system. We describe the requirements
for such a system and analyse the different technologies available that can be
leveraged to successfully build such a system. We finally explore the design of
the VESTEC system and discuss ongoing challenges that need to be addressed to
realise a production level system.Comment: Preprint of paper in 2019 IEEE/ACM HPC for Urgent Decision Making
(UrgentHPC
Predicting batch queue job wait times for informed scheduling of urgent HPC workloads
There is increasing interest in the use of HPC machines for urgent workloads
to help tackle disasters as they unfold. Whilst batch queue systems are not
ideal in supporting such workloads, many disadvantages can be worked around by
accurately predicting when a waiting job will start to run. However there are
numerous challenges in achieving such a prediction with high accuracy, not
least because the queue's state can change rapidly and depend upon many
factors. In this work we explore a novel machine learning approach for
predicting queue wait times, hypothesising that such a model can capture the
complex behaviour resulting from the queue policy and other interactions to
generate accurate job start times.
For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we
explore how different machine learning approaches and techniques improve the
accuracy of our predictions, comparing against the estimation generated by
Slurm. We demonstrate that our techniques deliver the most accurate predictions
across our machines of interest, with the result of this work being the ability
to predict job start times within one minute of the actual start time for
around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When
compared against what Slurm can deliver, this represents around 3.8 times
better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our
approach can accurately predicting the start time for three quarters of all job
within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for
90\% of jobs on Cirrus. Whilst the driver of this work has been to better
facilitate placement of urgent workloads across HPC machines, the insights
gained can be used to provide wider benefits to users and also enrich existing
batch queue systems and inform policy too.Comment: Preprint of article at the 2022 Cray User Group (CUG
Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment
We investigate the performance of the HemeLB lattice-Boltzmann simulator for
cerebrovascular blood flow, aimed at providing timely and clinically relevant
assistance to neurosurgeons. HemeLB is optimised for sparse geometries,
supports interactive use, and scales well to 32,768 cores for problems with ~81
million lattice sites. We obtain a maximum performance of 29.5 billion site
updates per second, with only an 11% slowdown for highly sparse problems (5%
fluid fraction). We present steering and visualisation performance measurements
and provide a model which allows users to predict the performance, thereby
determining how to run simulations with maximum accuracy within time
constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16
figures, 7 table
Supercomputing with MPI meets the Common Workflow Language standards: an experience report
Use of standards-based workflows is still somewhat unusual by
high-performance computing users. In this paper we describe the experience of
using the Common Workflow Language (CWL) standards to describe the execution,
in parallel, of MPI-parallelised applications. In particular, we motivate and
describe the simple extension to the specification which was required, as well
as our implementation of this within the CWL reference runner. We discuss some
of the unexpected benefits, such as simple use of HPC-oriented performance
measurement tools, and CWL software requirements interfacing with HPC module
systems. We close with a request for comment from the community on how these
features could be adopted within versions of the CWL standards.Comment: Submitted to 15th Workshop on Workflows in Support of Large-Scale
Science (WORKS20
The role of interactive super-computing in using HPC for urgent decision making
Technological advances are creating exciting new opportunities that have the potential to move HPC well beyond traditional computational workloads. In this paper we focus on the potential for HPC to be instrumental in responding to disasters such as wildfires, hurricanes, extreme flooding, earthquakes, tsunamis, winter weather conditions, and accidents. Driven by the VESTEC EU funded H2020 project, our research looks to prove HPC as a tool not only capable of simulating disasters once they have happened, but also one which is able to operate in a responsive mode, supporting disaster response teams making urgent decisions in real-time. Whilst this has the potential to revolutionise disaster response, it requires the ability to drive HPC interactively, both from the user's perspective and also based upon the arrival of data. As such interactivity is a critical component in enabling HPC to be exploited in the role of supporting disaster response teams so that urgent decision makers can make the correct decision first time, every time
Exploring the origins of the power-law properties of energy landscapes: An egg-box model
Multidimensional potential energy landscapes (PELs) have a Gaussian
distribution for the energies of the minima, but at the same time the
distribution of the hyperareas for the basins of attraction surrounding the
minima follows a power-law. To explore how both these features can
simultaneously be true, we introduce an ``egg-box'' model. In these model
landscapes, the Gaussian energy distribution is used as a starting point and we
examine whether a power-law basin area distribution can arise as a natural
consequence through the swallowing up of higher-energy minima by larger
low-energy basins when the variance of this Gaussian is increased sufficiently.
Although the basin area distribution is substantially broadened by this
process,it is insufficient to generate power-laws, highlighting the role played
by the inhomogeneous distribution of basins in configuration space for actual
PELs.Comment: 7 pages, 8 figure
PolNet:A Tool to Quantify Network-Level Cell Polarity and Blood Flow in Vascular Remodeling
In this article, we present PolNet, an open-source software tool for the study of blood flow and cell-level biological activity during vessel morphogenesis. We provide an image acquisition, segmentation, and analysis protocol to quantify endothelial cell polarity in entire in vivo vascular networks. In combination, we use computational fluid dynamics to characterize the hemodynamics of the vascular networks under study. The tool enables, to our knowledge for the first time, a network-level analysis of polarity and flow for individual endothelial cells. To date, PolNet has proven invaluable for the study of endothelial cell polarization and migration during vascular patterning, as demonstrated by two recent publications. Additionally, the tool can be easily extended to correlate blood flow with other experimental observations at the cellular/molecular level. We release the source code of our tool under the Lesser General Public License
A Bespoke Workflow Management System for Data-Driven Urgent HPC
In this paper we present a workflow management system which permits the kinds of data-driven workflows required by urgent computing, namely where new data is integrated into the workflow as a disaster progresses in order refine the predictions as time goes on. This allows the workflow toadapt to new data at runtime, a capability that most workflow management systems do not possess. The workflow management system was developed for the EU-funded VESTEC project, which aims to fuse HPC with real-time data for supporting urgent decision making. We first describe an example workflow from the VESTEC project, and show why existing workflow technologies do not meet the needs of the project. We then go on to present the design of our Workflow Management System, describe how it is implemented into the VESTEC system, and provide an example of the workflow system in use for a test case
Impact of blood rheology on wall shear stress in a model of the middle cerebral artery
Perturbations to the homeostatic distribution of mechanical forces exerted by
blood on the endothelial layer have been correlated with vascular pathologies
including intracranial aneurysms and atherosclerosis. Recent computational work
suggests that in order to correctly characterise such forces, the
shear-thinning properties of blood must be taken into account. To the best of
our knowledge, these findings have never been compared against experimentally
observed pathological thresholds. In the current work, we apply the three-band
diagram (TBD) analysis due to Gizzi et al. to assess the impact of the choice
of blood rheology model on a computational model of the right middle cerebral
artery. Our results show that, in the model under study, the differences
between the wall shear stress predicted by a Newtonian model and the well known
Carreau-Yasuda generalized Newtonian model are only significant if the vascular
pathology under study is associated with a pathological threshold in the range
0.94 Pa to 1.56 Pa, where the results of the TBD analysis of the rheology
models considered differs. Otherwise, we observe no significant differences.Comment: 14 pages, 6 figures, published at Interface Focu
- …