1,644 research outputs found
Optimising UCNS3D, a High-Order finite-Volume WENO Scheme Code for arbitrary unstructured Meshes
UCNS3D is a computational-fluid-dynamics (CFD) code for the simulation of viscous flows on arbitrary unstructured meshes. It employs very high-order numerical schemes which inherently are easier to scale than lower-order numerical schemes due to the higher ratio of computation versus communication. In this white paper, we report on optimisations of the UCNS3D code implemented in the course of the PRACE Preparatory Access Type C project āHOVEā in the time frame of February to August 2016. Through the optimisation of dense linear algebra operations, in particular matrix-vector products, by formula rewriting, pre-computation and the usage of BLAS, significant speedups of the code by factors of 2 to 6 have been achieved for representative benchmark cases. Moreover, very good scalability up to the order of 10,000 CPU cores has been demonstrated
Deploying AI Frameworks on Secure HPC Systems with Containers
The increasing interest in the usage of Artificial Intelligence techniques
(AI) from the research community and industry to tackle "real world" problems,
requires High Performance Computing (HPC) resources to efficiently compute and
scale complex algorithms across thousands of nodes. Unfortunately, typical data
scientists are not familiar with the unique requirements and characteristics of
HPC environments. They usually develop their applications with high-level
scripting languages or frameworks such as TensorFlow and the installation
process often requires connection to external systems to download open source
software during the build. HPC environments, on the other hand, are often based
on closed source applications that incorporate parallel and distributed
computing API's such as MPI and OpenMP, while users have restricted
administrator privileges, and face security restrictions such as not allowing
access to external systems. In this paper we discuss the issues associated with
the deployment of AI frameworks in a secure HPC environment and how we
successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.Comment: 6 pages, 2 figures, 2019 IEEE High Performance Extreme Computing
Conferenc
Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment
We investigate the performance of the HemeLB lattice-Boltzmann simulator for
cerebrovascular blood flow, aimed at providing timely and clinically relevant
assistance to neurosurgeons. HemeLB is optimised for sparse geometries,
supports interactive use, and scales well to 32,768 cores for problems with ~81
million lattice sites. We obtain a maximum performance of 29.5 billion site
updates per second, with only an 11% slowdown for highly sparse problems (5%
fluid fraction). We present steering and visualisation performance measurements
and provide a model which allows users to predict the performance, thereby
determining how to run simulations with maximum accuracy within time
constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16
figures, 7 table
Performance analysis of electronic structure codes on HPC systems: A case study of SIESTA
We report on scaling and timing tests of the SIESTA electronic structure code
for ab initio molecular dynamics simulations using density-functional theory.
The tests are performed on six large-scale supercomputers belonging to the
PRACE Tier-0 network with four different architectures: Cray XE6, IBM
BlueGene/Q, BullX, and IBM iDataPlex. We employ a systematic strategy for
simultaneously testing weak and strong scaling, and propose a measure which is
independent of the range of number of cores on which the tests are performed to
quantify strong scaling efficiency as a function of simulation size. We find an
increase in efficiency with simulation size for all machines, with a
qualitatively different curve depending on the supercomputer topology, and
discuss the connection of this functional form with weak scaling behaviour. We
also analyze the absolute timings obtained in our tests, showing the range of
system sizes and cores favourable for different machines. Our results can be
employed as a guide both for running SIESTA on parallel architectures, and for
executing similar scaling tests of other electronic structure codes.Comment: 9 pages, 9 figure
- ā¦