282 research outputs found
Warp-X: a new exascale computing platform for beam-plasma simulations
Turning the current experimental plasma accelerator state-of-the-art from a
promising technology into mainstream scientific tools depends critically on
high-performance, high-fidelity modeling of complex processes that develop over
a wide range of space and time scales. As part of the U.S. Department of
Energy's Exascale Computing Project, a team from Lawrence Berkeley National
Laboratory, in collaboration with teams from SLAC National Accelerator
Laboratory and Lawrence Livermore National Laboratory, is developing a new
plasma accelerator simulation tool that will harness the power of future
exascale supercomputers for high-performance modeling of plasma accelerators.
We present the various components of the codes such as the new Particle-In-Cell
Scalable Application Resource (PICSAR) and the redesigned adaptive mesh
refinement library AMReX, which are combined with redesigned elements of the
Warp code, in the new WarpX software. The code structure, status, early
examples of applications and plans are discussed
Anomaly Detection using Autoencoders in High Performance Computing Systems
Anomaly detection in supercomputers is a very difficult problem due to the
big scale of the systems and the high number of components. The current state
of the art for automated anomaly detection employs Machine Learning methods or
statistical regression models in a supervised fashion, meaning that the
detection tool is trained to distinguish among a fixed set of behaviour classes
(healthy and unhealthy states).
We propose a novel approach for anomaly detection in High Performance
Computing systems based on a Machine (Deep) Learning technique, namely a type
of neural network called autoencoder. The key idea is to train a set of
autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes
and, after training, use them to identify abnormal conditions. This is
different from previous approaches which where based on learning the abnormal
condition, for which there are much smaller datasets (since it is very hard to
identify them to begin with).
We test our approach on a real supercomputer equipped with a fine-grained,
scalable monitoring infrastructure that can provide large amount of data to
characterize the system behaviour. The results are extremely promising: after
the training phase to learn the normal system behaviour, our method is capable
of detecting anomalies that have never been seen before with a very good
accuracy (values ranging between 88% and 96%).Comment: 9 pages, 3 figure
Simulation of the performance and scalability of message passing interface (MPI) communications of atmospheric models running on exascale supercomputers
In this study, we identify the key message passing interface (MPI) operations
required in atmospheric modelling; then, we use a skeleton program and a
simulation framework (based on SST/macro simulation package) to simulate
these MPI operations (transposition, halo exchange, and
allreduce), with the perspective of future exascale machines in
mind. The experimental results show that the choice of the collective
algorithm has a great impact on the performance of communications; in
particular, we find that the generalized ring-k algorithm for the alltoallv
operation and the generalized recursive-k algorithm for the allreduce
operation perform the best. In addition, we observe that the impacts of
interconnect topologies and routing algorithms on the performance and
scalability of transpositions, halo exchange, and allreduce operations are
significant. However, the routing algorithm has a negligible impact on the
performance of allreduce operations because of its small message size. It is
impossible to infinitely grow bandwidth and reduce latency due to hardware
limitations. Thus, congestion may occur and limit the continuous improvement
of the performance of communications. The experiments show that the
performance of communications can be improved when congestion is mitigated by
a proper configuration of the topology and routing algorithm, which uniformly
distribute the congestion over the interconnect network to avoid the hotspots
and bottlenecks caused by congestion. It is generally believed that the
transpositions seriously limit the scalability of the spectral models. The
experiments show that the communication time of the transposition is larger
than those of the wide halo exchange for the semi-Lagrangian method and the
allreduce in the generalized conjugate residual (GCR) iterative solver for
the semi-implicit method below 2Ă—105 MPI processes. The
transposition whose communication time decreases quickly with increasing
number of MPI processes demonstrates strong scalability in the case of very
large grids and moderate latencies. The halo exchange whose communication
time decreases more slowly than that of transposition with increasing number
of MPI processes reveals its weak scalability. In contrast, the allreduce
whose communication time increases with increasing number of MPI processes
does not scale well. From this point of view, the scalability of spectral
models could still be acceptable. Therefore it seems to be premature to
conclude that the scalability of the grid-point models is better than that of
spectral models at the exascale, unless innovative methods are exploited to
mitigate the problem of the scalability presented in the grid-point models.</p
Study of Raspberry Pi 2 Quad-core Cortex A7 CPU Cluster as a Mini Supercomputer
High performance computing (HPC) devices is no longer exclusive for academic,
R&D, or military purposes. The use of HPC device such as supercomputer now
growing rapidly as some new area arise such as big data, and computer
simulation. It makes the use of supercomputer more inclusive. Todays
supercomputer has a huge computing power, but requires an enormous amount of
energy to operate. In contrast a single board computer (SBC) such as Raspberry
Pi has minimum computing power, but require a small amount of energy to
operate, and as a bonus it is small and cheap. This paper covers the result of
utilizing many Raspberry Pi 2 SBCs, a quad-core Cortex A7 900 MHz, as a cluster
to compensate its computing power. The high performance linpack (HPL) is used
to benchmark the computing power, and a power meter with resolution 10mV / 10mA
is used to measure the power consumption. The experiment shows that the
increase of number of cores in every SBC member in a cluster is not giving
significant increase in computing power. This experiment give a recommendation
that 4 nodes is a maximum number of nodes for SBC cluster based on the
characteristic of computing performance and power consumption.Comment: Pre-print of conference paper on International Conference on
Information Technology and Electrical Engineerin
Championing stochastic electronic structure methods with CHAMP
We present the recent progress in developing a high-performance and user-friendly program suite – the Cornell-Holland Ab-initio Materials Package (CHAMP) -- for performing accurate and efficient quantum Monte Carlo (QMC) calculations of molecular systems. A prominent capability of CHAMP is the efficient computation of analytical interatomic forces, also in combination with the fast evaluation of multi-determinant expansions and their derivatives. The code utilizes the latest processor instructions to perform vectorized tasks and is optimized for upcoming exascale computing facilities.The code offers various capabilities such as variational Monte Carlo (VMC), diffusion Monte Carlo (DMC), and optimization of many-body wave functions by energy minimization for ground and excited states. The other prominent features of CHAMP include the efficient computation of analytical interatomic forces and a compact formulation for the fast evaluation of multi-determinant expansions and their derivatives
- …