282 research outputs found

    Warp-X: a new exascale computing platform for beam-plasma simulations

    Full text link
    Turning the current experimental plasma accelerator state-of-the-art from a promising technology into mainstream scientific tools depends critically on high-performance, high-fidelity modeling of complex processes that develop over a wide range of space and time scales. As part of the U.S. Department of Energy's Exascale Computing Project, a team from Lawrence Berkeley National Laboratory, in collaboration with teams from SLAC National Accelerator Laboratory and Lawrence Livermore National Laboratory, is developing a new plasma accelerator simulation tool that will harness the power of future exascale supercomputers for high-performance modeling of plasma accelerators. We present the various components of the codes such as the new Particle-In-Cell Scalable Application Resource (PICSAR) and the redesigned adaptive mesh refinement library AMReX, which are combined with redesigned elements of the Warp code, in the new WarpX software. The code structure, status, early examples of applications and plans are discussed

    Anomaly Detection using Autoencoders in High Performance Computing Systems

    Full text link
    Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states). We propose a novel approach for anomaly detection in High Performance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with). We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).Comment: 9 pages, 3 figure

    Simulation of the performance and scalability of message passing interface (MPI) communications of atmospheric models running on exascale supercomputers

    Get PDF
    In this study, we identify the key message passing interface (MPI) operations required in atmospheric modelling; then, we use a skeleton program and a simulation framework (based on SST/macro simulation package) to simulate these MPI operations (transposition, halo exchange, and allreduce), with the perspective of future exascale machines in mind. The experimental results show that the choice of the collective algorithm has a great impact on the performance of communications; in particular, we find that the generalized ring-k algorithm for the alltoallv operation and the generalized recursive-k algorithm for the allreduce operation perform the best. In addition, we observe that the impacts of interconnect topologies and routing algorithms on the performance and scalability of transpositions, halo exchange, and allreduce operations are significant. However, the routing algorithm has a negligible impact on the performance of allreduce operations because of its small message size. It is impossible to infinitely grow bandwidth and reduce latency due to hardware limitations. Thus, congestion may occur and limit the continuous improvement of the performance of communications. The experiments show that the performance of communications can be improved when congestion is mitigated by a proper configuration of the topology and routing algorithm, which uniformly distribute the congestion over the interconnect network to avoid the hotspots and bottlenecks caused by congestion. It is generally believed that the transpositions seriously limit the scalability of the spectral models. The experiments show that the communication time of the transposition is larger than those of the wide halo exchange for the semi-Lagrangian method and the allreduce in the generalized conjugate residual (GCR) iterative solver for the semi-implicit method below 2Ă—105 MPI processes. The transposition whose communication time decreases quickly with increasing number of MPI processes demonstrates strong scalability in the case of very large grids and moderate latencies. The halo exchange whose communication time decreases more slowly than that of transposition with increasing number of MPI processes reveals its weak scalability. In contrast, the allreduce whose communication time increases with increasing number of MPI processes does not scale well. From this point of view, the scalability of spectral models could still be acceptable. Therefore it seems to be premature to conclude that the scalability of the grid-point models is better than that of spectral models at the exascale, unless innovative methods are exploited to mitigate the problem of the scalability presented in the grid-point models.</p

    Study of Raspberry Pi 2 Quad-core Cortex A7 CPU Cluster as a Mini Supercomputer

    Full text link
    High performance computing (HPC) devices is no longer exclusive for academic, R&D, or military purposes. The use of HPC device such as supercomputer now growing rapidly as some new area arise such as big data, and computer simulation. It makes the use of supercomputer more inclusive. Todays supercomputer has a huge computing power, but requires an enormous amount of energy to operate. In contrast a single board computer (SBC) such as Raspberry Pi has minimum computing power, but require a small amount of energy to operate, and as a bonus it is small and cheap. This paper covers the result of utilizing many Raspberry Pi 2 SBCs, a quad-core Cortex A7 900 MHz, as a cluster to compensate its computing power. The high performance linpack (HPL) is used to benchmark the computing power, and a power meter with resolution 10mV / 10mA is used to measure the power consumption. The experiment shows that the increase of number of cores in every SBC member in a cluster is not giving significant increase in computing power. This experiment give a recommendation that 4 nodes is a maximum number of nodes for SBC cluster based on the characteristic of computing performance and power consumption.Comment: Pre-print of conference paper on International Conference on Information Technology and Electrical Engineerin

    Championing stochastic electronic structure methods with CHAMP

    Get PDF
    We present the recent progress in developing a high-performance and user-friendly program suite – the Cornell-Holland Ab-initio Materials Package (CHAMP) -- for performing accurate and efficient quantum Monte Carlo (QMC) calculations of molecular systems. A prominent capability of CHAMP is the efficient computation of analytical interatomic forces, also in combination with the fast evaluation of multi-determinant expansions and their derivatives. The code utilizes the latest processor instructions to perform vectorized tasks and is optimized for upcoming exascale computing facilities.The code offers various capabilities such as variational Monte Carlo (VMC), diffusion Monte Carlo (DMC), and optimization of many-body wave functions by energy minimization for ground and excited states. The other prominent features of CHAMP include the efficient computation of analytical interatomic forces and a compact formulation for the fast evaluation of multi-determinant expansions and their derivatives
    • …
    corecore