3 research outputs found

    Cluster of emerging technology: evaluation of a production HPC system based on A64FX

    Get PDF
    Clusters of emerging technologies are appearing with more and more frequency in HPC. After years of skepticism, data-centers are adopting them as production systems thanks to several geopolitical and technological factors. The most honorable example is the Fugaku supercomputer, powered by the latest Fujitsu A64FX CPU. Which is the behavior of mature HPC codes on such emerging technology clusters? Which performance will obtain scientists when running their HPC applications “as is” on these clusters? This paper presents the evaluation of CTE-Arm, a Fugaku-like system, including both fine-tuned micro-benchmarks and five scientific applications run without prior fine-tuning: Alya, NEMO, Gromacs, OpenIFS, and WRF. Results show that while micro-architectural benchmarks show performance as expected, the performance obtained running HPC applications not tuned for a specific architecture are between 2× and 4× slower compared with a standard Intel-based HPC system. Therefore further effort is needed to improve tools (e.g., compilers) and system software (e.g., MPI libraries) to ease applications deployment and improve their performance.This work is partially supported by the Spanish Government (SEV-2015-0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P), by the Generalitat de Catalunya (2017-SGR-1414), by the European and Horizon 2020 POP CoE (GA n. 824080).Peer ReviewedPostprint (author's final draft

    BioFVM-X: an MPI+OpenMP 3-D simulator for biological systems

    Get PDF
    Multi-scale simulations require parallelization to address large-scale problems, such as real-sized tumor simulations. BioFVM is a software package that solves diffusive transport Partial Differential Equations for 3-D biological simulations successfully applied to tissue and cancer biology problems. Currently, BioFVM is only shared-memory parallelized using OpenMP, greatly limiting the execution of large-scale jobs in HPC clusters. We present BioFVM-X: an enhanced version of BioFVM capable of running on multiple nodes. BioFVM-X uses MPI+OpenMP to parallelize the generic core kernels of BioFVM and shows promising scalability in large 3-D problems with several hundreds diffusible substrates and ≈ 0.5 billion voxels. The BioFVM-X source code, examples and documentation, are available under the BSD 3-Clause license at https://gitlab.bsc.es/gsaxena/biofvm_x.The research leading to these results has received funding from EU H2020 Programme under the PerMedCoE project, grant agreement number 951773 and the INFORE project, grant agreement number 825070. The authors would like to thank Paul Macklin and Randy Heiland from Indiana University for their constant support and advice regarding BioFVM.Peer ReviewedPostprint (published version

    Cost-aware prediction of uncorrected DRAM errors in the field

    Get PDF
    This paper presents and evaluates a method to predict DRAM uncorrected errors, a leading cause of hardware failures in large-scale HPC clusters. The method uses a random forest classifier, which was trained and evaluated using error logs from two years of production of the MareNostrum 3 supercomputer. By enabling the system to take measures to mitigate node failures, our method reduces lost compute time by up to 57%, a net saving of 21,000 node–hours per year. We release all source code as open source. We also discuss and clarify aspects of methodology that are essential for a DRAM prediction method to be useful in practice. We explain why standard evaluation metrics, such as precision and recall, are insufficient, and base the evaluation on a cost–benefit analysis. This methodology can help ensure that any DRAM error predictor is clear from training bias and has a clear cost–benefit calculation.This work was supported by the Spanish Ministry of Science and Technology (project PID2019-107255GB), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and the European Union’s Horizon 2020 research and innovation programme and EuroEXA project (grant agreement No 754337). Paul Carpenter and Marc Casas hold the Ramon y Cajal fellowship under contracts RYC2018-025628-I and RYC2017-23269, respectively, of the Ministry of Economy and Competitiveness of Spain.Peer ReviewedPostprint (author's final draft
    corecore