736 research outputs found

    The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform

    Get PDF
    The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft

    Main memory in HPC: do we need more, or could we live with less?

    Get PDF
    An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now. This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Severo Ochoa programme (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union’s Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Darko Zivanovic holds the Severo Ochoa grant (SVP-2014-068501) of the Ministry of Economy and Competitiveness of Spain. The authors thank Harald Servat from BSC and Vladimir Marjanovi´c from High Performance Computing Center Stuttgart for their technical support.Postprint (published version

    Random circuit block-encoded matrix and a proposal of quantum LINPACK benchmark

    Full text link
    The LINPACK benchmark reports the performance of a computer for solving a system of linear equations with dense random matrices. Although this task was not designed with a real application directly in mind, the LINPACK benchmark has been used to define the list of TOP500 supercomputers since the debut of the list in 1993. We propose that a similar benchmark, called the quantum LINPACK benchmark, could be used to measure the whole machine performance of quantum computers. The success of the quantum LINPACK benchmark should be viewed as the minimal requirement for a quantum computer to perform a useful task of solving linear algebra problems, such as linear systems of equations. We propose an input model called the RAndom Circuit Block-Encoded Matrix (RACBEM), which is a proper generalization of a dense random matrix in the quantum setting. The RACBEM model is efficient to be implemented on a quantum computer, and can be designed to optimally adapt to any given quantum architecture, with relying on a black-box quantum compiler. Besides solving linear systems, the RACBEM model can be used to perform a variety of linear algebra tasks relevant to many physical applications, such as computing spectral measures, time series generated by a Hamiltonian simulation, and thermal averages of the energy. We implement these linear algebra operations on IBM Q quantum devices as well as quantum virtual machines, and demonstrate their performance in solving scientific computing problems.Comment: 22 pages, 18 figure

    Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond

    Full text link
    Various strategies to implement efficiently QMC simulations for large chemical systems are presented. These include: i.) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), ii.) the possibility of keeping the memory footprint minimal, iii.) the important enhancement of single-core performance when efficient optimization tools are employed, and iv.) the definition of a universal, dynamic, fault-tolerant, and load-balanced computational framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056 and 1731 electrons). Using 10k-80k computing cores of the Curie machine (GENCI-TGCC-CEA, France) QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible

    Energy Efficiency of Personal Computers: A Comparative Analysis

    Get PDF
    The demand for electricity related to Information and Communications Technologies is constantly growing and significantly contributes to the increase in global greenhouse gas emissions. To reduce this harmful growth, it is necessary to address this problem from different perspectives. Among these is changing the computing scale, such as migrating, if possible, algorithms and processes to the most energy efficient resources. In this context, this paper explores the possibility of running scientific and engineering programs on personal computers and compares the obtained power efficiency on these systems with that of mainframe computers and even supercomputers. Anecdotally, this paper also shows how the power efficiency obtained for the same workloads on personal computers is similar to that obtained on supercomputers included in the Green500 ranking.Spanish GovernmentEuropean Commission PGC2018-098813-B-C31 MICI
    • …
    corecore