1,905 research outputs found

    QuEST and High Performance Simulation of Quantum Computers

    Full text link
    We introduce QuEST, the Quantum Exact Simulation Toolkit, and compare it to ProjectQ, qHipster and a recent distributed implementation of Quantum++. QuEST is the first open source, OpenMP and MPI hybridised, GPU accelerated simulator of universal quantum circuits. Embodied as a C library, it is designed so that a user's code can be deployed seamlessly to any platform from a laptop to a supercomputer. QuEST is capable of simulating generic quantum circuits of general single-qubit gates and multi-qubit controlled gates, on pure and mixed states, represented as state-vectors and density matrices, and under the presence of decoherence. Using the ARCUS Phase-B and ARCHER supercomputers, we benchmark QuEST's simulation of random circuits of up to 38 qubits, distributed over up to 2048 compute nodes, each with up to 24 cores. We directly compare QuEST's performance to ProjectQ's on single machines, and discuss the differences in distribution strategies of QuEST, qHipster and Quantum++. QuEST shows excellent scaling, both strong and weak, on multicore and distributed architectures.Comment: 8 pages, 8 figures; fixed typos; updated QuEST URL and fixed typo in Fig. 4 caption where ProjectQ and QuEST were swapped in speedup subplot explanation; added explanation of simulation algorithm, updated bibliography; stressed technical novelty of QuEST; mentioned new density matrix suppor

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

    86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy

    Full text link
    We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU version is 7 times faster than the CPU version with the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions.Comment: 29 pages, 11 figure

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

    Full text link
    The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library dependencies and is OS platform-independent. It is optimized for heterogeneous computing due to the possibility of dividing the lattice into non-equivalent parts to hide the difference in performances of the devices used. QCDGPU has client-server part for distributed simulations. The package is designed to produce lattice gauge configurations as well as to analyze previously generated ones. QCDGPU may be executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL library for pseudo-random numbers generation on OpenCL-compatible devices, which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure

    An Improved GPU Simulator For Spiking Neural P Systems

    Get PDF
    Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
    corecore