1,905 research outputs found
QuEST and High Performance Simulation of Quantum Computers
We introduce QuEST, the Quantum Exact Simulation Toolkit, and compare it to
ProjectQ, qHipster and a recent distributed implementation of Quantum++. QuEST
is the first open source, OpenMP and MPI hybridised, GPU accelerated simulator
of universal quantum circuits. Embodied as a C library, it is designed so that
a user's code can be deployed seamlessly to any platform from a laptop to a
supercomputer. QuEST is capable of simulating generic quantum circuits of
general single-qubit gates and multi-qubit controlled gates, on pure and mixed
states, represented as state-vectors and density matrices, and under the
presence of decoherence. Using the ARCUS Phase-B and ARCHER supercomputers, we
benchmark QuEST's simulation of random circuits of up to 38 qubits, distributed
over up to 2048 compute nodes, each with up to 24 cores. We directly compare
QuEST's performance to ProjectQ's on single machines, and discuss the
differences in distribution strategies of QuEST, qHipster and Quantum++. QuEST
shows excellent scaling, both strong and weak, on multicore and distributed
architectures.Comment: 8 pages, 8 figures; fixed typos; updated QuEST URL and fixed typo in
Fig. 4 caption where ProjectQ and QuEST were swapped in speedup subplot
explanation; added explanation of simulation algorithm, updated bibliography;
stressed technical novelty of QuEST; mentioned new density matrix suppor
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
The present panorama of HPC architectures is extremely heterogeneous, ranging
from traditional multi-core CPU processors, supporting a wide class of
applications but delivering moderate computing performance, to many-core GPUs,
exploiting aggressive data-parallelism and delivering higher performances for
streaming computing applications. In this scenario, code portability (and
performance portability) become necessary for easy maintainability of
applications; this is very relevant in scientific computing where code changes
are very frequent, making it tedious and prone to error to keep different code
versions aligned. In this work we present the design and optimization of a
state-of-the-art production-level LQCD Monte Carlo application, using the
directive-based OpenACC programming model. OpenACC abstracts parallel
programming to a descriptive level, relieving programmers from specifying how
codes should be mapped onto the target architecture. We describe the
implementation of a code fully written in OpenACC, and show that we are able to
target several different architectures, including state-of-the-art traditional
CPUs and GPUs, with the same code. We also measure performance, evaluating the
computing efficiency of our OpenACC code on several architectures, comparing
with GPU-specific implementations and showing that a good level of
performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for
consideration in International Journal of Modern Physics
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy
We present the GPU version of DeePMD-kit, which, upon training a deep neural
network model using ab initio data, can drive extremely large-scale molecular
dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU
version is 7 times faster than the CPU version with the same power consumption.
The code can scale up to the entire Summit supercomputer. For a copper system
of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per
day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such
unprecedented ability to perform MD simulation with ab initio accuracy opens up
the possibility of studying many important issues in materials and molecules,
such as heterogeneous catalysis, electrochemical cells, irradiation damage,
crack propagation, and biochemical reactions.Comment: 29 pages, 11 figure
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations
of pure SU(N) gluodynamics in external magnetic field at finite temperature and
O(N) model is developed. The code is implemented in OpenCL, tested on AMD and
NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices.
The package contains minimal external library dependencies and is OS
platform-independent. It is optimized for heterogeneous computing due to the
possibility of dividing the lattice into non-equivalent parts to hide the
difference in performances of the devices used. QCDGPU has client-server part
for distributed simulations. The package is designed to produce lattice gauge
configurations as well as to analyze previously generated ones. QCDGPU may be
executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL
library for pseudo-random numbers generation on OpenCL-compatible devices,
which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance
Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure
An Improved GPU Simulator For Spiking Neural P Systems
Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
- …