128 research outputs found

    GPU-accelerated algorithms for many-particle continuous-time quantum walks

    Get PDF
    Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge–Kutta integration. We prove that both Taylor-series expansion and Runge–Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device. Program summary Program Title: cuQuWa Licensing provisions: GNU General Public License, version 3 Program Files doi: http://dx.doi.org/10.17632/vjpnjgycdj.1 Programming language: CUDA C Nature of problem: Evolution of many-particle continuous-time quantum-walks on a multidimensional grid in a noisy environment. The submitted code is specialized for the simulation of 2-particle quantum-walks with periodic boundary conditions. Solution method: Taylor-series expansion of the evolution operator. The density-matrix is calculated by averaging multiple independent realizations of the system. External routines: cuBLAS, cuRAND Unusual features: Simulations are run exclusively on the graphic processing unit within the CUDA environment. An undocumented misbehavior in the random-number generation routine (cuRAND package) can corrupt the simulation of large systems, though no problems are reported for small and medium-size systems. Compiling the code with the -arch=sm_30 flag for compute capability 3.5 and above fixes this issue

    GPU-accelerated algorithms for many-particle continuous-time quantum walks

    Get PDF
    Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e.\ua0Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge\u2013Kutta integration. We prove that both Taylor-series expansion and Runge\u2013Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device. Program summary Program Title: cuQuWa Licensing provisions: GNU General Public License, version 3 Program Files doi: http://dx.doi.org/10.17632/vjpnjgycdj.1 Programming language: CUDA C Nature of problem: Evolution of many-particle continuous-time quantum-walks on a multidimensional grid in a noisy environment. The submitted code is specialized for the simulation of 2-particle quantum-walks with periodic boundary conditions. Solution method: Taylor-series expansion of the evolution operator. The density-matrix is calculated by averaging multiple independent realizations of the system. External routines: cuBLAS, cuRAND Unusual features: Simulations are run exclusively on the graphic processing unit within the CUDA environment. An undocumented misbehavior in the random-number generation routine (cuRAND package) can corrupt the simulation of large systems, though no problems are reported for small and medium-size systems. Compiling the code with the -arch=sm_30 flag for compute capability 3.5 and above fixes this issue

    Tools and Selected Applications

    Get PDF

    Development of a Chemically Reacting Flow Solver on the Graphic Processing Units

    Get PDF
    The focus of the current research is to develop a numerical framework on the Graphic Processing Units (GPU) capable of modeling chemically reacting flow. The framework incorporates a high-order finite volume method coupled with an implicit solver for the chemical kinetics. Both the fluid solver and the kinetics solver are designed to take advantage of the GPU architecture to achieve high performance. The structure of the numerical framework is shown, detailing different aspects of the optimization implemented on the solver. The mathematical formulation of the core algorithms is presented along with a series of standard test cases, including both nonreactive and reactive flows, in order to validate the capability of the numerical solver. The performance results obtained with the current framework show the parallelization efficiency of the solver and emphasize the capability of the GPU in performing scientific calculations. Distribution A: Approved for public release; distribution unlimited. PA #1117

    Massively parallel split-step Fourier techniques for simulating quantum systems on graphics processing units

    Get PDF
    The split-step Fourier method is a powerful technique for solving partial differential equations and simulating ultracold atomic systems of various forms. In this body of work, we focus on several variations of this method to allow for simulations of one, two, and three-dimensional quantum systems, along with several notable methods for controlling these systems. In particular, we use quantum optimal control and shortcuts to adiabaticity to study the non-adiabatic generation of superposition states in strongly correlated one-dimensional systems, analyze chaotic vortex trajectories in two dimensions by using rotation and phase imprinting methods, and create stable, threedimensional vortex structures in Bose–Einstein condensates through artificial magnetic fields generated by the evanescent field of an optical nanofiber. We also discuss algorithmic optimizations for implementing the split-step Fourier method on graphics processing units. All computational methods present in this work are demonstrated on physical systems and have been incorporated into a state-of-the-art and open-source software suite known as GPUE, which is currently the fastest quantum simulator of its kind.Okinawa Institute of Science and Technology Graduate Universit

    Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing

    Get PDF
    Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future

    High-performance tsunami modelling with modern GPU technology

    Get PDF
    PhD ThesisEarthquake-induced tsunamis commonly propagate in the deep ocean as long waves and develop into sharp-fronted surges moving rapidly coastward, which may be effectively simulated by hydrodynamic models solving the nonlinear shallow water equations (SWEs). Tsunamis can cause substantial economic and human losses, which could be mitigated through early warning systems given efficient and accurate modelling. Most existing tsunami models require long simulation times for real-world applications. This thesis presents a graphics processing unit (GPU) accelerated finite volume hydrodynamic model using the compute unified device architecture (CUDA) for computationally efficient tsunami simulations. Compared with a standard PC, the model is able to reduce run-time by a factor of > 40. The validated model is used to reproduce the 2011 Japan tsunami. Two source models were tested, one based on tsunami waveform inversion and another using deep-ocean tsunameters. Vertical sea surface displacement is computed by the Okada model, assuming instantaneous sea-floor deformation. Both source models can reproduce the wave propagation at offshore and nearshore gauges, but the tsunameter-based model better simulates the first wave amplitude. Effects of grid resolutions between 450-3600 m, slope limiters, and numerical accuracy are also investigated for the simulation of the 2011 Japan tsunami. Grid resolutions of 1-2 km perform well with a proper limiter; the Sweby limiter is optimal for coarser resolutions, recovers wave peaks better than minmod, and is more numerically stable than Superbee. One hour of tsunami propagation can be predicted in 50 times on a regular low-cost PC-hosted GPU, compared to a single CPU. For 450 m resolution on a larger-memory server-hosted GPU, performance increased by ~70 times. Finally, two adaptive mesh refinement (AMR) techniques including simplified dynamic adaptive grids on CPU and a static adaptive grid on GPU are introduced to provide multi-scale simulations. Both can reduce run-time by ~3 times while maintaining acceptable accuracy. The proposed computationally-efficient tsunami model is expected to provide a new practical tool for tsunami modelling for different purposes, including real-time warning, evacuation planning, risk management and city planning

    ZASTOSOWANIE BIBLIOTEK NUMERYCZNYCH W OBLICZENIACH MEB

    Get PDF
    Numerical library usage effectively reduce computation time and facilitate code programming. There are modified versions of popular BLAS and LAPACK libraries, dedicated to multi-core and distributed programming respectively PBLAS and SCALAPACK. Currently, a similar development applies to the GPU programming in two major implementations of GPGPU: NVIDIA CUDA and Kronos / ATI OpenCL. In the same time hybrid CPU-GPU versions of these libraries are intensively developed, a good example of that is MAGMA. This paper will present the effects of some of those libraries implementation used to solve the two-dimensional planar capacitor model by the boundary element method with constant boundary elements.Zastosowanie bibliotek numerycznych pozwala na znaczne skrócenie czasu obliczeń i ułatwienie pisania kodu programu. Popularne biblioteki BLAS i LAPACK doczekały się dojrzałych implementacji pozwalających na wykorzystanie procesorów wielordzeniowych i środowisk obliczeń rozproszonych w postaci odpowiednio PBLAS i SCALAPACK. Aktualnie podobny proces rozwoju dotyczy środowisk związanych z obliczeniami wykonywanymi na procesorach GPU w dwóch głównych implementacjach GPGPU: NVIDIA CUDA i Kronos/ATI OpenCL. Równolegle z rozwojem tych ostatnich toczą się prace nad mieszanymi CPU-GPU wersjami tych bibliotek czego doskonałym przykładem jest MAGMA. W artykule przedstawione zostaną efekty implementacji kilku wybranych bibliotek z tego zakresu zastosowanych do rozwiązania dwuwymiarowego modelu kondensatora płaskiego metodą elementów brzegowych wykorzystującą stałe elementy brzegowe

    Quantile mechanics II: changes of variables in Monte Carlo methods and GPU-optimised normal quantiles

    Get PDF
    With financial modelling requiring a better understanding of model risk, it is helpful to be able to vary assumptions about underlying probability distributions in an efficient manner, preferably without the noise induced by resampling distributions managed by Monte Carlo methods. This paper presents differential equations and solution methods for the functions of the form Q(x) = F −1(G(x)), where F and G are cumulative distribution functions. Such functions allow the direct recycling of Monte Carlo samples from one distribution into samples from another. The method may be developed analytically for certain special cases, and illuminate the idea that it is a more precise form of the traditional Cornish–Fisher expansion. In this manner the model risk of distributional risk may be assessed free of the Monte Carlo noise associated with resampling. The method may also be regarded as providing both analytical and numerical bases for doing more precise Cornish–Fisher transformations. Examples are given of equations for converting normal samples to Student t, and converting exponential to normal. In the case of the normal distribution, the change of variables employed allows the sampling to take place to good accuracy based on a single rational approximation over a very wide range of sample space. The avoidance of branching statements is of use in optimal graphics processing unit (GPU) computations as it avoids the effect of branch divergence. We give a branch-free normal quantile that offers performance improvements in a GPU environment while retaining the best precision characteristics of well-known methods. We also offer models with low probability branch divergence. Comparisons of new and existing forms are made on Nvidia GeForce GTX Titan and Tesla C2050 GPUs. We argue that in both single- and double-precisions, the change-of-variables approach offers the most GPU-optimal Gaussian quantile yet, working faster than the Cuda 5.5 built-in function
    corecore