973 research outputs found
On the parallel efficiency of the Frederickson-McBryan multigrid
To take full advantage of the parallelism in a standard multigrid algorithm requires as many processors as points. However, since coarse grids contain fewer points, most processors are idle during the coarse grid iterations. Frederickson and McBryan claim that retaining all points on all grid levels (using all processors) can lead to a superconvergent algorithm. The purpose of this work is to show that the parellel superconvergent multigrid (PSMG) algorithm of Frederickson and McBryan, though it achieves perfect processor utilization, is no more efficient than a parallel implementation of standard multigrid methods. PSMG is simply a new and perhaps simpler way of achieving the same results
Spherical harmonic transform with GPUs
We describe an algorithm for computing an inverse spherical harmonic
transform suitable for graphic processing units (GPU). We use CUDA and base our
implementation on a Fortran90 routine included in a publicly available parallel
package, S2HAT. We focus our attention on the two major sequential steps
involved in the transforms computation, retaining the efficient parallel
framework of the original code. We detail optimization techniques used to
enhance the performance of the CUDA-based code and contrast them with those
implemented in the Fortran90 version. We also present performance comparisons
of a single CPU plus GPU unit with the S2HAT code running on either a single or
4 processors. In particular we find that use of the latest generation of GPUs,
such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms
by as much as 18 times with respect to S2HAT executed on one core, and by as
much as 5.5 with respect to S2HAT on 4 cores, with the overall performance
being limited by the Fast Fourier transforms. The work presented here has been
performed in the context of the Cosmic Microwave Background simulations and
analysis. However, we expect that the developed software will be of more
general interest and applicability
Massively parallel quantum computer simulator, eleven years later
A revised version of the massively parallel simulator of a universal quantum
computer, described in this journal eleven years ago, is used to benchmark
various gate-based quantum algorithms on some of the most powerful
supercomputers that exist today. Adaptive encoding of the wave function reduces
the memory requirement by a factor of eight, making it possible to simulate
universal quantum computers with up to 48 qubits on the Sunway TaihuLight and
on the K computer. The simulator exhibits close-to-ideal weak-scaling behavior
on the Sunway TaihuLight,on the K computer, on an IBM Blue Gene/Q, and on Intel
Xeon based clusters, implying that the combination of parallelization and
hardware can track the exponential scaling due to the increasing number of
qubits. Results of executing simple quantum circuits and Shor's factorization
algorithm on quantum computers containing up to 48 qubits are presented.Comment: Substantially rewritten + new data. Published in Computer Physics
Communicatio
- …