973 research outputs found

    On the parallel efficiency of the Frederickson-McBryan multigrid

    Get PDF
    To take full advantage of the parallelism in a standard multigrid algorithm requires as many processors as points. However, since coarse grids contain fewer points, most processors are idle during the coarse grid iterations. Frederickson and McBryan claim that retaining all points on all grid levels (using all processors) can lead to a superconvergent algorithm. The purpose of this work is to show that the parellel superconvergent multigrid (PSMG) algorithm of Frederickson and McBryan, though it achieves perfect processor utilization, is no more efficient than a parallel implementation of standard multigrid methods. PSMG is simply a new and perhaps simpler way of achieving the same results

    Spherical harmonic transform with GPUs

    Get PDF
    We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability

    Massively parallel quantum computer simulator, eleven years later

    Get PDF
    A revised version of the massively parallel simulator of a universal quantum computer, described in this journal eleven years ago, is used to benchmark various gate-based quantum algorithms on some of the most powerful supercomputers that exist today. Adaptive encoding of the wave function reduces the memory requirement by a factor of eight, making it possible to simulate universal quantum computers with up to 48 qubits on the Sunway TaihuLight and on the K computer. The simulator exhibits close-to-ideal weak-scaling behavior on the Sunway TaihuLight,on the K computer, on an IBM Blue Gene/Q, and on Intel Xeon based clusters, implying that the combination of parallelization and hardware can track the exponential scaling due to the increasing number of qubits. Results of executing simple quantum circuits and Shor's factorization algorithm on quantum computers containing up to 48 qubits are presented.Comment: Substantially rewritten + new data. Published in Computer Physics Communicatio
    corecore