3,715 research outputs found

    A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

    Full text link
    We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

    Efficient implementation of the Hardy-Ramanujan-Rademacher formula

    Full text link
    We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to allow the partition function p(n)p(n) to be computed with softly optimal complexity O(n1/2+o(1))O(n^{1/2+o(1)}) and very little overhead. A new implementation based on these techniques achieves speedups in excess of a factor 500 over previously published software and has been used by the author to calculate p(1019)p(10^{19}), an exponent twice as large as in previously reported computations. We also investigate performance for multi-evaluation of p(n)p(n), where our implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to power series methods on far denser sets of indices than previous implementations. As an application, we determine over 22 billion new congruences for the partition function, extending Weaver's tabulation of 76,065 congruences.Comment: updated version containing an unconditional complexity proof; accepted for publication in LMS Journal of Computation and Mathematic

    Complex and Hypercomplex Discrete Fourier Transforms Based on Matrix Exponential Form of Euler's Formula

    Get PDF
    We show that the discrete complex, and numerous hypercomplex, Fourier transforms defined and used so far by a number of researchers can be unified into a single framework based on a matrix exponential version of Euler's formula ejθ=cosθ+jsinθe^{j\theta}=\cos\theta+j\sin\theta, and a matrix root of -1 isomorphic to the imaginary root jj. The transforms thus defined can be computed using standard matrix multiplications and additions with no hypercomplex code, the complex or hypercomplex algebra being represented by the form of the matrix root of -1, so that the matrix multiplications are equivalent to multiplications in the appropriate algebra. We present examples from the complex, quaternion and biquaternion algebras, and from Clifford algebras Cl1,1 and Cl2,0. The significance of this result is both in the theoretical unification, and also in the scope it affords for insight into the structure of the various transforms, since the formulation is such a simple generalization of the classic complex case. It also shows that hypercomplex discrete Fourier transforms may be computed using standard matrix arithmetic packages without the need for a hypercomplex library, which is of importance in providing a reference implementation for verifying implementations based on hypercomplex code.Comment: The paper has been revised since the second version to make some of the reasons for the paper clearer, to include reviews of prior hypercomplex transforms, and to clarify some points in the conclusion

    The exponentially convergent trapezoidal rule

    Get PDF
    It is well known that the trapezoidal rule converges geometrically when applied to analytic functions on periodic intervals or the real line. The mathematics and history of this phenomenon are reviewed and it is shown that far from being a curiosity, it is linked with computational methods all across scientific computing, including algorithms related to inverse Laplace transforms, special functions, complex analysis, rational approximation, integral equations, and the computation of functions and eigenvalues of matrices and operators

    Quantum algorithm and circuit design solving the Poisson equation

    Get PDF
    The Poisson equation occurs in many areas of science and engineering. Here we focus on its numerical solution for an equation in d dimensions. In particular we present a quantum algorithm and a scalable quantum circuit design which approximates the solution of the Poisson equation on a grid with error \varepsilon. We assume we are given a supersposition of function evaluations of the right hand side of the Poisson equation. The algorithm produces a quantum state encoding the solution. The number of quantum operations and the number of qubits used by the circuit is almost linear in d and polylog in \varepsilon^{-1}. We present quantum circuit modules together with performance guarantees which can be also used for other problems.Comment: 30 pages, 9 figures. This is the revised version for publication in New Journal of Physic
    corecore