Search CORE

3,715 research outputs found

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

Author: Novaković Vedran
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2014
Field of study

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

CiteSeerX

Efficient implementation of the Hardy-Ramanujan-Rademacher formula

Author: Apostol
Borwein
Borwein
Brent
Cipolla
Crandall
Erdős
Knuth
Knuth
Odlyzko
Tonelli
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to allow the partition function

p(n)

to be computed with softly optimal complexity

O(n^{1/2+o(1)})

and very little overhead. A new implementation based on these techniques achieves speedups in excess of a factor 500 over previously published software and has been used by the author to calculate

p(10^{19})

, an exponent twice as large as in previously reported computations. We also investigate performance for multi-evaluation of

p(n)

, where our implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to power series methods on far denser sets of indices than previous implementations. As an application, we determine over 22 billion new congruences for the partition function, extending Weaver's tabulation of 76,065 congruences.Comment: updated version containing an unconditional complexity proof; accepted for publication in LMS Journal of Computation and Mathematic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Complex and Hypercomplex Discrete Fourier Transforms Based on Matrix Exponential Form of Euler's Formula

Author: Ell Todd A.
Sangwine Stephen J.
Publication venue: 'Elsevier BV'
Publication date: 04/07/2011
Field of study

We show that the discrete complex, and numerous hypercomplex, Fourier transforms defined and used so far by a number of researchers can be unified into a single framework based on a matrix exponential version of Euler's formula

e^{j\theta}=\cos\theta+j\sin\theta

, and a matrix root of -1 isomorphic to the imaginary root

j

. The transforms thus defined can be computed using standard matrix multiplications and additions with no hypercomplex code, the complex or hypercomplex algebra being represented by the form of the matrix root of -1, so that the matrix multiplications are equivalent to multiplications in the appropriate algebra. We present examples from the complex, quaternion and biquaternion algebras, and from Clifford algebras Cl1,1 and Cl2,0. The significance of this result is both in the theoretical unification, and also in the scope it affords for insight into the structure of the various transforms, since the formulation is such a simple generalization of the classic complex case. It also shows that hypercomplex discrete Fourier transforms may be computed using standard matrix arithmetic packages without the need for a hypercomplex library, which is of importance in providing a reference implementation for verifying implementations based on hypercomplex code.Comment: The paper has been revised since the second version to make some of the reasons for the paper clearer, to include reviews of prior hypercomplex transforms, and to clarify some points in the conclusion

arXiv.org e-Print Archive

University of Essex Research Repository

The exponentially convergent trapezoidal rule

Author: Trefethen Lloyd N.
Weideman J. A. C.
Publication venue: SIAM
Publication date: 01/01/2013
Field of study

It is well known that the trapezoidal rule converges geometrically when applied to analytic functions on periodic intervals or the real line. The mathematics and history of this phenomenon are reviewed and it is shown that far from being a curiosity, it is linked with computational methods all across scientific computing, including algorithms related to inverse Laplace transforms, special functions, complex analysis, rational approximation, integral equations, and the computation of functions and eigenvalues of matrices and operators

CiteSeerX

Oxford University Research Archive

Quantum algorithm and circuit design solving the Poisson equation

Author: Abramowitz M
Anargyros Papageorgiou
Asmussen S
Berry D W
Childs A M Wiebe N
Evans L C
Forsythe G E
Griffiths D J
Grover L Rudolph T
Iasonas Petras
Joseph Traub
Klappenecker A Roetteler M
LeVeque R J
Leyton S K Osborne T J
Nielsen M A
Papageorgiou A
Pueschel M Roetteler M Beth T
Ritter K
Sabre Kais
Shor P W Goldwasser S
Werschulz A G
Wickerhauser M V
Yudong Cao
Publication venue: 'IOP Publishing'
Publication date: 11/11/2012
Field of study

The Poisson equation occurs in many areas of science and engineering. Here we focus on its numerical solution for an equation in d dimensions. In particular we present a quantum algorithm and a scalable quantum circuit design which approximates the solution of the Poisson equation on a grid with error \varepsilon. We assume we are given a supersposition of function evaluations of the right hand side of the Poisson equation. The algorithm produces a quantum state encoding the solution. The number of quantum operations and the number of qubits used by the circuit is almost linear in d and polylog in \varepsilon^{-1}. We present quantum circuit modules together with performance guarantees which can be also used for other problems.Comment: 30 pages, 9 figures. This is the revised version for publication in New Journal of Physic

arXiv.org e-Print Archive

Crossref

Purdue E-Pubs