Search CORE

16,524 research outputs found

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Author: A Krechel
D Goddeke
G Haase
G Karypis
G Karypis
GE Blelloch
H Grossauer
H Sterck De
J Bolz
N Bell
O Axelsson
O Axelsson
PS Vassilevski
R Courant
TV Kolev
VE Henson
W Joubert
Publication venue
Publication date: 11/02/2013
Field of study

We design and implement a parallel algebraic multigrid method for isotropic graph Laplacian problems on multicore Graphical Processing Units (GPUs). The proposed AMG method is based on the aggregation framework. The setup phase of the algorithm uses a parallel maximal independent set algorithm in forming aggregates and the resulting coarse level hierarchy is then used in a K-cycle iteration solve phase with a

\ell^1

-Jacobi smoother. Numerical tests of a parallel implementation of the method for graphics processors are presented to demonstrate its effectiveness.Comment: 18 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Tsirelson's problem and Kirchberg's conjecture

Author: Choi M. D.
Folland G. B.
Paulsen V.
Pedersen G.-K.
Pisier G.
Stinespring W. F.
Takesaki M.
TOBIAS FRITZ
Tsirelson B. S.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2012
Field of study

Tsirelson's problem asks whether the set of nonlocal quantum correlations with a tensor product structure for the Hilbert space coincides with the one where only commutativity between observables located at different sites is assumed. Here it is shown that Kirchberg's QWEP conjecture on tensor products of C*-algebras would imply a positive answer to this question for all bipartite scenarios. This remains true also if one considers not only spatial correlations, but also spatiotemporal correlations, where each party is allowed to apply their measurements in temporal succession; we provide an example of a state together with observables such that ordinary spatial correlations are local, while the spatiotemporal correlations reveal nonlocality. Moreover, we find an extended version of Tsirelson's problem which, for each nontrivial Bell scenario, is equivalent to the QWEP conjecture. This extended version can be conveniently formulated in terms of steering the system of a third party. Finally, a comprehensive mathematical appendix offers background material on complete positivity, tensor products of C*-algebras, group C*-algebras, and some simple reformulations of the QWEP conjecture.Comment: 57 pages, to appear in Rev. Math. Phy

arXiv.org e-Print Archive

Crossref

MPG.PuRe

cuIBM -- A GPU-accelerated Immersed Boundary Method

Author: Barba Lorena A.
Krishnan Anush
Layton Simon K
Publication venue
Publication date: 08/04/2016
Field of study

A projection-based immersed boundary method is dominated by sparse linear algebra routines. Using the open-source Cusp library, we observe a speedup (with respect to a single CPU core) which reflects the constraints of a bandwidth-dominated problem on the GPU. Nevertheless, GPUs offer the capacity to solve large problems on commodity hardware. This work includes validation and a convergence study of the GPU-accelerated IBM, and various optimizations.Comment: Extended paper post-conference, presented at the 23rd International Conference on Parallel Computational Fluid Dynamics (http://www.parcfd.org), ParCFD 2011, Barcelona (unpublished

arXiv.org e-Print Archive

CiteSeerX

Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices

Author: Baruch E.
Cannon L. E.
Choi J
Choi J.
Choi J.
Solomonik E.
Szabo A.
van de Geijn R. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/10/2015
Field of study

A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization.Scalability of iterative computation of square-root inverse of block-rank-sparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text overlap with arXiv:1504.0504

arXiv.org e-Print Archive

Crossref