16,524 research outputs found
Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs
We design and implement a parallel algebraic multigrid method for isotropic
graph Laplacian problems on multicore Graphical Processing Units (GPUs). The
proposed AMG method is based on the aggregation framework. The setup phase of
the algorithm uses a parallel maximal independent set algorithm in forming
aggregates and the resulting coarse level hierarchy is then used in a K-cycle
iteration solve phase with a -Jacobi smoother. Numerical tests of a
parallel implementation of the method for graphics processors are presented to
demonstrate its effectiveness.Comment: 18 pages, 3 figure
Tsirelson's problem and Kirchberg's conjecture
Tsirelson's problem asks whether the set of nonlocal quantum correlations
with a tensor product structure for the Hilbert space coincides with the one
where only commutativity between observables located at different sites is
assumed. Here it is shown that Kirchberg's QWEP conjecture on tensor products
of C*-algebras would imply a positive answer to this question for all bipartite
scenarios. This remains true also if one considers not only spatial
correlations, but also spatiotemporal correlations, where each party is allowed
to apply their measurements in temporal succession; we provide an example of a
state together with observables such that ordinary spatial correlations are
local, while the spatiotemporal correlations reveal nonlocality. Moreover, we
find an extended version of Tsirelson's problem which, for each nontrivial Bell
scenario, is equivalent to the QWEP conjecture. This extended version can be
conveniently formulated in terms of steering the system of a third party.
Finally, a comprehensive mathematical appendix offers background material on
complete positivity, tensor products of C*-algebras, group C*-algebras, and
some simple reformulations of the QWEP conjecture.Comment: 57 pages, to appear in Rev. Math. Phy
cuIBM -- A GPU-accelerated Immersed Boundary Method
A projection-based immersed boundary method is dominated by sparse linear
algebra routines. Using the open-source Cusp library, we observe a speedup
(with respect to a single CPU core) which reflects the constraints of a
bandwidth-dominated problem on the GPU. Nevertheless, GPUs offer the capacity
to solve large problems on commodity hardware. This work includes validation
and a convergence study of the GPU-accelerated IBM, and various optimizations.Comment: Extended paper post-conference, presented at the 23rd International
Conference on Parallel Computational Fluid Dynamics (http://www.parcfd.org),
ParCFD 2011, Barcelona (unpublished
Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices
A task-based formulation of Scalable Universal Matrix Multiplication
Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is
applied to the multiplication of hierarchy-free, rank-structured matrices that
appear in the domain of quantum chemistry (QC). The novel features of our
formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and
(2) fine-grained task-based composition. These features make it tolerant of the
load imbalance due to the irregular matrix structure and eliminate all
artifactual sources of global synchronization.Scalability of iterative
computation of square-root inverse of block-rank-sparse QC matrices is
demonstrated; for full-rank (dense) matrices the performance of our SUMMA
formulation usually exceeds that of the state-of-the-art dense MM
implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text
overlap with arXiv:1504.0504
- …