4,033 research outputs found
Metropolis-Hastings within Partially Collapsed Gibbs Samplers
The Partially Collapsed Gibbs (PCG) sampler offers a new strategy for
improving the convergence of a Gibbs sampler. PCG achieves faster convergence
by reducing the conditioning in some of the draws of its parent Gibbs sampler.
Although this can significantly improve convergence, care must be taken to
ensure that the stationary distribution is preserved. The conditional
distributions sampled in a PCG sampler may be incompatible and permuting their
order may upset the stationary distribution of the chain. Extra care must be
taken when Metropolis-Hastings (MH) updates are used in some or all of the
updates. Reducing the conditioning in an MH within Gibbs sampler can change the
stationary distribution, even when the PCG sampler would work perfectly if MH
were not used. In fact, a number of samplers of this sort that have been
advocated in the literature do not actually have the target stationary
distributions. In this article, we illustrate the challenges that may arise
when using MH within a PCG sampler and develop a general strategy for using
such updates while maintaining the desired stationary distribution. Theoretical
arguments provide guidance when choosing between different MH within PCG
sampling schemes. Finally we illustrate the MH within PCG sampler and its
computational advantage using several examples from our applied work
A GPU-based hyperbolic SVD algorithm
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm,
using a massively parallel graphics processing unit (GPU), is developed. The
algorithm also serves as the final stage of solving a symmetric indefinite
eigenvalue problem. Numerical testing demonstrates the gains in speed and
accuracy over sequential and MPI-parallelized variants of similar Jacobi-type
HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are
discussed.Comment: Accepted for publication in BIT Numerical Mathematic
Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many
sparse solvers. We investigate performance properties of spMVM with matrices of
various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded
jagged diagonals storage" (pJDS) format is proposed which may substantially
reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our
test scenarios the pJDS format cuts the overall spMVM memory footprint on the
GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance.
Using a suitable performance model we identify performance bottlenecks on the
node level that invalidate some types of matrix structures for efficient
multi-GPGPU parallelization. For appropriate sparsity patterns we extend
previous work on distributed-memory parallel spMVM to demonstrate a scalable
hybrid MPI-GPGPU code, achieving efficient overlap of communication and
computation.Comment: 10 pages, 5 figures. Added reference to other recent sparse matrix
format
- …