29 research outputs found
Parallel bidiagonalization of a dense matrix
A new stable method for the reduction of rectangular dense matrices to
bidiagonal form has been proposed recently. This is a one-sided method since
it can be entirely expressed in terms of operations with (full) columns of
the matrix under transformation. The algorithm is well suited to parallel
computing and, in order to make it even more attractive for distributed
memory systems, we introduce a modification which halves the number of
communication instances. In this paper we present such a modification. A
block organization of the algorithm to use level~3 BLAS routines seems
difficult and, at least for the moment, it relies upon level~2 BLAS
routines. Nevertheless, we found that our sequential code is competitive
with the LAPACK DGEBRD routine. We also compare the time taken by our
parallel codes and the ScaLAPACK PDGEBRD routine. We investigated the best
data distribution schemes for the different codes and we can state that our
parallel codes are also competitive with the ScaLAPACK routine.Fundação para a Ciência e a Tecnologia (FCT) - programa POCI 2010
A GPU-based hyperbolic SVD algorithm
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm,
using a massively parallel graphics processing unit (GPU), is developed. The
algorithm also serves as the final stage of solving a symmetric indefinite
eigenvalue problem. Numerical testing demonstrates the gains in speed and
accuracy over sequential and MPI-parallelized variants of similar Jacobi-type
HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are
discussed.Comment: Accepted for publication in BIT Numerical Mathematic
Deterministic algorithms for the low rank approximation of matrices
Cours sur invitation donné lors de l'Action Nationale de Formation CNRS intitulée: "Réduction de la dimension dans la fouille de données massives : enjeux, méthodes et outils pour le calcul.
A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units
We present a hierarchically blocked one-sided Jacobi algorithm for the
singular value decomposition (SVD), targeting both single and multiple graphics
processing units (GPUs). The blocking structure reflects the levels of GPU's
memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining
high relative accuracy. To this end, we developed a family of parallel pivot
strategies on GPU's shared address space, but applicable also to inter-GPU
communication. Unlike common hybrid approaches, our algorithm in a single GPU
setting needs a CPU for the controlling purposes only, while utilizing GPU's
resources to the fullest extent permitted by the hardware. When required by the
problem size, the algorithm, in principle, scales to an arbitrary number of GPU
nodes. The scalability is demonstrated by more than twofold speedup for
sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single
Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure