128 research outputs found
Massively parallel Poisson and QR factorization solvers
AbstractThe paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder reflections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n × n processor array.The Dirichlet problem is replaced by its finite-difference analog on an M × N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(M N log Ln2), where L = max(M + 1, N). A parallel proposal of QR factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.The algorithms were coded in MPF and MPL parallel programming languages and results of computational experiments on the MasPar MP-1 system are also presented
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Accurate and Efficient Expression Evaluation and Linear Algebra
We survey and unify recent results on the existence of accurate algorithms
for evaluating multivariate polynomials, and more generally for accurate
numerical linear algebra with structured matrices. By "accurate" we mean that
the computed answer has relative error less than 1, i.e., has some correct
leading digits. We also address efficiency, by which we mean algorithms that
run in polynomial time in the size of the input. Our results will depend
strongly on the model of arithmetic: Most of our results will use the so-called
Traditional Model (TM). We give a set of necessary and sufficient conditions to
decide whether a high accuracy algorithm exists in the TM, and describe
progress toward a decision procedure that will take any problem and provide
either a high accuracy algorithm or a proof that none exists. When no accurate
algorithm exists in the TM, it is natural to extend the set of available
accurate operations by a library of additional operations, such as , dot
products, or indeed any enumerable set which could then be used to build
further accurate algorithms. We show how our accurate algorithms and decision
procedure for finding them extend to this case. Finally, we address other
models of arithmetic, and the relationship between (im)possibility in the TM
and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems
© 2021 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] In this article, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [16] and only scaled to 300 processes for the same matrices. Comparing with PDSTEDC in ScaLAPACK, PSDC is always faster and achieves 1.4x-1.6x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.The authors would like to thank the referees for their valuable comments which greatly improve the presentation of this article. This work was supported by National Natural Science Foundation of China (No. NNW2019ZT6-B20, NNW2019ZT6B21, NNW2019ZT5-A10, U1611261, 61872392, and U1811461), National Key RD Program of China (2018YFB0204303), NSF of Hunan (No. 2019JJ40339), NSF of NUDT (No. ZK18-03-01), Guangdong Natural Science Foundation (2018B030312002), and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2016ZT06D211. The work of Jose E. Roman was supported by the Spanish Agencia Estatal de Investigacion (AEI) under project SLEPc-DA (PID2019-107379RB-I00).Liao, X.; Li, S.; Lu, Y.; Román Moltó, JE. (2021). A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems. IEEE Transactions on Parallel and Distributed Systems. 32(2):367-378. https://doi.org/10.1109/TPDS.2020.3019471S36737832
- …