Search CORE

1,083 research outputs found

An efficient implementation of the block Gram--Schmidt method

Author: Matsuo Yoichi
Nodera Takashi
Publication venue: Australian Mathematical Society
Publication date: 27/08/2013
Field of study

The block Gram--Schmidt method computes the QR factorisation rapidly, but this is dependent on block size

m

. We endeavor to determine the optimal

m

automatically during one execution. Our algorithm determines

m

through observing the relationship between computation time and complexity. Numerical experiments show that our proposed algorithms compute approximately twice as fast as the block Gram--Schmidt method for some block sizes, and is a viable option for computing the QR factorisation in a more stable and rapid manner. References Bjorck, A., Numerical Methods for Least Squares Problems, SIAM, (1996). Elden, L., and Park, H., Block Downdating of Least Squares Solutions, SIAM J. Matrix Anal. Appl., 15:1018--1034 (1994). doi:10.1137/S089547989223691X Runger, G., and Schwind, M., Comparison of Different Parallel Modified Gram--Schmidt Algorithms, Euro-Par 2005, LNCS 3648:826--836 (2005). doi:10.1007/11549468_90 Katagiri, T., Performance Evaluation of Parallel Gram--Schmidt Re-orthogonalization Methods, VECPAR 2002, LNCS 2565:302--314 (2003). doi:10.1007/3-540-36569-9_19 Matrix Market, Mathematical and Computational Sciences Division, Information Technology Laboratory of the National Institute of Standards and Technology, USA. http://math.nist.gov/MatrixMarket/ Matsuo, Y. and Nodera, T., The Optimal Block-Size for the Block Gram--Schmidt Orthogonalization, J. Sci. Tech, 49:348--354 (2011). Moriya, K. and Nodera, T., The DEFLATED-GMRES(m, k) Method with Switching the Restart Frequency Dynamically, Numer. Linear Alg. Appl., 7:569--584 (2000). doi:10.1002/1099-1506(200010/12)7:7/8<569::AID-NLA213>3.0.CO;2-8 Moriya, K. and Nodera, T., Usage of the convergence test of the residual norm in the Tsuno--Nodera version of the GMRES algorithm, ANZIAM J., 49:293--308 (2007). doi:10.1017/S1446181100012852 Liu, Q., Modified Gram--Schmidt-based Methods for Block Downdating the Cholesky Factorization, J. Comput. Appl. Math., 235:1897--1905 (2011). doi:10.1016/j.cam.2010.09.003 Saad, Y. and Schultz, M. H., GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comput., 7:856--869 (1986). doi:10.1137/0907058 Shiroishi, J. and Nodera, T., A GMRES(

m

) Method with Two Stage Deflated Preconditioners, ANZIAM J., 52:C222--C236 (2011). http://journal.austms.org.au/ojs/index.php/ANZIAMJ/article/view/3984 Leon, S. J., Bjorck, A., and Gander, W., Gram--Schmidt Orthogonalization: 100 years and more, Numer. Linear Algebra Appl., 20:492--532 (2013). doi:10.1002/nla.1839 Stewart, G. W., Block Gram--Schmidt Orthogonalization, SIAM J. Sci. Comput., 31:761--775 (2008). doi:10.1137/070682563 Vanderstraeten, D., An Accurate Parallel Block Gram-Schmidt Algorithm without Reorthogonalization, Numer. Lin. Alg. Appl., 7:219--236 (2000). doi:10.1002/1099-1506(200005)7:4<219::AID-NLA196>3.0.CO;2-L Yokozawa, T., Takahashi, T., Boku, T. and Sato, M., Efficient Parallel Implementation of Classical Gram-Schmidt Orthogonalization Using Matrix Multiplication, (in Japanese) Information Processing Society of Japan (IPSJ), Computing System, 1:61--72 (2008)

Australian Mathematical Society (AustMS): E-Journals

Exploiting Data Representation for Fault Tolerance

Author: Elliott James
Hoemmen Mark
Mueller Frank
Publication venue: 'Elsevier BV'
Publication date: 09/12/2013
Field of study

We explore the link between data representation and soft errors in dot products. We present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. We show how this finding relates to the fundamental linear algebra concepts of normalization and matrix equilibration. We present a case study illustrating that the probability of experiencing a large error in a dot product is minimized when both vectors are normalized. Furthermore, when data is normalized we show that the absolute error is less than one or very large, which allows us to detect large errors. We demonstrate how this finding can be used by instrumenting the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase, and show that when scaling is used the absolute error can be bounded above by one

arXiv.org e-Print Archive

CiteSeerX

Crossref

Daubechies wavelets as a basis set for density functional pseudopotential calculations

Author: Alexander Willand
Alexey Neelov
Anders Bergman
Damien Caliste
Goedecker S.
Luigi Genovese
Mark Rayson
Oded Zilberberg
Pulay P.
Reinhold Schneider
Seyed Alireza Ghasemi
Stefan Goedecker
Strang J.
Thierry Deutsch
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

Daubechies wavelets are a powerful systematic basis set for electronic structure calculations because they are orthogonal and localized both in real and Fourier space. We describe in detail how this basis set can be used to obtain a highly efficient and accurate method for density functional electronic structure calculations. An implementation of this method is available in the ABINIT free software package. This code shows high systematic convergence properties, very good performances and an excellent efficiency for parallel calculations.Comment: 15 pages, 11 figure

arXiv.org e-Print Archive

Crossref

edoc

HAL-CEA

Compressed basis GMRES on high-performance graphics processing units

Author: Aliaga José I.
Anzt Hartwig
Grützmacher Thomas
Quintana-Ortí Enrique S.
Tomás Andrés E.
Publication venue: SAGE Publications
Publication date: 05/08/2022
Field of study

Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication bandwidth in current computer architectures, motivating the investigation of sophisticated techniques to avoid, reduce, and/or hide the message-passing costs (in distributed platforms) and the memory accesses (in all architectures). This article leverages Ginkgo’s memory accessor in order to integrate a communication-reduction strategy into the (Krylov) GMRES solver that decouples the storage format (i.e., the data representation in memory) of the orthogonal basis from the arithmetic precision that is employed during the operations with that basis. Given that the execution time of the GMRES solver is largely determined by the memory accesses, the cost of the datatype transforms can be mostly hidden, resulting in the acceleration of the iterative step via a decrease in the volume of bits being retrieved from memory. Together with the special properties of the orthonormal basis (whose elements are all bounded by 1), this paves the road toward the aggressive customization of the storage format, which includes some floating-point as well as fixed-point formats with mild impact on the convergence of the iterative process. We develop a high-performance implementation of the “compressed basis GMRES” solver in the Ginkgo sparse linear algebra library using a large set of test problems from the SuiteSparse Matrix Collection. We demonstrate robustness and performance advantages on a modern NVIDIA V100 graphics processing unit (GPU) of up to 50% over the standard GMRES solver that stores all data in IEEE double-precision

KITopen

Repositori Institucional de la Universitat Jaume I

Solving large sparse eigenvalue problems on supercomputers

Author: Philippe Bernard
Saad Youcef
Publication venue
Publication date
Field of study

An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed

NASA Technical Reports Server