Search CORE

22,263 research outputs found

Recommended from our members

On the Optimal Solution of Large Linear Systems

Author: Traub Joseph F.
Wozniakowski Henryk
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1982
Field of study

The information-based study of the optimal solution of large linear systems is initiated by studying the case of Krylov information. Among the algorithms which use Krylov information are minimal residual, conjugate gradient, Chebyshev, and successive approximation algorithms. A "sharp" lower bound on the number of matrix-vector multiplications required to compute an E- approximation is obtained for any orthogonally invariant class of matrices. Examples of such classes include many of practical interest such as symmetric matrices, symmetric positive definite matrices, and matrices with bounded condition number. It is shown that the minimal residual algorithm is within at most one matrix-vector multiplication of the lower bound. A similar result is obtained for the generalized minimal residual algorithm. The lower bound is computed for certain classes of orthogonally invariant matrices. We show how the lack of certain properties (symmetry, positive definiteness) increases the lower bound. A conjecture and a number of open problems are stated

Columbia University Academic Commons

Lightweight Diffusion Layer from the $k^{th}$ root of the MDS Matrix

Author: Debdeep Mukhopadhyay
Souvik Kolay
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 26/06/2014
Field of study

The Maximum Distance Separable (MDS) mapping, used in cryptography deploys complex Galois field multiplications, which consume lots of area in hardware, making it a costly primitive for lightweight cryptography. Recently in lightweight hash function: PHOTON, a matrix denoted as ‘Serial’, which required less area for multiplication, has been multiplied 4 times to achieve a lightweight MDS mapping. But no efficient method has been proposed so far to synthesize such a serial matrix or to find the required number of repetitive multiplications needed to be performed for a given MDS mapping. In this paper, first we provide an generic algorithm to find out a low-cost matrix, which can be multiplied k times to obtain a given MDS mapping. Further, we optimize the algorithm for using in cryptography and show an explicit case study on the MDS mapping of the hash function PHOTON to obtain the ‘Serial’. The work also presents quite a few results which may be interesting for lightweight implementation

CiteSeerX

Cryptology ePrint Archive

New Matrix Series Formulae for Matrix Exponentials and for the Solution of Linear Systems of Algebraic Equations

Author: Ciric Ioan R.
Publication venue: 'IntechOpen'
Publication date: 30/09/2019
Field of study

The solution of certain differential equations is expressed using a special type of matrix series and is directly related to the solution of general systems of algebraic equations. Efficient formulae for matrix exponentials are derived in terms of rapidly convergent series of the same type. They are essential for two new solution methods, especially beneficial for large linear systems, namely an iterative method and a method based on an exact matrix product formula. The computational complexity of these two methods is analysed, and for both of them, the number of matrix exponential-vector multiplications required for an imposed accuracy can be predetermined in terms of the system condition. The total number of arithmetic operations involved is roughly proportional to n2, where n is the matrix dimension. The common feature of all the series in the results presented is that starting with a first term that is already well-conditioned, each subsequent term is computed by multiplication with an even better conditioned matrix, tending quickly to the identity matrix. This contributes substantially to the stability of the numerical computation. A very efficient method based on the numerical integration of a special kind of differential equations, applicable to even ill-conditioned systems, is also presented

IntechOpen

Crossref

NVIDIA Tensor Core Programmability, Performance & Precision

Author: Der Chien Steven Wei
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Vetter Jeffrey S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2018
Field of study

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 201

arXiv.org e-Print Archive

Crossref