22,263 research outputs found

    Lightweight Diffusion Layer from the kthk^{th} root of the MDS Matrix

    Get PDF
    The Maximum Distance Separable (MDS) mapping, used in cryptography deploys complex Galois field multiplications, which consume lots of area in hardware, making it a costly primitive for lightweight cryptography. Recently in lightweight hash function: PHOTON, a matrix denoted as ‘Serial’, which required less area for multiplication, has been multiplied 4 times to achieve a lightweight MDS mapping. But no efficient method has been proposed so far to synthesize such a serial matrix or to find the required number of repetitive multiplications needed to be performed for a given MDS mapping. In this paper, first we provide an generic algorithm to find out a low-cost matrix, which can be multiplied k times to obtain a given MDS mapping. Further, we optimize the algorithm for using in cryptography and show an explicit case study on the MDS mapping of the hash function PHOTON to obtain the ‘Serial’. The work also presents quite a few results which may be interesting for lightweight implementation

    New Matrix Series Formulae for Matrix Exponentials and for the Solution of Linear Systems of Algebraic Equations

    Get PDF
    The solution of certain differential equations is expressed using a special type of matrix series and is directly related to the solution of general systems of algebraic equations. Efficient formulae for matrix exponentials are derived in terms of rapidly convergent series of the same type. They are essential for two new solution methods, especially beneficial for large linear systems, namely an iterative method and a method based on an exact matrix product formula. The computational complexity of these two methods is analysed, and for both of them, the number of matrix exponential-vector multiplications required for an imposed accuracy can be predetermined in terms of the system condition. The total number of arithmetic operations involved is roughly proportional to n2, where n is the matrix dimension. The common feature of all the series in the results presented is that starting with a first term that is already well-conditioned, each subsequent term is computed by multiplication with an even better conditioned matrix, tending quickly to the identity matrix. This contributes substantially to the stability of the numerical computation. A very efficient method based on the numerical integration of a special kind of differential equations, applicable to even ill-conditioned systems, is also presented

    NVIDIA Tensor Core Programmability, Performance & Precision

    Full text link
    The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 201
    • …
    corecore