Search CORE

7,217 research outputs found

Radix Conversion for IEEE754-2008 Mixed Radix Floating-Point Arithmetic

Author: Kupriianova O.
Lauter Ch.
Muller J. -M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/11/2013
Field of study

Conversion between binary and decimal floating-point representations is ubiquitous. Floating-point radix conversion means converting both the exponent and the mantissa. We develop an atomic operation for FP radix conversion with simple straight-line algorithm, suitable for hardware design. Exponent conversion is performed with a small multiplication and a lookup table. It yields the correct result without error. Mantissa conversion uses a few multiplications and a small lookup table that is shared amongst all types of conversions. The accuracy changes by adjusting the computing precision

arXiv.org e-Print Archive

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Secure Numerical and Logical Multi Party Operations

Author: Lu Bin
Schneider Johannes
Publication venue
Publication date: 19/07/2017
Field of study

We derive algorithms for efficient secure numerical and logical operations using a recently introduced scheme for secure multi-party computation~\cite{sch15} in the semi-honest model ensuring statistical or perfect security. To derive our algorithms for trigonometric functions, we use basic mathematical laws in combination with properties of the additive encryption scheme in a novel way. For division and logarithm we use a new approach to compute a Taylor series at a fixed point for all numbers. All our logical operations such as comparisons and large fan-in AND gates are perfectly secure. Our empirical evaluation yields speed-ups of more than a factor of 100 for the evaluated operations compared to the state-of-the-art

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations

Author: Furber Steve
Hopkins Michael
Lester Dave R.
Mikaitis Mantas
Publication venue: 'The Royal Society'
Publication date: 01/01/2020
Field of study

Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and memory footprint and bandwidth. However, simply switching to lower-precision types typically results in increased numerical errors. We investigate approaches to improving the accuracy of reduced-precision fixed-point arithmetic types, using examples in an important domain for numerical computation in neuroscience: the solution of Ordinary Differential Equations (ODEs). The Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms. In particular, fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single precision floating-point and fixed-point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers. A computationally much cheaper alternative is also investigated, inspired by the concept of dither that is a widely understood mechanism for providing resolution below the least significant bit (LSB) in digital signal processing. These results will have implications for the solution of ODEs in other subject areas, and should also be directly relevant to the huge range of practical problems that are represented by Partial Differential Equations (PDEs).Comment: Submitted to Philosophical Transactions of the Royal Society

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

Author: Bientinesi Paolo
Petschow Matthias
Quintana-Orti Enrique
Publication venue
Publication date: 01/01/2013
Field of study

The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Publikationsserver der RWTH Aachen University

NVIDIA Tensor Core Programmability, Performance & Precision

Author: Der Chien Steven Wei
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Vetter Jeffrey S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2018
Field of study

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 201

arXiv.org e-Print Archive

Crossref

Computational Complexity of Iterated Maps on the Interval (Extended Abstract)

Author: Anatole Katok
Christoph Spandl
G"otz Alefeld
Helmut Ratschek
IEEE 1987
James H. Wilkinson
Jens Blanck
Jens Blanck
Klaus Weihrauch
Laurent Fousse
Morris W. Hirsch
N. Revol
Ning Zhong
Norbert Th. Müller
Norbert Th. Müller
Pierre Collet
Pierre Collet
Robert L. Devaney
Vasco Brattka
Xizhong Zheng
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2010
Field of study

The exact computation of orbits of discrete dynamical systems on the interval is considered. Therefore, a multiple-precision floating point approach based on error analysis is chosen and a general algorithm is presented. The correctness of the algorithm is shown and the computational complexity is analyzed. As a main result, the computational complexity measure considered here is related to the Ljapunow exponent of the dynamical system under consideration

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals