101 research outputs found
A note on structured pseudospectra
AbstractIn this note, we study the notion of structured pseudospectra. We prove that for Toeplitz, circulant, Hankel and symmetric structures, the structured pseudospectrum equals the unstructured pseudospectrum. We show that this is false for Hermitian and skew-Hermitian structures. We generalize the result to pseudospectra of matrix polynomials. Indeed, we prove that the structured pseudospectrum equals the unstructured pseudospectrum for matrix polynomials with Toeplitz, circulant, Hankel and symmetric structures. We conclude by giving a formula for structured pseudospectra of real matrix polynomials. The particular type of perturbations used for these pseudospectra arise in control theory
Compensated Horner Scheme
Using error-free transformations, we improve the classic Horner Scheme (HS)
to evaluate (univariate) polynomials in floating point arithmetic.
We prove that this Compensated Horner Scheme (CHS) is as accurate as HS
performed with twice the working precision.
Theoretical analysis and experiments exhibit a reasonable running time
overhead being also more interesting than double-double implementations.
We introduce a dynamic and validated error bound of the CHS computed value.
The talk presents these results together with a survey about error-free
transformations and related hypothesis
Algorithms for Accurate, Validated and Fast Polynomial Evaluation
International audienceWe survey a class of algorithms to evaluate polynomials with floating point coefficients and for computation performed with IEEE-754 floating point arithmetic. The principle is to apply, once or recursively, an error-free transformation of the polynomial evaluation with the Horner algorithm and to accurately sum the final decomposition. These compensated algorithms are as accurate as the Horner algorithm performed in K times the working precision, for K an arbitrary integer. We prove this accuracy property with an \apriori error analysis. We also provide validated dynamic bounds and apply these results to compute a faithfully rounded evaluation. These compensated algorithms are fast. We illustrate their practical efficiency with numerical experiments on significant environments. Comparing to existing alternatives these K-times compensated algorithms are competitive for K up to 4, i.e., up to 212 mantissa bits
On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic
International audienceWe improve the usual relative error bound for the computation of x^n through iterated multiplications by x in binary floating-point arithmetic. The obtained error bound is only slightly better than the usual one, but it is simpler. We also discuss the more general problem of computing the product of n terms
General Framework for Deriving Reproducible Krylov Subspace Algorithms: BiCGStab Case
Parallel implementations of Krylov subspace algorithms often help to accelerate the procedure to find the solution of a linear system. However, from the other side, such parallelization coupled with
asynchronous and out-of-order execution often enlarge the non-associativity
of floating-point operations. This results in non-reproducibility on the
same or different settings. This paper proposes a general framework for
deriving reproducible and accurate variants of a Krylov subspace algorithm. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The
framework is illustrated on the preconditioned BiCGStab method for the
solution of non-symmetric linear systems with message-passing. Finally,
we verify the two reproducible variants of PBiCGStab on a set matrices
from the SuiteSparse Matrix Collection and a 3D Poissonâs equation
Reproducible Triangular Solvers for High-Performance Computing
On modern parallel architectures, floating-point computations may become non-deterministic and, therefore, non-reproducible mainly due to non-associativity of floating-point operations. We propose an algorithm to solve dense triangular systems by leveraging the standard parallel triangular solver and our, recently introduced, multi-level exact summation approach. Finally, we present implementations of the proposed fast repro-ducible triangular solver and results on recent NVIDIA GPUs
A Reproducible Accurate Summation Algorithm for High-Performance Computing
International audienceFloating-point (FP) addition is non-associative and parallel reduction involving this operation is a serious issue as noted in the DARPA Exascale Report [1]. Such large summations typically appear within fundamental numerical blocks such as dot products or numerical integrations. Hence, the result may vary from one parallel machine to another or even from one run to another. These discrepancies worsen on heterogeneous architectures â such as clusters with GPUs or Intel Xeon Phi processors â which combine programming environments that may obey various floating-point models and offer different intermediate precision or different operators [2,3]. Such non-determinism of floating-point calculations in parallel programs causes validation and debugging issues, and may lead to deadlocks [4]. The increasing power of current computers enables one to solve more and more complex problems. That, consequently, leads to a higher number of floating-point operations to be performed; each of them potentially causing a round-off error. Because of the round-off error propagation, some problems must be solved with a wider floating-point format. Two approaches exist to perform floating-point addition without incurring round-off errors. The first approach aims at computing the error that is occurred during rounding using FP expansions, which are based on an error-free transformation. FP expansions represent the result as an unevaluated sum of a fixed number of FP numbers, whose components are ordered in magnitude with minimal overlap to cover a wide range of exponents. FP expansions of sizes 2 and 4 are presented in [5] and [6], accordingly. The main advantage of this solution is that the expansion can stay in registers during the computations. But, the accuracy is insufficient for the summation of numerous FP numbers or sums with a huge dynamic range. Moreover, their complexity grows linearly with the size of the expansion. An alternative approach to expansions exploits the finite range of representable floating-point numbers by storing every bit in a very long vector of bits (accumulator). The length of the accumulator is chosen such that every bit of information of the input format can be represented; this covers the range from the minimum representable floating-point value to the maximum value, independently of the sign. For instance, Kulisch [7] proposed to utilize an accumulator of 4288 bits to handle the accumulation of products of 64-bit IEEE floating-point values. The Kulisch accumulator is a solution to produce the exact result of a very large amount of floating-point numbers of arbitrary magnitude. However, for a long period this approach was considered impractical as it induces a very large memory overhead. Furthermore, without dedicated hardware support, its performance is limited by indirect memory accesses that make vectorization challenging. We aim at addressing both issues of accuracy and reproducibility in the context of summation. We advocate to compute the correctly-rounded result of the exact sum. Besides offering strict reproducibility through an unambiguous definition of the expected result, our approach guarantees that the result ha
Numerical validation in quadruple precision using stochastic arithmetic
International audienceDiscrete Stochastic Arithmetic (DSA) enables one to estimate rounding errors and to detect numerical instabilities in simulation programs. DSA is implemented in the CADNA library that can analyze the numerical quality of single and double precision programs. In this article, we show how the CADNA library has been improved to enable the estimation of rounding errors in programs using quadruple precision floating-point variables, i.e. having 113-bit mantissa length. Although an implementation of DSA called SAM exists for arbitrary precision programs, a significant performance improvement has been obtained with CADNA compared to SAM for the numerical validation of programs with 113-bit mantissa length variables. This new version of CADNA has been sucessfully used for the control of accuracy in quadruple precision applications, such as a chaotic sequence and the computation of multiple roots of polynomials. We also describe a new version of the PROMISE tool, based on CADNA, that aimed at reducing in numerical programs the number of double precision variable declarations in favor of single precision ones, taking into account a requested accuracy of the results. The new version of PROMISE can now provide type declarations mixing single, double and quadruple precision
- âŠ