278 research outputs found

    On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic

    Get PDF
    International audienceWe improve the usual relative error bound for the computation of x^n through iterated multiplications by x in binary floating-point arithmetic. The obtained error bound is only slightly better than the usual one, but it is simpler. We also discuss the more general problem of computing the product of n terms

    Computing Correctly Rounded Integer Powers in Floating-Point Arithmetic

    Get PDF
    23 pagesWe introduce several algorithms for accurately evaluating powers to a positive integer in floating-point arithmetic, assuming a fused multiply-add (fma) instruction is available. We aim at always obtaining correctly-rounded results in round-to-nearest mode, that is, our algorithms return the floating-point number that is nearest the exact value

    On the use of the Infinity Computer architecture to set up a dynamic precision floating-point arithmetic

    Get PDF
    We devise a variable precision floating-point arithmetic by exploiting the framework provided by the Infinity Computer. This is a computational platform implementing the Infinity Arithmetic system, a positional numeral system which can handle both infinite and infinitesimal quantities expressed using the positive and negative finite or infinite powers of the radix 1. The computational features offered by the Infinity Computer allow us to dynamically change the accuracy of representation and floating-point operations during the flow of a computation. When suitably implemented, this possibility turns out to be particularly advantageous when solving ill-conditioned problems. In fact, compared with a standard multi-precision arithmetic, here the accuracy is improved only when needed, thus not affecting that much the overall computational effort. An illustrative example about the solution of a nonlinear equation is also presented

    Exploiting structure in floating-point arithmetic

    Get PDF
    Invited paper - MACIS 2015 (Sixth International Conference on Mathematical Aspects of Computer and Information Sciences)International audienceThe analysis of algorithms in IEEE floating-point arithmetic is most often carried out via repeated applications of the so-called standard model, which bounds the relative error of each basic operation by a common epsilon depending only on the format. While this approach has been eminently useful for establishing many accuracy and stability results, it fails to capture most of the low-level features that make floating-point arithmetic so highly structured. In this paper, we survey some of those properties and how to exploit them in rounding error analysis. In particular, we review some recent improvements of several classical, Wilkinson-style error bounds from linear algebra and complex arithmetic that all rely on such structure properties

    Computing fast and accurate convolutions

    Get PDF
    The analysis of data often models random components as a sum of in- dependent random variables (RVs). These RVs are often assumed to be lattice-valued, either implied by the problem or for computational efficiency. Thus, such analysis typically requires computing, or, more commonly, ap- proximating a portion of the distribution of that sum. Computing the underlying distribution without approximations falls un- der the area of exact tests. These are becoming more popular with continuing increases in both computing power and the size of data sets. For the RVs above, exactly computing the underlying distribution is done via a convolu- tion of their probability mass functions, which reduces to convolving pairs of non-negative vectors. This is conceptually simple, but practical implementations must care- fully consider both speed and accuracy. Such implementations fall prey to the round-off error inherent to floating point arithmetic, risking large rela- tive errors in computed results. There are two main existing algorithms for computing convolutions of vectors: naive convolution (NC) has small bounds on the relative error of each element of the result but has quadratic runtime; while Fast Fourier Transform-based convolution (FFT-C) has almost linear runtime but does not control the relative error of each element, due to the accumulation of round-off error. This essay introduces two novel algorithms for these problems: aFFT-C for computing convolution of two non-negative vectors, and sisFFT for com- puting p-values of sums of independent and identically-distributed lattice- valued RVs. Through a rigorous analysis of round-off error and its accumula- tion, both aFFT-C and sisFFT provide control of the relative error similar to NC, but are typically closer in speed to FFT-C by careful use of FFT-based convolutions and by aggressively discarding irrelevant values. Both accuracy and performance are demonstrated empirically with a variety of examples
    • …
    corecore