130 research outputs found
Finding the "truncated" polynomial that is closest to a function
When implementing regular enough functions (e.g., elementary or special
functions) on a computing system, we frequently use polynomial approximations.
In most cases, the polynomial that best approximates (for a given distance and
in a given interval) a function has coefficients that are not exactly
representable with a finite number of bits. And yet, the polynomial
approximations that are actually implemented do have coefficients that are
represented with a finite - and sometimes small - number of bits: this is due
to the finiteness of the floating-point representations (for software
implementations), and to the need to have small, hence fast and/or inexpensive,
multipliers (for hardware implementations). We then have to consider polynomial
approximations for which the degree- coefficient has at most
fractional bits (in other words, it is a rational number with denominator
). We provide a general method for finding the best polynomial
approximation under this constraint. Then, we suggest refinements than can be
used to accelerate our method.Comment: 14 pages, 1 figur
Correctly rounded multiplication by arbitrary precision constants
We introduce an algorithm for multiplying a floating-point number by a constant that is not exactly representable in floating-point arithmetic. Our algorithm uses a multiplication and a fused multiply accumulate instruction. We give methods for checking whether, for a given value of and a given floating-point format, our algorithm returns a correctly rounded result for any . When it does not, our methods give the values for which the multiplication is not correctly rounded.Nous proposons un algorithme permettant de multiplier un nombre virgule flottante x par une constante C qui n’est pas exactement représentable en virgule flottante.Notre algorithme nécessite la disponibilité d’une instruction “multiplication-accumulation”. Nous donnons des méthodes pour tester si,pour une constante C et un format virgule flottante donnés, notre algorithme donnera un arrondi correct pour toutes les valeurs de x.Quand ce n’est pas le cas,nos méthodes permettent de connaître toutes les valeurs de x pour lesquelles la multiplication par C n’est pas arrondie correctement
Chebyshev Interpolation Polynomial-based Tools for Rigorous Computing
17 pagesInternational audiencePerforming numerical computations, yet being able to provide rigorous mathematical statements about the obtained result, is required in many domains like global optimization, ODE solving or integration. Taylor models, which associate to a function a pair made of a Taylor approximation polynomial and a rigorous remainder bound, are a widely used rigorous computation tool. This approach benefits from the advantages of numerical methods, but also gives the ability to make reliable statements about the approximated function. Despite the fact that approximation polynomials based on interpolation at Chebyshev nodes offer a quasi-optimal approximation to a function, together with several other useful features, an analogous to Taylor models, based on such polynomials, has not been yet well-established in the field of validated numerics. This paper presents a preliminary work for obtaining such interpolation polynomials together with validated interval bounds for approximating univariate functions. We propose two methods that make practical the use of this: one is based on a representation in Newton basis and the other uses Chebyshev polynomial basis. We compare the quality of the obtained remainders and the performance of the approaches to the ones provided by Taylor models
Integer and Floating-Point Constant Multipliers for FPGAs
International audienceReconfigurable circuits now have a capacity that allows them to be used as floating-point accelerators. They offer massive parallelism, but also the opportunity to design optimised floating-point hardware operators not available in microprocessors. Multiplication by a constant is an important example of such an operator. This article presents an architecture generator for the correctly rounded multiplication of a floating-point number by a constant. This constant can be a floating-point value, but also an arbitrary irrational number. The multiplication of the significands is an instance of the well-studied problem of constant integer multiplication, for which improvement to existing algorithms are also proposed and evaluated
(M,p,k)-friendly points: a table-based method for trigonometric function evaluation
International audienceWe present a new way of approximating the sine and cosine functions by a few table look-ups and additions. It consists in first reducing the input range to a very small interval by using rotations with "(M, p, k) friendly angles", proposed in this work, and then by using a bipartite table method in a small interval. An implementation of the method for 24- bit case is described and compared with CORDIC. Roughly, the proposed scheme offers a speedup of 2 compared with an unfolded double-rotation radix-2 CORDIC
A path-norm toolkit for modern networks: consequences, promises and challenges
This work introduces the first toolkit around path-norms that fully
encompasses general DAG ReLU networks with biases, skip connections and any
operation based on the extraction of order statistics: max pooling, GroupSort
etc. This toolkit notably allows us to establish generalization bounds for
modern neural networks that are not only the most widely applicable path-norm
based ones, but also recover or beat the sharpest known bounds of this type.
These extended path-norms further enjoy the usual benefits of path-norms: ease
of computation, invariance under the symmetries of the network, and improved
sharpness on layered fully-connected networks compared to the product of
operator norms, another complexity measure most commonly used.
The versatility of the toolkit and its ease of implementation allow us to
challenge the concrete promises of path-norm-based generalization bounds, by
numerically evaluating the sharpest known bounds for ResNets on ImageNet
Comparison between binary and decimal floating-point numbers
International audienceWe introduce an algorithm to compare a binary floating-point (FP) number and a decimal FP number, assuming the "binary encoding" of the decimal formats is used, and with a special emphasis on the basic interchange formats specified by the IEEE 754-2008 standard for FP arithmetic. It is a two-step algorithm: a first pass, based on the exponents only, quickly eliminates most cases, then, when the first pass does not suffice, a more accurate second pass is performed. We provide an implementation of several variants of our algorithm, and compare them
Approximation speed of quantized vs. unquantized ReLU neural networks and beyond
We deal with two complementary questions about approximation properties of
ReLU networks. First, we study how the uniform quantization of ReLU networks
with real-valued weights impacts their approximation properties. We establish
an upper-bound on the minimal number of bits per coordinate needed for
uniformly quantized ReLU networks to keep the same polynomial asymptotic
approximation speeds as unquantized ones. We also characterize the error of
nearest-neighbour uniform quantization of ReLU networks. This is achieved using
a new lower-bound on the Lipschitz constant of the map that associates the
parameters of ReLU networks to their realization, and an upper-bound
generalizing classical results. Second, we investigate when ReLU networks can
be expected, or not, to have better approximation properties than other
classical approximation families. Indeed, several approximation families share
the following common limitation: their polynomial asymptotic approximation
speed of any set is bounded from above by the encoding speed of this set. We
introduce a new abstract property of approximation families, called
infinite-encodability, which implies this upper-bound. Many classical
approximation families, defined with dictionaries or ReLU networks, are shown
to be infinite-encodable. This unifies and generalizes several situations where
this upper-bound is known
- …