2,045 research outputs found

    Optimizing polynomials for floating-point implementation

    Get PDF
    The floating-point implementation of a function on an interval often reduces to polynomial approximation, the polynomial being typically provided by Remez algorithm. However, the floating-point evaluation of a Remez polynomial sometimes leads to catastrophic cancellations. This happens when some of the polynomial coefficients are very small in magnitude with respects to others. In this case, it is better to force these coefficients to zero, which also reduces the operation count. This technique, classically used for odd or even functions, may be generalized to a much larger class of functions. An algorithm is presented that forces to zero the smaller coefficients of the initial polynomial thanks to a modified Remez algorithm targeting an incomplete monomial basis. One advantage of this technique is that it is purely numerical, the function being used as a numerical black box. This algorithm is implemented within a larger polynomial implementation tool that is demonstrated on a range of examples, resulting in polynomials with less coefficients than those obtained the usual way.Comment: 12 page

    Complex Library Mapping for Embedded Software Using Symbolic Algebra

    Get PDF
    Embedded software designers often use libraries that have been pre-optimized for a given processor to achieve higher code quality. However, using such libraries in legacy code optimization is nontrivial and typically requires manual intervention. This paper presents a methodology that maps algorithmic constructs of the software specification to a library of complex software elements. This library-mapping step is automated by using symbolic algebra techniques. We illustrate the advantages of our methodology by optimizing an algorithmic level description of MPEG Layer III (MP3) audio decoder for the Badge4 [2] portable embedded system. During the optimization process we use commercially available libraries with complex elements ranging from simple mathematical functions such as exp to the IDCT routine. We implemented and measured the performance and energy consumption of the MP3 decoder software on Badge4 running embedded Linux operating system. The optimized MP3 audio decoder runs 300 times faster than the original code obtained from the standards body while consuming 400 times less energy. Since our optimized MP3 decoder runs 3.5 times faster than real-time, additional energy can be saved by using processor frequency and voltage scaling

    Automatic Generation of Fast and Certified Code for Polynomial Evaluation

    Get PDF
    International audienceDesigning an efficient floating-point implementation of a function based on polynomial evaluation requires being able to find an accurate enough evaluation program, exploiting at most the target architecture features. This article introduces CGPE, a tool dealing with the generation of fast and certified codes for the evaluation of bivariate polynomials. First we discuss the issue underlying the evaluation scheme combinatorics before giving an overview of the CGPE tool. The approach we propose consists in two steps: the generation of evaluation schemes by using some heuristics so as to quickly find some of low latency; and the selection that mainly consists in automatically checking their scheduling on the given target and validating their accuracy. Then, we present on-going development and ideas for possible improvements of the whole process. Finally, we illustrate the use of CGPE on some examples, and show how it allows us to generate fast and certified codes in a few seconds and thus to reduce the development time of libms like FLIP
    • …
    corecore