4,623 research outputs found

    Computing Real Roots of Real Polynomials ... and now For Real!

    Full text link
    Very recent work introduces an asymptotically fast subdivision algorithm, denoted ANewDsc, for isolating the real roots of a univariate real polynomial. The method combines Descartes' Rule of Signs to test intervals for the existence of roots, Newton iteration to speed up convergence against clusters of roots, and approximate computation to decrease the required precision. It achieves record bounds on the worst-case complexity for the considered problem, matching the complexity of Pan's method for computing all complex roots and improving upon the complexity of other subdivision methods by several magnitudes. In the article at hand, we report on an implementation of ANewDsc on top of the RS root isolator. RS is a highly efficient realization of the classical Descartes method and currently serves as the default real root solver in Maple. We describe crucial design changes within ANewDsc and RS that led to a high-performance implementation without harming the theoretical complexity of the underlying algorithm. With an excerpt of our extensive collection of benchmarks, available online at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in performance of ANewDsc over other subdivision methods also transfers into practice. These experiments also show that our new implementation outperforms both RS and mature competitors by magnitudes for notoriously hard instances with clustered roots. For all other instances, we avoid almost any overhead by integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo, Ontario, Canad

    Julia: A Fresh Approach to Numerical Computing

    Get PDF
    Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment, and 3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts. We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming. Julia shows that one can have machine performance without sacrificing human convenience.Comment: 37 page

    Design of efficient reversible floating-point arithmetic unit on field programmable gate array platform and its performance analysis

    Get PDF
    The reversible logic gates are used to improve the power dissipation in modern computer applications. The floating-point numbers with reversible features are added advantage to performing complex algorithms with high-performance computations. This manuscript implements an efficient reversible floating-point arithmetic (RFPA) unit, and its performance metrics are realized in detail. The RFP adder/subtractor (A/S), RFP multiplier, and RFP divider units are designed as a part of the RFP arithmetic unit. The RFPA unit is designed by considering basic reversible gates. The mantissa part of the RFP multiplier is created using a 24x24 Wallace tree multiplier. In contrast, the reciprocal unit of the RFP divider is designed using Newton Raphson’s method. The RFPA unit and its submodules are executed in parallel by utilizing one clock cycle individually. The RFPA unit and its submodules are synthesized separately on the Vivado IDE environment and obtained the implementation results on Artix-7 field programmable gate array (FPGA). The RFPA unit utilizes only 18.44% slice look-up tables (LUTs) by consuming the 0.891 W total power on Artix-7 FPGA. The RFPA unit sub-models are compared with existing approaches with better performance metrics and chip resource utilization improvements

    The effect of coefficient quantization optimization on filtering performance and gate count

    Get PDF
    Abstract. Digital filters are an essential component of Digital Signal Processing (DSP) applications and play a crucial role in removing unwanted signal components from a desired signal. However, digital filters are known to be resource-intensive and consume a large amount of power, making it important to optimize their design in order to minimize hardware requirements such as multipliers, adders, and registers. This trade-off between filter performance and hardware consumption can be influenced by the quantization of filter coefficients. Therefore, this thesis investigates the quantization of Finite Impulse Response (FIR) filter coefficients and analyzes its impact on filter performance and hardware resource consumption. A method called dynamic quantization is introduced and an algorithm for step-by-step dynamic quantization is provided to improve upon the results obtained with the classical fixed point quantization method. To demonstrate the effectiveness of this approach, the dynamic quantization of filter coefficients for a Low-pass Equiripple FIR filter is examined and a comparative study of the magnitude response and hardware consumption of the generated filter using both the classical and dynamic quantization methods is presented. By understanding the trade-offs and benefits of each quantization method, engineers can make informed decisions about the most appropriate approach for their specific application

    Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS

    Get PDF
    The roofline model not only provides a powerful tool to relate an application\u27s performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a “memory accessor” that hides the data compression behind the memory access, can virtually push the bandwidth-induced roofline, yielding higher performance for memory-bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory-bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory-bound BLAS operations (including the sparse matrix-vector product) can be re-engineered with the memory accessor and that the resulting accessor-enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS
    • …
    corecore