Search CORE

4,623 research outputs found

Computing Real Roots of Real Polynomials ... and now For Real!

Author: Kobel Alexander
Rouillier Fabrice
Sagraloff Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Very recent work introduces an asymptotically fast subdivision algorithm, denoted ANewDsc, for isolating the real roots of a univariate real polynomial. The method combines Descartes' Rule of Signs to test intervals for the existence of roots, Newton iteration to speed up convergence against clusters of roots, and approximate computation to decrease the required precision. It achieves record bounds on the worst-case complexity for the considered problem, matching the complexity of Pan's method for computing all complex roots and improving upon the complexity of other subdivision methods by several magnitudes. In the article at hand, we report on an implementation of ANewDsc on top of the RS root isolator. RS is a highly efficient realization of the classical Descartes method and currently serves as the default real root solver in Maple. We describe crucial design changes within ANewDsc and RS that led to a high-performance implementation without harming the theoretical complexity of the underlying algorithm. With an excerpt of our extensive collection of benchmarks, available online at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in performance of ANewDsc over other subdivision methods also transfers into practice. These experiments also show that our new implementation outperforms both RS and mature competitors by magnitudes for notoriously hard instances with clustered roots. For all other instances, we avoid almost any overhead by integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo, Ontario, Canad

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

MPG.PuRe

Julia: A Fresh Approach to Numerical Computing

Author: Bezanson Jeff
Edelman Alan
Karpinski Stefan
Shah Viral B.
Publication venue
Publication date: 01/12/2014
Field of study

Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment, and 3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts. We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming. Julia shows that one can have machine performance without sacrificing human convenience.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Design of efficient reversible floating-point arithmetic unit on field programmable gate array platform and its performance analysis

Author: Bhandari Gajanan Sangeetha
Sanjeevaiah Girija
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2023
Field of study

The reversible logic gates are used to improve the power dissipation in modern computer applications. The floating-point numbers with reversible features are added advantage to performing complex algorithms with high-performance computations. This manuscript implements an efficient reversible floating-point arithmetic (RFPA) unit, and its performance metrics are realized in detail. The RFP adder/subtractor (A/S), RFP multiplier, and RFP divider units are designed as a part of the RFP arithmetic unit. The RFPA unit is designed by considering basic reversible gates. The mantissa part of the RFP multiplier is created using a 24x24 Wallace tree multiplier. In contrast, the reciprocal unit of the RFP divider is designed using Newton Raphson’s method. The RFPA unit and its submodules are executed in parallel by utilizing one clock cycle individually. The RFPA unit and its submodules are synthesized separately on the Vivado IDE environment and obtained the implementation results on Artix-7 field programmable gate array (FPGA). The RFPA unit utilizes only 18.44% slice look-up tables (LUTs) by consuming the 0.891 W total power on Artix-7 FPGA. The RFPA unit sub-models are compared with existing approaches with better performance metrics and chip resource utilization improvements

ZENODO

Institute of Advanced Engineering and Science

High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation

Author: Anzt Hartwig
Grützmacher Thomas
Quintana-Orti E. S.
Scheidegger F.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2019
Field of study

KITopen

The effect of coefficient quantization optimization on filtering performance and gate count

Author: Adewale A. (Ayomikun)
Publication venue: University of Oulu
Publication date: 17/04/2023
Field of study

Abstract. Digital filters are an essential component of Digital Signal Processing (DSP) applications and play a crucial role in removing unwanted signal components from a desired signal. However, digital filters are known to be resource-intensive and consume a large amount of power, making it important to optimize their design in order to minimize hardware requirements such as multipliers, adders, and registers. This trade-off between filter performance and hardware consumption can be influenced by the quantization of filter coefficients. Therefore, this thesis investigates the quantization of Finite Impulse Response (FIR) filter coefficients and analyzes its impact on filter performance and hardware resource consumption. A method called dynamic quantization is introduced and an algorithm for step-by-step dynamic quantization is provided to improve upon the results obtained with the classical fixed point quantization method. To demonstrate the effectiveness of this approach, the dynamic quantization of filter coefficients for a Low-pass Equiripple FIR filter is examined and a comparative study of the magnitude response and hardware consumption of the generated filter using both the classical and dynamic quantization methods is presented. By understanding the trade-offs and benefits of each quantization method, engineers can make informed decisions about the most appropriate approach for their specific application

University of Oulu Repository - Jultika

Techniques for the realization of ultrareliable spaceborne computers Interim scientific report

Author: Goldberg J.
Green M. W.
Levitt K. N.
Stone H. S.
Publication venue
Publication date
Field of study

Error-free ultrareliable spaceborne computer

NASA Technical Reports Server

Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS

Author: Anzt H.
Grützmacher T.
Quintana-Ortí E. S.
Publication venue: John Wiley and Sons
Publication date: 09/11/2021
Field of study

The roofline model not only provides a powerful tool to relate an application\u27s performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a “memory accessor” that hides the data compression behind the memory access, can virtually push the bandwidth-induced roofline, yielding higher performance for memory-bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory-bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory-bound BLAS operations (including the sparse matrix-vector product) can be re-engineered with the memory accessor and that the resulting accessor-enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS

KITopen