Search CORE

85 research outputs found

Study of the posit number system: a practical approach

Author: Murillo Montero Raúl
Publication venue
Publication date: 01/01/2019
Field of study

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) has been for decades the standard for floating-point arithmetic and is implemented in a vast majority of modern computer systems. Recently, a new number representation format called posit (Type III unum) introduced by John L. Gustafson – who claims this new format can provide higher accuracy using equal or less number of bits and simpler hardware than current standard – is proposed as an alternative to the now omnipresent IEEE 754 arithmetic. In this Bachelor dissertation, the novel posit number format, its characteristics and properties – presented in literature – are analyzed and compared with the standard for floating-point numbers (floats). Based on the literature assertions, we focus on determining whether posits would be a good “drop-in replacement” for floats. With the help of Wolfram Mathematica and Python, different environments are created to compare the performance of IEEE 754 floating-point standard with Type III unum: posits. In order to get a more practical approach, first, we propose different numerical problems to compare the accuracy of both formats, including algebraic problems and numerical methods. Then, we focus on the possible use of posits in Deep Learning problems, such as training artificial Neural Networks or preforming low-precision inference on Convolutional Neural Networks. To conclude this work, we propose a low-level design for posit arithmetic multiplier using the FloPoCo tool to generate synthesizable VHDL code

Docta Complutense

Accelerated Financial Applications through Specialized Hardware, FPGA

Author: Dang Tri Quang
Rothermel John Mark
Publication venue: Digital WPI
Publication date: 13/12/2007
Field of study

This project will investigate Field Programmable Gate Array (FPGA) technology in financial applications. FPGA implementation in high performance computing is still in its infancy. Certain companies like XtremeData inc. advertized speed improvements of 50 to 1000 times for DNA sequencing using FPGAs, while using an FPGA as a coprocessor to handle specific tasks provides two to three times more processing power. FPGA technology increases performance by parallelizing calculations. This project will specifically address speed and accuracy improvements of both fundamental and transcendental functions when implemented using FPGA technology. The results of this project will lead to a series of recommendations for effective utilization of FPGA technology in financial applications

DigitalCommons@WPI

HDL IMPLEMENTATION AND ANALYSIS OF A RESIDUAL REGISTER FOR A FLOATING-POINT ARITHMETIC UNIT

Author: Kaveti Akil
Publication venue: UKnowledge
Publication date: 01/01/2008
Field of study

Processors used in lower-end scientific applications like graphic cards and video game consoles have IEEE single precision floating-point hardware [23]. Double precision offers higher precision at higher implementation cost and lower performance. The need for high precision computations in these applications is not enough to justify the use double precision hardware and the extra hardware complexity needed [23]. Native-pair arithmetic offers an interesting and feasible solution to this problem. This technique invented by T. J. Dekker uses single-length floating-point numbers to represent higher precision floating-point numbers [3]. Native-pair arithmetic has been proposed by Dr. William R. Dieter and Dr. Henry G. Dietz to achieve better accuracy using standard IEEE single precision floating point hardware [1]. Native-pair arithmetic results in better accuracy however it decreases the performance by 11x and 17x for addition and multiplication respectively [2]. The proposed implementation uses a residual register to store the error residual term [2]. This addition is not only cost efficient but also results in acceptable accuracy with 10 times the performance of 64-bit hardware. This thesis demonstrates the implementation of a 32-bit floating-point unit with residual register and estimates the hardware cost and performance

University of Kentucky

Floating point bit-sequential arithmetic units /

Author: Blaker David Mark
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Performance degradation in the presence of subnormal floating-point values

Author: Hari Govind
Isaac Dooley
Laxmikant Kale
Michael Breitenfeld
Orion Lawlor
Publication venue
Publication date: 01/01/2005
Field of study

Abstrac

CiteSeerX

Implementations of high performance architecture for IEEE 754 compliant floating-point adders

Author: Mathis Brett
Publication venue
Publication date: 01/12/2020
Field of study

This thesis presents a direct iteration and implementation on a high per-formance architecture for IEEE 754 floating-point addition. This thesis improves on the previous architecture's implementation in a variety of sub-operations required for IEEE 754 floating-point addition, which are focused on directly improving critical path delay performance. A key element of this paper is the introduction of a flagged-prefix adder within the main carry-propagation path of an end-around-carry adder. It also provides detailed documentation for the design of IEEE 754 compliant floating-point adders. This is particularly emphasized for uncommon operations and control logic used throughout floating-point addition, including denormalized numbers and multi-precision logic. The full design for this architecture has support for binary16, binary32, and binary64 operations. The full extended range provided by denormalized IEEE 754 values is supported. It also has conversion support between IEEE 754 and two's complement integer values in either binary16, binary32, or binary64 precision. The performance comparisons shown are synthesis results in cmos32soi 32nm GF technology and ARM-based standard cells

SHAREOK repository

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

Author: Sahdev D. Kanjariya, Rutarth Patel
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2015
Field of study

The Floating point numbers are being widely used in various fields because of their great dynamic range, high precision and easy operation rules. In this paper, architecture of generic floating point unit is proposed and discussed. This generic unit is compatible with all three IEEE-754 binary formats. Further based on this architecture, floating point adder, subtractor and multiplier modules are designed and functionally verified for Virtex-4 FPGA. The design is working properly and giving accurate result up to the last point. DOI: 10.17762/ijritcc2321-8169.15054

International Journal on Recent and Innovation Trends in Computing and Communication