Search CORE

60 research outputs found

Normalizing or not normalizing? An open question for floating-point arithmetic in embedded systems

Author: Gonzalez-Navarro Sonia
Hormigo-Aguilar Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Emerging embedded applications lack of a specific standard when they require floating-point arithmetic. In this situation they use the IEEE-754 standard or ad hoc variations of it. However, this standard was not designed for this purpose. This paper aims to open a debate to define a new extension of the standard to cover embedded applications. In this work, we only focus on the impact of not performing normalization. We show how eliminating the condition of normalized numbers, implementation costs can be dramatically reduced, at the expense of a moderate loss of accuracy. Several architectures to implement addition and multiplication for non-normalized numbers are proposed and analyzed. We show that a combined architecture (adder-multiplier) can halve the area and power consumption of its counterpart IEEE-754 architecture. This saving comes at the cost of reducing an average of about 10 dBs the Signal-to-Noise Ratio for the tested algorithms. We think these results should encourage researchers to perform further investigation in this issue.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Measuring Improvement when Using HUB Formats to Implement Floating-Point Systems under Round-to-Nearest

Author: Hormigo-Aguilar Javier
Villalba-Moreno Julio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

MEC bajo TIN2013-42253-PThis paper analyzes the benefits of using HUB formats to implement floating-point arithmetic under round-tonearest mode from a quantitative point of view. Using HUB formats to represent numbers allows the removal of the rounding logic of arithmetic units, including sticky-bit computation. This is shown for floating-point adders, multipliers, and converters. Experimental analysis demonstrates that HUB formats and the corresponding arithmetic units maintain the same accuracy as conventional ones. On the other hand, the implementation of these units, based on basic architectures, shows that HUB formats simultaneously improve area, speed, and power consumption. Specifically, based on data obtained from the synthesis, a HUB single-precision adder is about 14% faster but consumes 38% less area and 26% less power than the conventional adder. Similarly, a HUB single-precision multiplier is 17% faster, uses 22% less area, and consumes slightly less power than conventional multiplier. At the same speed, the adder and multiplier achieve area and power reductions of up to 50% and 40%, respectively

Repositorio Institucional Universidad de Málaga

Profile-directed specialisation of custom floating-point hardware

Author: Brown Ashley W.
Brown Ashley W.
Publication venue: Computing, Imperial College London
Publication date: 01/05/2010
Field of study

We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets

Spiral - Imperial College Digital Repository

Large multipliers with less DSP blocks

Author: de Dinechin Florent
Pasca Bogdan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2009
Field of study

International audienceRecent computing-oriented FPGAs feature DSP blocks including small embedded multipliers. A large integer multiplier, for instance for a double-precision floating-point multiplier, consumes many of these DSP blocks. This article studies three non-standard implementation techniques of large multipliers: the Karatsuba-Ofman algorithm, non-standard multiplier tiling, and specialized squarers. They allow for large multipliers working at the peak frequency of the DSP blocks while reducing the DSP block usage. Their overhead in term of logic resources, if any, is much lower than that of emulating embedded multipliers. Their latency overhead, if any, is very small. Complete algorithmic descriptions are provided, carefully mapped on recent Xilinx and Altera devices, and validated by synthesis results

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

DEVELOPMENT AND VALIDATION OF A SPECIAL PURPOSE SENSOR AND PROCESSOR SYSTEM TO CALCULATE EQUILIBRIUM MOISTURE CONTENT OF WOOD

Author: Tangirala Phani
Publication venue: UKnowledge
Publication date: 01/01/2005
Field of study

Percent Moisture Content (MC %) of wood is defined to be the weight of the moisture in the wood divided by the weight of the dry wood times 100%. Equilibrium Moisture Content (EMC), moisture content at environmental equilibrium is a very important metric affecting the performance of wood in many applications. For best performance in many applications, the goal is to maintain this value between 6% and 8%. EMC value is a function of the temperature and the relative humidity of the surrounding air of wood. It is very important to maintain this value while processing, storing or finishing the wood. This thesis develops a special purpose sensor and processor system to be implemented as a small hand-held device used to sense, calculate and display the value of EMC of wood depending on surrounding environmental conditions. Wood processing industry personnel would use the hand-held EMC calculating and display device to prevent many potential problems that can show significant affect on the performance of wood. The design of the EMC device requires the use of sensors to obtain the required inputs of temperature and relative humidity. In this thesis various market available sensors are compared and appropriate sensor is chosen for the design. The calculation of EMC requires many arithmetic operations with stringent precision requirements. Various arithmetic algorithms and systems are compared in terms of meeting required arithmetic functionality, precision requirements, and silicon implementation area and gate count, and a suitable choice is made. The resulting processor organization and design is coded in VHDL using the Xilinx ISE 6.2.03i tool set. The design is synthesized, validated via VHDL virtual prototype simulation, and implemented to a Xilinx Spartan2E FPGA for experimental hardware prototype testing and evaluation. It is tested over various ranges of temperature and relative humidity. Comparison of experimentally calculated EMC values with the theoretical values of EMC derived for corresponding temperature and relative humidity points resulted in validation of the EMC processor architecture, functional performance and arithmetic precision requirements

University of Kentucky