60 research outputs found
Normalizing or not normalizing? An open question for floating-point arithmetic in embedded systems
Emerging embedded applications lack of a specific standard when they require floating-point arithmetic. In this situation they use the IEEE-754 standard or ad hoc variations of it. However, this standard was not designed for this purpose. This paper aims to open a debate to define a new extension of the standard to cover embedded applications. In this work, we only focus on the impact of not performing normalization. We show how eliminating the condition of normalized numbers, implementation costs can be dramatically reduced, at the expense of a moderate loss of accuracy. Several architectures to implement addition and multiplication for non-normalized numbers are proposed and analyzed. We show that a combined architecture (adder-multiplier) can halve the area and power consumption of its counterpart IEEE-754 architecture. This saving comes at the cost of reducing an average of about 10 dBs the Signal-to-Noise Ratio for the tested algorithms. We think these results should encourage researchers to perform further investigation in this issue.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Measuring Improvement when Using HUB Formats to Implement Floating-Point Systems under Round-to-Nearest
MEC bajo TIN2013-42253-PThis paper analyzes the benefits of using HUB
formats to implement floating-point arithmetic under round-tonearest
mode from a quantitative point of view. Using HUB
formats to represent numbers allows the removal of the rounding
logic of arithmetic units, including sticky-bit computation. This
is shown for floating-point adders, multipliers, and converters.
Experimental analysis demonstrates that HUB formats and the
corresponding arithmetic units maintain the same accuracy as
conventional ones. On the other hand, the implementation of
these units, based on basic architectures, shows that HUB formats
simultaneously improve area, speed, and power consumption.
Specifically, based on data obtained from the synthesis, a HUB
single-precision adder is about 14% faster but consumes 38% less
area and 26% less power than the conventional adder. Similarly, a
HUB single-precision multiplier is 17% faster, uses 22% less area,
and consumes slightly less power than conventional multiplier. At
the same speed, the adder and multiplier achieve area and power
reductions of up to 50% and 40%, respectively
Profile-directed specialisation of custom floating-point hardware
We present a methodology for generating
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets
Large multipliers with less DSP blocks
International audienceRecent computing-oriented FPGAs feature DSP blocks including small embedded multipliers. A large integer multiplier, for instance for a double-precision floating-point multiplier, consumes many of these DSP blocks. This article studies three non-standard implementation techniques of large multipliers: the Karatsuba-Ofman algorithm, non-standard multiplier tiling, and specialized squarers. They allow for large multipliers working at the peak frequency of the DSP blocks while reducing the DSP block usage. Their overhead in term of logic resources, if any, is much lower than that of emulating embedded multipliers. Their latency overhead, if any, is very small. Complete algorithmic descriptions are provided, carefully mapped on recent Xilinx and Altera devices, and validated by synthesis results
DEVELOPMENT AND VALIDATION OF A SPECIAL PURPOSE SENSOR AND PROCESSOR SYSTEM TO CALCULATE EQUILIBRIUM MOISTURE CONTENT OF WOOD
Percent Moisture Content (MC %) of wood is defined to be the weight of the moisture in the wood divided by the weight of the dry wood times 100%. Equilibrium Moisture Content (EMC), moisture content at environmental equilibrium is a very important metric affecting the performance of wood in many applications. For best performance in many applications, the goal is to maintain this value between 6% and 8%. EMC value is a function of the temperature and the relative humidity of the surrounding air of wood. It is very important to maintain this value while processing, storing or finishing the wood. This thesis develops a special purpose sensor and processor system to be implemented as a small hand-held device used to sense, calculate and display the value of EMC of wood depending on surrounding environmental conditions. Wood processing industry personnel would use the hand-held EMC calculating and display device to prevent many potential problems that can show significant affect on the performance of wood. The design of the EMC device requires the use of sensors to obtain the required inputs of temperature and relative humidity. In this thesis various market available sensors are compared and appropriate sensor is chosen for the design. The calculation of EMC requires many arithmetic operations with stringent precision requirements. Various arithmetic algorithms and systems are compared in terms of meeting required arithmetic functionality, precision requirements, and silicon implementation area and gate count, and a suitable choice is made. The resulting processor organization and design is coded in VHDL using the Xilinx ISE 6.2.03i tool set. The design is synthesized, validated via VHDL virtual prototype simulation, and implemented to a Xilinx Spartan2E FPGA for experimental hardware prototype testing and evaluation. It is tested over various ranges of temperature and relative humidity. Comparison of experimentally calculated EMC values with the theoretical values of EMC derived for corresponding temperature and relative humidity points resulted in validation of the EMC processor architecture, functional performance and arithmetic precision requirements
- …