8 research outputs found
Diseño de un Microprocesador Logarítmico Orientado al Procesamiento Digital de Señales /
Diseñar un microprocesador logarítmico orientado al procesamiento digital de señales.El sistema numérico logarítmico (LNS por Logarithmic Number System), es un sistema numérico no convencional que
combina la facilidad de implementación y precisión que proporciona el sistema numérico de punto fijo y el intervalo de representación
numérico que ofrece el sistema numérico de punto flotante [1]. Debido a que el LNS basa su funcionamiento en las propiedades de las
funciones logaritmo y antilogaritmo, este sistema numérico permite realizar operaciones de multiplicación, división y raíz cuadrada
haciendo uso de sumas, restas y desplazamientos lógicos en el formato de punto fijo, reduciendo de esta forma la complejidad de la
arquitectura hardware de dichas operaciones [2], [3]. A pesar de que en la actualidad existen diversas arquitecturas basadas en
aritmética convencional que permiten la implementación en hardware de este tipo de operaciones aritméticas, esto sigue siendo un
problema abierto, dado que las soluciones existentes presentan un elevado consumo de área, potencia y excesivo número de ciclos de reloj
para llevar a cabo dichas operaciones [4]-[5].Universidad Autónoma del Carib
The Denormal Logarithmic Number System
International audienceEconomical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number system simplifies design, for example, by eliminating worries about FXNS overflow because the range of FP is much larger than FXNS for the same wordsize; however, primitive FP introduces another problem: underflow. The conventional Signed Logarithmic Number System (SLNS) offers similar range and precision as FP with much better performance (in terms of power, speed and area) for multiplication, division, powers and roots. Moderate-precision addition in SLNS uses table lookup with properties similar to FP (including underflow). This paper proposes a new number system, called the Denormal LNS (DLNS), which is a hybrid of the properties of FXNS and SLNS. The inspiration for DLNS comes from the denormal numbers found in IEEE-754 (that provide better, gradual underflow) and the µ-law often used for speech encoding; the novel DLNS circuit here allows arithmetic to be performed directly on such encoded data. The proposed approach allows customizing the range in which gradual underflow occurs. A wide gradual underflow range acts like FXNS; a narrow one acts like SLNS. Simulation of an FFT application illustrates a moderate gradual underflow decreasing bit-switching activity 15% compared to underflow-free SLNS, at the cost of increasing application error by 30%. DLNS reduces switching activity 5% to 20% more than an abruptly-underflowing SLNS with one-half the error. Synthesis shows the novel circuit primarily consists of traditional SLNS addition and subtraction tables, with additional datapaths that allow the novel ALU to act on conventional SLNS as well as DLNS and mixed data, for a worst-case area overhead of 26%
Profile-directed specialisation of custom floating-point hardware
We present a methodology for generating
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets
Implementation of a real time Hough transform using FPGA technology
This thesis is concerned with the modelling, design and implementation of efficient architectures for performing the Hough Transform (HT) on mega-pixel resolution real-time images using Field Programmable Gate Array (FPGA) technology. Although the HT has been around for many years and a number of algorithms have been developed it still remains a significant bottleneck in many image processing applications.
Even though, the basic idea of the HT is to locate curves in an image that can be parameterized: e.g. straight lines, polynomials or circles, in a suitable parameter space, the research presented in this thesis will focus only on location of straight lines on binary images. The HT algorithm uses an accumulator array (accumulator bins) to detect the existence of a straight line on an image. As the image needs to be binarized, a novel generic synchronization circuit for windowing operations was designed to perform edge detection. An edge detection method of special interest, the canny method, is used and the design and implementation of it in hardware is achieved in this thesis.
As each image pixel can be implemented independently, parallel processing can be performed. However, the main disadvantage of the HT is the large storage and computational requirements. This thesis presents new and state-of-the-art hardware implementations for the minimization of the computational cost, using the Hybrid-Logarithmic Number System (Hybrid-LNS) for calculating the HT for fixed bit-width architectures. It is shown that using the Hybrid-LNS the computational cost is minimized, while the precision of the HT algorithm is maintained.
Advances in FPGA technology now make it possible to implement functions as the HT in reconfigurable fabrics. Methods for storing large arrays on FPGA’s are presented, where data from a 1024 x 1024 pixel camera at a rate of up to 25 frames per second are processed
Fast, area-efficient 32-bit LNS for computer arithmetic operations
PhD ThesisThe logarithmic number system has been proposed as an alternative to floating-point.
Multiplication, division and square-root operations are accomplished with fixedpoint
arithmetic, but addition and subtraction are considerably more challenging.
Recent work has demonstrated that these operations too can be done with similar
speed and accuracy to their floating-point equivalents, but the necessary circuitry is
complex. In particular, it is dominated by the need for large lookup tables for the
storage of a non-linear function.
This thesis describes the architectures required to implement a newly design
approach for producing fast and area-efficient 32-bit LNS arithmetic unit. The
designs are structured based on two different algorithms. At first, a new cotransformation
procedure is introduced in the singularity region whilst performing
subtractions in which the technique capable to generate less total storage than the cotransformation
method in the previous LNS architecture. Secondly, improvement to
an existing interpolation process is proposed, that also reduce the total tables to an
extent that allows their easy synthesis in logic. Consequently, the total delays in the
system can be significantly reduced.
According to the comparison analysis with previous best LNS design and
floating-point units, it is shown that the new LNS architecture capable to offer
significantly better in speed while sustaining its accuracy within floating-point limit.
In addition, its implementation is more economical than previous best LNS system
and almost equivalent with existing floating-point arithmetic unit.University Malaysia Perlis:
Ministry of Higher Education, Malaysia
Application-Specific Number Representation
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-
specific number representations. Well-known number formats include fixed-point, floating-
point, logarithmic number system (LNS), and residue number system (RNS). Such different
number representations lead to different arithmetic designs and error behaviours, thus produc-
ing implementations with different performance, accuracy, and cost.
To investigate the design options in number representations, the first part of this thesis presents
a platform that enables automated exploration of the number representation design space. The
second part of the thesis shows case studies that optimise the designs for area, latency or
throughput from the perspective of number representations.
Automated design space exploration in the first part addresses the following two major issues:
² Automation requires arithmetic unit generation. This thesis provides optimised
arithmetic library generators for logarithmic and residue arithmetic units, which support
a wide range of bit widths and achieve significant improvement over previous designs.
² Generation of arithmetic units requires specifying the bit widths for each
variable. This thesis describes an automatic bit-width optimisation tool called R-Tool,
which combines dynamic and static analysis methods, and supports different number
systems (fixed-point, floating-point, and LNS numbers).
Putting it all together, the second part explores the effects of application-specific number
representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic
imaging computations. Experimental results show that customising the number representations
brings benefits to hardware implementations: by selecting a more appropriate number format,
we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by
performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%.
On the performance side, hardware implementations with customised number formats achieve
5 to potentially over 40 times speedup over software implementations
Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays
The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions