4 research outputs found
Design of approximate overclocked datapath
Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design.
In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors.
Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces
Uso eficiente de aritm茅tica redundante en FPGAs
Hasta hace pocos a帽os, la utilizaci贸n de aritm茅tica redundante en FPGAs hab铆a
sido descartada por dos razones principalmente. En primer lugar, por el buen
rendimiento que ofrec铆an los sumadores de acarreo propagado, gracias a la l贸gica de
de acarreo que pose铆an de f谩brica y al peque帽o tama帽o de los operandos en las
aplicaciones t铆picas para FPGAs. En segundo lugar, el excesivo consumo de 谩rea que
las herramientas de s铆ntesis obten铆an cuando mapeaban unidades que trabajan en carrysave.
En este trabajo, se muestra que es posible la utilizaci贸n de aritm茅tica redundante
carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de
operaci贸n con un consumo de recursos razonable. Se ha introducido un nuevo formato
redundante doble carry-save y se ha demostrado que la manera 贸ptima para la
realizaci贸n de multiplicadores de elevado ancho de palabra es la combinaci贸n de
multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA
mainly for two reasons. First, the efficient results obtained using carry-propagate adders
thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical
FPGA applications. Second, the high number of resources that the synthesis tools
utilizes to implement carry-save circuits.
In this work, it is demonstrated that carry-save arithmetic can be efficiently used
in FPGA, obtaining an important speed improvement with a reasonable area cost. A
new redundant format, double carry-save, has been introduced, and the optimal
implementation of large size multipliers has been shown based on embedded multipliers
and carry-save adders
Optimizaci贸n de recursos hardware para la operaci贸n de convoluci贸n utilizada en el procesamiento digital de se帽ales
Esta tesis presenta varias arquitecturas sobre la unidad MAC (multiplica鈥揳cumula) para la
optimizaci贸n de la operaci贸n de convoluci贸n, que es ampliamente utilizada en el
procesamiento digital de se帽ales, sobre varios dispositivos electr贸nicos de bajo coste.
B谩sicamente esta optimizaci贸n se centra en las FPGA de Xilinx Spartan 3 y Spartan 6,
utilizando aritm茅tica redundante, en particular la aritm茅tica carry鈥搒ave. Este tipo de
aritm茅tica no se suele utilizar en las FPGAs debido a que aumenta el 谩rea consumida, pero
en esta investigaci贸n se ha demostrado experimentalmente que cuando el n煤mero de
operaciones MAC a realizar es elevado, como es el caso de la convoluci贸n de dos se帽ales,
el uso de la aritm茅tica CSA resulta eficiente, ya que disminuye significativamente los
tiempos empleados, sin un aumento excesivo de los recursos utilizados de la FPGA.
Por otro lado, tambi茅n se han estudiado otros dispositivos electr贸nicos que suelen ser
empleados en el procesamiento digital de se帽ales, tales como DSP o GPP, realizando una
comparaci贸n de los tiempos empleados de las FPGAs respecto a estos dispositivos.This Thesis presents several architectures of the multiply-accumulate unit (MAC) to
optimize the convolution operation, which is widely used in digital signal processing, on
several low-cost electronic devices. This optimization is mainly focused on Xilinx Spartan-
3 and Spartan-6 FPGAs, using redundant arithmetic, specifically the carry-save arithmetic
(CSA). This type of arithmetic is not usually used on FPGAs since its high consumption of
area resources, but this research shows that if the number of MAC operations developed is
high, as the case of the convolution of two signals, the use of CSA arithmetic is efficient,
since it decreases significantly the execution times without an excessive increase of the
resources used in the FPGA.
On the other hand, other electronic devices as DSP or GPP, usually used in digital signal
processing, have been studied. A comparation of execution times on FPGAs and these
devices has been included