Search CORE

4 research outputs found

FPGA-implementation of Time-Multiplexed Multiple Constant Multiplication based on carry-save arithmetic

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Design of approximate overclocked datapath

Author: Shi Kan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/03/2016
Field of study

Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design. In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors. Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces

Spiral - Imperial College Digital Repository

Uso eficiente de aritmética redundante en FPGAs

Author: Ortiz Manuel A.
Publication venue: Universidad de Córdoba, Servicio de Publicaciones
Publication date: 01/01/2013
Field of study

Hasta hace pocos años, la utilización de aritmética redundante en FPGAs había sido descartada por dos razones principalmente. En primer lugar, por el buen rendimiento que ofrecían los sumadores de acarreo propagado, gracias a la lógica de de acarreo que poseían de fábrica y al pequeño tamaño de los operandos en las aplicaciones típicas para FPGAs. En segundo lugar, el excesivo consumo de área que las herramientas de síntesis obtenían cuando mapeaban unidades que trabajan en carrysave. En este trabajo, se muestra que es posible la utilización de aritmética redundante carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de operación con un consumo de recursos razonable. Se ha introducido un nuevo formato redundante doble carry-save y se ha demostrado que la manera óptima para la realización de multiplicadores de elevado ancho de palabra es la combinación de multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA mainly for two reasons. First, the efficient results obtained using carry-propagate adders thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical FPGA applications. Second, the high number of resources that the synthesis tools utilizes to implement carry-save circuits. In this work, it is demonstrated that carry-save arithmetic can be efficiently used in FPGA, obtaining an important speed improvement with a reasonable area cost. A new redundant format, double carry-save, has been introduced, and the optimal implementation of large size multipliers has been shown based on embedded multipliers and carry-save adders

Repositorio Institucional de la Universidad de Córdoba

Optimización de recursos hardware para la operación de convolución utilizada en el procesamiento digital de señales

Author: Moreno Moreno Carlos Diego
Publication venue: Universidad de Córdoba, Servicio de Publicaciones
Publication date: 01/01/2013
Field of study

Esta tesis presenta varias arquitecturas sobre la unidad MAC (multiplica–acumula) para la optimización de la operación de convolución, que es ampliamente utilizada en el procesamiento digital de señales, sobre varios dispositivos electrónicos de bajo coste. Básicamente esta optimización se centra en las FPGA de Xilinx Spartan 3 y Spartan 6, utilizando aritmética redundante, en particular la aritmética carry–save. Este tipo de aritmética no se suele utilizar en las FPGAs debido a que aumenta el área consumida, pero en esta investigación se ha demostrado experimentalmente que cuando el número de operaciones MAC a realizar es elevado, como es el caso de la convolución de dos señales, el uso de la aritmética CSA resulta eficiente, ya que disminuye significativamente los tiempos empleados, sin un aumento excesivo de los recursos utilizados de la FPGA. Por otro lado, también se han estudiado otros dispositivos electrónicos que suelen ser empleados en el procesamiento digital de señales, tales como DSP o GPP, realizando una comparación de los tiempos empleados de las FPGAs respecto a estos dispositivos.This Thesis presents several architectures of the multiply-accumulate unit (MAC) to optimize the convolution operation, which is widely used in digital signal processing, on several low-cost electronic devices. This optimization is mainly focused on Xilinx Spartan- 3 and Spartan-6 FPGAs, using redundant arithmetic, specifically the carry-save arithmetic (CSA). This type of arithmetic is not usually used on FPGAs since its high consumption of area resources, but this research shows that if the number of MAC operations developed is high, as the case of the convolution of two signals, the use of CSA arithmetic is efficient, since it decreases significantly the execution times without an excessive increase of the resources used in the FPGA. On the other hand, other electronic devices as DSP or GPP, usually used in digital signal processing, have been studied. A comparation of execution times on FPGAs and these devices has been included

Repositorio Institucional de la Universidad de Córdoba