2 research outputs found
Uso eficiente de aritmética redundante en FPGAs
Hasta hace pocos años, la utilización de aritmética redundante en FPGAs había
sido descartada por dos razones principalmente. En primer lugar, por el buen
rendimiento que ofrecían los sumadores de acarreo propagado, gracias a la lógica de
de acarreo que poseían de fábrica y al pequeño tamaño de los operandos en las
aplicaciones típicas para FPGAs. En segundo lugar, el excesivo consumo de área que
las herramientas de síntesis obtenían cuando mapeaban unidades que trabajan en carrysave.
En este trabajo, se muestra que es posible la utilización de aritmética redundante
carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de
operación con un consumo de recursos razonable. Se ha introducido un nuevo formato
redundante doble carry-save y se ha demostrado que la manera óptima para la
realización de multiplicadores de elevado ancho de palabra es la combinación de
multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA
mainly for two reasons. First, the efficient results obtained using carry-propagate adders
thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical
FPGA applications. Second, the high number of resources that the synthesis tools
utilizes to implement carry-save circuits.
In this work, it is demonstrated that carry-save arithmetic can be efficiently used
in FPGA, obtaining an important speed improvement with a reasonable area cost. A
new redundant format, double carry-save, has been introduced, and the optimal
implementation of large size multipliers has been shown based on embedded multipliers
and carry-save adders
Integrated Programmable-Array accelerator to design heterogeneous ultra-low power manycore architectures
There is an ever-increasing demand for energy efficiency (EE) in rapidly evolving Internet-of-Things end nodes. This pushes researchers and engineers to develop solutions that provide both Application-Specific Integrated Circuit-like EE and Field-Programmable Gate Array-like flexibility. One such solution is Coarse Grain Reconfigurable Array (CGRA). Over the past decades, CGRAs have evolved and are competing to become mainstream hardware accelerators, especially for accelerating Digital Signal Processing (DSP) applications. Due to the over-specialization of computing architectures, the focus is shifting towards fitting an extensive data representation range into fewer bits, e.g., a 32-bit space can represent a more extensive data range with floating-point (FP) representation than an integer representation. Computation using FP representation requires numerous encodings and leads to complex circuits for the FP operators, decreasing the EE of the entire system. This thesis presents the design of an EE ultra-low-power CGRA with native support for FP computation by leveraging an emerging paradigm of approximate computing called transprecision computing. We also present the contributions in the compilation toolchain and system-level integration of CGRA in a System-on-Chip, to envision the proposed CGRA as an EE hardware accelerator. Finally, an extensive set of experiments using real-world algorithms employed in near-sensor processing applications are performed, and results are compared with state-of-the-art (SoA) architectures. It is empirically shown that our proposed CGRA provides better results w.r.t. SoA architectures in terms of power, performance, and area