4 research outputs found
High-Throughput DTW accelerator with minimum area in AMD FPGA by HLS.
Dynamic Time Warping (DTW) is a dynamic programming
algorithm that is known to be one of the best methods
to measure the similarities between two signals, even if there are
variations in the speed of those. It is extensively used in many
machine learning algorithms, especially for pattern recognition
and classification. U nfortunately, i t h as a q uadratic complexity,
which results in very high computational costs. Furthermore,
its data dependency made it also very difficult t o parallelize.
Special attention has been paid to computing DTW on the edge,
as a way to reduce the load of communication on Internet-of-
Thing applications. In this work, we propose a minimum area
implementation of the DTW algorithm in AMD FPGAs with
optimal use of the resources. That is achieved by maximizing
the use time of the resources and taking advantage of the inner
structure of the AMD FPGAs. This architecture could be used in
small devices or as a base for a multi-core implementation with
very high-throughput.MCIN/AEI/10.13039/501100011033and European Union Next Generation EU/PRTR under Project TED2021-
131527B-I00; by the Fondo Europeo de Desarrollo Regional (UMA20-FEDERJA-059); and by AMD™(Xilinx™) University Program
Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Reproducible SUmmation under HUB Format
Version diferente del paper presentado en el congresoFloating point reproducibility is a property
claimed by programmers and end users. Half-Unit-Biased
(HUB) is a new representation format in which the round
to nearest is carried out by truncation, preventing any carry
propagation and saving time and area. In this paper we study
the reproducible summation of HUB numbers by using a errorfree
vector transformation technique, providing both a specific
architecture and the usage of combined HUB/Standard floating
point adders to achieve a reproducible resultUniversidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Efficient Floating-Point Representation for Balanced Codes for FPGA Devices
Trabajo premiado con Best paper AwardWe propose a floating–point representation to deal
efficiently with arithmetic operations in codes with a balanced
number of additions and multiplications for FPGA devices. The
variable shift operation is very slow in these devices. We propose
a format that reduces the variable shifter penalty. It is based on
a radix–64 representation such that the number of the possible
shifts is considerably reduced. Thus, the execution time of the
floating–point addition is highly optimized when it is performed
in an FPGA device, which compensates for the multiplication
penalty when a high radix is used, as experimental results have
shown. Consequently, the main problem of previous specific highradix
FPGA designs (no speedup for codes with a balanced
number of multiplications and additions) is overcome with our
proposal. The inherent architecture supporting the new format
works with greater bit precision than the corresponding single
precision (SP) IEEE–754 standard.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech. IEEE, IEEE Computer Societ
GuĂa docente comĂşn de las titulaciones de Ingeniero en ElectrĂłnica en las universidades andaluzas
El presente documento constituye el resultado del trabajo elaborado de acuerdo con la Convocatoria de ElaboraciĂłn de GuĂas Docentes de Titulaciones Andaluzas conforme al Sistema de CrĂ©ditos Europeos (años 2005/2006) de la DirecciĂłn General de Universidades, dependiente de la SecretarĂa General de Universidades, InvestigaciĂłn y TecnologĂa de la ConsejerĂa de InnovaciĂłn, Ciencia y Empresa de la Junta de AndalucĂa