4 research outputs found

    High-Throughput DTW accelerator with minimum area in AMD FPGA by HLS.

    Get PDF
    Dynamic Time Warping (DTW) is a dynamic programming algorithm that is known to be one of the best methods to measure the similarities between two signals, even if there are variations in the speed of those. It is extensively used in many machine learning algorithms, especially for pattern recognition and classification. U nfortunately, i t h as a q uadratic complexity, which results in very high computational costs. Furthermore, its data dependency made it also very difficult t o parallelize. Special attention has been paid to computing DTW on the edge, as a way to reduce the load of communication on Internet-of- Thing applications. In this work, we propose a minimum area implementation of the DTW algorithm in AMD FPGAs with optimal use of the resources. That is achieved by maximizing the use time of the resources and taking advantage of the inner structure of the AMD FPGAs. This architecture could be used in small devices or as a base for a multi-core implementation with very high-throughput.MCIN/AEI/10.13039/501100011033and European Union Next Generation EU/PRTR under Project TED2021- 131527B-I00; by the Fondo Europeo de Desarrollo Regional (UMA20-FEDERJA-059); and by AMD™(Xilinx™) University Program Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Reproducible SUmmation under HUB Format

    Get PDF
    Version diferente del paper presentado en el congresoFloating point reproducibility is a property claimed by programmers and end users. Half-Unit-Biased (HUB) is a new representation format in which the round to nearest is carried out by truncation, preventing any carry propagation and saving time and area. In this paper we study the reproducible summation of HUB numbers by using a errorfree vector transformation technique, providing both a specific architecture and the usage of combined HUB/Standard floating point adders to achieve a reproducible resultUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Efficient Floating-Point Representation for Balanced Codes for FPGA Devices

    Get PDF
    Trabajo premiado con Best paper AwardWe propose a floating–point representation to deal efficiently with arithmetic operations in codes with a balanced number of additions and multiplications for FPGA devices. The variable shift operation is very slow in these devices. We propose a format that reduces the variable shifter penalty. It is based on a radix–64 representation such that the number of the possible shifts is considerably reduced. Thus, the execution time of the floating–point addition is highly optimized when it is performed in an FPGA device, which compensates for the multiplication penalty when a high radix is used, as experimental results have shown. Consequently, the main problem of previous specific highradix FPGA designs (no speedup for codes with a balanced number of multiplications and additions) is overcome with our proposal. The inherent architecture supporting the new format works with greater bit precision than the corresponding single precision (SP) IEEE–754 standard.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. IEEE, IEEE Computer Societ
    corecore