3 research outputs found

    Mapping for Maximum Performance on FPGA DSP Blocks

    Full text link

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Full text link
    In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

    Dise帽o e implementaci贸n de una Unidad Aritm茅tica de Coma Flotante (FPU) gen茅rica y fexible

    Get PDF
    Proyecto de Graduaci贸n (Licenciatura en Ingenier铆a en Electr贸nica) Instituto Tecnol贸gico de Costa Rica. Escuela de Ingenier铆a Electr贸nica, 2016.A methodology that measures the DSP performance of a ASP with low-power target applications is presented. Additionally, the timing, area and power synthesis results for an adder, a multiplier and a CORDIC oating point units in Artix 7 FPGA family and 0.13 m technology for single and double precision in various system frequencies, are presented. The pipelined adder achieves a maximum frequency of 350MHz, the multiplier (with a simple Karatsuba signi cand multiplication) reaches 243MHz, and lastly, the standalone CORDIC oating point operator reaches 537MHz
    corecore