22 research outputs found

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Full text link
    In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

    Hardware Acceleration of Complex Machine Learning Models through Modern High-Level Synthesis

    No full text
    Machine learning algorithms continue to receive significant attention from industry and research. As the models increase in complexity and accuracy, their computational and memory demands also grow, pushing for more powerful, heterogeneous architectures; custom FPGA/ASIC accelerators are often the best solution to efficiently process large amounts of data close to the sensors in large-scale scientific experiments. Previous works exploited high-level synthesis to help design dedicated compute units for machine learning inference, proposing frameworks that translate high-level models into annotated C/C++. Our proposal, instead, integrates HLS in a compiler-based tool flow with multiple levels of abstraction, enabling analysis, optimization and design space exploration along the whole process. Such an approach will also allow to explore models beyond multi-layer perceptrons and convolutional neural networks (which are often the main target of "classic" HLS frameworks), for example to address the different challenges posed by sparse and graph-based neural networks
    corecore