25 research outputs found

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Full text link
    In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

    TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

    Get PDF
    International audienceTo achieve high performance and high energy efficiency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetics; methods andtools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research

    Massively Parallel Processing Approach To Fractal Image Compression

    No full text
    In the last years Image Fractal Compression techniques (IFS) have gained ever more interest because of their capability to achieve high compression ratios while maintaining very good quality for the reconstructed image. The main drawback of such techniques is the very high computing time needed to determine the compressed code. In this paper, after a brief description of the IFS theory, we discuss its parallel implementation by comparing the different level of exploitable parallelism. In the paper we show that Massively Parallel Processing on SIMD machines is the best way to use all the large granularity parallelism present in this problem. Finally, we give some results achieved implementing IFS compression technique on the MPP APE100/Quadrics machine. 1. INTRODUCTION Image compression fractal techniques were introduced by Barnsley [Bar 88]. The image is represented through a piecewise linear contractive function F and is reconstructed by iteratively applying F to a randomly chosen st..

    Hyper-Systolic Implementation of BLAS-3 Routines on the APE100/Quadrics Machine

    No full text
    . Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the APE100/Quadrics [2]) and to the adoption of the hyper-systolic method [3, 6, 4] to efficiently implement BLAS-3 on such a machine. The results we achieved (nearly 60-70% of the peak performances for large matrices) demonstrate the validity of the proposed approach. The work is structured as follows: section 1 is devoted to review BLAS-3, in section 2 we recall the hyper-systolic method, subsequently (section 3), the target machine is described and (section 4) the HS implementation is shown. Finally (section 5), some experimental results are given. Keywords: BLAS-3, hyper-systolic, massively p..
    corecore