20 research outputs found

    SECDA-TFLite: a toolkit for efficient development of FPGA-based DNN accelerators for edge inference

    Get PDF
    In this paper we propose SECDA-TFLite, a new open source toolkit for developing DNN hardware accelerators, integrated within the TFLite framework. The toolkit leverages the principles of SECDA, a hardware/software co-design methodology, to reduce the design time of optimized DNN inference accelerators on edge devices with FPGAs. With SECDA-TFLite, we reduce the initial setup costs associated with integrating a new accelerator design within a target DNN framework, allowing developers to focus on the design. SECDA-TFLite also includes modules for cost-effective SystemC simulation, profiling, and AXI-based data communication. As a case study, we use SECDA-TFLite to develop and evaluate three accelerator designs across seven common CNN models and two BERT-based models against an ARM A9 CPU-only baseline, achieving an average performance speedup across models of up to 3.4Ă— for the CNN models and of up to 2.5Ă— for the BERT-based models. Our code is available at https://github.com/gicLAB/SECDA-TFLite

    Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

    Get PDF
    Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry and academia to explore custom accelerators to optimize ML executions for performance and power. However, identifying which accelerator is best equipped for performing a particular ML task is challenging, especially given the growing range of ML tasks, the number of target environments, and the limited number of integrated modeling tools. To tackle this issue, it is of paramount importance to provide the computer architecture research community with a common framework capable of performing a comprehensive, uniform, and fair comparison across different accelerator designs targeting a particular ML task. To this aim, we propose a new framework named TFLITESOC (System On Chip) that integrates a lightweight system modeling library (SystemC) for fast design space exploration of custom ML accelerators into the build/execution environment of Tensorflow Lite (TFLite), a highly popular ML framework for ML inference. Using this approach, we are able to model and evaluate new accelerators developed in SystemC by leveraging the language’s hierarchical design capabilities, resulting in faster design prototyping. Furthermore, any accelerator designed using TFLITE-SOC can be benchmarked for inference with any DNN model compatible with TFLite, which enables end-to-end DNN processing and detailed (i.e., per DNN layer) performance analysis. In addition to providing rapid prototyping, integrated benchmarking, and a range of platform configurations, TFLITESOC offers comprehensive performance analysis of accelerator occupancy and execution time breakdown as well as a rich set of modules that can be used by new accelerators to implement scaling up studies and optimized memory transfer protocols. We present our framework and demonstrate its utility by considering the design space of a TPU-like systolic array and describing possible directions for optimization. Using a compression technique, we implement an optimization targeting reducing the memory traffic between DRAM and on-device buffers. Compared to the baseline accelerator, our optimized design shows up to 1.26x speedup on accelerated operations and up to 1.19x speedup on end-to-end DNN execution

    Spartan: a sparsity-adaptive framework to accelerate deep neural network training on GPUs

    Get PDF
    Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, providing accurate solutions to a broad range of applications. Sparsity in activation maps in DNN training presents an opportunity to reduce computations. However, exploiting activation sparsity presents two major challenges: i) profiling activation sparsity during training comes with significant overhead due to computing the degree of sparsity and the data movement; ii) the dynamic nature of activation maps requires dynamic dense-to-sparse conversion during training, leading to significant overhead. In this paper, we present Spartan, a lightweight hardware/software framework to accelerate DNN training on a GPU. Spartan provides a cost effective and programmer-transparent microarchitectural solution to exploit activation sparsity detected during training. Spartan provides an efficient sparsity monitor, a tile-based sparse GEMM algorithm, and a novel compaction engine designed for GPU workloads. Spartan can reduce sparsity profiling overhead by 52.5Ă— on average. For the most compute-intensive layers, i.e., convolutional layers, we can speedup AlexNet by 3.4Ă—, VGGNet-16 by 2.14Ă—, and ResNet-18 by 2.02Ă—, when training on the ImageNet dataset

    Hardware/Software Co-Design of Edge DNN Accelerators with TFLite

    No full text
    In this work we discuss SECDA-TFLite, a open-source toolkit for developing DNN hardware accelerators, integrated within the TFLite DNN inference framework. The toolkit leverages the principles of SECDA, a hardware/software co-design methodology which reduces the design time of optimized DNN inference accelerators on edge devices with FPGAs. Utilizing SECDA-TFLite, we further reduce the initial setup costs associated with integrating a new accelerator design within a target DNN framework, allowing developers to focus on the design. SECDA-TFLite also includes modules for cost-effective SystemC simulation, profiling, and AXI-based data communication. Additionally, we briefly cover our case study, where we use SECDA-TFLite to efficiently develop three different DNN accelerator designs on a PYNQ-Z1 board. We evaluate the three accelerator designs across five common CNN models and two BERT-based models, achieving an average performance speedup across models of up to 2.9x for the CNN models and an average speedup of up to 2.5x for the BERT-based models

    SECDA: Efficient Hardware/Software Co-design of FPGA-based DNN Accelerators for Edge Inference

    No full text
    Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput and lower power consumption than general purpose processors, they are especially well-suited for DNN acceleration. However, existing solutions for designing FPGA-based DNN accelerators for edge devices come with high development overheads, given the cost of repeated FPGA synthesis passes, reimplementation in a Hardware Description Language (HDL) of the simulated design, and accelerator system integration. In this paper we propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized DNN inference accelerators on edge devices with FPGAs. SECDA combines cost-effective SystemC simulation with hardware execution, streamlining design space exploration and the development process via reduced design evaluation time. As a case study, we use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We quickly and iteratively explore the system’s hardware/software stack, while identifying and mitigating performance bottlenecks. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5× with a 2.9× reduction in energy consumption over CPU-only inference

    Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators

    Get PDF
    The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability of domain-specific accelerators, which struggle to support every new machine learning (ML) algorithm advancement, clearly highlighting the need for a tool to quickly and automatically transition from algorithm definition to hardware implementation and explore the design space along a variety of SWaP (size, weight and Power) metrics. The software defined architectures (SODA) synthesizer implements a modular compiler-based infrastructure for the end-to-end generation of machine learning accelerators, from high-level frameworks to hardware description language. Neuromorphic computing, mimicking how the brain operates, promises to perform artificial intelligence tasks at efficiencies orders-of-magnitude higher than the current conventional tensor-processing based accelerators, as demonstrated by a variety of specialized designs leveraging Spiking Neural Networks (SNNs). Nevertheless, the mapping of an artificial neural network (ANN) to solutions supporting SNNs is still a non-trivial and very device-specific task, and completely lacks the possibility to design hybrid systems that integrate conventional and spiking neural models. In this paper, we discuss the design of such an integrated generator, leveraging the SODA Synthesizer framework and its modular structure. In particular, we present a new MLIR dialect in the SODA frontend that allows expressing spiking neural network concepts (e.g., spiking sequences, transformation, and manipulation) and we discuss how to enable the mapping of spiking neurons to the related specialized hardware (which could be generated through middle-end and backend layers of the SODA Synthesizer). We then discuss the opportunities for further integration offered by the hardware compilation infrastructure, providing a path towards the generation of complex hybrid artificial intelligence systems
    corecore