3 research outputs found
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
While embedded FPGAs are attractive platforms for DNN acceleration on
edge-devices due to their low latency and high energy efficiency, the scarcity
of resources of edge-scale FPGA devices also makes it challenging for DNN
deployment. In this paper, we propose a simultaneous FPGA/DNN co-design
methodology with both bottom-up and top-down approaches: a bottom-up
hardware-oriented DNN model search for high accuracy, and a top-down FPGA
accelerator design considering DNN-specific characteristics. We also build an
automatic co-design flow, including an Auto-DNN engine to perform
hardware-oriented DNN model search, as well as an Auto-HLS engine to generate
synthesizable C code of the FPGA accelerator for explored DNNs. We demonstrate
our co-design approach on an object detection task using PYNQ-Z1 FPGA. Results
show that our proposed DNN model and accelerator outperform the
state-of-the-art FPGA designs in all aspects including Intersection-over-Union
(IoU) (6.2% higher), frames per second (FPS) (2.48X higher), power consumption
(40% lower), and energy efficiency (2.5X higher). Compared to GPU-based
solutions, our designs deliver similar accuracy but consume far less energy.Comment: Accepted by Design Automation Conference (DAC'2019
ATCN: Agile Temporal Convolutional Networks for Processing of Time Series on Edge
This paper presents a scalable deep learning model called Agile Temporal
Convolutional Network (ATCN) for high-accurate fast classification and time
series prediction in resource-constrained embedded systems. ATCN is primarily
designed for mobile embedded systems with performance and memory constraints
such as wearable biomedical devices and real-time reliability monitoring
systems. It makes fundamental improvements over the mainstream temporal
convolutional neural networks, including the incorporation of separable
depth-wise convolution to reduce the computational complexity of the model and
residual connections as time attention machines, increase the network depth and
accuracy. The result of this configurability makes the ATCN a family of compact
networks with formalized hyper-parameters that allow the model architecture to
be configurable and adjusted based on the application requirements. We
demonstrate the capabilities of our proposed ATCN on accuracy and performance
trade-off on three embedded applications, including transistor reliability
monitoring, heartbeat classification of ECG signals, and digit classification.
Our comparison results against state-of-the-art approaches demonstrate much
lower computation and memory demand for faster processing with better
prediction and classification accuracy. The source code of the ATCN model is
publicly available at https://github.com/TeCSAR-UNCC/ATCN
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator
Existing FPGA-based DNN accelerators typically fall into two design
paradigms. Either they adopt a generic reusable architecture to support
different DNN networks but leave some performance and efficiency on the table
because of the sacrifice of design specificity. Or they apply a layer-wise
tailor-made architecture to optimize layer-specific demands for computation and
resources but loose the scalability of adaptation to a wide range of DNN
networks. To overcome these drawbacks, this paper proposes a novel FPGA-based
DNN accelerator design paradigm and its automation tool, called DNNExplorer, to
enable fast exploration of various accelerator designs under the proposed
paradigm and deliver optimized accelerator architectures for existing and
emerging DNN networks. Three key techniques are essential for DNNExplorer's
improved performance, better specificity, and scalability, including (1) a
unique accelerator design paradigm with both high-dimensional design space
support and fine-grained adjustability, (2) a dynamic design space to
accommodate different combinations of DNN workloads and targeted FPGAs, and (3)
a design space exploration (DSE) engine to generate optimized accelerator
architectures following the proposed paradigm by simultaneously considering
both FPGAs' computation and memory resources and DNN networks' layer-wise
characteristics and overall complexity. Experimental results show that, for the
same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher
performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions
generated by DNNBuilder for VGG-like DNN with 38 CONV layers. Compared to
accelerators with generic reusable computation units, DNNExplorer achieves up
to 2.0x and 4.4x DSP efficiency improvement than a recently published
accelerator design from academia (HybridDNN) and a commercial DNN accelerator
IP (Xilinx DPU), respectively.Comment: Published as a conference paper at International Conference on
Computer Aided Design 2020 (ICCAD'20