936 research outputs found

    FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

    Full text link
    A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS

    An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors

    Get PDF
    Event-Driven vision sensing is a new way of sensing visual reality in a frame-free manner. This is, the vision sensor (camera) is not capturing a sequence of still frames, as in conventional video and computer vision systems. In Event-Driven sensors each pixel autonomously and asynchronously decides when to send its address out. This way, the sensor output is a continuous stream of address events representing reality dynamically continuously and without constraining to frames. In this paper we present an Event-Driven Convolution Module for computing 2D convolutions on such event streams. The Convolution Module has been designed to assemble many of them for building modular and hierarchical Convolutional Neural Networks for robust shape and pose invariant object recognition. The Convolution Module has multi-kernel capability. This is, it will select the convolution kernel depending on the origin of the event. A proof-of-concept test prototype has been fabricated in a 0.35 m CMOS process and extensive experimental results are provided. The Convolution Processor has also been combined with an Event-Driven Dynamic Vision Sensor (DVS) for high-speed recognition examples. The chip can discriminate propellers rotating at 2 k revolutions per second, detect symbols on a 52 card deck when browsing all cards in 410 ms, or detect and follow the center of a phosphor oscilloscope trace rotating at 5 KHz.Unión Europea 216777 (NABAB)Ministerio de Ciencia e Innovación TEC2009-10639-C04-0

    A library-based tool to translate high level DNN models into hierarchical VHDL descriptions

    Get PDF
    This work presents a tool to convert high level models of deep neural networks into register transfer level designs. In order to make it useful for different target technologies, the output designs are based on hierarchical VHDL descriptions, which are accepted as input files for a wide variety of FPGA, SoC and ASIC digital synthesis tools. The presented tool is aimed to speed up the design and synthesis cycle of such systems and provides the designer with certain capability to balance network latency and hardware resources. It also provides a clock domain crossing to interface the input layer of the synthesized neural networks with sensors running at different clock frequencies. The tool is tested with a neural network which combines convolutional and fully connected layers designed to perform traffic sign recognition tasks and synthesized under different hardware resource usage specifications on a Zynq Ultrascale+ MPSoC development board.This work has been partially funded by Spanish Ministerio de Ciencia e Innovación (MCI), Agencia Estatal de Investigación (AEI) and European Region Development Fund (ERDF/FEDER) under grant RTI2018-097088-B-C33
    corecore