Search CORE

936 research outputs found

FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

Author: Anderson Jason H.
Brothers John
Grady Brett
Kim Jin Hee
Lian Ruolong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/07/2018
Field of study

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS

arXiv.org e-Print Archive

Crossref

An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors

Author: Acosta Jiménez Antonio José
Camuñas Mesa Luis Alejandro
Linares Barranco Alejandro
Linares Barranco Bernabé
Serrano Gotarredona María Teresa
Zamarreño Ramos Carlos
Publication venue: IEEE Computer Society
Publication date: 01/01/2012
Field of study

Event-Driven vision sensing is a new way of sensing visual reality in a frame-free manner. This is, the vision sensor (camera) is not capturing a sequence of still frames, as in conventional video and computer vision systems. In Event-Driven sensors each pixel autonomously and asynchronously decides when to send its address out. This way, the sensor output is a continuous stream of address events representing reality dynamically continuously and without constraining to frames. In this paper we present an Event-Driven Convolution Module for computing 2D convolutions on such event streams. The Convolution Module has been designed to assemble many of them for building modular and hierarchical Convolutional Neural Networks for robust shape and pose invariant object recognition. The Convolution Module has multi-kernel capability. This is, it will select the convolution kernel depending on the origin of the event. A proof-of-concept test prototype has been fabricated in a 0.35 m CMOS process and extensive experimental results are provided. The Convolution Processor has also been combined with an Event-Driven Dynamic Vision Sensor (DVS) for high-speed recognition examples. The chip can discriminate propellers rotating at 2 k revolutions per second, detect symbols on a 52 card deck when browsing all cards in 410 ms, or detect and follow the center of a phosphor oscilloscope trace rotating at 5 KHz.Unión Europea 216777 (NABAB)Ministerio de Ciencia e Innovación TEC2009-10639-C04-0

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

A library-based tool to translate high level DNN models into hierarchical VHDL descriptions

Author: Doménech Asensi Ginés
Martínez Álvarez José Javier
Rubio Ibáñez Pablo
Publication venue
Publication date: 25/11/2021
Field of study

This work presents a tool to convert high level models of deep neural networks into register transfer level designs. In order to make it useful for different target technologies, the output designs are based on hierarchical VHDL descriptions, which are accepted as input files for a wide variety of FPGA, SoC and ASIC digital synthesis tools. The presented tool is aimed to speed up the design and synthesis cycle of such systems and provides the designer with certain capability to balance network latency and hardware resources. It also provides a clock domain crossing to interface the input layer of the synthesized neural networks with sensors running at different clock frequencies. The tool is tested with a neural network which combines convolutional and fully connected layers designed to perform traffic sign recognition tasks and synthesized under different hardware resource usage specifications on a Zynq Ultrascale+ MPSoC development board.This work has been partially funded by Spanish Ministerio de Ciencia e Innovación (MCI), Agencia Estatal de Investigación (AEI) and European Region Development Fund (ERDF/FEDER) under grant RTI2018-097088-B-C33

Repositorio Digital de la Universidad Politécnica de Cartagena