899 research outputs found
An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors
Event-Driven vision sensing is a new way of sensing
visual reality in a frame-free manner. This is, the vision sensor
(camera) is not capturing a sequence of still frames, as in conventional
video and computer vision systems. In Event-Driven sensors
each pixel autonomously and asynchronously decides when to
send its address out. This way, the sensor output is a continuous
stream of address events representing reality dynamically continuously
and without constraining to frames. In this paper we present
an Event-Driven Convolution Module for computing 2D convolutions
on such event streams. The Convolution Module has been
designed to assemble many of them for building modular and hierarchical
Convolutional Neural Networks for robust shape and
pose invariant object recognition. The Convolution Module has
multi-kernel capability. This is, it will select the convolution kernel
depending on the origin of the event. A proof-of-concept test prototype
has been fabricated in a 0.35 m CMOS process and extensive
experimental results are provided. The Convolution Processor has
also been combined with an Event-Driven Dynamic Vision Sensor
(DVS) for high-speed recognition examples. The chip can discriminate
propellers rotating at 2 k revolutions per second, detect symbols
on a 52 card deck when browsing all cards in 410 ms, or detect
and follow the center of a phosphor oscilloscope trace rotating at
5 KHz.Unión Europea 216777 (NABAB)Ministerio de Ciencia e Innovación TEC2009-10639-C04-0
A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics
This paper proposes a special-purpose system to achieve high-accuracy and
high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The
system consists of field programmable gate array (FPGA) and application
specific integrated circuit (ASIC) working in heterogeneous parallelization. To
be specific, a multiplication-less neural network (NN) is deployed on the
non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic
forces, which is the most computationally expensive part of MD. All other
calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to
achieve similar-level accuracy, the proposed NvN-based system based on low-end
fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy
efficiency than state-of-the-art vN based MLMD using graphics processing units
(GPUs) based on much more advanced technologies (12 nm), indicating superiority
of the proposed NvN-based heterogeneous parallel architecture
Image Processing Using FPGAs
This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs
Optimising algorithm and hardware for deep neural networks on FPGAs
This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs).
The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime.
The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process.
The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces
- …