2,295 research outputs found
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power efficiency is less than 10 GOp/s/W for
single-frame runtime inference. We propose a flexible and efficient CNN
accelerator architecture called NullHop that implements SOA CNNs useful for
low-power and low-latency application scenarios. NullHop exploits the sparsity
of neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can
process up to 128 input and 128 output feature maps per layer in a single pass.
We implemented the proposed architecture on a Xilinx Zynq FPGA platform and
present results showing how our implementation reduces external memory
transfers and compute time in five different CNNs ranging from small ones up to
the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using
Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that
the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop
achieves an efficiency of 368%, maintains over 98% utilization of the MAC
units, and achieves a power efficiency of over 3TOp/s/W in a core area of
6.3mm. As further proof of NullHop's usability, we interfaced its FPGA
implementation with a neuromorphic event camera for real time interactive
demonstrations
FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations
Neural network-based methods for image processing are becoming widely used in
practical applications. Modern neural networks are computationally expensive
and require specialized hardware, such as graphics processing units. Since such
hardware is not always available in real life applications, there is a
compelling need for the design of neural networks for mobile devices. Mobile
neural networks typically have reduced number of parameters and require a
relatively small number of arithmetic operations. However, they usually still
are executed at the software level and use floating-point calculations. The use
of mobile networks without further optimization may not provide sufficient
performance when high processing speed is required, for example, in real-time
video processing (30 frames per second). In this study, we suggest
optimizations to speed up computations in order to efficiently use already
trained neural networks on a mobile device. Specifically, we propose an
approach for speeding up neural networks by moving computation from software to
hardware and by using fixed-point calculations instead of floating-point. We
propose a number of methods for neural network architecture design to improve
the performance with fixed-point calculations. We also show an example of how
existing datasets can be modified and adapted for the recognition task in hand.
Finally, we present the design and the implementation of a floating-point gate
array-based device to solve the practical problem of real-time handwritten
digit classification from mobile camera video feed
PCA-RECT: An Energy-efficient Object Detection Approach for Event Cameras
We present the first purely event-based, energy-efficient approach for object
detection and categorization using an event camera. Compared to traditional
frame-based cameras, choosing event cameras results in high temporal resolution
(order of microseconds), low power consumption (few hundred mW) and wide
dynamic range (120 dB) as attractive properties. However, event-based object
recognition systems are far behind their frame-based counterparts in terms of
accuracy. To this end, this paper presents an event-based feature extraction
method devised by accumulating local activity across the image frame and then
applying principal component analysis (PCA) to the normalized neighborhood
region. Subsequently, we propose a backtracking-free k-d tree mechanism for
efficient feature matching by taking advantage of the low-dimensionality of the
feature representation. Additionally, the proposed k-d tree mechanism allows
for feature selection to obtain a lower-dimensional dictionary representation
when hardware resources are limited to implement dimensionality reduction.
Consequently, the proposed system can be realized on a field-programmable gate
array (FPGA) device leading to high performance over resource ratio. The
proposed system is tested on real-world event-based datasets for object
categorization, showing superior classification performance and relevance to
state-of-the-art algorithms. Additionally, we verified the object detection
method and real-time FPGA performance in lab settings under non-controlled
illumination conditions with limited training data and ground truth
annotations.Comment: Accepted in ACCV 2018 Workshops, to appea
FPGA-based enhanced probabilistic convergent weightless network for human iris recognition
This paper investigates how human identification and identity verification can be performed by the application of an FPGA based weightless neural network, entitled the Enhanced Probabilistic Convergent Neural Network (EPCN), to the iris biometric modality. The human iris is processed for feature vectors which will be employed for formation of connectivity, during learning and subsequent recognition. The pre-processing of the iris, prior to EPCN training, is very minimal. Structural modifications were also made to the Random Access Memory (RAM) based neural network which enhances its robustness when applied in real-time
FPGA-accelerated machine learning inference as a service for particle physics computing
New heterogeneous computing paradigms on dedicated hardware with increased
parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting
solutions with large potential gains. The growing applications of machine
learning algorithms in particle physics for simulation, reconstruction, and
analysis are naturally deployed on such platforms. We demonstrate that the
acceleration of machine learning inference as a web service represents a
heterogeneous computing solution for particle physics experiments that
potentially requires minimal modification to the current computing model. As
examples, we retrain the ResNet-50 convolutional neural network to demonstrate
state-of-the-art performance for top quark jet tagging at the LHC and apply a
ResNet-50 model with transfer learning for neutrino event classification. Using
Project Brainwave by Microsoft to accelerate the ResNet-50 image classification
model, we achieve average inference times of 60 (10) milliseconds with our
experimental physics software framework using Brainwave as a cloud (edge or
on-premises) service, representing an improvement by a factor of approximately
30 (175) in model inference latency over traditional CPU inference in current
experimental hardware. A single FPGA service accessed by many CPUs achieves a
throughput of 600--700 inferences per second using an image batch of one,
comparable to large batch-size GPU throughput and significantly better than
small batch-size GPU throughput. Deployed as an edge or cloud service for the
particle physics computing model, coprocessor accelerators can have a higher
duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table
Using LSTM recurrent neural networks for monitoring the LHC superconducting magnets
The superconducting LHC magnets are coupled with an electronic monitoring
system which records and analyses voltage time series reflecting their
performance. A currently used system is based on a range of preprogrammed
triggers which launches protection procedures when a misbehavior of the magnets
is detected. All the procedures used in the protection equipment were designed
and implemented according to known working scenarios of the system and are
updated and monitored by human operators.
This paper proposes a novel approach to monitoring and fault protection of
the Large Hadron Collider (LHC) superconducting magnets which employs
state-of-the-art Deep Learning algorithms. Consequently, the authors of the
paper decided to examine the performance of LSTM recurrent neural networks for
modeling of voltage time series of the magnets. In order to address this
challenging task different network architectures and hyper-parameters were used
to achieve the best possible performance of the solution. The regression
results were measured in terms of RMSE for different number of future steps and
history length taken into account for the prediction. The best result of
RMSE=0.00104 was obtained for a network of 128 LSTM cells within the internal
layer and 16 steps history buffer
Visual Spike-based Convolution Processing with a Cellular Automata Architecture
this paper presents a first approach for
implementations which fuse the Address-Event-Representation
(AER) processing with the Cellular Automata using FPGA and
AER-tools. This new strategy applies spike-based convolution
filters inspired by Cellular Automata for AER vision
processing. Spike-based systems are neuro-inspired circuits
implementations traditionally used for sensory systems or
sensor signal processing. AER is a neuromorphic
communication protocol for transferring asynchronous events
between VLSI spike-based chips. These neuro-inspired
implementations allow developing complex, multilayer,
multichip neuromorphic systems and have been used to design
sensor chips, such as retinas and cochlea, processing chips, e.g.
filters, and learning chips. Furthermore, Cellular Automata is a
bio-inspired processing model for problem solving. This
approach divides the processing synchronous cells which
change their states at the same time in order to get the solution.Ministerio de Educación y Ciencia TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-02Junta de Andalucía P06-TIC-0141
- …