228 research outputs found
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
Neural Network Methods for Radiation Detectors and Imaging
Recent advances in image data processing through machine learning and
especially deep neural networks (DNNs) allow for new optimization and
performance-enhancement schemes for radiation detectors and imaging hardware
through data-endowed artificial intelligence. We give an overview of data
generation at photon sources, deep learning-based methods for image processing
tasks, and hardware solutions for deep learning acceleration. Most existing
deep learning approaches are trained offline, typically using large amounts of
computational resources. However, once trained, DNNs can achieve fast inference
speeds and can be deployed to edge devices. A new trend is edge computing with
less energy consumption (hundreds of watts or less) and real-time analysis
potential. While popularly used for edge computing, electronic-based hardware
accelerators ranging from general purpose processors such as central processing
units (CPUs) to application-specific integrated circuits (ASICs) are constantly
reaching performance limits in latency, energy consumption, and other physical
constraints. These limits give rise to next-generation analog neuromorhpic
hardware platforms, such as optical neural networks (ONNs), for high parallel,
low latency, and low energy computing to boost deep learning acceleration
A High-Performance Data Acquisition System for Smart Cameras in Science
This dissertation proposes a novel smart camera platform serving as a flexible data acquisition system for scientific applications. Current technological progress offers increasing performance in the areas we consider, namely high data-throughput, data processing, and detector performance. Prevalent data acquisition solutions typically focus on one of these aspects. However, driven by science, experiments
experience increasing demands in terms of data throughput, speed and flexibility. In this dissertation, we introduce a system which, in addition to being able to provide high-speed data transfer, is also capable of interpreting the incoming information at an early stage. In order to demonstrate the full potential of the smart camera platform, we focus on X-ray imaging with synchrotron light sources. X-ray imaging applications can investigate the traits of technological and biological processes over microseconds for radiography, and milliseconds for tomography applications. These applications may require different sensors, and include complex experiment operations. The new smart camera platform is part of a larger project, UFO, which introduces a new concept for X-ray imaging. On-line data assessment is used to provide a data-driven feedback and active management of both the process and data acquisition procedure. This is accomplished using a GPU platform for fast reconstruction, embedded on-camera data processing, and integrating smart camera in a high-throughput data acquisition system. The final design of the smart camera platform consists of a custom high-performance FPGA board, providing continuous data transfer, embedded image processing, and a flexible input stage. In the IMAGE beamline of ANKA, camera is integrated in the new control system, and used in real-life applications. A maximum data-throughput of up to 8 GB/s is achieved. A custom image-based algorithm is implemented in the FPGA, with stringent real-time requirements, able to increase native sensor speed up to five times while reducing the amount of transfered data. Several image sensors are used, with resolutions of up to 20 megapixels and frame rates of up to 5 kfps. The smart camera platform was also used in non-imaging applications, stemming from the flexible input stage. The
proposed camera architecture enables the user to modify the current system for any kind of high data-throughput applications, and to modify and implement custom processing algorithms
Neural network methods for radiation detectors and imaging
Recent advances in image data proccesing through deep learning allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware. This enables radiation experiments, which includes photon sciences in synchrotron and X-ray free electron lasers as a subclass, through data-endowed artificial intelligence. We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions for deep learning acceleration. Most existing deep learning approaches are trained offline, typically using large amounts of computational resources. However, once trained, DNNs can achieve fast inference speeds and can be deployed to edge devices. A new trend is edge computing with less energy consumption (hundreds of watts or less) and real-time analysis potential. While popularly used for edge computing, electronic-based hardware accelerators ranging from general purpose processors such as central processing units (CPUs) to application-specific integrated circuits (ASICs) are constantly reaching performance limits in latency, energy consumption, and other physical constraints. These limits give rise to next-generation analog neuromorhpic hardware platforms, such as optical neural networks (ONNs), for high parallel, low latency, and low energy computing to boost deep learning acceleration (LA-UR-23-32395)
Real-time video scene analysis with heterogeneous processors
Field-Programmable Gate Arrays (FPGAs) and General Purpose Graphics Processing Units (GPUs) allow acceleration and real-time processing of computationally intensive computer vision algorithms. The decision to use either architecture in any application is determined by task-specific priorities such as processing latency, power consumption and algorithm accuracy. This choice is normally made at design time on a heuristic or fixed algorithmic basis; here we propose an alternative method for automatic runtime selection.
In this thesis, we describe our PC-based system architecture containing both platforms; this provides greater flexibility and allows dynamic selection of processing platforms to suit changing scene priorities. Using the Histograms of Oriented Gradients (HOG) algorithm for pedestrian detection, we comprehensively explore algorithm implementation on FPGA, GPU and a combination of both, and show that the effect of data transfer time on overall processing performance is significant. We also characterise performance of each implementation and quantify tradeoffs between power, time and accuracy when moving processing between architectures, then specify the optimal architecture to use when prioritising each of these.
We apply this new knowledge to a real-time surveillance application representative of anomaly detection problems: detecting parked vehicles in videos. Using motion detection and car and pedestrian HOG detectors implemented across multiple architectures to generate detections, we use trajectory clustering and a Bayesian contextual motion algorithm to generate an overall scene anomaly level. This is in turn used to select the architectures to run the compute-intensive detectors for the next frame on, with higher anomalies selecting faster, higher-power implementations. Comparing dynamic context-driven prioritisation of system performance against a fixed mapping of algorithms to architectures shows that our dynamic mapping method is 10% more accurate at detecting events than the power-optimised version, at the cost of 12W higher power consumption
Runtime methods for energy-efficient, image processing using significance driven learning.
Ph. D. Thesis.Image and Video processing applications are opening up a whole
range of opportunities for processing at the "edge" or IoT applications
as the demand for high accuracy processing high resolution images
increases. However this comes with an increase in the quantity of data
to be processed and stored, thereby causing a significant increase in
the computational challenges. There is a growing interest in developing
hardware systems that provide energy efficient solutions to this
challenge. The challenges in Image Processing are unique because the
increase in resolution, not only increases the data to be processed but
also the amount of information detail scavenged from the data is also
greatly increased. This thesis addresses the concept of extracting the
significant image information to enable processing the data intelligently
within a heterogeneous system.
We propose a unique way of defining image significance, based on
what causes us to react when something "catches our eye", whether it
be static or dynamic, whether it be in our central field of focus or our
peripheral vision. This significance technique proves to be a relatively
economical process in terms of energy and computational effort.
We investigate opportunities for further computational and energy
efficiency that are available by elective use of heterogeneous system
elements.
We utilise significance to adaptively select regions of interest for selective
levels of processing dependent on their relative significance.
We further demonstrate that exploiting the computational slack time
released by this process, we can apply throttling of the processor
speed to effect greater energy savings. This demonstrates a reduction
in computational effort and energy efficiency a process that we term
adaptive approximate computing.
We demonstrate that our approach reduces energy in a range of 50 to
75%, dependent on user quality demand, for a real-time performance
requirement of 10 fps for a WQXGA image, when compared with the
existing approach that is agnostic of significance. We further hypothesise
that by use of heterogeneous elements that savings up to 90%
could be achievable in both performance and energy when compared
with running OpenCV on the CPU alone
Design and management of image processing pipelines within CPS: Acquired experience towards the end of the FitOptiVis ECSEL Project
Cyber-Physical Systems (CPSs) are dynamic and reactive systems interacting with processes, environment and, sometimes, humans. They are often distributed with sensors and actuators, characterized for being smart, adaptive, predictive and react in real-time. Indeed, image- and video-processing pipelines are a prime source for environmental information for systems allowing them to take better decisions according to what they see. Therefore, in FitOptiVis, we are developing novel methods and tools to integrate complex image- and video-processing pipelines. FitOptiVis aims to deliver a reference architecture for describing and optimizing quality and resource management for imaging and video pipelines in CPSs both at design- and run-time. The architecture is concretized in low-power, high-performance, smart components, and in methods and tools for combined design-time and run-time multi-objective optimization and adaptation within system and environment constraints
- …