2,398 research outputs found
FPGA-based module for SURF extraction
We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
OPTIMIZED ARCHITECTURE DESIGN AND IMPLEMENTATION OF OBJECT TRACKING ALGORITHM ON FPGA
FPGA based Object tracking implementation is one of the most recent video
surveillance applications in embedded systems. In general, FPGA implementation is
more efficient than general purpose computers in attaining high throughput due to its
parallelism and execution speed. The system need to be designed on a standard frame
rate in such a way to achieve optimal performance in real time environment. Optimal
design of a system is dependent on minimizing the cost, area (device utility) and
power while achieving the required speed. Past research work that investigated object
tracking systems' implementation on FPGA achieved a significantly high throughput
but have shown high device utilization. This research work aims at optimizing the
device utilization under real time constraints. The Adaptive Hybrid Difference
algorithm (AHD), which is used to detect the moving objects, was chosen to be
implemented on FPGA due to its computation ability and efficiency with regard to
hardware implementation. AHD can work at various lighting conditions automatically
by determining the adaptive threshold in every period of time
Event-based Row-by-Row Multi-convolution engine for Dynamic-Vision Feature Extraction on FPGA
Neural networks algorithms are commonly used to
recognize patterns from different data sources such as audio or
vision. In image recognition, Convolutional Neural Networks are
one of the most effective techniques due to the high accuracy they
achieve. This kind of algorithms require billions of addition and
multiplication operations over all pixels of an image. However,
it is possible to reduce the number of operations using other
computer vision techniques rather than frame-based ones, e.g.
neuromorphic frame-free techniques. There exists many neuromorphic
vision sensors that detect pixels that have changed
their luminosity. In this study, an event-based convolution engine
for FPGA is presented. This engine models an array of leaky
integrate and fire neurons. It is able to apply different kernel
sizes, from 1x1 to 7x7, which are computed row by row, with a
maximum number of 64 different convolution kernels. The design
presented is able to process 64 feature maps of 7x7 with a latency
of 8.98 s.Ministerio de Economía y Competitividad TEC2016-77785-
Resource-constrained FPGA Design for Satellite Component Feature Extraction
The effective use of computer vision and machine learning for on-orbit
applications has been hampered by limited computing capabilities, and therefore
limited performance. While embedded systems utilizing ARM processors have been
shown to meet acceptable but low performance standards, the recent availability
of larger space-grade field programmable gate arrays (FPGAs) show potential to
exceed the performance of microcomputer systems. This work proposes use of
neural network-based object detection algorithm that can be deployed on a
comparably resource-constrained FPGA to automatically detect components of
non-cooperative, satellites on orbit. Hardware-in-the-loop experiments were
performed on the ORION Maneuver Kinematics Simulator at Florida Tech to compare
the performance of the new model deployed on a small, resource-constrained FPGA
to an equivalent algorithm on a microcomputer system. Results show the FPGA
implementation increases the throughput and decreases latency while maintaining
comparable accuracy. These findings suggest future missions should consider
deploying computer vision algorithms on space-grade FPGAs.Comment: 9 pages, 7 figures, 4 tables, Accepted at IEEE Aerospace Conference
202
Software Porting of a 3D Reconstruction Algorithm to Razorcam Embedded System on Chip
A method is presented to calculate depth information for a UAV navigation system from Keypoints in two consecutive image frames using a monocular camera sensor as input and the OpenCV library. This method was first implemented in software and run on a general-purpose Intel CPU, then ported to the RazorCam Embedded Smart-Camera System and run on an ARM CPU onboard the Xilinx Zynq-7000. The results of performance and accuracy testing of the software implementation are then shown and analyzed, demonstrating a successful port of the software to the RazorCam embedded system on chip that could potentially be used onboard a UAV with tight constraints of size, weight, and power. The potential impacts will be seen through the continuation of this research in the Smart ES lab at University of Arkansas
- …