10,563 research outputs found
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
FPGA-accelerated machine learning inference as a service for particle physics computing
New heterogeneous computing paradigms on dedicated hardware with increased
parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting
solutions with large potential gains. The growing applications of machine
learning algorithms in particle physics for simulation, reconstruction, and
analysis are naturally deployed on such platforms. We demonstrate that the
acceleration of machine learning inference as a web service represents a
heterogeneous computing solution for particle physics experiments that
potentially requires minimal modification to the current computing model. As
examples, we retrain the ResNet-50 convolutional neural network to demonstrate
state-of-the-art performance for top quark jet tagging at the LHC and apply a
ResNet-50 model with transfer learning for neutrino event classification. Using
Project Brainwave by Microsoft to accelerate the ResNet-50 image classification
model, we achieve average inference times of 60 (10) milliseconds with our
experimental physics software framework using Brainwave as a cloud (edge or
on-premises) service, representing an improvement by a factor of approximately
30 (175) in model inference latency over traditional CPU inference in current
experimental hardware. A single FPGA service accessed by many CPUs achieves a
throughput of 600--700 inferences per second using an image batch of one,
comparable to large batch-size GPU throughput and significantly better than
small batch-size GPU throughput. Deployed as an edge or cloud service for the
particle physics computing model, coprocessor accelerators can have a higher
duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table
A novel system architecture for real-time low-level vision
A novel system architecture that exploits the spatial locality in memory access that is found in most low-level vision algorithms is presented. A real-time feature selection system is used to exemplify the underlying ideas, and an implementation based on commercially available Field Programmable Gate Arrays (FPGAās) and synchronous SRAM memory devices is proposed. The peak memory access rate of a system based on this architecture is estimated at 2.88 G-Bytes/s, which represents a four to five times improvement with respect to existing reconfigurable computers
Real-time human action recognition on an embedded, reconfigurable video processing architecture
Copyright @ 2008 Springer-Verlag.In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine (SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. āmotion history imageā) class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfiured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments.DTI and Broadcom Ltd
- ā¦