518 research outputs found
SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS
Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches
Gotcha! I Know What You are Doing on the FPGA Cloud: Fingerprinting Co-Located Cloud FPGA Accelerators via Measuring Communication Links
In recent decades, due to the emerging requirements of computation
acceleration, cloud FPGAs have become popular in public clouds. Major cloud
service providers, e.g. AWS and Microsoft Azure have provided FPGA computing
resources in their infrastructure and have enabled users to design and deploy
their own accelerators on these FPGAs. Multi-tenancy FPGAs, where multiple
users can share the same FPGA fabric with certain types of isolation to improve
resource efficiency, have already been proved feasible. However, this also
raises security concerns. Various types of side-channel attacks targeting
multi-tenancy FPGAs have been proposed and validated. The awareness of security
vulnerabilities in the cloud has motivated cloud providers to take action to
enhance the security of their cloud environments.
In FPGA security research papers, researchers always perform attacks under
the assumption that attackers successfully co-locate with victims and are aware
of the existence of victims on the same FPGA board. However, the way to reach
this point, i.e., how attackers secretly obtain information regarding
accelerators on the same fabric, is constantly ignored despite the fact that it
is non-trivial and important for attackers. In this paper, we present a novel
fingerprinting attack to gain the types of co-located FPGA accelerators. We
utilize a seemingly non-malicious benchmark accelerator to sniff the
communication link and collect performance traces of the FPGA-host
communication link. By analyzing these traces, we are able to achieve high
classification accuracy for fingerprinting co-located accelerators, which
proves that attackers can use our method to perform cloud FPGA accelerator
fingerprinting with a high success rate. As far as we know, this is the first
paper targeting multi-tenant FPGA accelerator fingerprinting with the
communication side-channel.Comment: To be published in ACM CCS 202
Pedestrian Detection Image Processing with FPGA
This paper focuses on real-time pedestrian detection using the Histograms of Oriented Gradients (HOG) feature descriptor algorithm on a Field Programmable Gate Array. To achieve real- time pedestrian recognition on embedded systems, hardware architecture for HOG feature extraction is proposed. In order to reduce computational complexity toward efficient hardware architecture, this paper proposes several methods to simplify the computation of the HOG feature descriptor. The architecture is proposed on a Xilinx Zynq-7000 SoC using Verilog HDL to evaluate the real-time performance. This implementation processes image data at twice the pixel rate of similar software simulations and significantly reduces resource utilization while maintaining high detection accuracy
Recent Advances in Embedded Computing, Intelligence and Applications
The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
Recommended from our members
Do switches dream of machine learning?: Toward in-network classification
Machine learning is currently driving a technological and societal revolution. While programmable switches have been proven to be useful for in-network computing, machine learning within programmable switches had little success so far. Not using network devices for machine learning has a high toll, given the known power efficiency and performance benefits of processing within the network. In this paper, we explore the potential use of commodity programmable switches for in-network classification, by mapping trained machine learning models to match-action pipelines. We introduce IIsy, a software and hardware based prototype of our approach, and discuss the suitability of mapping to different targets. Our solution can be generalized to additional machine learning algorithms, using the methods presented in this work
- …