9,280 research outputs found
LO-FAT: Low-Overhead Control Flow ATtestation in Hardware
Attacks targeting software on embedded systems are becoming increasingly
prevalent. Remote attestation is a mechanism that allows establishing trust in
embedded devices. However, existing attestation schemes are either static and
cannot detect control-flow attacks, or require instrumentation of software
incurring high performance overheads. To overcome these limitations, we present
LO-FAT, the first practical hardware-based approach to control-flow
attestation. By leveraging existing processor hardware features and
commonly-used IP blocks, our approach enables efficient control-flow
attestation without requiring software instrumentation. We show that our
proof-of-concept implementation based on a RISC-V SoC incurs no processor
stalls and requires reasonable area overhead.Comment: Authors' pre-print version to appear in DAC 2017 proceeding
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
FPGA-based real-time moving target detection system for unmanned aerial vehicle application
Moving target detection is the most common task for Unmanned Aerial Vehicle (UAV) to find and track object of interest from a bird's eye view in mobile aerial surveillance for civilian applications such as search and rescue operation. The complex detection algorithm can be implemented in a real-time embedded system using Field Programmable Gate Array (FPGA). This paper presents the development of real-time moving target detection System-on-Chip (SoC) using FPGA for deployment on a UAV. The detection algorithm utilizes area-based image registration technique which includes motion estimation and object segmentation processes. The moving target detection system has been prototyped on a low-cost Terasic DE2-115 board mounted with TRDB-D5M camera. The system consists of Nios II processor and stream-oriented dedicated hardware accelerators running at 100 MHz clock rate, achieving 30-frame per second processing speed for 640 × 480 pixels' resolution greyscale videos
Performance analysis and optimization of automatic speech recognition
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fast and accurate Automatic Speech Recognition (ASR) is emerging as a key application for mobile devices. Delivering ASR on such devices is challenging due to the compute-intensive nature of the problem and the power constraints of embedded systems. In this paper, we provide a performance and energy characterization of Pocketsphinx, a popular toolset for ASR that targets mobile devices. We identify the computation of the Gaussian Mixture Model (GMM) as the main bottleneck, consuming more than 80 percent of the execution time. The CPI stack analysis shows that branches and main memory accesses are the main performance limiting factors for GMM computation. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. First, we use a refactored implementation of the innermost loop of the GMM evaluation code to ameliorate the impact of branches. Second, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. Third, we compute the Gaussians for multiple frames in parallel, so means and variances can be fetched once in the on-chip caches and reused across multiple frames, significantly reducing memory bandwidth usage. We evaluate our optimizations using both hardware counters on real CPUs and simulations. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61 percent energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59 percent energy savings without any loss in the accuracy of the ASR system.Peer ReviewedPostprint (author's final draft
A single-chip FPGA implementation of real-time adaptive background model
This paper demonstrates the use of a single-chip
FPGA for the extraction of highly accurate background
models in real-time. The models are based
on 24-bit RGB values and 8-bit grayscale intensity
values. Three background models are presented, all
using a camcorder, single FPGA chip, four blocks
of RAM and a display unit. The architectures have
been implemented and tested using a Panasonic NVDS60B
digital video camera connected to a Celoxica
RC300 Prototyping Platform with a Xilinx Virtex
II XC2v6000 FPGA and 4 banks of onboard RAM.
The novel FPGA architecture presented has the advantages
of minimizing latency and the movement of
large datasets, by conducting time critical processes
on BlockRAM. The systems operate at clock rates
ranging from 57MHz to 65MHz and are capable
of performing pre-processing functions like temporal
low-pass filtering on standard frame size of 640X480
pixels at up to 210 frames per second
- …