Search CORE

9,280 research outputs found

LO-FAT: Low-Overhead Control Flow ATtestation in Hardware

Author: Asokan N.
Davi Lucas
Dessouky Ghada
Koeberl Patrick
Nyman Thomas
Paverd Andrew
Sadeghi Ahmad-Reza
Zeitouni Shaza
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2017
Field of study

Attacks targeting software on embedded systems are becoming increasingly prevalent. Remote attestation is a mechanism that allows establishing trust in embedded devices. However, existing attestation schemes are either static and cannot detect control-flow attacks, or require instrumentation of software incurring high performance overheads. To overcome these limitations, we present LO-FAT, the first practical hardware-based approach to control-flow attestation. By leveraging existing processor hardware features and commonly-used IP blocks, our approach enables efficient control-flow attestation without requiring software instrumentation. We show that our proof-of-concept implementation based on a RISC-V SoC incurs no processor stalls and requires reasonable area overhead.Comment: Authors' pre-print version to appear in DAC 2017 proceeding

arXiv.org e-Print Archive

TUbiblio

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

FPGA-based real-time moving target detection system for unmanned aerial vehicle application

Author: Marsono Muhammad Nadzir
Shaikh-Husin Nasir
Sheikh Usman Ullah
Tang Jia Wei
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Moving target detection is the most common task for Unmanned Aerial Vehicle (UAV) to find and track object of interest from a bird's eye view in mobile aerial surveillance for civilian applications such as search and rescue operation. The complex detection algorithm can be implemented in a real-time embedded system using Field Programmable Gate Array (FPGA). This paper presents the development of real-time moving target detection System-on-Chip (SoC) using FPGA for deployment on a UAV. The detection algorithm utilizes area-based image registration technique which includes motion estimation and object segmentation processes. The moving target detection system has been prototyped on a low-cost Terasic DE2-115 board mounted with TRDB-D5M camera. The system consists of Nios II processor and stream-oriented dedicated hardware accelerators running at 100 MHz clock rate, achieving 30-frame per second processing speed for 640 × 480 pixels' resolution greyscale videos

Crossref

Directory of Open Access Journals

Universiti Teknologi Malaysia Institutional Repository

Performance analysis and optimization of automatic speech recognition

Author: Arnau Montañés José María
González Colás Antonio María
Tabani Hamid
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fast and accurate Automatic Speech Recognition (ASR) is emerging as a key application for mobile devices. Delivering ASR on such devices is challenging due to the compute-intensive nature of the problem and the power constraints of embedded systems. In this paper, we provide a performance and energy characterization of Pocketsphinx, a popular toolset for ASR that targets mobile devices. We identify the computation of the Gaussian Mixture Model (GMM) as the main bottleneck, consuming more than 80 percent of the execution time. The CPI stack analysis shows that branches and main memory accesses are the main performance limiting factors for GMM computation. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. First, we use a refactored implementation of the innermost loop of the GMM evaluation code to ameliorate the impact of branches. Second, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. Third, we compute the Gaussians for multiple frames in parallel, so means and variances can be fetched once in the on-chip caches and reused across multiple frames, significantly reducing memory bandwidth usage. We evaluate our optimizations using both hardware counters on real CPUs and simulations. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61 percent energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59 percent energy savings without any loss in the accuracy of the ASR system.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A single-chip FPGA implementation of real-time adaptive background model

Author: Appiah Kofi
Hunter Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

This paper demonstrates the use of a single-chip FPGA for the extraction of highly accurate background models in real-time. The models are based on 24-bit RGB values and 8-bit grayscale intensity values. Three background models are presented, all using a camcorder, single FPGA chip, four blocks of RAM and a display unit. The architectures have been implemented and tested using a Panasonic NVDS60B digital video camera connected to a Celoxica RC300 Prototyping Platform with a Xilinx Virtex II XC2v6000 FPGA and 4 banks of onboard RAM. The novel FPGA architecture presented has the advantages of minimizing latency and the movement of large datasets, by conducting time critical processes on BlockRAM. The systems operate at clock rates ranging from 57MHz to 65MHz and are capable of performing pre-processing functions like temporal low-pass filtering on standard frame size of 640X480 pixels at up to 210 frames per second

University of Lincoln Institutional Repository

Nottingham Trent Institutional Repository (IRep)

Intelligent intrusion detection in low power IoTs

Author: Ahmadinia Ali
Javed Abbas
Larijani Hadi
Saeed Ahmed
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2016
Field of study

ResearchOnline@GCU