14,973 research outputs found
C-FLAT: Control-FLow ATtestation for Embedded Systems Software
Remote attestation is a crucial security service particularly relevant to
increasingly popular IoT (and other embedded) devices. It allows a trusted
party (verifier) to learn the state of a remote, and potentially
malware-infected, device (prover). Most existing approaches are static in
nature and only check whether benign software is initially loaded on the
prover. However, they are vulnerable to run-time attacks that hijack the
application's control or data flow, e.g., via return-oriented programming or
data-oriented exploits. As a concrete step towards more comprehensive run-time
remote attestation, we present the design and implementation of Control- FLow
ATtestation (C-FLAT) that enables remote attestation of an application's
control-flow path, without requiring the source code. We describe a full
prototype implementation of C-FLAT on Raspberry Pi using its ARM TrustZone
hardware security extensions. We evaluate C-FLAT's performance using a
real-world embedded (cyber-physical) application, and demonstrate its efficacy
against control-flow hijacking attacks.Comment: Extended version of article to appear in CCS '16 Proceedings of the
23rd ACM Conference on Computer and Communications Securit
Doctor of Philosophy
dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches
Making an Embedded DBMS JIT-friendly
While database management systems (DBMSs) are highly optimized, interactions
across the boundary between the programming language (PL) and the DBMS are
costly, even for in-process embedded DBMSs. In this paper, we show that
programs that interact with the popular embedded DBMS SQLite can be
significantly optimized - by a factor of 3.4 in our benchmarks - by inlining
across the PL / DBMS boundary. We achieved this speed-up by replacing parts of
SQLite's C interpreter with RPython code and composing the resulting
meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte
does not compromise stand-alone SQL performance and is 2.2% faster than SQLite
on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure
A Comparative Analysis of STM Approaches to Reduction Operations in Irregular Applications
As a recently consolidated paradigm for optimistic concurrency in modern multicore architectures, Transactional Memory (TM)
can help to the exploitation of parallelism in irregular applications when data dependence information is not available up to run-
time. This paper presents and discusses how to leverage TM to exploit parallelism in an important class of irregular applications, the class that exhibits irregular reduction patterns. In order to test and compare our techniques with other solutions, they were implemented in a software TM system called ReduxSTM, that acts as a proof of concept. Basically, ReduxSTM combines two major ideas: a sequential-equivalent ordering of transaction commits that assures the correct result, and an extension of the underlying TM privatization mechanism to reduce unnecessary overhead due to reduction memory updates as well as unnecesary aborts and rollbacks. A comparative study of STM solutions, including ReduxSTM, and other more classical approaches to the parallelization of reduction operations is presented in terms of time, memory and overhead.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
- …