149 research outputs found
Recommended from our members
Capability Memory Protection for Embedded Systems
This dissertation explores the use of capability security hardware and software in real-time and latency-sensitive embedded systems, to address existing memory safety and task isolation problems as well as providing new means to design a secure and scalable real-time system.
In addition, this dissertation looks into how practical and high-performance temporal memory safety can be achieved under a capability architecture.
State-of-the-art memory protection schemes for embedded systems typically present limited and inflexible solutions to memory protection and isolation, and fail to scale as embedded devices become more capable and ubiquitous.
I investigate whether a capability architecture is able to provide new angles to address memory safety issues in an embedded scenario.
Previous CHERI capability research focuses on 64-bit architectures in UNIX operating systems, which does not translate to typical 32-bit embedded processors with low-latency and real-time requirements.
I propose and implement the CHERI CC-64 encoding and the CHERI-64 coprocessor to construct a feasible capability-enabled 32-bit CPU.
In addition, I implement a real-time kernel for embedded systems atop CHERI-64.
On this hardware and software platform, I focus on exploring scalable task isolation and fine-grained memory protection enabled by capabilities in a single flat physical address space, which are otherwise difficult or impossible to achieve via state-of-the-art approaches.
Later, I present the evaluation of the hardware implementation and the software run-time overhead and real-time performance.
Even with capability support, CHERI-64 as well as other CHERI processors still expose major attack surfaces through temporal vulnerabilities like use-after-free.
A naive approach that sweeps memory to invalidate stale capabilities is inefficient and incurs significant cycle overhead and DRAM traffic.
To make sweeping revocation feasible, I introduce new architectural mechanisms and micro-architectural optimisations to substantially reduce the cost of memory sweeping and capability revocation.
Another factor of the cost is the frequency of memory sweeping.
I explore tradeoffs of memory allocator designs that use quarantine buffers and shadow space tags to prevent frequent unnecessary sweeping.
The evaluation shows that the optimisations and new allocator designs reduce the cost of capability sweeping revocation by orders of magnitude, making it already practical for most applications to adopt temporal safety under CHERI.CSC Cambridge Scholarshi
TaintHLS: High-Level Synthesis For Dynamic Information Flow Tracking
Dynamic Information Flow Tracking (DIFT) is a technique to track potential security vulnerabilities in software and hardware systems at run time. Untrusted data are marked with tags (tainted), which are propagated through the system and their potential for unsafe use is analyzed to prevent them. DIFT is not supported in heterogeneous systems especially hardware accelerators. Currently, DIFT is manually generated and integrated into the accelerators. This process is error-prone, potentially hurting the process of identifying security violations in heterogeneous systems. We present TAINTHLS, to automatically generate a micro-architecture to support baseline operations and a shadow microarchitecture for intrinsic DIFT support in hardware accelerators while providing variable granularity of taint tags. TaintHLS offers a companion high-level synthesis (HLS) methodology to automatically generate such DIFT-enabled accelerators from a high-level specification. We extended a state-of-the-art HLS tool to generate DIFT-enhanced accelerators and demonstrated the approach on numerous benchmarks. The DIFT-enabled accelerators have negligible performance and no more than 30% hardware overhead
Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis
High-Level Synthesis (HLS) for FPGAs is attracting popularity and is increasingly used to handle complex systems with multiple integrated components. To increase performance and efficiency, HLS flows now adopt several advanced optimization techniques. Aggressive optimizations and system level integration can cause the introduction of bugs that are only observable on-chip. Debugging support for circuits generated with HLS is receiving a considerable attention. Among the data that can be collected on chip for debugging, one of the most important is the state of the Finite State Machines (FSM) controlling the components of the circuit.
However, this usually requires a large amount of memory to trace the behavior during the execution. This work proposes an approach that takes advantage of the HLS information and of the structure of the FSM to compress control flow traces and to integrate optimized components for on-chip debugging. The generated checkers analyze the FSM execution on-fly, automatically notifying when a bug is detected, localizing it and providing data about its cause. The traces are compressed using a software profiling technique, called Efficient Path Profiling (EPP), adapted for the debugging of hardware accelerators generated with HLS. With this technique, the size of the memory used to store control flow traces can be reduced up to 2 orders of magnitude, compared to state-of-the-art
Recommended from our members
Exploitation from Malicious PCI Express Peripherals
The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol.
Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection.
To prove this thesis, we designed and built a new PCIe attack platform.
We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS.
To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator.
We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs.
The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU).
The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access.
We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware.
We discovered that use of the IOMMU is patchy even in modern operating systems.
Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used.
These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows.
We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled.
These represent the first use of the relevant exploits in each case.
In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.EPSRC and ARM iCASE Awar
Recommended from our members
Compiling Irregular Software to Specialized Hardware
High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications.
In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism.
This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before
메모리 변조 공격 대응을 위한 하드웨어 기술
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 백윤흥.Many programs are written in unsafe languages like C or C++ mainly due to their advantages in performance, and most of them are too complex to be implemented without implementation errors. For these two reasons, such programs inevitably have vulnerabilities with which attackers can access their memory arbitrary. Unfortunately, it is said to be impossible to eliminate the vulnerabilities. Whereas programs can be verified not to have some vulnerabilities, only small programs can be analyzed statically and not all vulnerabilities can be found and fixed. To address the problem of the vulnerable programs, researchers have proposed a number of mechanisms to mitigate the attacks exploiting the vulnerabilities.
This thesis presents novel hardware-assisted mechanisms against those attacks exploiting the vulnerabilities, which are called the memory corruption attacks.
The first half discusses the ones against the attacks on OS kernels. In most computer systems, OS kernels have the full control. Every program running on a system has to call the kernel to access or acquire the resources of the system such as the network, file system, or even the memory. This nature makes the OS kernels be an attractive target for attackers. Taking control of it, they can affect every single program running on the system.
A difficulty in devising mechanisms to mitigate the attacks on OS kernels comes from the fact that they control the system. Any mechanism that relies on the OS kernels can be nullified by the attackers with the control of the kernels. This lead to the research on the mechanisms that do not rely on the OS kernels themselves. This thesis presents the state of the art of the mechanisms using physically isolated hardware components to avoid relying on the OS kernels. We designed and implemented a novel means for such mechanisms to collect the kernel events efficiently and effectively, and utilized them to mitigate the common types of attacks.
The second half presents hardware-assisted mechanisms for memory corruption attacks in general. Though many mechanisms have been proposed to mitigate memory corruption attacks, most of them are not practical. Some of them have limited backward compatibility which requires the existing programs to be fixed to adopt them, and most of them are not efficient enough to be widely deployed.
This thesis aims to design practical mechanisms to mitigate memory corruption attacks, and presents two of such mechanisms. The first one enables the programs to isolate the data-flow of sensitive data from the others. Such isolation makes it more difficult for the attackers to corrupt the sensitive data because only the vulnerabilities in the code blocks accessing them can be exploited to corrupt them. The second one prevents the attackers from building up the attacks reliably by randomizing data space. Once a program adopts the mechanism, only the memory accesses complying with the results of the static analysis can be completed correctly. As the attacks usually cause the victim programs to violate the results, the attacker-induced memory accesses will cause unpredictable values to be stored or loaded.
In summary, this thesis presents four mechanisms to mitigate the memory corruption attacks either on OS kernels or user-level programs.1 Introduction 1
1.1 Hardware-basedMonitorsforOSKernels 3
1.2 Hardware-assisted Enforcement of Data-Flow Integrity 4
1.3 Outline 5
2 Snoop-Based Kernel Integrity Monitors 7
2.1 Motivations 7
2.2 Assumptions and Threat Model 11
2.2.1 Assumptions 11
2.2.2 Threat Model 11
2.3 Transient Attacks 11
2.3.1 Definition 12
2.3.2 Difficulties of Detecting Transient Attacks 13
2.4 Vigilare System Requirements 13
2.4.1 Selective Bus-traffic Collection and Sufficient Computing Power 14
2.4.2 Handling Bursty Traffic 15
2.4.3 Integrity of the Vigilare System 15
2.5 Detection of the Attacks on Immutable Regions 16
2.5.1 Immutable Regions of Linux Kernel 16
2.5.2 Physical Addresses of Immutable Regions 18
2.5.3 SnoopMon 18
2.5.4 SnoopMon-A 21
2.5.5 SnoopMon-S 22
2.6 Detection of the Attacks on Mutable Regions 24
2.6.1 Attacks on Mutable Regions 25
2.6.2 KI-Mon 26
2.6.3 Detection Mechanisms 29
2.7 Protection of the Kernel from Permanent Damage 31
2.8 Evaluation 35
2.8.1 Comparison with Snapshot-based Monitoring 35
2.8.2 Effectiveness of Snoop-based Monitoring 39
2.8.3 Discussions 41
2.9 Limitations and FutureWork 42
2.9.1 RelocationAttack 42
2.9.2 CodeReuseAttacks 43
2.9.3 PrivilegeEscalation. 44
2.9.4 CacheResidentAttacks 44
2.10 RelatedWork 46
2.10.1 Hypervisor-basedApproaches 46
2.10.2 Hardware-basedApproaches 48
2.10.3 SnoopingBusTraffic 49
2.11 Summary 49
3 Protection of OS Kernels from Code-Injection and Code-Reuse Attacks 51
3.1 Motivations 51
3.2 Problem Definition 55
3.2.1 Threat Model 55
3.2.2 Assumptions 56
3.3 Code-Injection Attacks 56
3.3.1 Architectural Supports 56
3.3.2 Detection Mechanism 61
3.4 ROPAttacks 67
3.4.1 Branch Address Classification 69
3.4.2 Call Site Emission 69
3.4.3 Protection of Shadow Stacks 70
3.4.4 Context Switches 71
3.4.5 Shadow Stack Creation 71
3.5 Evaluation 72
3.5.1 ImplementationDetails 72
3.5.2 Performance 74
3.5.3 Security 77
3.6 Limitations and Future Work 80
3.6.1 Bypassing the Scheme 80
3.6.2 Kernel Modules 81
3.7 Related Work 81
3.7.1 Page Table Protection 81
3.7.2 Hypervisor-based Approaches 82
3.7.3 Snapshot Analyses 82
3.7.4 Bus Snooping 83
3.7.5 Control-Flow Integrity for Privileged Software Layer 83
3.7.6 Software Diversification 83
3.7.7 Formally Verified Microkernels 84
3.7.8 Debug Interfaces 84
3.7.9 Architectural Supports for Shadow Stacks 85
3.8 Summary 85
4 Data-Flow Isolation 87
4.1 Motivations 87
4.2 Threat Model and Assumptions 91
4.3 Background and Related Work 92
4.3.1 Data-flow Integrity 92
4.3.2 Tag-based Memory Protection 93
4.3.3 Tag-based Hardware 95
4.3.4 Memory Safety 96
4.4 HDFI Architecture 96
4.4.1 ISA Extension 97
4.4.2 Memory Tagger 98
4.4.3 Optimizations 99
4.4.4 Protecting the Tag Tables 101
4.5 Implementation 101
4.5.1 Hardware 102
4.5.2 Software Support 106
4.6 Evaluation 106
4.6.1 Verification 108
4.6.2 Performance Overhead 108
4.7 Limitations and Future Work 111
4.8 Summary 112
5 Data Space Randomization 114
5.1 Motivations 114
5.2 Background 117
5.2.1 Mitigation with DSR 120
5.2.2 Limitations of Existing DSR Schemes 121
5.3 Threat Model 122
5.4 Design 123
5.4.1 Hardware Overview 124
5.4.2 Hardware Initialization 125
5.4.3 New Instructions 126
5.4.4 DSR Overview 128
5.5 Prototype Implementation 128
5.5.1 Instruction Encoding 129
5.5.2 Processor Pipeline 131
5.5.3 DSR Prototype 133
5.6 Security Evaluation 135
5.6.1 Real-World Protection 138
5.7 Performance Evaluation 139
5.8 Limitations 142
5.9 Future Work 143
5.10 Related Work 144
5.11 Summary 147
6 Conclusion 148
7 Bibliography 150
Abstract (In Korean) 173Docto
Extraction of Host Internal Information for External Hardware Security Monitors
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 백윤흥.Defending electrical devices against a variety of attacks is a daunting
task. A lot of researchers have endeavored to address this issue by
proposing security solutions that can attain high level of security
while minimizing performance overhead introduced to the system. Among
them, hardware-based security solutions have been noted for high
performance compared to their software-based counterparts. However, we
have witnessed that these mechanisms have rarely been accepted to the
market. This phenomenon may be attributed to the fact that most
solutions incur non-negligible modifications to the host architecture
internals and thus would substantially increase the design time and
manufacturing cost. In order to answer this problem, a hardware-based
external monitoring has recently been proposed. The crux of this
solution is that, being located outside the host core and connected to
the host via a standard bus interface, the external monitor can
efficiently conduct time-consuming monitoring tasks on behalf of the
host while requiring no alteration to the host internals. However, these
approaches either suffer from the incapability of handling various
security problems or experience unsubtle performance overhead because,
being externally placed and having no dedicated communication channels,
the hardware monitor has a limited access to the information produced by
the host core, and consequently, the system may be forced to use memory
regions or other shared hardware resources to explicitly transfer the
information from the host to the monitor hardware. In this thesis, we
propose a security solution that can carry out more complicated security
tasks with low performance overhead while keeping the host internal
architecture intact. This can be archived by using an existing standard
debug interface, readily available in numerous modern processors, to
connect our security monitor to the host processor. In order to show the
validity of our approach and explore the implication of using the debug
interface for security monitoring, we present three security monitoring
systems each of which addresses one of three well-known security issues:
defending against kernel rootkits, tracking information-flow, and
defense of code-reuse attacks. The experiment results show that, when
implemented on a FPGA prototyping board, our monitoring solutions
successfully detect the attack samples (i.e., data leakage attacks and
CRAs). More importantly, our systems can attain significantly low
performance overhead compared to previously proposed security monitoring
solutions. The experiments also reveal that the area overhead of the
hardware is acceptably small when compared to the normal sizes of
today's mobile processors.Chapter 1. Introduction 1
Chapter 2. Background and RelatedWork 8
2.1 Background 8
2.1.1 Core Debug Interface 8
2.2 Related Work 9
2.2.1 Software-based Monitoring solutions 10
2.2.2 Hardware-based Monitoring with Invasive Modification 10
2.2.3 Hardware-based Monitoring with Minimal Modification 11
2.2.4 Hardware-based Kernel Integrity Monitors 12
2.2.5 Utilizing debug interface 13
Chapter 3. Monitoring the Integrity of OS Kernels with Data-Flow Information 15
3.1 Introduction 15
3.2 Motivational Example 19
3.3 Assumptions and Threat Models 20
3.4 The Baseline System 21
3.4.1 The Overall System Design 21
3.4.2 Periodic Cache Flush for Cache Resident Attacks 23
3.5 Extrax design 25
3.5.1 Address Translation Unit 26
3.5.2 Early Stage Filter 28
3.6 Experimental Results 30
3.6.1 Prototype System 30
3.6.2 Security Evaluation 32
3.6.3 Performance Analysis 34
3.6.4 Power Consumption 36
3.7 Limitation and Future Work 36
3.8 Conclusion 39
Chapter 4. Monitoring Dynamic Information Flow using Control-Flow/Data-Flow Information 41
4.1 Introduction 41
4.2 DIFT Process with an External Hardware Engine 44
4.3 Building a DIFT Engine for CDI 48
4.3.1 Components of the DIFT Engine 48
4.3.2 Tag Propagation Unit 51
4.4 Experiment 53
4.4.1 Security Evaluation 56
4.4.2 Performance Evaluation 56
4.5 Conclusion 59
Chapter 5. Monitoring ROP/JOP Attacks using Control-Flow Information 60
5.1 Introduction 60
5.2 Background and Assumptions 65
5.2.1 Background 65
5.2.2 Assumptions and Threat Model 70
5.3 Overall System Architecture 71
5.3.1 SoC Prototype Overview 71
5.3.2 CRA Detection Process 72
5.4 IMPLEMENTATION DETAILS 75
5.4.1 Binary Instrumentation 75
5.4.2 Hardware Architectures 77
5.5 EXPERIMENTAL RESULTS 82
5.6 Conclusion 86
Chapter 6. Conclusion 88
Bibliography 90
초 록 99Docto
Private and Public-Key Side-Channel Threats Against Hardware Accelerated Cryptosystems
Modern side-channel attacks (SCA) have the ability to reveal sensitive data from non-protected hardware implementations of cryptographic accelerators whether they be private or public-key systems. These protocols include but are not limited to symmetric, private-key encryption using AES-128, 192, 256, or public-key cryptosystems using elliptic curve cryptography (ECC). Traditionally, scalar point (SP) operations are compelled to be high-speed at any cost to reduce point multiplication latency. The majority of high-speed architectures of contemporary elliptic curve protocols rely on non-secure SP algorithms. This thesis delivers a novel design, analysis, and successful results from a custom differential power analysis attack on AES-128. The resulting SCA can break any 16-byte master key the sophisticated cipher uses and it\u27s direct applications towards public-key cryptosystems will become clear. Further, the architecture of a SCA resistant scalar point algorithm accompanied by an implementation of an optimized serial multiplier will be constructed. The optimized hardware design of the multiplier is highly modular and can use either NIST approved 233 & 283-bit Kobliz curves utilizing a polynomial basis. The proposed architecture will be implemented on Kintex-7 FPGA to later be integrated with the ARM Cortex-A9 processor on the Zynq-7000 AP SoC (XC7Z045) for seamless data transfer and analysis of the vulnerabilities SCAs can exploit
- …