6 research outputs found
Rowhammer Induced Intermittent Fault Attack on ECC-hardened memory
Fault attack is a class of active implementation based attacks which introduces controlled perturbations in the normal operation of a system to produce faulty outcomes. In case of ciphers, these faulty outcomes can lead to leakage of secret information, such as the secret key. The effectiveness and practicality of fault attacks largely depend on the underlying fault model and the type of fault induced. In this paper, we analyse the drawbacks of persistent fault model in case of error correction code (ECC) enabled systems. We further propose a novel fault attack called Intermittent Fault Attack which is well suited for ECC-enabled DRAM modules. We demonstrate the practicality of our attack model by inducing single bit faults using pinpointed Rowhammer technique in S-Boxes of block ciphers in an ECC protected system
A practical key-recovery attack on LWE-based key-encapsulation mechanism schemes using Rowhammer
Physical attacks are serious threats to cryptosystems deployed in the real
world. In this work, we propose a microarchitectural end-to-end attack
methodology on generic lattice-based post-quantum key encapsulation mechanisms
to recover the long-term secret key. Our attack targets a critical component of
a Fujisaki-Okamoto transform that is used in the construction of almost all
lattice-based key encapsulation mechanisms. We demonstrate our attack model on
practical schemes such as Kyber and Saber by using Rowhammer. We show that our
attack is highly practical and imposes little preconditions on the attacker to
succeed. As an additional contribution, we propose an improved version of the
plaintext checking oracle, which is used by almost all physical attack
strategies on lattice-based key-encapsulation mechanisms. Our improvement
reduces the number of queries to the plaintext checking oracle by as much as
for Saber and approximately for Kyber768. This can be of
independent interest and can also be used to reduce the complexity of other
attacks
A practical key-recovery attack on LWE-based key- encapsulation mechanism schemes using Rowhammer
Physical attacks are serious threats to cryptosystems deployed in the real world. In this work, we propose a microarchitectural end-to-end attack methodology on generic lattice-based post-quantum key encapsulation mechanisms to recover the long-term secret key. Our attack targets a critical component of a Fujisaki-Okamoto transform that is used in the construction of almost all lattice-based key encapsulation mechanisms. We demonstrate our attack model on practical schemes such as Kyber and Saber by using Rowhammer. We show that our attack is highly practical and imposes little preconditions on the attacker to succeed. As an additional contribution, we propose an improved version of the plaintext checking oracle, which is used by almost all physical attack strategies on lattice-based key-encapsulation mechanisms. Our improvement reduces the number of queries to the plaintext checking oracle by as much as 39% for Saber and approximately 23% for Kyber768. This can be of independent interest and can also be used to reduce the complexity of other attacks
Microkernel mechanisms for improving the trustworthiness of commodity hardware
The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults.
We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that
the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR).
Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement
Resiliency Mechanisms for In-Memory Column Stores
The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems.
In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION
1.1 Contributions of this Thesis
1.2 Outline
2 PROBLEM DESCRIPTION AND RELATED WORK
2.1 Reliable Data Management on Reliable Hardware
2.2 The Shift Towards Unreliable Hardware
2.3 Hardware-Based Mitigation of Bit Flips
2.4 Data Management System Requirements
2.5 Software-Based Techniques For Handling Bit Flips
2.5.1 Operating System-Level Techniques
2.5.2 Compiler-Level Techniques
2.5.3 Application-Level Techniques
2.6 Summary and Conclusions
3 ANALYSIS OF CODING TECHNIQUES
3.1 Selection of Error Codes
3.1.1 Hamming Coding
3.1.2 XOR Checksums
3.1.3 AN Coding
3.1.4 Summary and Conclusions
3.2 Probabilities of Silent Data Corruption
3.2.1 Probabilities of Hamming Codes
3.2.2 Probabilities of XOR Checksums
3.2.3 Probabilities of AN Codes
3.2.4 Concrete Error Models
3.2.5 Summary and Conclusions
3.3 Throughput Considerations
3.3.1 Test Systems Descriptions
3.3.2 Vectorizing Hamming Coding
3.3.3 Vectorizing XOR Checksums
3.3.4 Vectorizing AN Coding
3.3.5 Summary and Conclusions
3.4 Comparison of Error Codes
3.4.1 Effectiveness
3.4.2 Efficiency
3.4.3 Runtime Adaptability
3.5 Performance Optimizations for AN Coding
3.5.1 The Modular Multiplicative Inverse
3.5.2 Faster Softening
3.5.3 Faster Error Detection
3.5.4 Comparison to Original AN Coding
3.5.5 The Multiplicative Inverse Anomaly
3.6 Summary
4 BIT FLIP DETECTING STORAGE
4.1 Column Store Architecture
4.1.1 Logical Data Types
4.1.2 Storage Model
4.1.3 Data Representation
4.1.4 Data Layout
4.1.5 Tree Index Structures
4.1.6 Summary
4.2 Hardened Data Storage
4.2.1 Hardened Physical Data Types
4.2.2 Hardened Lightweight Compression
4.2.3 Hardened Data Layout
4.2.4 UDI Operations
4.2.5 Summary and Conclusions
4.3 Hardened Tree Index Structures
4.3.1 B-Tree Verification Techniques
4.3.2 Justification For Further Techniques
4.3.3 The Error Detecting B-Tree
4.4 Summary
5 BIT FLIP DETECTING QUERY PROCESSING
5.1 Column Store Query Processing
5.2 Bit Flip Detection Opportunities
5.2.1 Early Onetime Detection
5.2.2 Late Onetime Detection
5.2.3 Continuous Detection
5.2.4 Miscellaneous Processing Aspects
5.2.5 Summary and Conclusions
5.3 Hardened Intermediate Results
5.3.1 Materialization of Hardened Intermediates
5.3.2 Hardened Bitmaps
5.4 Summary
6 END-TO-END EVALUATION
6.1 Prototype Implementation
6.1.1 AHEAD Architecture
6.1.2 Diversity of Physical Operators
6.1.3 One Concrete Operator Realization
6.1.4 Summary and Conclusions
6.2 Performance of Individual Operators
6.2.1 Selection on One Predicate
6.2.2 Selection on Two Predicates
6.2.3 Join Operators
6.2.4 Grouping and Aggregation
6.2.5 Delta Operator
6.2.6 Summary and Conclusions
6.3 Star Schema Benchmark Queries
6.3.1 Query Runtimes
6.3.2 Improvements Through Vectorization
6.3.3 Storage Overhead
6.3.4 Summary and Conclusions
6.4 Error Detecting B-Tree
6.4.1 Single Key Lookup
6.4.2 Key Value-Pair Insertion
6.5 Summary
7 SUMMARY AND CONCLUSIONS
7.1 Future Work
A APPENDIX
A.1 List of Golden As
A.2 More on Hamming Coding
A.2.1 Code examples
A.2.2 Vectorization
BIBLIOGRAPHY
LIST OF FIGURES
LIST OF TABLES
LIST OF LISTINGS
LIST OF ACRONYMS
LIST OF SYMBOLS
LIST OF DEFINITION
Recommended from our members
Bespoke Security for Resource Constrained Cyber-Physical Systems
Cyber-Physical Systems (CPSs) are critical to many aspects of our daily lives. Autonomous cars, life saving medical devices, drones for package delivery, and robots for manufacturing are all prime examples of CPSs. The dual cyber/physical operating nature and highly integrated feedback control loops of CPSs means that they inherit security problems from traditional computing systems (e.g., software vulnerabilities, hardware side-channels) and physical systems (e.g., theft, tampering), while additionally introducing challenges of their own. The challenges to achieving security for CPSs stem not only from the interaction of the cyber and physical domains, but from the additional pressures of resource constraints imposed due to cost, limited energy budgets, and real-time nature of workloads. Due to the tight resource constraints of CPSs, there is often little headroom to devote for security. Thus, there is a need for low overhead deployable solutions to harden resource constrained CPSs. This dissertation shows that security can be effectively integrated into resource constrained cyber-physical system devices by leveraging fundamental physical properties, & tailoring and extending age-old abstractions in computing.
To provide context on the state of security for CPSs, this document begins with the development of a unifying framework that can be used to identify threats and opportunities for enforcing security policies while providing a systematic survey of the field. This dissertation characterizes the properties of CPSs and typical components (e.g., sensors, actuators, computing devices) in addition to the software commonly used. We discuss available security primitives and their limitations for both hardware and software. In particular, we focus on software security threats targeting memory safety. The rest of the thesis focuses on the design and implementation of novel, deployable approaches to combat memory safety on resource constrained devices used by CPSs (e.g., 32-bit processors and microcontrollers). We first discuss how cyber-physical system properties such as inertia and feedback can be used to harden software efficiently with minimal modification to both hardware and software. We develop the framework You Only Live Once (YOLO) that proactively resets a device and restores it from a secure verified snapshot. YOLO relies on inertia, to tolerate periods of resets, and on feedback to rebuild state when recovering from a snapshot. YOLO is built upon a theoretical model that is used to determine safe operating parameters to aid a system designer in deployment. We evaluate YOLO in simulation and two real-world CPSs, an engine and drone.
Second, we explore how rethinking of core computing concepts can lead to new fundamental abstractions that can efficiently hide performance overheads usually associated with hardening software against memory safety issues. To this end, we present two techniques: (i) The Phantom Address Space (PAS) is a new architectural concept that can be used to improve N-version systems by (almost) eliminating the overheads associated with handling replicated execution. Specifically, PAS can be used to provide an efficient implementation of a diversification concept known as execution path randomization aimed at thwarting code-reuse attacks. The goal of execution path randomization is to frequently switch between two distinct program variants forcing the attacker to gamble on which code to reuse. (ii) Cache Line Formats (Califorms) introduces a novel method to efficiently store memory in caches. Califorms makes the novel insight that dead spaces in program data due to its memory layout can be used to efficiently implement the concept of memory blacklisting, which prohibits a program from accessing certain memory regions based on program semantics. Califorms not onlyconsumes less memory than prior approaches, but can provide byte-granular protection while limiting the scope of its hardware changes to caches. While both PAS and Califorms were originally designed to target resource constrained devices, it's worth noting that they are widely applicable and can efficiently scale up to mobile, desktop, and server class processors.
As CPSs continue to proliferate and become integrated in more critical infrastructure, security is an increasing concern. However, security will undoubtedly always play second fiddle to financial concerns that affect business bottom lines. Thus, it is important that there be easily deployable, low-overhead solutions that can scale from the most constrained of devices to more featureful systems for future migration. This dissertation is one step towards the goal of providing inexpensive mechanisms to ensure the security of cyber-physical system software