Modern processors are highly optimized systems where every single cycle of computation time matters. Many optimizations depend on the data that is being processed. Microarchitectural attacks leak this data (side channels) or exploit physical imperfections to take control of the entire system (fault attacks). In my thesis (D. Gruss. Software-based Microarchitectural Attacks. PhD thesis, Graz University of Technology, 2017), I improved over state of the art in microarchitectural attacks and defenses in three dimensions. I cover these briefly in this summary. First, I show that attacks can be fully automated. Second, I present several novel previously unknown side channels. Third, I show that attacks can be mounted in highly restricted environments such as sandboxed JavaScript code in websites, and on any computer system including smartphones, tablets, personal computers, and commercial cloud systems. These results formed one of the corner stones for attacks like Meltdown (M. Lipp et al. Meltdown: Reading kernel memory from user space. In USENIX Security Symposium, 2018) and Spectre (P. Kocher et al. Spectre attacks: Exploiting speculative execution. In S&P, 2019) which were discovered months after the thesis was concluded.
Introduction
The idea of learning the secret code for a safe by listening to the clicking sounds of the lock, is likely as old as safes are. The clicking sound is an inadvertent influence on the environment revealing secret information. In 1996, Kocher [13] described side-channel attacks, a technique that allows to derive secret values used in a computation *Corresponding author: Daniel Gruss, Graz University of Technology, Institute of Applied Information Processing and Communications, Inffeldgasse 16a, 8010 Graz, Austria, e-mail: daniel.gruss@iaik.tugraz.at from inadvertent influences the computation has on its environment. This seminal work was the beginning of an entire area of research on side channels. Kocher performed what we now describe as a timing attack, an attack exploiting differences in the execution time of an algorithm. In the following years, side-channel attacks have been demonstrated based on virtually any measurable environmental change caused by various types of computations, such as power consumption, electro-magnetic radiation, temperature, photonic emission, acoustic emissions, and many more. These attacks have in common that they require an attacker to have some form of physical access to the target device.
In contrast to side-channel attacks, which do not cause any damage to the target device, there are also fault attacks. In a fault attack an attacker tries to manipulate computations of a device to either evade security mechanisms of the device or to leak its secrets. For this purpose, the attacker manipulates the environment in a way that influences the target device. Typically such fault-inducing environments are at the border of or beyond the specification range of the target device. Like for side-channel attacks, different environment manipulations have been investigated, such as exposure to voltage glitching, clock glitching, extreme temperatures, or photons. Again, to perform a fault attack, some form of physical access to the target device is required.
Modern computer systems are highly complex and highly optimized. Consequently, information leakage, the inadvertent influence of the environment in a secretdependent way, is not only introduced on an algorithmic level. Optimizations are performed based on the specific data values that are processed, the location of the data, the frequency of accesses to locations, and many other factors. It is clear that any adversary observing effects of these optimizations through a side channel can make deductions on the specific cause of the optimizations. Through these deductions, the adversary learns information about the secret data values that are processed.
In my thesis [1] , I investigated software-based microarchitectural attacks. Software-based microarchitectural side-channel attacks exploit timing and behavior differences that are (partially) caused through microarchitectural optimizations, i. e., differences that are not architecturally documented. Software-based microarchitec-tural fault attacks induce faults through microarchitectural optimizations, i. e., operate elements of modern computer systems at the border of or beyond their specification range. Generally, software-based microarchitectural attacks do not require physical access, but instead only some form of code execution on the target system.
In a nutshell, we investigate attacks which can read sensitive data, in extreme cases even from websites, unnoticed from the user (side-channel attacks), or even take over the control of the entire system (fault attacks). Finding these attacks is essential to identify the attack surface and to design countermeasures. Microarchitectural attacks are highly sophisticated and require a comprehensive background in the areas of operating system development and processor architectures. The relevance and the scientific contribution of my thesis also became evident through Meltdown [15] and Spectre [12] , for which my thesis is a cornerstone.
Background
Cache attacks are the most prominent class of softwarebased microarchitectural attacks. The possibility of timing differences induced through processor caches was first described by Kocher [13] . Cache timing attacks have first mostly been applied on cryptographic algorithms in software-based attacks.
Cache attacks in more recent works are usually instances of three generic cache attack techniques. These techniques have been used in targeted attacks on cryptographic algorithms and were later on generalized by Osvik et al. [18] and Yarom et al. [26] . These generic techniques are independent of the specific cache and hardware on which they are performed. Osvik et al. [18] described two generalized cache attack techniques. First, Evict+Time, where an attacker measures how the execution time of an algorithm is influenced by evicting a chosen cache set. Second, Prime+Probe, where an attacker measures whether a victim computation influences how long it takes to access every way of a chosen cache set.
In both attacks the attacker learns that the chosen cache set was used by the victim. Yarom et al. [26] introduced the third generalized attack technique, Flush+ Reload. In a Flush+Reload attack, the attacker flushes a shared memory location from the cache and subsequently measures how long it takes to reaccess it. If the victim loaded the shared memory location back into the cache in the meantime, the reaccess is faster. In a Flush+Reload attack the attacker does not only learn which cache set was used by the victim, but even the specific memory location (at the granularity of cache lines).
Based on these three attack primitives various computations have been attacked, for instance cryptographic algorithms [26] , web server function calls [27] , user input [8, 7, 17] , kernel addressing information [9, 5] .
Software-based fault attacks are considerably more difficult to build in practice as faults must be induced in hardware. Hence, software has to move the system component that is targeted to the border of or beyond its specification range. Only in 2014 software-based fault attacks have been found to be practical, in the so-called Rowhammer attack [11, 23] . In concurrent work, Karimi et al. [10] demonstrated a second software-based fault attack. They showed that a carefully crafted instruction stream can deteriorate the processor stability and cause severe permanent damage to the processor if executed continuously for weeks. Rowhammer attacks have by now been demonstrated in JavaScript [6] , on supposedly safe DDR4 [19] , on co-located virtual machines [20] , and on mobile devices [25] .
Contributions of the thesis
To develop and evaluate potential countermeasures against software-based microarchitectural attacks, it is necessary to map and understand the attack surface in detail. In my thesis [1] , I aimed to improve the general understanding of the attack surface of software-based microarchitectural attacks and to provide novel insights to software-based microarchitectural attacks and attack vectors. Our research includes the minimization of requirements, the automation of previous attacks, and the identification of previously unknown side channels.
I started the work on software-based microarchitectural attacks by enhancing the Flush+Reload cache attack technique [26] . Previous cache attacks required manual identification of vulnerabilities, i. e., data accesses or instruction execution depending on secret information. In my thesis [1] , I introduced the technique Cache Template Attacks [8] . Cache Template Attacks allow us to profile and exploit cache-based information leakage of any program automatically, without prior knowledge of specific software versions or even specific system information. They can be executed online on a remote system without any prior offline computations or measurements. Cache Template Attacks consist of two phases. In the profiling phase, we determine dependencies between the processing of secret information, e. g., specific key inputs or private keys of cryptographic primitives, and specific cache accesses. In the exploitation phase, we derive the secret values based on observed cache accesses.
We studied various applications of Cache Template Attacks. Our automated attack on the T-table-based AES implementation of OpenSSL is as efficient as state-of-the-art manual cache attacks. However, our results also show that an attacker can infer highly accurate keystroke timings on Linux as well as Windows. For Linux distributions we even demonstrated a fully automatic keylogger that significantly reduces the entropy of passwords from log 2 (26) = 4.7 bits per character to 1.4 bits per character. The underlying cache template matrix is shown in Figure 1 . From this we can conclude that cache-based side-channel attacks are an even greater threat for today's computer architectures than assumed so far. In fact, even sensitive user input, like passwords, cannot be considered secure on machines employing CPU caches. We argue that fundamental concepts of computer architectures and operating systems enable the automatic exploitation of cache-based vulnerabilities. We observed that many of the existing countermeasures do not prevent such attacks as expected. In particular, it is not sufficient to protect only specific cryptographic algorithms like AES. More generic countermeasures will be necessary to counter the threat of automated cache attacks. The fact that cache attacks can be launched automatically marks a change of perspective, from a more academic interest towards practical attacks, which can be launched by less sophisticated attackers.
While caches buffer the comparably slow DRAM, the DRAM itself buffers the even slower hard disk. Hence, side-channel attacks are also possible on the DRAM level.
Suzaki et al. [24] demonstrated a side-channel attack on page deduplication, performed by the operating system or hypervisor, which reveals whether specific data can be found in memory. Therefore, it was considered harmful in public clouds, but still considered safe to use in a private environment, i. e., private clouds, personal computers, and smartphones. We were the first to demonstrate that page deduplication attacks can even be performed from JavaScript. Unlike previous attacks, our attack does not require the victim to execute an adversary's program, but simply to open a website which contains the adversary's JavaScript code. We are not only able to determine which applications are running, but also specific user activities, for instance, whether the user has specific websites currently opened. The attack works on servers, personal computers and smartphones, and across the borders of virtual machines. This part of my work shows that page deduplication must always be considered vulnerable [2] . Systems which have page deduplication enabled cannot be considered secure anymore. The fact that page deduplication attacks can be launched through websites marks a paradigm shift, from a targeted attack on a specific system towards large-scale practical attacks launched on a huge number of devices simultaneously.
Based on these two works I investigated the possibility of Rowhammer attacks [11, 23] from JavaScript integrated into websites. Rowhammer violates the fundamental security assumption, that a memory location can only be modified by processes that may write to this memory location. However, parasitic effects in DRAM can change the content of a memory cell without accessing it, but by accessing other memory locations in a high frequency. This socalled Rowhammer bug occurs in most of today's memory modules and has fatal consequences for the security of all affected systems, e. g., privilege escalation attacks. All previous studies and attacks related to Rowhammer relied on the availability of a cache flush instruction in order to cause accesses to DRAM modules at a sufficiently high frequency.
We overcome this limitation by defeating complex cache replacement policies. We showed that caches can be forced into fast cache eviction to trigger the Rowhammer bug with only regular memory accesses. Ours is the first work to investigate eviction strategies to defeat complex cache replacement policies. This does not only enable to trigger Rowhammer in JavaScript, it also benefits research on cache attacks as it allows to perform attacks on recent and unknown CPUs fast and reliably. Existing countermeasures fail to protect against this new Rowhammer attack.
Our fully automated attack runs in JavaScript through a remote website and can gain unrestricted access to sys- tems [6] . The attack technique is independent of CPU microarchitecture, programming language and execution environment. We showed that the attack works on off-theshelf systems. The majority of DDR3 modules are vulnerable and DDR4 modules can be vulnerable too. Thus, it is important to discover all Rowhammer attack vectors. Automated attacks through websites pose an enormous threat as they can be performed on millions of victim machines simultaneously.
Proposed countermeasures against cache attacks assume that they cause more cache hits and cache misses than benign applications and use hardware performance counters for detection. To show that this assumption does not hold, I developed a new cache attack called Flush+ Flush [7] . The Flush+Flush attack only relies on the execution time of the flush instruction, which depends on whether data is cached or not. The Flush+Flush attack is a novel cache attack that, unlike any other cache attack, performs no memory accesses. Instead, it relies only on the execution time of the flush instruction to determine whether data is cached. Thus, it causes no cache misses at all and the number of cache hits is reduced to a minimum due to the constant cache flushes. For the same reason, Flush+Flush does not trigger prefetches and thus is applicable in more situations than other attacks. As the attack causes no cache misses, detection mechanisms based on performance counters to monitor cache activity fail, as their underlying assumption is incorrect. The Flush+ Flush attack runs in a higher frequency and thus is faster than any existing cache attack. With 496 KB/s in a crosscore covert channel it is 6.7 times faster than any previously published cache covert channel. As also shown in Figure 2 , Flush+Flush has a lower accuracy than Flush+ Reload, but a higher accuracy than Prime+Probe. To prevent the Flush+Flush attack we proposed small hardware modifications. Making the clflush instruction constanttime has no measurable impact on today's software and does not introduce any interface changes. Thus, it is an effective countermeasure that should be implemented. The experiments in this paper broadened the understanding of the internals of modern CPU caches. Beyond the adoption of detection mechanisms, the field of cache attacks benefits from these findings, both to discover new attacks and to be able to prevent them.
Intel x86 CPUs have gained a significant amount of attention among the scientific community and powerful techniques to exploit cache side channels have been developed. However, modern smartphones use one or more multi-core ARM CPUs that have a different cache organization and instruction set than Intel x86 CPUs. These ARM CPUs typically have no user-space flush instruction and do not share last-level caches like Intel x86 CPUs. Consequently, no cross-core cache attacks have been demonstrated on non-rooted Android smartphones. We solved the key challenges that obstructed these attacks so far and demonstrated Prime+Probe, Flush+Reload, Evict+Reload, and Flush+Flush on non-rooted ARM-based devices without any privileges [14] . Our attacks are the first cross-core and cross-CPU attacks on ARM CPUs. Based on our techniques, we demonstrate covert channels that outperform state-of-the-art covert channels on Android by several orders of magnitude. Moreover, our attack techniques provide a high resolution and a high accuracy, which allows monitoring singular events such as touch and swipe actions on the screen, touch actions on the soft-keyboard, and inter-keystroke timings. Consequently, we can also derive the lengths of words entered on the touchscreen. Eventually, we are the first to attack cryptographic primitives implemented in Java. We show that efficient state-of-theart key-recovery attacks can be mounted against the default AES implementation that is part of the Java Bouncy Castle crypto provider. We also show that cache activity in the ARM TrustZone can be monitored from the normal world. The techniques we present can be used to attack hundreds of millions of Android devices. We are convinced that launching our proposed attack against libraries and apps, will reveal numerous further exploitable information leaks. Our attacks are applicable to hundreds of millions of today's off-the-shelf smartphones as they all have very similar if not identical hardware. This is especially daunting since smartphones have become the most important personal computing devices and our techniques significantly broaden the scope and impact of cache attacks.
As a novel side channel I investigated the behavior of prefetch instructions. I found that we can use these instructions to attack modern operating systems. Modern operating systems use hardware support to protect against control-flow hijacking attacks such as code-injection attacks. Typically, write access to executable pages is prevented and kernel mode execution is restricted to kernel code pages only. However, current CPUs provide no protection against code-reuse attacks like ROP. ASLR is used to prevent these attacks by making all addresses unpredictable for an attacker. Hence, the kernel security relies fundamentally on preventing access to address information.
I developed Prefetch Side-Channel Attacks [5] , a new class of generic attacks exploiting major weaknesses in prefetch instructions. These timing differences originate from a second cache hierarchy in modern processors for page table entries, which exists besides the unified cache hierarchy. I found that prefetch instructions have a different execution time based on the state of these page translation caches. Even worse, the x86 prefetch instructions allow unprivileged processes to prefetch privileged memory into the cache. These new attacks allow unprivileged local attackers to completely bypass access control on address information and thus to compromise an entire physical system by defeating SMAP, SMEP, and kernel ASLR. Our attacks work in native and virtualized environments alike. Prefetch can fetch inaccessible privileged memory into various caches on Intel x86. It also leaks the translation-level for virtual addresses on both Intel x86 and ARMv8-A.
We introduced two primitives that build the basis of our attacks. First, the translation-level oracle, exploiting the prefetch execution time information. Second, the address-translation oracle, exploiting the lack of privilege checks.
The translation-level oracle allowed us to defeat ASLR and locate libraries and drivers in inaccessible memory regions. Using the address-translation oracle, we were able to resolve virtual addresses to physical addresses on 64-bit Linux systems and from unprivileged user programs inside an Amazon EC2 virtual machine.
We built three attacks exploiting these primitives. Our first attack retrieves an exact image of the full paging hierarchy of a process, defeating both user space and kernel space ASLR. Our second attack resolves virtual to physical addresses to bypass SMAP on 64-bit Linux systems using a ret2dir-style attacks. Based on both oracles, we demonstrated how to defeat kernel ASLR on Windows 10, providing the basis for ROP attacks on kernel and driver binary code. We demonstrated this from unprivileged user programs on Linux and inside Amazon EC2 virtual machines. Finally, as a countermeasure I proposed a new form of strong kernel isolation to protect commodity systems. Stronger kernel isolation, now also known by its practical implementations, e. g., KAISER [4] or KPTI, isolates user space and kernel space into separate address spaces as illustrated in Figure 3 . This countermea- sure only requires a few modifications in operating system kernels and that the performance penalty is as low as 0.06-5.09 %.
Other contributions during the thesis
During the work on my thesis, I contributed to several other works that are not included as a part of the thesis. Nonetheless, I discuss them here to draw the complete picture of all contributions. While working on the Rowhammer attack I observed timing differences caused by so-called row hits and row conflicts in the DRAM module. To get a better understanding of these timing differences we developed a fully automated method to reverse-engineer the mapping of physical addresses to DRAM cells in software [19] . Using these reverse-engineered mappings reduces the runtime of Rowhammer attacks significantly. Investigating the timing differences in more detail, I found significant sidechannel leakage that is comparable to that of cache attacks. Our previous works on Rowhammer, cache eviction on ARM Cortex-A, and DRAM reverse-engineering systems, also sparked the idea of performing Rowhammer attacks on Android devices [25] .
Related work on software-based microarchitectural side channels typically discusses the capacity of a side channel based on the raw capacity of a covert channel built on top of it. Due to the nature of side channels, these covert channels are not error-free. Previous work claimed that straightforward application of error correcting codes is sufficient to eliminate all errors. Thus, to provide realistic estimates the error rate is taken into account to com-pute a real-world capacity for the channel. Investigating how realistic these estimates are, we built an entirely errorfree covert channel. We found that the application of error correcting codes is possible but has to be combined with other error detection techniques in a non-trivial way. Our channel is so reliable that we can even tunnel an SSH connection through it [16] .
Many microarchitectural attacks could generally run in JavaScript, but require high-precision timers. We investigated high-precision timing sources in JavaScript and found techniques which allow to mount reliable attacks. We demonstrate this by building a covert channel through DRAM between JavaScript running on a website and an unprivileged application running inside a virtual machine [22] .
A new feature in modern Intel processors is Intel SGX, an environment for secure execution on untrusted hardware and operating systems. SGX enclaves are highly secure and can generally not be inspected or monitored by the operating system. However, they are also restricted environments, which cannot perform any system calls directly. We investigated whether it is possible to exploit the security features to protect malicious software running inside an SGX enclave. We built cache side-channel attacks extracting cryptographic keys from the host or from colocated SGX enclaves [21] .
We investigated possible countermeasures against attacks on address-translation caches. Our solution, called KAISER, is a practical extension for the Linux kernel, which eliminates the leakage entirely while having a low performance overhead on modern processors [4] . Most operating systems today have implemented this mechanism to protect against Meltdown [15] .
Finally, we also investigated generic countermeasures against cache side-channel attacks. Modern Intel processors implement hardware-transactional memory on top of the cache hierarchy. Through creative instrumentation we can use hardware-transactional memory to abort upon conflicting memory operations and cache misses. This effectively eliminates the leakage which is exploited in cache attacks [3] .
Conclusions
We can draw conclusions on four different axis from my thesis and the corresponding publications.
First, microarchitectural attacks can be widely automated. Automation provides any unsophisticated user with the ability to perform microarchitectural attacks. It also enables more large scale attacks. Future work will likely investigate automation of microarchitectural attacks in further detail.
Second, unknown and novel side channels are very likely to exist and to be found. We already showed that modern microarchitectures expose several previously unknown side channels, such as the clflush instruction [7] , the DRAM [19] , or prefetch instructions [5] .
Third, it is possible to reduce and minimize requirements of known attacks to a point where they can be performed in highly-restricted and sandboxed environments. We have shown this in our work on Rowhammer attacks in JavaScript [6] and in our work on page deduplication attacks in JavaScript [2] .
Fourth, constructing both effective and efficient countermeasures is a difficult task. Research often overambitiously aims to find universal countermeasures against microarchitectural attacks, ignoring that the various attacks have vastly different requirements and properties [7, 19] . At the core of microarchitectural attacks is usually a temporal or behavioral difference that is intended by the processor manufacturer to optimize the performance. Hence, it might be difficult to always find a universal countermeasure that does not degrade the performance [5] , since security and performance are often contradicting each other.
