Recent Spectre attacks exploit hardware speculative execution to read forbidden data. The attacks speculatively load forbidden data in misspeculated paths creating a side channel via the microarchitectural state which is not cleaned up after a misspeculation. The side channel then leaks the data. We focus on the most-challenging Spectre variant (Spectre-v1) which exploits sandboxing through bounds checking. Because the forbidden data can be accessed in only three ways only one of which remains challenging (Spectre-v1), whereas the data can be leaked through numerous side channels all of which must be plugged, preventing the access in the first place is more practical. Recent hardware schemes plug some side channels but incur significant complexity and performance loss and remain susceptible to other side channels. Most current software mitigations are architecturedependent, have performance or semantic uncertainty problems, or both. We propose a compiler-based mitigation, called Secure Automatic Bounds Checking (SABC), which uses a simple sequence of three instructions to prevent forbidden access. The instructions have straightforward semantics and are found in all 32-and 64-bit architectures. An alternative, architecture-independent technique that leverages process boundaries-site isolation -incurs 1.8x memory overhead and 30% performance overhead over the baseline with no isolation. SABC is architecture-independent, has assured semantics, incurs little performance overhead, and renders current and future side channels useless for Spectre-v1.
Introduction
Speculative execution-based security attacks called Meltdown [18, 31] and Spectre [18, 27] affect most modern computer systems based on high-performance microprocessors. As demonstrated, these serious hardware-based attacks can read the entire kernel or browser memory. Since the original attacks were announced, several variants have been crafted, with probably more to come. While there have been numerous software-based attacks in general (e.g., buffer overflow), these hardware-based attacks have increased the attack surface significantly. While Meltdown (CVE-2017-5754) is not an issue for some microprocessors (e.g. AMD's processor) and can be addressed in others, all high-performance processors are thought to be vulnerable to Spectre variants, which is the focus of this paper.
Fundamentally, these attacks exploit speculative execution, a key feature employed by modern microprocessors for high performance. At a high level, these attacks exploit the facts that (1) control speculation can go down paths that are never executed, which can be leveraged to access secrets that are otherwise inaccessible, and (2) modern architectures clean up the architectural state (e.g., the architectural register and memory state) after a misspeculation, but not the micro-architectural state (e.g., the branch predictors and cache state). Such micro-architectural state, which survives a misspeculation, can act as a side channel that leaks information. The attacks speculatively load forbidden data which results in a misspeculation and leave a trail of the data in the micro-architecture state to be leaked.
Three known Spectre variants are: (1) circumventing bounds check-based software sandboxing (CVE-2017-5753) by exploiting branch prediction to speculatively load forbidden data (e.g., JavaScript speculatively loading the Web browser's data), (2) indirect branch target (or return address [28] )) injection (CVE-2017-5715) from one process to exploit a gadget (i.e., a code sequence) in a different process to speculatively load forbidden data (e.g., a user process fooling the kernel), (3) exploiting speculative store bypass to read a forbidden value. While the third variant (officially known as Spectre-v4/CVE-2018-3639 because Meltdown is Spectre-v3) has not been shown to be practical, the other two have resulted in viable transmission rates of forbidden data.
Completely disabling speculative execution would result in unacceptable performance loss. Modern architectures employ speculation to avoid long latencies by predicting the outcome of slow computation. Because the prediction is highly accurate, such speculation results in high performance. The possibility of the attacks does not suddenly remove long latencies or the need for speculation. Further, while a forbidden access is known to occur in only three ways (the three Spectre variants) among which only Spectre-v1 remains a challenge, the accessed data can be leaked through numerous microarchitectural side channels [12, 14-17, 32, 35, 36, 39, 45, 50] . Approaches that wish to plug the leaks must ensure that all side channels are plugged. Thus, plugging the leak is harder than preventing forbidden speculative access in the first place. In fact, any rolling-back of the micro-architectural state to plug the leak may be susceptible to timing channels [27] . Recent proposals [29, 37, 44, 49] ) plug specific side channels but make complex and intrusive hardware changes, incur performance loss, and remain susceptible to other side channels (Section 6). Instead, our proposal uses a simple three-instruction sequence with little performance loss to prevent Spectre-v1's forbidden access. While the previous schemes are also applicable to Spectre-v2, simply clearing the BTB and RAS at context switches and other techniques [23] prevents forbidden data access, obviating these schemes for Spectre-v2. (Spectre-v4 has not been shown to be practical.)
Because compiler-based automatic bounds checking is a key Spectre-v1 vulnerability, we discuss existing mitigations in that context. For instance, (1) Intel proposes using lfence to essentially limit speculation [23] . Contrary to widely-held belief, we found that lfence's performance loss is negligible when applied only to bounds checks. (2) Expecting a high loss, however, Chrome's V8, the most widely-used JavaScript engine, implements a low-performance-overhead mitigation for x86 where the speculative load array-index or value is zeroed for out-of-bounds loads and kept intact for withinbounds loads [4, 5] (shown later in Section 5.1). Transforming the array-index or value are equivalent for the purposes of mitigation. Unfortunately, the instruction sequence for such a mitigation includes a conditional move whose specification does not preclude prediction of the predicate [40] . An attack can exploit such prediction to circumvent the mitigation just like the bounds checks. (3) This prediction is not hypothetical: ARM recommends a similar mitigation but includes a consumption of speculative data barrier (CSDB) to ensure that any predicted conditional select or move is resolved before the speculative load can execute. Unfortunately, ARM significantly changed CSDB's specification [13] after the initial release [2] while claiming that mitigations based on the initial specification would continue to work. Yet, V8's mitigation for ARM is fundamentally flawed (Section 2.2.2). (4) V8's mitigation for MIPS uses branches whose prediction would defeat the purpose of the mitigation. Consequently, the mitigation had to be withdrawn [43] . (5) Recently, Chrome released a site isolation implementation to separate each Web domain into a different process to prevent speculative out-of-bounds access [38] . However, site isolation is not available for Android, JVMs and eBPF, is nontrivial to optimize, and incurs 1.8x memory overhead. All but the last mitigation are architecture-dependent, have uncertain semantics or applicability problems, or both. While the last one does not have these problems, it incurs high memory and performance overheads (shown later in Section 5.2). We use V8 as a concrete example; however, our ideas are applicable to any compiler.
To address these issues, we propose compiler-based Secure Automatic Bounds Checking (SABC) via a simple threeinstruction sequence that uses only common arithmetic and logical instructions found in all architectures. SABC does not use any conditional instructions or branches, so there is no confusion over prediction. As such, SABC (1) is architectureindependent, (2) has assured semantics, (3) incurs little performance overhead, and (4) renders all current and future side channels useless for Spectre-v1. SABC's instruction sequence is also applicable to manually-inserted bounds checks, as we explain in Section 3.5.
The key implication of architectural independence is that SABC is applicable across all architectures (legacy, current, and future). The simplicity and semantic assurance of our scheme are key for security, especially given that the root cause is the complex behavior of speculative execution and that the current mitigations are complex and uncertain in security guarantees.
Experiments using a modified Chrome V8 to load the Alexa Top 50 [20] Websites on an Ivy Bridge-based laptop with 8 GB memory show that both the current compilerbased mitigations and SABC incur little overhead over the baseline with no mitigation. However, while the current mitigations are architecture-dependent, have uncertain correctness behavior or both, SABC is architecture-independent and has assured semantics. Further, site isolation incurs 1.8x memory overhead and 30% performance overhead over the baseline with no isolation.
Key Context and Current Mitigations
As explained in Section 1, we focus on Spectre-v1 which exploits out-of-bounds speculative loads in the context of bounds checked accesses. 
Spectre-v1's Bounds Check Bypass
In general, Spectre-v1 can bypass any bounds checks, whether inserted automatically (by compilers) or manually (by programmers). In this section, we focus on the challenging context where some external code is run within a "host" process using compiler-inserted bounds checks (e.g., downloaded JavaScript within the browser process or packet filter bytecode within the kernel [33] ). In Section 3.5, we discuss manually-inserted ad-hoc bounds checks (e.g., for input validation at function/kernel call boundaries [39] ).
One way to protect against external code, is to run the external code as a different process, which would automatically provide isolation via virtual memory. However, due to higher performance overheads of this approach (shown later in Section 5.2), the within-process option requires some isolation mechanism other than virtual memory. Sandboxing via compiler-inserted automatic bounds checking is such an alternative where the compiler either proves that an access is within the bounds of the data structure being accessed or inserts dynamic bounds checks before the access (e.g., Chrome's V8 inserts bounds checks when a downloaded JavaScript is interpreted or JIT-compiled dynamically into native binary). Language features help in ensuring that the compiler's bounds checking is correct while incurring low performance overheads. In JavaScript, for instance, there is no pointer arithmetic and arrays are the only dynamically-indexed structures [24] . Note that the sandboxing is key to this context where without such sandboxing, or with faulty sandboxing, the external code can read any location in the host process even in the absence of Spectre-v1. Similarly, even in the case of manually-inserted bounds checks, if the bounds checks are not inserted to protect all relevant accesses, Spectre-v1 is not necessary as information can be stolen directly via unprotected accesses.
In Spectre-v1, the attacker exploits the bounds checking code to load speculatively an out-of-bounds location. Figure 1 illustrates a bounds check bypass attack with the JavaScript attack code (left box, JavaScript source code) as well as the relevant compiler-inserted code for bounds checking that would ordinarily protect against out-of-bounds accesses (right box, MIPS assembly code) arising from the array access in line J4 in the source code. Figure 1 is in the specific context of JavaScript, where the array length is retrieved as a property of the array object (X.length in Figure 1 ). In general, the bounds checking is performed against the known length irrespective of how it is retrieved. The attacker primes the branch predictor to mispredict a branch (line A3) that performs the bounds checking for a load (line A6). By using a few within-bounds indices before the out-of-bounds index (line J1), such misprediction is easy to achieve. The misprediction causes a forbidden, out-of-bounds load to proceed speculatively (lines A4 to A7), though it will be squashed later. The attack code then uses a part of the speculativelyloaded forbidden value as an index for another load (line J5). The later load causes a cache miss and an eviction from the cache which is pre-populated with a specific array. The attack code then probes the cache to determine which of the array entries has been evicted using high-precision timers (not shown in Figure 1 ). The identity of the missing entry reveals the index part of the forbidden value. Other parts of the value and other memory locations (e.g., all of a browser's memory) can be read at viable rates. To mitigate against such timing attacks, browsers have recently taken to disallowing high resolution timers as a response. Such an approach does nothing to prevent forbidden accesses, which is SABC's focus; it merely closes timing based side-channels while leaving other side-channels intact.
Spectre-v2 and Spectre-RSB [28] attacks from one process to another work as follows. The attacker process primes the BTB and RAS predictors by repeatedly transferring control to a gadget which can access and leak the secret via a side channel if executed in the victim process. Obviously, the attacker process cannot directly access the victim's secret. The attacker process then fools the victim process to use the primed predictions and access and leak the secret. This cross-process attack does not make sense in the generic one-process context where the attacker can directly read the secret. But in the context of a browser's process, just as JavaScript cannot read arbitrary locations JavaScript cannot transfer control to arbitrary gadget addresses to prime the predictors. These Spectre variants can be blocked by clearing the BTB and RAS upon context switches and other techniques suggested by Intel [23] , as mentioned in Section 1. The Spectre-RSB paper concedes that the within-process version is not practical. 
Current Mitigations for Spectre-v1
As discussed in Section 1, while plugging the leak is hard due to numerous potential side channels, current mitigations attempt to prevent speculative access to forbidden locations.
Mitigation for x86
Intel proposes using lfence [22] which serializes execution of instructions after an lfence to occur only after the resolution of all speculation before the lfence [23] ( Figure 2 ). Such serialization ensures that misspeculated execution of the out-of-bounds load is prevented, but incurs significant performance loss if applied indiscriminately. Contrary to widely-held belief, however, we found that applying lfence only to bounds-check branches leads to little performance loss (Section 5.1).
Expecting high performance loss for lfence, Chrome V8 implements a mitigation for x86 where the out-of-bound load array-index or value is transformed to remain intact for within-bounds access and to zero for out-of-bounds access [4, 5] as shown in Figure 3 . The code sequence uses a conditional move (cmov in line X7 of Figure 3 ) to choose between a mask of all zeros and all ones based on the bounds checking comparison of the array index and array length. Then an and applies this mask to the load value, called valuebased mitigation (line X9 in Figure 3 ), or to the load address, called address-based mitigation (not shown). However, correctness of the code sequence depends on the hardware behavior of the instructions in the sequence. Unfortunately, the instructions' specification [22] does not preclude prediction of conditional-instruction predicates (e.g., employed by ARM). The mitigation would be defeated if a future x86 implementation employs the prediction. Such uncertainty in the correctness guarantees of security schemes is unacceptable.
Other work has stated that the use of general value prediction [30] would also defeat the mitigation [40] . While none of the commercial microprocessors seem to employ value prediction which would not only defeat the mitigation but also create new Spectre variants. For instance, a speculative attempt to read a secret may result in an accurate value prediction of the secret even if the access itself is prevented.
Mitigation for ARM
ARM recommends a mitigation similar to that for x86 with the key difference of replacing x86's cmov by ARM's conditional select (csel) followed by a consumption of speculative data barrier (CSDB) [3, 13] . The CSDB ensures that the csel's predicate is resolved before the load uses the csel's result.
ARM significantly changed CSDB's specification after the initial release. The initial specification [2] requires (1) an address dependence between the csel and the second load, (2) a data dependence between the first load and the csel for one of its inputs and no dependence for the other input, and (3) csel's condition to choose the latter input if the first load is not executed architecturally (e.g., due to control flow). If these conditions are met, the second load will not affect the cache state due to eviction, preventing determination of the first load's data value. While being highly specific to the leaking of data through the cache, this specification does not cover (1) leaking of data through other channels, (2) prediction of csel predicates, or (3) other value prediction. The new, significantly-different, and simpler specification [13] disallows general value prediction and requires any csel or SVE predicate prediction to be resolved before a data-dependent load. The cache state is unaffected.
V8's initial mitigation for ARM did not follow ARM's updated recommendation [13] and was flawed. As such, V8-on-ARM remained vulnerable for months until updated mitigations were released to fix the flaws, illustrating that complex semantics can lead to flawed mitigations.
Mitigation for Other Architectures
As discussed in Section 1, V8's mitigation for MIPS has been withdrawn because the code uses a branch which would allow speculative execution of the out-of-bounds load.
In addition to the above limitations, these mitigations are architecture-dependent in that each uses instructions unique to its architecture and not available in other architectures.
Site Isolation
Recently Chrome has incorporated site isolation to separate each Web domain into a different process to prevent speculative load of forbidden data. While site isolation is architecture-independent and is secure (by relying on virtual memory protections), it incurs 1.8x higher memory and 30% performance overheads (as we show later in Section 5.2). The argument that the high overheads are acceptable because site isolation is comprehensively bullet-proof against all Spectre variants is not true. Site isolation does not mitigate Spectre's second variant of branch target injection. This variant injects indirect branch target into one process (e.g., a user process or a client) to exploit a gadget (i.e., a code sequence) in a different process (e.g., the kernel or a server) to speculatively load forbidden data. Thus, one process can leak secrets from another process despite virtual memory protections. Moreover, while Chrome has adopted site isolation, other software relying on sandboxing may not (e.g., Java or eBPF). In fact, it is not clear how eBPF can employ site isolation by running the packet filter as a different process that can access the packets in the kernel space. As such, we believe that site isolation does not obviate SABC. 3 Secure Automatic Bounds Checking To address the above problems, we propose architectureindependent Secure Automatic Bounds Checking (SABC) with assured semantics. SABC is a transformation of the load array-index or the load value. For architecture-independence and assured semantics, the transformation should:
• output zero for out-of-bounds loads and keep the values intact (i.e., identity function) for within-bounds loads; • use common arithmetic and logical operations found in all architectures; • not use branches or conditional moves which may be predicted; and • use simple operations to afford assuredness. Figure 4 illustrates SABC. Figure 4 (a) shows the assembly code for the array access with compiler-inserted bounds checks. SABC is implemented using three additional instructions shown in bold (Figure 4(b) ). Instead of piggybacking on the bounds checking comparison of the array index to the array length, our code sequence subtracts the length from the index (i.e., index minus array length, line B4 in Figure 4 (b)), where the result is negative for within-bounds accesses and non-negative otherwise. Our code then arithmetic right-shifts the result by one less than the word-length (in bits) of the result which gives a mask of all ones for within-bounds accesses and of all zeros otherwise (line B5 in Figure 4 (b)). Note that while the logical right shift operation shifts in zeros as the data is shifted right, the arithmetic right shift operation repeats the sign bit as the data is shifted right. The mask is logical-ANDed with the index before the load (address-based SABC, line B6 in Figure 4 (b)) or with the load value immediately after the load (value-based SABC, line B9 in Figure 5 ). In address-based SABC, the result is the original index for within-bounds accesses and zero for out-of-bounds accesses which can no longer speculatively access a forbidden location (Figure 4(b) ). In value-based SABC, the result is the original load value for within-bounds accesses and zero for out-of-bounds accesses (discussed later).
SABC
Our simple instruction sequence is architecture-independent such that we need to verify only this single sequence. And, our assurance analysis is also simple, giving confidence in the security of our scheme which stands in contrast to the complexity of speculative execution and uncertain security of the current mitigations.
Architecture Independence
Our code sequence is a subtract, an arithmetic right-shift and a logical AND, all of which are architecture-independent and have simple semantics. All three instructions are supported in all 32-bit and 64-bit high performance architectures as shown in Table 1 . Unlike architecture-specific approaches such as the x86 specific lfence and ARM-specific csel/csdb, our use of simple, widely-supported instructions makes our solution architecture-independent. Furthermore, the semantics of these instructions are simple and well-understood. One may think that though the high-level instruction semantics are simple, there may be subtle differences in the low-level semantics of these instructions (e.g., how condition code flags are modified). For our analysis below, the flag behavior of the instructions is irrelevant.
Assured Semantics
We examine the two basic SABC options when accessing arrays.
1. Address-based SABC ensures that speculative accesses of the form X[i] (including speculatively executed array accesses) are guaranteed to access array element X[i] (if i is within bounds) and array element X[0] (if i is not within bounds). This option effectively sandboxes the array index (which in-turn sandboxes the address) to make sure that the secret cannot be accessed. 2. Value-based SABC ensures that speculative array accesses of the form X[i] return X[i] (if i is within bounds) and 0 (if i is not within bounds). This option ensures that the secret cannot be read even if the address of the secret can be accessed speculatively. [3] sub asr and Alpha [8] subl sra and Power [19] sub srawi and SPARC [21] sub sra and x86 [22] sub sar/sarx and PA-RISC [25] sub ext/dep and MIPS [34] sub sra and RISC-V [48] sub sra and
In both approaches, the key requirement is to achieve such conditional behavior (depending on whether the access is within bounds or not) subject to the following conditions. • Conditional behavior may not be achieved using branch instructions which are subject to speculation.
• Some ISAs offer predicated instructions such as conditional instructions. Nominally, the predicate may even appear to be a data value in a register. However, because predicates are generally single-bit entities, some implementations may predict predicates of such conditional instructions. As such, conditional instructions are not an appropriate construct.
In our analysis, we use the following basic assumptions: 1. The first array element is at index zero. (It may employ branch prediction, and prediction of predicates for conditional instructions.) Analysis of address-based SABC: We analyze SABC's behavior in terms of the following two claims. Claim 1: SABC introduces a read-after-write dependence chain from the subtract (line B4 in Figure 4(b) ) to the load that loads the secret (line B9 in Figure 4(b) ). This claim is true by construction.
Claim 2:
The masked index computed in line B6 is either i (if the index is within bounds) or 0 (otherwise). Based on assumptions (1) and (2) , the result of the subtract which computes (i − L) is negative when i < L (i.e., when the index is within bounds). If i = L or i > L the result is zero or positive. Under the standard 2's-complement signednumber representation which is universal to all modern processors, the sign-bit of the difference (destination register $r4 in line B4 of Figure 4(b) ) indicates whether the index is within bounds (i < L) or not (i ≥ L). The arithmetic right shift copies the sign-bit to the entire width of the word. Effectively, the destination register $r4 in line B5 is either all 1s (when the index is within bounds) or all 0s (when the index is out of bounds). Finally, the bitwise AND operation (line B6) ensures that the index is unchanged (as a result of ANDing with all 1s) when the index is within bounds, or is masked to zero (as a result of ANDing with all 0s) when the index is out of bounds. For now, we show SABC code examples assuming non-negative array indices as in JavaScript where indices are effectively treated as unsigned integers [24] . Later in Section 3.4, we show that SABC is easily extended to support two-sided bounds checking with negative indices, as allowed in other languages (e.g., Python [46] ).
In the absence of value prediction (basic assumption 5), Claim 1 guarantees that SABC cannot be bypassed. And Claim 2 shows that SABC prevents out-of-bounds accesses.
Our assumption of lack of general value prediction (1) seems to be true for all commercial microprocessors (we state this assumption clearly nevertheless to avoid any uncertainty) and (2) is needed not only for SABC but for all the previous mitigations except site isolation (Section 2.2). Indeed, as we explain in Section 2.2.1, general value prediction would enable new Spectre variants. The uncertainties caused by value prediction and predicate prediction are qualitatively different: value prediction has not been employed by real systems whereas predicate prediction is employed at least by ARM causing confusion over csel semantics (Section 2.2.2). Because predicate prediction exists, any tightening of conditional instruction specification to rule out predicate prediction would impact existing software. Not surprisingly, ARM and Intel do not tighten the specification of csel and cmov, respectively. In contrast, tightening the specification to rule out value prediction which has not been adopted for over two decades would not impact software.
Extending to Value-based SABC: Value-based SABC is a modest change to address-based SABC (see Figure 5 ). Applying the mask to the loaded value (line B9) ensures that there is a dependence chain from the unmodified index to the masked value. As discussed before, SABC cannot be bypassed In general, SABC's code sequence must be protected from being optimized away in later compiler passes (similar to how the 'volatile' declaration preserves memory accesses from being optimized away). However, in practice, we found that no additional protection was needed.
Two-sided Bounds Checks
While some languages and runtimes interpret array indices as unsigned integers, others allow signed indices (e.g., Python's negative array indices [46] ). We now extend SABC in a straightforward manner to two-sided bounds checking for signed indices.
For languages and managed runtimes that do allow negative indices, there must be two original, automaticallyinserted bounds-checks to verify that the index lies between the upper and lower bounds. Figure 6(a) shows such twosided bounds checking code for the high-level code snippet shown in Figure 1 . In addition to the upper-bound check to verify that the index is smaller than the length of the array (line P4), Figure 6 (a) also includes another check to verify that the index is higher than the lower bound (line P5).
Each of the two bounds-checks can be transformed easily using SABC, as shown in lines Q6 to Q11 in Figure 6(b) . Essentially, SABC's three-instruction protection is applied twice, but the behavior in each case is similar. In each check, SABC ensures that an attempted out-of-bounds access results in the index being masked to zero.
Manually-inserted Bounds Checks
SABC's three-instruction sequence is not dependent on compiler automation; it works for manually-inserted bounds checks as well, provided the sequence is also manuallyinserted at each such bounds check. This requirement burdens the programmer only slightly more than the manual bounds checking in the first place. Attempts to automatically find and secure all manually-inserted bounds checks may be inadvisable without proof of comprehensive coverage. For instance, in the case of the Microsoft Visual C++ (MSVC) compiler's /Qspectre switch, lfence is inserted only at those conditional edges that resemble the attack code patterns, and not at every branch, to reduce lfence's performance overhead. However, the compiler analysis turned out to be inaccurate causing the compiler to catch only a subset of attack patterns [5] .
In contrast, while automatically differentiating between risky and non-risky patterns may be hard for ad-hoc manuallyinserted checks, every memory access in our context is guaranteed to be bounds checked (via the compiler) without the possibility of missing any access.
Branch Target Buffer Aliasing
To be economical, the branch target buffer (BTB) often uses partial tags and partial payload (target). Only a few tag bits are enough to achieve highly-accurate PC-to-BTB-entry match for prediction. Similarly, the payload contains only enough target bits to index into the I-cache (and the I-TLB) and accurately retrieve the target instruction. For prediction verification, the full target PC can be constructed using the I-cache's tag bits. An attacker could train such a BTB to skip over potentially arbitrary instruction sequence, creating a general and powerful attack independent of Spectre-v1. For instance, a "gadget-based" attack could skip from an address computation in a register to an unrelated load that happens to use the register, independent of bounds-checking and SABC. Such an attack could skip over SABC's masking sequence B3-B6 in Figure 4 (b) by introducing code (1) whose PC is aliased to the PC of B3 or a previous instruction, and (2) whose control-flow target is aliased to B7.
The goal of such skipping is that the load at B9 loads with the original, unmasked address after such skipping. Because the address register holds the unmasked address which the skipped sequence (B3-B6) modifies in place, the skipping will leave the unmasked address intact for the load. One way to address this problem is by ensuring that each instruction in the sequence uses as the destination a temporary register different than the source registers so that there is no in-place modification, With such a sequence (shown in Figure 7 ), skipping any subset of the masking instructions, including all of them, would cause the execution to use the stale value(s) left behind in the skipped destination registers. Specifically, the load at B9 would then use such a stale value as the address. Initializing the destination registers (e.g., to zero) to force such skipping to access a predestined location (e.g., X[0] in Figure 4(b) ) instead of a stale address does not work because such initialization can be skipped as well. Unfortunately, the attack code could create register pressure forcing the reuse of a register, holding a dead value corresponding to an unmasked address, as a destination in SABC's masking code. Then, a skipped execution may use the unmasked address as the load address, sidestepping SABC. A similar argument applies to the load-masking variant of SABC.
An assured solution to such BTB aliasing is to use the full tag and payload in the BTB. While this solution increases the Figure 6 . Extending SABC to achieve two-sided bounds checking Figure 7 . Guarding against skipping SABC code BTB area and power, the BTB is small enough that the net area and power impact would be small (e.g., a 2K-entry BTB using 8-bit tags and 16-bit payloads may need 40-bit tags and 48-bit payloads). We emphasize that the BTB aliasing problem exists independent of Spectre-v1 and SABC, and needs an independent solution.
Evaluation Methodology
We implemented SABC in the V8 JavaScript engine [11] which is the engine used in the Chrome and Opera browsers; specifically in the TurboFan optimizing compiler. TurboFan performs automatic bounds checking for array and string accesses and includes range analysis and redundancy elimination to avoid unnecessary bounds checking.
TurboFan also employs layered intermediate representation (IR) with low "machine-level" nodes that are close to hardware instructions. Interestingly, TurboFan's machine level nodes include support for the three operations needed by SABC-subtract, arithmetic right shift, and bitwise ANDing. We leverage the availability of these nodes to implement SABC directly in this low-level IR. Because IR nodes are translated to any supported backend ISA via TurboFan's built-in ISA-specific instruction selection, our implementation effort is also architecture independent. SABC's architecture independence does not depend on IR support for the three key instructions; rather its architecture independence flows from the fact that the ISAs all support the three instructions. Even without IR support, SABC could be implemented in the architecture-specific code generation stage.
In addition to SABC, we also implemented the lfence mitigation in TurboFan. Evaluation: We compared the performance of SABC with other mitigation approaches (site isolation, lfence, and V8's value-masking mitigation) in a browser context. We use Puppeteer [10] to control a headless Chrome instance to load websites. We focus on the JavaScript script duration (i.e., not including network delays), which is one of the metrics that Puppeteer isolates and reports.
When comparing the compiler-based mitigations (e.g., lfence, V8's value masking, and Secure Automatic Bounds Checking (SABC)) we load a single page in Puppeteer and measure the JavaScript runtime. This single-page measurement is adequate as the mitigations do not impact memory footprint. However, when evaluating the performance overhead of site-isolation, it is necessary to consider the impact of other tabs each of which runs as a separate process (or a group of processes if the site serves content from multiple domains), increasing the memory footprint. To that end, we model a set of 10 background tabs with active JavaScript and measure the script duration of 1 foreground tab to account for the browsing trend of many concurrent tabs.
To account for runtime variations in runs for both the compiler-based mitigations and site isolation, we load each webpage 10 times and report both the mean script duration time as well as the standard deviation (shown as error bars). The script duration times measure only the JavaScript runtimes without including network or rendering times. To reduce the variations due to JITting, we discard the first run, which typically incurs browser cache misses and unJITted interpretation, and keep the browser cache warmed up for the later runs (the browser cache caches the JITted code).
Because the compiler-based mitigations and site-isolation use different evaluation techniques (single-page versus multiple concurrent tabs), we show their results separately. Workloads: We perform our experiments using Alexa Top 50 [20] (as of 18th July 2018) which is available for free. Though our goal was to include all 50 websites from Alexa Top 50, we found that Puppeteer encountered repeated errors and crashes with 14 websites due to which we limited our Because SPEC benchmarks have not been shown to be susceptible to Spectre-v1, they are not appropriate for our purposes. Also, Alexa Top 50 represents a workload used by billions in the real world which is a better target than JavaScript benchmarks. Testbed: Our measurements were conducted on a laptop with an Intel Core i7-3635QM processor and 8 GB of DDR3 memory running Windows 10 (a representative consumer browsing configuration). We chose not to make our measurements on more aggressive systems which are not representative of the vast majority of browser users.
Results
We first compare the performance of compiler-based mitigations using lfence, V8's load value masking, and SABC. We then show the performance impact of site isolation. Figure 8 shows the JavaScript runtimes normalized to that of no mitigation (Y-axis) for each of our 36 of the Alexa Top 50 Websites (groups of bars numbered 1 to 36 on the X axis) and for the mitigation approaches based on lfence, V8's value masking ( Figure 3 ) and SABC (bars within each group). The X-axis order preserves the relative ordering of Alexa Top 50 (i.e., a website with a lower X-axis label is more popular than one with a higher X-axis label), the website numbering may not represent the website's true Alexa rank because of the 14 dropped websites. Figure 8 also shows the typical variation by including one-standard-deviation error ranges from our 10-run measurements.
Compiler-based Mitigations
The key observation from Figure 8 is that the various mitigation approaches all have negligible performance impact on website performance. The masking/barrier techniques are all indistinguishable as indicated by the error ranges. Because V8's value masking and SABC both use lightweight instruction sequences, it is not surprising that they are similar to the no-mitigation case. However, lfence is a rather heavyweight mechanism that is documented to have higher performance penalties overheads when applied indiscriminately to all branches in microbenchmarks [5] . In this experiment, we apply lfence only to the bounds checks branches and not the other branches. We believe that the performance behavior is due to website access patterns (which may not be heavy with indexed accesses of arrays/strings) and significant overhead incurred by JavaScript dynamic typing (which may reduce performance and thus reduce lfence's relative impact). For instance, we found that a simple loop with a single array access has only a few tens of instructions when compiled from C but many hundreds of instructions when compiled from JavaScript.
For some websites, most notably websites #17 and #26, the no-mitigation case appears to be slower than some or all of the mitigations. For all such websites, however, the error ranges of all the schemes overlap considerably for the slowdowns to be statistically insignificant.
Note that SABC is not meant to be faster; rather our key claim is that while being secure and architecture-independent SABC is not slower than the no-mitigation case. Recall from Section 2.2 that the other mitigations are architecture-dependent, have uncertain correctness behavior due to incomplete specification, or both.
Finally, we found that the code bloat for the standalone JavaScript benchmarks/microbenchmarks included in the V8 distribution was less than 1%, on average (there are no tools to inspect Alexa 50 Website Javascript code).
Performance Impact of Site Isolation
Site-isolation leverages OS-protection of process boundaries to prevent Spectre attacks. While we have previously discussed the qualitative reasons why site-isolation does not remove the need for fast, efficient bounds checking (Section 2.2.4), we focus on the performance impact of site isolation in this section.
Site isolation fundamentally increases the number of processes needed (typically, from one to a few tens). Indirectly, the increase in the number of processes places pressure on If the pressure on the memory hierarchy increases beyond (cache or memory) capacity, there will be thrashing which can hurt performance. Figure 9 shows the memory bloat due to site isolation (relative to the memory used without site isolation). Here, we use 10 concurrent background tabs and one foreground tab (Section 4 ). We measure the memory bloat for all the tabs together. We use the same background tabs as we vary the foreground tab chosen from our 36 Alexa Top 50 Websites (X axis). Across the board, we found a 1.8x memory bloat (geometric mean) with site isolation than without.
Memory bloat is not free of consequences. Especially in typical client machines, the memory increase can cause thrashing and consequent performance degradation. Figure 10 plots the normalized JavaScript runtime (Y-axis) for the foreground tab loading each of our 36 Alexa Top 50 Websites (groups of bars on the X axis). Further, we also show the geometric mean of all websites in the rightmost set of bars. We include error ranges corresponding to one standard deviation for each individual website.
On average, site isolation has a significant impact on performance with a 30% mean slowdown with worst-case slowdowns as high as (approximately) 2.2x. As in Figure 8 , here also we see some websites for which site isolation appears to be slightly faster than no isolation (e.g., websites #4, #5, #14, and #25). However, as before, the overlap in the error ranges rule out any statistical significance of such speedups. Further, site isolation also incurs high variation in JavaScript runtime (as shown by the long error ranges). Note that beyond the 30% slowdown in mean JavaScript runtime, high variation is also bad for user experience as it implies that browsing speed varies significantly from run-to-run.
Security Validation
To ensure that SABC is effective, we applied and verified its masking approach to a JavaScript-based, proof-of-concept (PoC) Spectre attack implementation [42] . The PoC implementation does not bypass true compiler-inserted bounds checks; rather it demonstrates that user-inserted bounds checks for array bounds can be bypassed. Because SABC is developed as a mechanism for automatic bounds checking and not manually-inserted bounds checking, it is not possible to use our compiler implementation to secure the PoC implementation using SABC. As such, we manually inserted SABC in to the PoC code. We confirmed that the unmodified PoC code could access and leak secrets. In contrast, the PoC code did not leak any secrets when secured with SABC.
While the original Spectre-v1 [27] uses a cache-miss-based side channel, we devised, implemented, and verified a wayprediction-based and a cache-coherence-based side channel to illustrate that plugging a specific side channel as done in previous work [29, 37, 44, 49] is insufficient. We do not show the details of the new side channels due to lack of space.
Related Work
We have discussed key related approaches to masking and isolation throughout the paper. In addition, we briefly discuss some related work that was not previously described.
Cryptographic computations avoid conditional branches -using instead arithmetic/logic instructions -to avoid information leak via timing differences in the 'if' and 'else' paths [1, 6, 41] . SABC avoids prediction, unrelated to timing. Sgxpectre [7] , a Spectre variant which breaches SGX enclaves running in the same address space as the victim process, relies on branch-target-injection (like Spectre-v2) to breach SGX and is not related to bounds checking, our focus.
In contrast to our approach of preventing the forbidden access, recent hardware work attempts to plug specific side channels from leaking the secret. Recall that all side-channels must be closed for an architecture to be secure.
InvisiSpec [49] prevents all cache-based side channels through loads (both hits and misses) in cache-coherent shared memory. However, InvisiSpec does not prevent attacks based on speculative stores (e.g., SpectrePrime [45] which can leak Figure 10 . Performance Impact of Site Isolation micro-architectural state using coherence), non-cache based side channels (e.g., AVX-side-channels [39] , or memory sidechannels [36] ). Further, InvisiSpec incurs two shared memory accesses for every load and requires significant changes to the coherence protocol, degrading performance, energy and complexity.
Conditional Speculation [29] prevents cache-miss based side channels by stalling speculative cache misses in a specific code pattern (loads dependent on loads). The scheme would fail to prevent SpectrePrime-like attacks [45] based on speculative stores and would incur false positives and performance loss in memory-intensive workloads. The scheme changes the hazard logic, issue queue, load-store queue, and cache, degrading complexity and energy.
Context-Sensitive Fencing [44] reduces the performance impact of fences by preventing speculative cache hits from changing any metastate and converting speculative cache misses into uncacheable accesses. In addition to being susceptible to memory-based side channels, the scheme does not specify its interaction with coherence and may incur the issues faced by InvisiSpec.
Another scheme [37] prevents conflict-based side channels across different processes by encrypting the cache index to randomize the address-to-set mapping. To be leak-proof, the scheme continually changes the encryption keys, keeping two keys active at a time. However, this scheme incurs two cache lookups for every access, degrading performance and energy and does not address Spectre-v1.
DAWG [26] proposes to isolate the cache state (and metastate) of different protection domains to eliminate the cachebased side-channel. However, DAWG's protection domains, which are tied to processes, cannot handle single-process Spectre-v1. Further, the authors concede that there may be other side-channels in processors (e.g., [47] ) to leak data.
VirtualGhost [9] proposes a compiler-based solution to check if the upper address bits of a memory page match a statically-known bit-pattern. While this scheme blocks certain shared-page based Spectre-v1 attacks, it does not address bounds checking.
Instead of the performance and complexity overheads of the above schemes which plug some side channels but not others, Secure Automatic Bounds Checking (SABC) employs a simple three-instruction sequence to prevent Spectre-v1's forbidden access in the first place.
Conclusion
In browser-based Spectre attack called Spectre-v1, because the data can be leaked through numerous side channels all of which must be plugged, it is more practical to prevent the access in the first place. We proposed a compiler-based mitigation, called Secure Automatic Bounds Checking (SABC), to prevent forbidden access in Spectre-v1 using a sequence of three simple arithmetic and logical instructions which have straightforward semantics and are architecture-independent (i.e., they are found in all 32-and 64-bit architectures). The architecture independence implies that only one sequence of instructions needs to be verified and the simple semantics implies certainty of behavior. This simplicity is key for security given that the complexity of speculative execution is the root cause of Spectre. In addition to architecture independence and assured semantics, our mitigation incurs little performance overhead over a baseline with no mitigation. In contrast, site isolation incurs high memory (1.8x) and performance (30%) overheads. By preventing Spectre-v1's forbidden access, SABC renders current and future side channels useless for Spectre-v1.
