The recent Spectre attacks has demonstrated the fundamental insecurity of current computer microarchitecture. The attacks use features like pipelining, out-of-order and speculation to extract arbitrary information about the memory contents of a process. A comprehensive formal microarchitectural model capable of representing the forms of out-of-order and speculative behavior that can meaningfully be implemented in a high performance pipelined architecture has not yet emerged. Such a model would be very useful, as it would allow the existence and non-existence of vulnerabilities, and soundness of countermeasures to be formally established.
I. INTRODUCTION
The wealth of vulnerabilities that have followed on from Spectre and Meltdown [1] , [2] have provided ample evidence of the fundamental insecurity of current computer microarchitecture. The extensive use of instruction level parallelism in the form of out-of-order and speculative execution has produced designs with side channels that can be exploited by attackers to learn sensitive information about the memory contents of a process. One witness of the subtlety of the issues is the more than 50 years passed since pipelining, caching, and out-oforder execution, cf. IBM S/360, was first introduced.
Another witness is the fact that two years after the discovery of Spectre, a comprehensive understanding of the security implications of pipeline related microarchitecture features has yet to emerge. One result is the ongoing arms race between researchers discovering new Spectre-related vulnerabilities [3] , and CPU vendors providing patches followed by informal arguments [4] . The security and effectiveness of the currently proposed countermeasures is unknown, and there are continuously new vulnerabilities appearing that exploit specific microarchitecture features. We believe that this is due to the lack of comprehensive hardware models that capture relevant microarchitecture features underpinning these vulnerabilities.
Recent research [5] - [8] proposes the use of formal microarchitectural models and suggest using information flow analysis to identify information leaks arising from speculative execution in a principled manner. These models capture specific speculation features, e.g, branch prediction, and variants of Spectre, in particular variant 1, and design analyses that detect known attacks [5] , [6] , [9] . While these approaches illustrate the usefulness of formal models in analyzing microarchitecture leaks, features lying at the heart of modern CPUs such as out-of-order execution and many forms of speculation remain largely unexplored, implying that new vulnerabilities may still exist.
Contributions: This work presents InSpectre, the first comprehensive model capable of capturing out-of-order execution and all forms of speculation that can be meaningfully implemented in the context of a high performance pipeline. The model is intentionally very general and provides an infrastructure to define models of real CPUs (Section III).
Our first contribution is a novel semantics supporting modern microarchitectural features such as out-of-order execution, non-atomicity of instructions, and various forms of speculation, including branch prediction, jump target prediction, return address prediction, and dependency prediction. Additionally, the semantics supports intricate features such as address aliasing, dynamic references, store forward, and outof-order memory commits, which are necessary to model all known (sub-) variants of Spectre. Specifically, the semantics implements the stages of an abstract pipeline supporting outof-order execution (Section IV) and speculative execution (Section V). We validate the semantics by proving memory consistency with respect to the standard program order execution, and show that InSpectre can reproduce all four variants of Spectre. In line with existing work [5] , [6] , the security condition formalizes the intuition that speculation and optimizations should not introduce additional information leaks in a program (conditional noninterference, Section II).
As a second contribution, we use InSpectre to discover three new potential vulnerabilities related to out-of-order execution, speculative fetching, and speculation of dependencies (Section VI). The first attack shows that CPUs supporting only out-of-order execution may leak sensitive information. We discovered the second attack while attempting to validate a CPU vendor's claim that microarchitectures like Cortex A53 arXiv:1911.00868v1 [cs.CR] 3 Nov 2019 are immune to Spectre vulnerabilities because they support only speculative fetching [4] . Our model reveals that this may not be the case. The third vulnerability is a variant of Spectre v4 showing that speculation of a dependency, rather than speculation of a non-dependency as in Spectre v4, between a load and a store operation may also leak sensitive information.
Finally, as a third contribution, we leverage InSpectre to analyze the effectiveness of three existing countermeasures: constant time [10] , Retpoline [11] , and ARM's Speculative Store Bypass Safe [12] (Section VII). We found that constanttime analysis is unsound for CPUs supporting either outof-order or speculation, and propose a provably secure fix that enables constant-time analysis to ensure security for processors that support only out-of-order execution.
The detailed proofs are reported in a technical report [13] and their sketches are reported in Appendix.
II. SECURITY MODEL
Our security model has the following ingredients: (i) an execution model which is given by the execution semantics of a program; (ii) an attacker model specifying the observations of an attacker; (iii) a security policy specifying the parts of the program state that contain sensitive/high information, and the parts that contain public/low information; (iv) a security condition capturing a program's with respect to an execution model, an attacker model, and a security policy. We define each of these ingredients in the remainder of this section.
First, we consider a general model of attacker that observes the interaction between the CPU and the memory subsystem. This model has been proposed to capture information leaks via cache-based side channels transparently without an explicit cache model. It captures trace-driven attackers that can interleave with the victim's execution and indirectly observe, for instance using Flush+Reload [14] , the victim's cache footprint via latency jitters.
Specifically, an attacker can observe the address of a memory load dl v (data load from memory address v), the address of a memory store ds v (data store to memory address v), as well as the value of the program counter il v (instruction load from memory address v) [15] . We model observations as part of the microarchitectural semantics in Section IV.
We assume a transition relation − → ⊆ States ×Obs ×States to model the execution semantics of a program as a state transformer producing observations l ∈ Obs. The reflexive and transitive closure of − → induces a set of executions π ∈ Π. The function trace : Π → Obs * extracts the sequence of observations of an execution.
The security policy is defined by an indistinguishability relation ∼⊆ States × States on (program) states σ ∈ States. Two states σ 1 , σ 2 are indistinguishable if σ 1 ∼ σ 2 . The relation ∼ determines the security of information that is initially stored in a state, modeling the set of initial states that an attacker is not allowed to discriminate by making observations during a program execution. These states represent the initial uncertainty of an attacker about sensitive information.
The security condition defines the security of a program on target execution model (e.g., speculative model) − → t conditionally on the security of the same program on reference execution model (e.g., in-order model) − → r by requiring that the target model does not leak more information than the source model for a security policy ∼.
Definition 1 (Conditional Noninterference): Let P be a program and ∼ a security policy. Let also − → t and − → r be transition relations for the target execution model and reference execution model, respectively. Then P is conditionally noninterferent iff for all σ 1 , σ 2 ∈ States such that σ 1 ∼ σ 2 if for every π 1 = σ 1 − → r · · · there exists π 2 = σ 2 − → r · · · such that trace(π 1 ) = trace(π 2 ) then for every ρ 1 = σ 1 − → t · · · there exists ρ 2 = σ 2 − → t · · · such that trace(ρ 1 ) = trace(ρ 2 ). Observe that conditional noninterference only considers the new information leaks that may be introduced by executing the program with transition relation − → t , and it ignores any leaks that might have been present when executing the program with transition relation − → r . Appendix B elucidates the advantages of such definition as compared to standard noninterference.
III. FORMAL MICROARCHITECTURAL MODEL
We introduce a Machine Independent Language (MIL) which we use to define the semantics of microarchitectural features such as out-of-order and speculative execution. We use MIL as a form of abstract microcode language: A target language for translating ISA instructions, and reasoning about features that may cause vulnerabilities like Spectre. Microinstructions in MIL represent atomic actions that can be executed by the CPU, emulating the pipeline phases in an abstract manner. This model is intentionally very general and provides an infrastructure to define models of real microarchitectures.
We consider a domain of values v ∈ V , a program counter pc ∈ PC, a finite set of register/flag identifiers r 0 , . . . , r n , f, z ∈ R ⊆ V , and a finite set of memory addresses a 0 , . . . , a m ∈ M ⊆ V . The language can be easily extended to support other type of resources, like registers for vector operations. We assume a total order < ∈ N × N over a set of names t 0 , t 1 , . . . ∈ N , which we use to uniquely identify microinstructions. We write N 1 < N 2 if for every pair (t 1 , t 2 ) ∈ N 1 × N 2 it holds that t 1 < t 2 .
Microinstructions ι ∈ I are conditional atomic single assignments. A microinstruction ι = t ← c?o is uniquely identified by its name t ∈ N and consists of a boolean guard c, which determines if the assignment should be executed, and an operation o ∈ Op. A MIL program I is a set of microinstructions t i ← c i ?o i ∈ I. The MIL language has three types of operations: e ::= v | t | e 1 + e 2 | e 1 > e 2 | · · · o ::= e | ld τ t a | st τ t a t v An internal operation e is an expression over standard finite arithmetic and can additionally refer to names in N and values in V . A resource load operation ld τ t a , where τ ∈ {PC, R, M}, loads the value of resource τ addressed by t a . We support three types of resources: The program counter PC, registers R, and memory locations M. A resource store operation st τ t a t v uses the value of t v to update the resource τ addressed by t a .
The free names fn(ι) of an instruction ι = t ← c?o is the set of names occurring in c or o, the bound names, bn(ι), is the singleton {t}, and the names n(ι) is fn(ι) ∪ bn(ι).
To model the internal state of a CPU pipeline, we can translate an ISA instruction as multiple microinstructions. For a given ISA instruction at address v ∈ M and a name t ∈ N , the function translate(v, t) returns the MIL translation of the instruction at address v, ensuring that the names of the microinstructions thus generated are greater than t. Because we assume code to not be self-modifying, an instruction can be statically identified by its address in memory. We assume that the translation function satisfies the following properties for all ι, ι 1 , ι 2 ∈ translate(v, t): (i) if ι 1 = ι 2 then bn(ι 1 ) ∩ bn(ι 2 ) = ∅, (ii) fn(ι) < bn(ι), (iii) {t} < n(ι).
These properties ensure that names uniquely identify microinstructions, the name parameters of a single instruction form a Directed Acyclic Graph, the translated microinstructions are assigned names greater than t, and the translation of two different ISA instructions does not have direct interinstruction dependencies (but may have indirect ones).
A. MIL Program Examples
We introduce some illustrative examples of MIL programs, using their graph representation. For clarity, we omit conditions whenever they are true and visualize only the immediate dependencies between graph elements. We report the full list of examples in Appendix A.
Consider an ISA instruction that increments the value of register r 1 by one, i.e., r 1 := r 1 +1. The instruction can be translated in MIL as follows: t 1 ← r 1 , t 2 ← ld R t 1 , t 3 ← t 2 + 1, t 4 ← st R t 1 t 3 , t 5 ← ld PC , t 6 ← t 5 + 4, t 7 ← st PC t 6
Intuitively, t 1 refers to the identifier of target register r 1 , t 2 loads the current value of register r 1 , t 3 executes the increment, and t 4 stores the result of t 3 in the register store. The translation of an ISA instruction also updates the program counter to enable the execution of the next instruction. In this case, the program counter is increased by 4, unconditionally. Notice that we omit the program counter's address, since there is only one such resource. We can graphically represent this set of microinstructions using the following graph: In the following we adopt syntactic sugar to use expressions, in place of names, for the address and value of load and store operations. This can be eliminated by introducing the proper intermediary internal assignments. This permits to rewrite the previous example as:
The translation of multiple ISA instructions results in disconnected graphs. This reflects the fact that inter-instruction ld R r1 t2 st R r1 t1 + 1 t4 ld PC t5 st R t5 + 4 t7 dependencies may not be statically identified due to dynamic references and must be dynamically resolved by the MIL semantics. When translating multiple instructions, we use the following convention for generated names: the name t ij identifies the j-th microinstruction resulting from the translation of the i-th instruction. Our convention induces a total (lexicographical) order over names (i.e., t ij < t i j iff (i < i ) ∨ (i = i ∧ j < j )), which respects the properties of the translation function.
MIL is expressive enough to support conditional instructions like conditional arithmetic and conditional move. Conditional branches can be modeled in MIL via microinstructions that are guarded by complementary conditions. For instance, beq a instruction, which jumps to address a if the z flag is set, can be translated as follows:
IV. OUT-OF-ORDER SEMANTICS A. States, Transitions, Observations
We formalize the semantics via a transition relation σ l −− σ , which maps a state σ ∈ States to a state σ ∈ States, and produces a (possibly empty ·) observation l ∈ Obs, eliding the dot (·) when convenient. As in Section II we define Obs = {·, dl v, ds v, il v} to capture the attacker model.
A state σ ∈ States is a tuple (I , s, C , F ) where: (i) I is a set of MIL microinstructions, (ii) s ∈ Stores = N V is a (partial) storage function from names to values recording microinstruction execution results, (iii) C ⊆ N is a set of names of store operations that have been committed to the memory subsystem, (iv) F ⊆ N is a set of names of program counter store operations that have been processed, thus fetching and decoding the corresponding ISA instruction.
In the following we write s[t → v] for substitution of value v for name t in store s. We use f (x)↓ to represent that the partial function f is defined on x, and f (x)↑ if not f (x)↓. We write dom(f ) for the domain of a partial function f . We also use f | D to represent the restriction of function f to domain D. The semantics of expressions is [e] : Stores V and is defined as expected. An expression is undefined iff at least one name is undefined in a storage, i.e., [e](s)↑ ⇔ fn(e) ⊆ dom(s). Let σ = (I , s, C , F ), we use [e]σ for [e]s, σ(t)↑ for s(t)↑, ι ∈ σ for ι ∈ I . A microinstruction is able to move to state Executed if its guard evaluates to true and all dependencies have been executed. Subsequently, an Executed store microinstruction can either be committed to the memory subsystem (Committed: t ∈ C ), or update the program counter value, thus fetching and decoding a new ISA instruction (Fetched, t ∈ F ). Transition of a program counter update to state Fetched leads the spawn of newly decoded microinstructions (i.e., the translation of the subsequent ISA instruction) in state Decoded. The labels of the edges in the diagram correspond to the names of the transition rules of Section IV-D. Finally, ι is completed in state σ (written C(σ, ι)) if its lifecycle is complete, i.e. it is either discarded, committed (if it is a memory store), fetched (if it is a PC store), or executed, otherwise.
B. Microinstruction Lifecycle

C. Semantics of Single Microinstructions
The semantics is defined in two steps: we first define the semantics of single microinstructions, then introduce the operational semantics of MIL programs. The semantics of a microinstruction [ι] : States → V ×Obs returns a value and an observation. The semantics is undefined if the microinstruction cannot be executed in a state σ = (I , s, C , F ). (Internal operations) The semantics of internal operations is straightforward:
An internal operation can be executed as soon as its dependencies are available. In Example 1, the semantics of internal operation t 1 is defined for the empty storage ∅, since it does not refer to any names. However, the semantics of t 3 is undefined in ∅, since it depends on the value of t 2 that is not available in ∅. (Store operations) The semantics of store operations is defined as follows:
A resource update can be executed as soon as both the address of the resource and the value are available. Observe that this rule models the internal execution of a resource update and not its commit to the memory subsystem. These internal updates are not observable by a programmer, therefore there is no restriction on their execution order. For example, the ISA program r 1 :=0; r 2 :=r 1 ; r 1 :=1 can be implemented by the following microinstructions:
The semantics of t 11 , i.e., st R r 1 0, and t 31 , i.e., st R r 1 1, is defined in ∅, and yields (0, ·) and (1, ·), respectively. As we will see, the operational semantics is in charge of ordering resource updates to preserve consistency and dependencies. (Load operations) While the semantics of internal operations and store operations only depends on the execution of their operands, load operations may depend on past store operations. This requires identifying the previous resource update that determines the correct value to be loaded. We use the following definitions to compute the set of store operations that may affect a load operation.
σ}} is the set of active stores.
The stores that may affect the address of t are the stores that: (i) have not been discarded, namely they can be executed ([c]σ) or may be executed (c(σ)↑), and (ii) the store address in t a may result in the same address as the load address in t a , namely either they both evaluate to the same address (σ(t a ) = σ(t a )), or the store address is unknown (σ(t a )↑), or the load address is unknown (σ(t a )↑).
The active stores of t are the stores that may affect the load address computed by t a , and, there are no subsequent stores t a on the same address as the load address in t a , or on the same address as the store address in t a . This set determines the "minimal" set of store operations that may affect a load operation from address t a .
The definitions of str-act(σ, t) and str-may(σ, t) are naturally extended to stores t ← c?st τ t a t v These definitions allow us to define the semantics of loads:
A load operation can be executed if the set of active stores consists of a singleton set with bound name t s , i.e., the store causing t a to be assigned is uniquely determined, and both the address t a of the load and the address t s of the store can be evaluated in a state σ.
Moreover, the semantics allows forwarding the result of a store to another microinstruction before it is committed to memory. In fact, if the active store is yet to be committed to memory, i.e., t s / ∈ C , it is possible for the store to forward its data to the load, without causing an interaction with the memory subsystem (i.e., l = ·). Otherwise, the load yields an observation of a data load from address σ(t a ). Example 3 illustrates the semantics of loads. The program writes 1 into address 1, then writes 2 in 0, overwrites address 0 with 2, and finally loads from address 1. We use active stores to dynamically compute the dependencies of load operations. Let σ 0 be a state containing microinstructions as in the example, and having empty storage. For this state, the active stores for the load microinstruction t 42 , i.e., str-act(σ 0 , t 42 ) consists of all store microinstructions of the example, as depicted by the solid rectangle. Since none of microinstructions that compute the addresses have been executed (i.e., storage is empty), the address t 41 of the load is unknown, hence, we cannot exclude any store from affecting the address that will be used by t 42 . Therefore, the load cannot be executed in σ 0 . This set of active stores will shrink during execution as more information becomes available through the storage.
Let now the storage of σ 1 be {t 11 → 1; t 31 → 1}, i.e., the result of executing microinstructions t 11 and t 31 . The active stores str-act(σ 1 , t 42 ) consist of microinstructions depicted by the dashed rectangle. Observe that the store denoted by t 12 is in str-may(σ 1 , t 42 ), however there exists a subsequent store, namely t 32 , that overrides the effects of t 12 on the same memory address. Therefore, t 12 is no longer an active store and it can safely be discarded.
Let the storage of σ 2 be {t 11 → 1; t 31 → 1, t 41 → 1}, i.e., the result of executing t 11 , t 31 and t 41 . The active stores str-act(σ 2 , t 42 ) now consist of the singleton set {t 32 } as depicted by the dotted rectangle. This is because the address t 41 of the load can be computed in state σ 2 . Although t 22 is still in str-may(σ 2 , t 42 ), there is a subsequent store, t 32 , that will certainly affect the address of the load. Therefore, t 22 is no longer an active store.
Finally, let the storage of σ 3 be {t 11 → 1; t 31 → 1, t 41 → 1, t 32 → 1}, i.e., the result of executing t 11 , t 31 , t 41 , and t 32 .
Once str-act been reduced to a singleton set ({t 32 }), and the active-store has been executed (σ 3 (t 32 )↓), the semantics of the load is defined. This yields the same value as the store in t 32 . If the store t 32 has been committed to the memory, the execution of the load yields the observation dl 1.
D. Operational Semantics
We can now define the microinstructions' transition relation σ l −− σ , implementing the lifecycle of Section IV-B.
(Execute) A microinstruction can be executed if it hasn't already been executed (s(t)↑), the guard holds (s[c]), and the dependencies have been resolved ([ι](s)↓):
Observe that if ι is a load from the memory subsystem, the rule produces the observation of a data load. (Commit) Once a memory store has been executed (s(t) = v), it can be committed to the memory subsystem, yielding an observation. Observe that a memory store can only be committed once (t ∈ C ). Moreover, the rule ensures that stores on the same address are committed in program order by checking that all past stores are in the set C , i.e., bn(str-may(σ, t)) ⊆ C .
In summary, stores can be executed internally in any order, however, they are committed in order. In Example 3, if σ has storage s = {t 11 → 1; t 12 → 1; t 31 → 1; t 32 → 1} and commits C = ∅, then only t 12 can be committed, since t 22 has not been executed and bn(str-may(σ, t 32 )) ⊆ C . Notice that t 22 is in the may stores since its address has not been resolved. Therefore, t 32 can be committed only after t 12 has been committed and t 21 has been executed. However, the commit of t 32 does not have to wait for the commit or execution of t 22 . In fact, if σ has storage s = s ∪ {t 21 → 0} then bn(str-may(σ , t 32 )) = {t 12 }. That is, order of store commits is only enforced per location, as expected.
(Fetch-Decode) A program counter store enables the fetching and decoding (i.e., translating) of a new ISA instruction. The rule for fetching is similar to the rule for commit, since instructions must be fetched in order. The set F keeps track of program counter updates whose resulting instruction has been fetched and ensures that instructions are not fetched or decoded twice. Fetching the result of a program counter update yields the observation of an instruction load from address v.
We write max(I ) for the largest name t in I . As expected, translate(v, max(I )) translates the instruction at address v, ensuring that the names of the microinstructions thus generated are greater than max(I ).
(Remarks on out-of-order semantics) The three rules of the semantics reflect the atomicity of MIL microinstructions: A transition can affect a single microinstruction by either assigning a value to the storage, extending the set of commits, or extending the set of fetches. In the following, we use step-param(σ, σ ) = (α, t) to identify the rule α ∈ {EXE, CMT(a, v), FTC(I )} that enables σ − − σ and the name t of the affected microinstruction. In case of commits we also extract the modified address a and the saved value v, in case of fetches we extract the newly decoded microinstructions I . The semantics preserves several invariants:
and free names (i.e., address and value) of the corresponding microinstruction are defined in s; if α = FTC(I ) then t ∈ F ; all state components are monotonic.
Notice that there is no state component that explicitly keeps track of the state of memory and registers. In fact, loads are resolved by using the value of the last preceding store that modified the same resource. For this reason, in order to bootstrap the computation, we assume that the set of microinstructions of the initial state contains one store for each memory address and register, the value of these stores is the initial value of the corresponding resource, and that these stores are in the storage and commits of the initial state.
E. In-order Semantics
We define the in-order (i.e., sequential) semantics by restricting the scheduling of the out-of-order semantics and enforcing the execution of microinstructions in program order:
That is, if a microinstruction t is affected then all previous (in program order) microinstructions have been completed.
Definition 3: Let σ 1 :: · · · :: σ n be the sequence of states of execution π, then commits(π, a) is the list of memory commits at address a in π, and is empty if n < 2; v :: commits(σ 2 :: · · · :: σ n , a) if step-param(σ 1 , σ 2 ) = (CMT(a, v), t); and commits(σ 2 :: · · · :: σ n , a) otherwise. We say that two models are memory consistent if writes to the same memory location are seen in the order.
Definition 4: The transition systems → 1 and → 2 are memory consistent if for any program and initial state σ 0 , for all executions π = σ 0 → * 1 σ, there exists π = σ 0 → * 2 σ such that for all a ∈ M, commits(π, a) = commits(π , a). Intuitively, two models that are memory consistent yield the same sequence of memory updates for each memory address. This ensures that the final result of a program is the same in both models.
Theorem 1: − − and − → are memory consistent.
V. SPECULATIVE SEMANTICS
We now extend the out-of-order semantics to support speculation. We add two new components to the states: a set of names P ⊆ n(I ) whose values have been predicted as result of speculation, and a dependency function δ : N S, a partial function recording, for each name t, the storage dependencies at time of execution of the microinstruction identified by t. Informally, δ(t) is a snapshot of the storage that affects the value of t due to speculative predictions. As we will see, these dependencies are needed in order to match speculative states with non-speculative states, and to restore the state of the execution in case of misspeculation.
A. Managing Microinstruction Dependencies
The execution of a microinstruction may depend on local (intra-) instruction dependencies, the names appearing freely in a microinstruction, as well as cross (inter-) instruction dependencies, caused by memory or register loads.
where the cross-instruction dependencies are defined as
Cross-dependencies are nonempty only for loads and consist of the names of active stores affecting t in state σ, asn(σ, t) = bn(str-act(σ, t)), plus, the names of stores potentially intervening between the earliest active store and t (we call srcs(σ, t) the potential sources of t), which are defined as
Intuitively, a load depends on the execution of active stores that may affect the address of that load. Moreover, the fact that a name t * is in the set of active stores asn depends on the addresses and guards of all stores between t * and t. This is because their values will determine the actual store that affects the address of the load t. Thanks to our ordering relation < between names, we can use the minimum name min(asn) in asn to compute all stores between any name in asn and t, thus extracting the free names of their guards and addresses.
The following figure illustrates dependencies of the load from Example 3:
If s = {t 11 → 1; t 21 → 0; t 41 → 1} then the set of active stores names asn for t 42 is bn(str-act(σ, t 42 )) = {t 12 , t 32 }, as depicted by the solid ellipses. In particular, min(asn) = t 12 . We consider all stores between t 12 and the load t 42 (i.e., t 12 , t 22 , and t 32 ), and add to the set of cross-dependencies the names in their guards and addresses, namely t 11 , t 21 and t 31 , as depicted by the dashed rectangle. Observe that t 21 is in the set of cross-dependencies, although t 22 is not an active store. This is because membership of t 12 in the active stores' set depends on the address t 21 being set to 0, i.e., s(t 21 ) = 0. Therefore, the set of cross-dependencies depsX (t 42 , σ) = {t 12 , t 32 , t 11 , t 21 , t 31 }. Finally, the local dependencies of the load t 42 consist of its parameter t 41 (the dotted ellipsis), such that deps(t, σ) = {t 12 , t 32 , t 11 , t 21 , t 31 , t 42 }.
We verify that the potential sources of deps are computed correctly.
Definition 6 (t-equivalence): Let t be a name, σ 1 and σ 2 be two states with storage s 1 and s 2 , and ι 1 and ι 2 be the microinstructions identified by t in the two states. We say that
, and if t's microinstruction is a load with dependencies T i = deps(t, σ i ) and active stores SA i = str-act(σ i | Ti , t) for i ∈ {1, 2} then SA 1 = SA 2 and s 1 | SA1 = s 2 | SA2 .
Intuitively, t-equivalence states that, if the microinstruction named with t depends (in the sense of deps) in both states on the same active stores and these stores assign the same value to t, then the microinstruction has the same dependencies, it is enabled, and it produces the same result in both states.
Lemma 1: If σ 1 ∼ t σ 2 and t's microinstruction in σ 1 is ι = t ← c?o, then deps(t, σ 1 ) = deps(t, σ 2 ), [c]σ 1 = [c]σ 2 , and if [ι]σ 1 = (v 1 , l 1 ) and [ι]σ 2 = (v 2 , l 2 ) then v 1 = v 2 . State Predicted models microinstructions that have not yet been executed, but whose result values have been predicted. A Decoded microinstruction can transition to state Predicted by predicting its result value, thus recording that the value was predicted, and causing the state of the microinstruction to be defined. A microinstruction that is ready to be executed (in Decoded), possibly relying on predicted values, can be executed and transition to state Speculated, recording its dependencies. Notice that state Speculated models both speculative and non-speculative execution of a microinstruction.
B. Microinstruction Lifecycle
From state Speculated, a microinstruction can: (a) roll back to Decoded (if the predicted values were wrong); (b) speculatively fetch the next ISA instruction to be executed, thus moving to state Speculatively Fetched) and generating newly decoded microinstructions; or retire in state Retired if it no longer depends on speculated values.
Microinstructions in state Speculatively Fetched can either be rolled back due to misspeculation, otherwise move to state Fetched). Finally, in state Retired, as in the out-of-order case a PC store microinstruction can be (non-speculatively) fetched and generate newly decoded microinstructions, or, if it is a memory store, be committed to the memory subsystem.
C. Microinstruction Semantics
We now present a speculative semantics, in terms of the transition relation (σ, δ, P ) − − − − (σ , δ , P ), that reflects the microinstructions' lifecycle in Figure 2 .
(Predict) The semantics allows to predict the value of an internal operation, choosing a value v ∈ V . The rule updates the storage and records the predicted name, while ensuring that the microinstruction has not been executed already.
We remark that the semantics predicts values only for internal operations. As we will see, this choice does not hinder expressiveness while it avoids the complexity in modeling speculative execution of program counter updates and loads. (Execute) The rules for execution and the rules for commit and fetch, reuse the out-of-order semantics. First for the case when the instruction has not been predicted already:
Notice that if step-param(σ, σ ) = (EXE, t) then σ(t)↑, hence t ∈ P . In this case it is sufficient to record the dependencies of t in δ. If t has already been predicted it is sufficient to record that this is no longer the case:
To commit a microinstruction it is sufficient to ensure that that there are no dependencies left, i.e., the microinstruction has been retired.
(Fetch) Finally, for the case of (speculative or non-speculative) fetching, the dependency map must be updated to record the dependency of the newly added microinstructions on the microinstruction entering the "fetched" state:
The following transition rule allows to retire a microinstruction in case of correct speculation:
The map δ(t) contains the snapshot of t's dependencies at time of t's execution. A microinstruction can be retired only if all its dependencies have been retired (dom(δ(t)) ∩ dom(δ) = ∅), the microinstruction has been executed (i.e. its value has not been just predicted s(t)↓ ∧ t ∈ P ), and the snapshot of t's dependencies is ∼ t with the current state, hence the semantics of t has been correctly speculated (see Lemma 1) . Retiring a microinstruction results in removing the state of its dependencies from δ, as captured by δ | {t} . Notice that in case of a load, (I , s, C , F ) ∼ t (I , δ(t), C , F ) may hold even if some dependencies of t differ in s and δ(t). In fact, a load may have been executed as a result of misspeculating the address of a previous store. In this case, ∼ t implies that the misspeculation has not affected the calculation of str-act of the load (i.e., it does not cause a store bypass), hence there is no reason to re-execute the load. This mechanism is demonstrated in examples in Section V-D.
(Rollback) A microinstruction t can be rolled back when it is found to transitively reference a value that was wrongly speculated. This is determined by comparing t's references at execution time (δ(t)) with the current state assignment, s. In case of a discrepancy, if t is not a program counter store, the assignment to t can simply be undone, leaving speculated microinstructions t that reference t to be rolled back later, if necessary. However, if t is a program counter store, the speculative evaluation using rule FTC will have caused a new microinstruction to be speculatively fetched. This fetch needs to be undone. To that end let t ≺ t (t refers to t), if t ∈ dom(δ(t )), let ≺ + be the transitive closure of ≺. As expected ≺ + is antisymmetric and its the reflexive closure is a partial order. Define then the set ∆ + as either ∅ if t is not a PC store instruction, or {t | t ≺ + t} otherwise: i.e., ∆ + is the set of names that reference t, not including t itself. Finally, let
We obtain: Theorem 2: − → and − − − − are memory consistent.
D. Speculation Semantics by Example
We reuse some of the previous examples to illustrate the speculation semantics.
Mispredicted values The speculative semantics supports prediction of arbitrary values as depicted in the table above. We use Example 1 to illustrate misprediction of values. Suppose that the microinstructions (identified by) t 1 and t 2 (i.e., reading value 77 from register r 1 ) have been executed and retired by applying rules EXE and RET. This yields a state σ 1 with storage s 1 = {t 1 → 1, t 2 → 77} and dependencies δ 1 = ∅, depicted in row 1. The CPU may now decide to predict that the value of arithmetic operation t 3 is 5. This can be achieved by applying rule PRD in state σ 1 , which updates the storage as in s 2 , and records of the dependency and the prediction of t 3 as in δ 2 and P 2 . The prediction allows speculative execution of the register store t 4 by applying rule EXE, which updates the state as in s 3 and records the dependency set δ 3 = δ 2 ∪{t 4 → {t 1 → 1, t 3 → 5}}. Specifically, since t 4 is a register store, its dependencies deps(σ 2 , t 4 ) consist of its parameters t 1 and t 3 . Hence, δ 3 = δ 2 ∪ t 4 → s 2 | {t1,t3} as in row 3.
In state σ 3 the CPU can execute the arithmetic operation t 3 by applying rule PEXE. The rule updates the storage as in s 4 by setting t 3 to 77 + 1, extending δ 4 = δ 3 ∪ {t 3 → s 3 | {t2} }, and removing t 3 from the prediction set, i.e., P 4 = ∅.
As a result of executing t 3 , we find that the register load t 4 has been misspeculated, triggering rule RBK. This is because the dependencies of t 4 , namely {t 3 }, at the time when t 4 was speculatively executed, i.e., δ 4 (t 4 )(t 3 ) = 5, are different from the dependencies of t 4 in the current state, i.e., s 4 (t 3 ) = 78. Hence, rule RBK updates the state σ 5 by removing t 4 from the storage, i.e., s 5 = s 4 \ {t 4 → 5}, as well as from the dependency set, i.e., δ 5 = t 3 → s 3 | {t2} .
The CPU can now apply rule EXE again to re-execute the register store t 4 , yielding s 6 = s 5 ∪ {t 4 → 78}, and δ 6 = δ 5 ∪ {t 4 → s 5 | {t1,t3} }.
Next, we can retire the arithmetic operation t 3 by applying rule RET and removing t 3 's dependencies. Observe that t 4 cannot be retired instead, since dom(δ 6 (t 4 )) ∩ dom(δ 6 ) = {t 3 }. In fact, the register store t 4 depends on t 3 , therefore the latter should be retired first, as expected.
Finally, we can retire t 4 and obtain δ 8 = ∅.
Mispredicted load/store dependencies Table above illustrates the speculative execution of cross-dependencies resulting from prediction of load/store's addresses. We discuss executions of Example 3 from a state σ 1 (row 1 in the table). Suppose the CPU has executed and retired t 11 , t 12 , t 21 , t 22 , and t 41 , that is, the first two stores and the load's address have been resolved. The CPU can predict the address (i.e., the value of t 31 ) of the third store as 0 and modify the state as in row 2 by applying rule PRD. This prediction enables speculative execution of the load t 42 : the active store's bounded names bn(str-act(s 2 , t 42 )) consist of the singleton set {t 12 }, since s 2 (t 21 ) = s 2 (t 31 ) = 0, while s 2 (t 41 ) = 1. Hence, we can apply rule EXE to execute t 42 , thus updating the storage s 3 = s 2 ∪ {t 42 → 1}, and recording the dependencies as in δ 3 . Concretely, t 42 's dependencies in state σ 3 consists of the local dependencies (i.e., the load's address t 41 ), and the cross dependencies containing t 12 (i.e., active store it loads the value from), as well as the potential sources of t 42 , that is, the addresses of all stores between the active store t 12 and the load t 42 , namely t 11 , t 21 and t 31 .
At this point, any attempt to retire the load t 42 by rule RET in state σ 3 will fail since its dependencies, e.g., t 31 , are yet to be retired. However, we can execute t 31 by applying rule PEXE. The execution updates the state as in σ 4 by removing t 31 from the prediction set and storing its correct value, as well as extending the dependency set with δ 4 = δ 3 ∪ {t 31 → ∅}).
The execution of t 31 enables the premises of rule RBK to capture that the dependency misprediction led to misspeculation of the address of the load t 42 . Specifically, the set asn at the time of t 42 's execution bn(str-act((I 4 , δ 4 (t 42 ), C 4 , F 4 ), t 41 )) = {t 12 } differs from the active store set bn(str-act(σ 4 , t 41 ))) = {t 32 } in the current state. Therefore, we roll back the execution updating the storage and removing all dependencies of t 42 as in σ 5 .
Finally, we remark that the speculative execution of loads is rolled back only if a misprediction causes a violation of load/store dependencies. For instance, if the value of t 31 was 5 instead of 1, as depicted in row 4 in the table, the misprediction of t 31 's value as 0 in σ 2 does not enable a rollback of the load. This is because the actual value of t 31 does not change the set of active stores. In fact, the set of active stores at the time of t 42 's execution bn(str-act((I 4 , δ 4 (t 42 ), C 4 , F 4 ), t 41 )) = {t 12 } is the same as the active store's set bn(str-act(σ 4 , t 41 ))) = {t 12 } in the current state. At this point, rule RBK will be enabled to roll back the mispredicted program counter update, since s 3 (t 2 ) = δ 2 (t 4 )(t 2 ). This removes t a1 . . . t an (and any name that transitively refers to them in δ 2 ) from the list of instructions, storage, decodes, and dependencies.
VI. MODELING PREDICTION STRATEGIES AND NEW VULNERABILITIES
A major benefit of our semantics is that it can be used to model and analyze (combinations of) microarchitectural features underpinning existing Spectre attacks [1] , [3] , [16] , and, importantly, to discover new vulnerabilities. Although the real-world feasibility of the new vulnerabilities falls outside the scope of a formal model, our model can be used to (in)validate the security claims of CPU vendors, opening up new avenues for provable security and practical exploitation.
We constrain our nondeterministic prediction semantics (see rule PRD) to model specific prediction strategies. Concretely, we define a prediction function pred p : Σ → N 2 V to capture a prediction strategy p by computing the set of predicted values for a name t ∈ N and a state σ ∈ Σ. We assume the transition relation satisfies the following property:
If (σ, δ, P ) l − − − − (σ , δ , P ∪{t}) then t ∈ dom(pred p (σ)) and σ (t) ∈ pred p (σ)(t). This property ensures that the transition relation chooses predicted values from function pred p .
Following the security model in Section II, we check conditional noninterference by: (a) using the in-order transition relation − → as reference model and speculative (out-of-order) transition relation − − − − (− − ) as target model; (b) providing the security policy ∼ for memory and registers; (c) and defining the attacker observations of the instruction and/or data cache. To invalidate conditional noninterference it is sufficient to find two ∼-indistinguishable states that yield the same observations in the reference model, and different observations in the target model. For simplicity, our examples only report the discriminating observation trace. We use the classification by Canella et al. [3] to refer to existing attacks. We refer to Appendix C for the model of Spectre v1 and we report below the models of other existing and new vulnerabilities.
A. Target Address Prediction: Spectre-BTB and Spectre-RSB Two variants of Spectre attacks [3] exploit a CPU's prediction mechanism for jump targets to leak sensitive data. In particular, Spectre-BTB [1] (Branch Target Buffer) poisons the prediction of indirect jump targets. To model this prediction strategy we assume a function ijmps(I ) that extracts all PC stores resulting from the translation of indirect jumps. This can be accomplished by making the translation of these instructions syntactically distinguishable from other control flow updates. As a result, prediction is possible for all indirect jumps whose address is yet to be resolved: Namely,
We do not restrict the possible predicted values v, since an accurate model of jump prediction requires knowing the strategy used by the CPU to update the BTB buffer.
Spectre-RSB [16] , [17] poisons the Return Stack Buffer (RSB), which is used to temporally store the N most recent return addresses: call instructions push the return address on the RSB, while ret instructions pop from the RSB to predict the return target. A misprediction can happen if: (i) a return address on the stack has been explicitly overwritten, e.g., when a program handles a software exception using longjmp instructions, or, (ii) returning from a call stack deeper than N , the RSB is empty and the CPU uses the same prediction as for the other indirect jumps. We model call and ret instructions via program counter stores. A call to address b 1 from address a 1 can be modeled as The call instruction saves (e.g. t 15 ) the return address (e.g. a 1 + 4) into the stack, decreases the stack pointer (e.g. t 13 ), and jumps to address b 1 (e.g. t 16 ). A ret instruction from address a 2 can be modeled as The instruction loads the return address from the stack ( t 24 ), increases the stack pointer (t 23 ), and returns (t 26 ). We assume functions calls(I ) and rets(I ) to extract the PC stores that belong to a call and ret respectively. Moreover, if t ∈ bn(calls(I )), we use ret a (I , t) to retrieve name of the microinstruction that saves the return address (e.g t 15 ) of the corresponding call. We model return address prediction as pred RSB (I , s, C , F , δ, P ) = {t a → v | t ← c?st PC t a ∈ rets(I ) ∧ s(t a )↑ ∧ ∃t ∈ bn(calls(I )). t < t ∧ s(ret a (I , t )) = v ∧ RSB-depth(I , t , t) ⊆ {1 . . . N }} Prediction is possible only for ret microinstructions t that have a prior matching call t , provided that the size of intermediary stack depth is between 1 and N . We define the latter as the set RSB-depth(I , t , t) = {#(bn(calls(I )) ∩ {t . . . t }) − #(bn(rets(I )) ∩ {t . . . t }) | t ≤ t < t}, where {t . . . t } is an arbitrary continuous sequence of names starting from t and ending before t, and #(bn(calls(I )) ∩ {t . . . t }) and #(bn(rets(I )) ∩ {t . . . t }) count the number of calls and rets in the sequence respectively. The prediction consists in assuming the target address (e.g. t a ) of the ret to be equal to the return address (e.g. v) that has been pushed into the stack by the matching call.
The following illustrative example shows how jump target prediction may violate the security condition. Consider the program * p:=&f; ( * p)() that saves the address of a function (i.e., &f ) in a function pointer at constant address p and immediately invokes it. Assuming that these instructions are stored at addresses a 1 and a 2 , their MIL translation is Because our semantics can predict only internal operations (see rule PRD), the translation function introduces an additional internal operation, i.e., t 22 which allows predicting the value of the load t 21 .
Suppose that the function f simply returns and the security policy labels all data, except the program counter, as sensitive. The program is secure (at the ISA level) as it always transfers control to f , producing the sequence of observations il a 1 :: ds p :: il a 1 :: dl p :: il &f independently of the initial state.
Jump target prediction produces a different behavior. Let σ 0 be the state containing only the translation of the instruction in a 1 . Initially, pred BT B (σ 0 ) is empty since the state contains no PC updates (e.g. t 12 ) that result from translating indirect jumps. The CPU may execute and fetch t 12 , thus adding t 21 , t 22 , and t 23 to the set of microinstructions I . In the resulting state pred BT B is {t 22 → v | v ∈ V }, since t 23 models an indirect jump and t 22 has not been executed. The CPU can therefore predict the value of t 22 without waiting for the result of the load t 21 . If the predicted value is the address g of the instruction r 1 := * (r 2 ) the misprediction can use g as gadget to leak sensitive information.
In fact, the speculative semantics can produce the sequence of observations il a 1 :: ds p :: il a 1 :: dl p :: il g :: dl v, where v is the initial value of register r 2 . The last observation of the sequence allows an attacker to learn sensitive data. Observe that this leak is readily captured by the security condition, since such observation is not possible in the sequential semantics.
B. New Vulnerability: Spectre-STL-D
Spectre-STL [18] (Store-To-Load) exploits the CPUs mechanism to predict load-to-store data dependencies. A load cannot be executed before executing all the past (in program order) stores that affect the same memory address. However, if the address of a past store has not been resolved, the CPU may execute the load in speculation without waiting for the store, predicting that the target address of the store is different from the load's source address. Mispredictions cause store bypasses and can lead to information leakage and accesses to stale data.
Our model reveals that if a microarchitecture instead mispredicts the existence of a Store-To-Load Dependency, hence Spectre-STL-D, e.g., in order to forward temporary store results, a similar vulnerability may be possible. To this end, we define pred ST LD (σ, δ, P ) as:
A prediction occurs whenever a memory store (t) is waiting an unresolved address (σ(t a )↑), while the address (s(t a ) = a) of a subsequent load (t ) has been resolved, and the load may depend on the store (t ∈ bn(str-act(σ, t ))). Prediction guesses that the store's address (t a ) matches the load's address. This behavior may cause Spectre-STL-D if a misspeculated dependency is used to perform subsequent memory accesses. If the CPU executes and fetches t 14 , predicts that t 12 = b 2 (i.e., it mispredicts the alias * b 1 == * b 2 ), executes t 13 , forwards the result of t 13 to t 21 , and executes t 22 before the load t 11 is retired, then the address accessed by t 22 depends on t secret . This can produce the secret-dependent sequence of observations il a 1 :: il a 2 :: dl secret, while the sequential semantics always produces the secret-independent sequence of observations il a 1 :: dl b 1 :: ds * b 1 :: il a 2 :: dl b 2 :: dl * b 2 .
C. New Vulnerability: Spectre-SF
When the first Spectre attack was published, some microarchitectures (e.g., Cortex A53) were claimed immune to the attack because of "allowing speculative fetching but not speculative execution" [4] . The informal argument was that mispredictions cannot cause buffer overreads or leave any footprint on the cache in absence of speculative loads.
Our model revealed Spectre-SF, a new variant which uses only Speculative Fetching to leak sensitive data. To check the claim above, we constrain the semantics to only allow speculation of the program counter values. Specifically, we require for any transition (σ, δ, P ) l − − − − (σ , δ , P ) that executes a microinstruction (step-param(σ, σ ) = (t, X)) which is either a load (t ← c?ld τ t a ∈ σ) or a store (t ← c?st τ t a t v ∈ σ) of a resource other than the program counter (τ = PC) to have an empty set of dependencies on past microinstructions (dom(δ) ∩ {t | t < t} = ∅).
Our new model reveals that if the instruction cache, rather than the data cache, is accessed speculatively when an instruction is fetched, Spectre-SF is possible. We illustrate the problem with the following program which jumps to the address pointed to by secret if the code is executed with admin privileges, otherwise it continues to address a 3 .
a1 : r1 = * secret; a2 : if ( * admin == 1)( * r1)(); a3:
In the sequential model, an attacker that only observes the instruction cache can see the sequence of observations il a 1 :: il a 2 :: il a 3 if * admin = 1, otherwise the sequence il a 1 :: il a 2 :: il secret. This program can be translated to MIL as follows A CPU that supports only speculative fetching may first complete all microinstructions in a 1 , and then predict the result of t 23 to enable the execution of t 25 . As a result the program counter speculatively fetches the instruction at location secret although * admin = 1. The transition yields the observation sequence il a 1 :: il a 2 :: il secret which was not possible in the sequential model, thus violating the security condition and leaking the value of secret though the instruction cache.
D. New Vulnerability: Spectre-OOO
Spectre-OOO is class vulnerabilities that do not rely on speculation, but merely use Out-Of-Order execution and dependencies between memory accesses. The following program (and its MIL translation) loads register r 1 from address b 1 , copies the value of r 1 in r 2 if the flag z is set, and saves the result into b 2 .
a1 : r1 = * b1; a2 : cmov z, r2, r1; a3 : * b2 = r2; The "conditional move" instruction in a 2 executes in constant time [19] and is used to re-write branches that may leak information via the execution time or the instruction cache. Suppose that flag z contains sensitive information and the attacker observes only the data cache. In the sequential model the program is considered secure, since it satisfies the "constant time" policy. This is a popular countermeasure to prevent sensitive data from affecting the execution time and caches by ensuring that conditions of branches and addresses of memory accesses are independent of sensitive data. The example program always accesses address b 1 and b 2 unconditionally and executes the same ISA instructions, producing the sequence of observations dl b 1 :: ds b 2 . However, the program becomes insecure in presence of out-of-order execution, since the data dependency between t 11 and t 32 exists only if z is set. Concretely, consider two states σ 1 and σ 2 in which z = 1 and z = 0, respectively. Then, str-act(σ 1 , t 31 ) = {t 24 } and str-act(σ 1 , t 24 ) = {t 12 }, while str-act(σ 2 , t 31 ) = {t 23 } and str-act(σ 2 , t 23 ) = {r 2 }. Therefore, state σ 2 may produce the observation sequence ds b 2 :: dl b 1 if only if the flag z = 0, thus leaking its value through the data cache.
VII. FORMAL ANALYSIS OF COUNTERMEASURES
In this section, we use InSpectre to reason about the security of countermeasures for microarchitectural leaks.
A. Constant-Time Execution
Spectre-OOO, as discussed in Section VI-D, shows that constant-time execution for ISA instructions is insufficient to enforce security at the microarchitectural level, e.g., if data dependencies between registers are influenced by secrets. This motivates the need for a new definition of constant time that is aware of microarchitectural features. 
: the same values are used to address resources and to update the program counter. Notice that additionally to standard requirements of constant time, MIL constant time requires that starting from two ∼indistinguishable states the program makes the same accesses to registers. Then we can prove that MIL constant time is sufficient to ensure security in the out-of-order model.
Theorem 3: If a program P is MIL constant time then P is conditionally noninterferent in the out-of-order model. The theorem enables checking conditional noninterference for the out-of-order model by verifying MIL constant time in the sequential model. This strategy has the advantage of performing the verification in the sequential model, which is deterministic, thus making it easier to reuse existing tools for (sequential) binary code analyses [20] .
Finally, we remark that MIL constant time is microarchitectural aware. This means that the same ISA program may or may not satisfy MIL constant time when translated to a given microarchitecture. In fact, the MIL translation of conditional move in Section VI (e.g. instruction a2) is not MIL constant time because of the dependency between the sensitive value in t 21 and conditional store in t 23 . However, if a microarchitecture translates the same conditional move in MIL as below, the translation is clearly MIL constant time.
B. Retpoline for Spectre-BTB
A known countermeasure to Spectre-BTB is the Retpoline technique developed by Google [11] . In a nutshell, retpolines are instruction snippets that isolate indirect jumps from speculative execution via call and return instructions. Retpoline has the effect of transforming indirect jumps at address a 2 of Example 4 as: Instruction at a 2 calls a trampoline starting at address b 1 and instruction at a 3 loops indefinitely. The first instruction of the trampoline overwrites the return address on the stack with the value of at address p and its second instruction at b 2 returns. We leverage our model to analyze the effectiveness of Retpoline for indirect jumps. Since address b 1 is known at compile time, t 26 does not trigger jump target prediction. While executing the trampoline, the value of t 55 may be mispredicted, especially if the load from p has not been executed and the store t 45 is postponed. However, b 2 is a ret, hence the value of t 55 is predicted via pred RSB . Since there is no call between a 1 and b 2 , then prediction can only assign the address a 3 to t 55 (i.e., pred RSB | t55 ⊆ {t 55 → a 3 }). Therefore, the RSB entry generated by a 2 is used and mispredictions are captured with the infinite loop in a 3 . Ultimately, when the value of t 55 is resolved, the correct return address is used and the control flow is redirected to the value in of * p, as expected.
C. Hardware Countermeasure Example
The specification of proposed hardware countermeasures oftentimes comes with no precise semantics and is ambiguous. InSpectre provides a ground to formalize the behavior of hardware mechanisms like memory fences. For instance, ARM introduced the Speculative Store Bypass Safe (SSBS) configuration to prevent store bypass vulnerabilities. The specification of SSBS [12] is: Hardware is not permitted to load . . . speculatively, in a manner that could . . . give rise to a . . . side channel, using an address derived from a register value that has been loaded from memory . . . (L) that speculatively reads an entry from earlier in the coherence order from that location being loaded from than the entry generated by the latest store (S) to that location using the same virtual address as L.
We formalize SSBS as follows. Let σ = (I , s, C , F , δ, P ) and t ← c?ld τ t a ∈ σ. If σ l − − − − σ , σ(t)↑, and σ (t)↓, then for every t ∈ srcs(t, σ), if σ(t ) = σ(t a ) then t ∈ P .
The reason why SSBS prevents Spectre-STL is simple. The rule forbids the execution of a load t if any address used to identify the last store affecting t a has been predicted to differ from t a . Observe that SSBS does not prevent Spectre-STL-D, where, in order to enable store forward, the CPU predicts that a store affects the same address of a subsequent load.
VIII. RELATED WORK
Speculative semantics and foundations Several works have recently addressed the formal foundations of specific forms of speculation to capture Spectre-like vulnerabilities. Cheang et al. [5] , Guarnieri et al. [6] , and Mcilroy et al. [7] propose semantics that support branch prediction, thus modeling only Spectre v1. Neither work supports speculation of target address, speculation of dependencies, or out-of-order execution. Disselkoen at al. [8] propose a pomset-based semantics that supports out-of-order execution and branch prediction. Their model targets a higher abstraction level modeling memory references using logical program variables. Hence, the model cannot support dynamic dependency resolution, dependency prediction, and speculation of target addresses.
We also remark on a related unpublished work by Cauligi et al. [21] appearing in arXiv in October 2019. Like us, they propose a model that captures existing variants of Spectre and discover a vulnerability similar to our Spectre-STL-D. A key difference between the two models is that Cauligi et al. impose sequential order to instruction retire and memory stores. While simplifying the proof of memory consistency, this does not reflect the inner workings of modern CPUs, which reorder memory stores and implement a relaxed consistency model. These features are required to capture Spectre-OOO in Section VI-D, where the leak is caused by store-reordering. Moreover, our model provides a clean separation between the general speculative semantics and the microarchitecturespecific feature, where the latter is obtained by reducing the nondeterminism of the former. This enables a modular analysis of (combinations of) predictive strategies, as in Spectre-SF in Section VI-C, showing that speculative fetching is insecure even in absence of speculative execution.
Cache side channels In line with prior works [5] , [6] , [21] , our attacker model abstracts away the mechanism used by an attacker to profile the sequence of a victim's memory accesses, providing a general account of trace-driven attacks [22] . Complementary works [14] , [23] - [25] show that cache profiling is becoming increasingly steady and precise. Performance jitters caused by cache usage have been widely exploited to leak sensitive data [26] - [32] , e.g., in cryptography software. We refer to a recent survey by Canella et al. [33] on cache-based attacks and countermeasures.
Spectre vs Meltdown Recent attacks that use microarchitectural effects of speculative execution have been generally distinguished as Spectre and Meltdown attacks [3] . We focus on the former [1] , [16] , [34] - [39] , which exploits speculation to cause a victim program to transiently access sensitive memory locations that the attacker is not authorized to read. Meltdown attacks [2] transiently bypass the hardware security mechanisms that enforce memory isolation. Importantly, Meltdown attacks can be easily countered in hardware, while Spectre attacks require hardware-software co-design, which motivates our model. We remark that the vulnerability in Section VI-B is different from the recent Microarchitectural Data Sampling attacks [40] - [42] , since it only requires the CPU to predict memory aliases with no need of violating memory protection mechanisms. Microarchitectures supporting this feature have been proposed, e.g., in Feiste et al. [43] .
Tool support Several research prototypes have been developed to reproduce and detect known Spectre-PHT attacks [5] , [6] , [9] . Checkmate [44] synthesizes proof-of-concept attacks by using models of speculative and out-of-order pipelines. Tool support for vulnerabilities beyond Spectre-PHT requires dealing with a large number of possible predictions and instruction interleavings. In fact, current tools mainly focus on Spectre-PHT ignoring out-of-order execution.
Hardware countermeasures While CPU vendors and researchers propose countermeasures, it is hard to validate their effectiveness and security without a model. InSpectre can help modeling and reasoning about their security guarantees, as in Section VII-C. Similarly, InSpectre can model the hardware configurations and fences designed by Intel [45] to stall (part of) an instruction stream in case of speculation. Several works [46] - [49] propose clean-slate approaches designing security-aware hardware that prevent Spectre-like attacks. InSpectre can help formalizing these hardware features and analyzing their security.
Concluding Remarks There are a number of interesting directions left open in this work, as indicated above. First, more work include tooling is needed to explore the utility of the model for exploit search and countermeasure proof, and the framework needs to be instantiated to different concrete pipeline architectures and experimentally validated. Going beyond single cores it is of interest also to augment the model with fences and other synchronization constructs, for multicore applications. 
APPENDIX
A. List of examples
B. Security Condition
Alternatively, we can define the security condition directly on the target model, in the style of standard noninterference.
Definition 8 (Noninterference): Let P be a program with transition relation − →, and ∼ a security policy. P satisfies noninterference iff for all σ 1 , σ 2 ∈ States such that σ 1 ∼ σ 2 and π 1 = σ 1 − → · · · , there exists an execution π 2 = σ 2 − → · · · and trace(π 1 ) = trace(π 2 ).
Noninterference ensures if the observations do not enable an attacker to refine his knowledge of sensitive information beyond what is allowed by the policy ∼, the program can be considered secure. Noninterference can accommodate partial release of sensitive information by refining definition of indistignuishability relation ∼. In our context, a precise definition of ∼ can be challenging to define. However, we ultimately aim at showing that the OOO/speculative model does not leak more information than the in-order (sequential) model, thus capturing the intuition that microarchitectural features like OOO and speculation do not introduce additional leaks.
Therefore, instead of defining the policy ∼ explicitly, we split it in two relations ∼ L and ∼ D , where the former models information of the initial state that is known by the attacker, i.e., the public resources, and the latter models information that the attacker is allowed to learn during the execution via observations. Hence, ∼=∼ L ∩ ∼ D . This characterization allows for a simpler formulation of the security condition that is transparent on the definition of ∼ D , as described in Def. 1.
C. Model of Spectre-PHT
We show how InSpectre models Spectre-PHT [1] , which exploits the prediction mechanism for the outcome of conditional branches. Modern CPUs use Pattern History Tables (PHT) to record patterns of past executions of conditional branches, i.e., whether the true or the false branch was executed, and then use it to predict the direction a branch prior to knowing the actual outcome of that branch. By poisoning the PHT to execute one direction (say the true branch), an attacker can fool the prediction mechanism to execute the true branch, even when the actual outcome of the branch is ultimately false. This example illustrates information leaks via Spectre-PHT. Suppose the security policy labels as public only the data within the size of A 1 and A 2 , and data in r 0 . If the attacker controls the value of r 0 , this program is clearly secure (at the ISA level) as it ensures that r 0 always lies within the bounds of A 1 . However, an attacker can fool the prediction mechanism by first supplying values of r 0 that execute the true branch, and then a value that exceeds the size of A 1 . This causes the CPU to perform an out-of-bounds memory access of sensitive data. This data is used as index for a second memory access of A 2 , thus leaving a trace of sensitive data into the cache which can be exploited by the attacker via well-known techniques [3] .
Branch prediction predicts values for MIL instructions that block the evaluation of the guard of a PC store whose target address has been already resolved. We model branch prediction by defining pred br (I , s, C , F , δ, P ) =
In order to show that branch prediction can lead to violations of the security condition, we use the variation of Spectre v1 above for a rich ISA supporting pointer indirection and conditional branches. The victim's code consists of instructions at addresses a 1 , a 2 and a 3 , which we translate to MIL: Clearly, if r 0 ≥ A 1 .size, the observation reveals memory content outside A 1 , allowing an attacker to learn sensitive data. Observe that this is rejected by the security condition, since such observation is not possible in the sequential semantics.
D. Proof of Memory Consistency for the Out Of Order Semantics: Theorem 1
To prove that − − and − → are memory consistent we demonstrate a reordering lemma, which allows to commute transitions if (n + 1)-th transition modified name t 2 , n-th transition modified name t 1 , and t 1 < t 2 .
σ1 σ2 σn σn+1
We use the following notation. Let σ 0 . . . σ n a sequence of states, we define (I j , s j , C j , F j ) = σ j for i ∈ {0 . . . n},
We first demonstrate that str-may and str-act of a microinstruction t do not depend on names bigger that t and that they are monotonic.
Lemma 2: Let σ 0 and σ 1 be two state, if bn(Î 1 ) ≥ t and dom(ŝ i ) ≥ t then str-may(σ 1 , t) = str-may(σ 0 , t) and str-act(σ 1 , t) = str-act(σ 0 , t) PROOF. Let t be a load or store accessing address t a (the other cases are trivial, since str-act is undefined), hence t a < t.
(1) The set of instructions that precedes t is the same in σ 0 and σ 1 . In fact, since bn(
(2) For every store that precedes t, the evaluation of condition and address is the same in σ 0 and σ 1 . In fact, let ι = t ← c ?st τ t a t v and t < t then n(c ) ∪ {t a } < t. Therefore, [c ]s 0 ∪ŝ 1 = [c ]s 0 and (s 0 ∪ŝ 1 )(t a ) = s 0 (t a ).
(3) Similarly, since t a < t then (s 0 ∪ŝ 1 )(t a ) = s 0 (t a ). Properties (1, 2, 3) guarantee that str-may(σ 1 , t) = str-may(σ 0 , t). Similarly, since str-act depends on the addresses and conditions of stores in str-may(σ 1 , t) and these have names smaller than t then str-act(σ 1 , t) = str-act(σ 0 , t). 2 Lemma 3: if σ 0 l −− σ 1 and t ← c?o ∈ σ then str-may(σ 1 , t) ⊆ str-may(σ 0 , t) and str-act(σ 1 , t) ⊆ str-act(σ 0 , t) PROOF. The proof is done by case on α 1 . For commits the proof is trivial, sinceÎ 1 = ∅ andŝ 1 = ∅. For fetches,ŝ 1 = ∅ and the transition may decode new stores inÎ 1 . However, these new stores have names greater than max(I 0 ), hence their names are greater than t. Therefore the additional stores do not affect str-may and str-act.
For executions,Î 1 = ∅,ŝ 1 = {t 1 → v}, and s 0 (t 1 )↑ for some v. This store update can make defined the evaluation of the condition or expression of a store. In this case, if a store is in str-may(σ 1 , t) it must also be in str-may(σ 0 , t). Stores that are in str-may(σ 0 , t) but are not in str-may(σ 1 , t) have undefined condition or address in σ 0 and false condition or non matching address in σ 1 . To show that str-act does not increase we proceed as follows. Let t be a store in str-may(σ 0 , t) \ str-act(σ 0 , t). There must be a subsequent overwriting store t in str-may(σ 0 , t) whose condition holds in σ 0 and address is defined in σ 0 . Such store cannot have t 1 in its free name, hence it is also in str-may(σ 1 , t). Therefore the store t overwrites t in σ 1 too. 2
Proof of Theorem 1 is done by induction on the length of traces and relies on Lemma 4 to demonstrate that (α 2 , t 2 ) can be applied in σ n and Lemma 5 to show that (α 1 , t 1 ) can be applied in the resulting state σ n+1 to obtain σ n+2 .
step-param(σ 0 , σ ) = (α 2 , t 2 ), and if α 2 = CMT(a, v) then α 1 = CMT(a, v ).
PROOF. We fist bound the effects of the transitions that modified t 1 .
(1) If σ 1 l2 − −− σ 2 and step-param(σ 1 , σ 2 ) = (α 2 , t 2 ), then exists ι 2 = t 2 ← c?o ∈ I 1 . Since step-param(σ 0 , σ 1 ) = (α 1 , t 1 ) then bn(Î 1 ) > t 1 > t 2 . Therefore
The proof continue by case over the transition rule α 2 .
(Case EXE) The hypothesis of the rule ensure that s 1 (t 2 )↑, [c]s 1 , and [ι 2 ]σ 1 = (v, l 2 ). The conclusion of the rule ensures thatŝ 2 = {t 2 → v},Ĉ 2 = ∅,F 2 = ∅, andÎ 2 = ∅. The proof that t 2 can be executed in σ 0 relies on the fact that all free names of the instruction t 2 must be smaller than t 2 . Property (2), f n(ι 2 ) < t 2 , and t 2 < t 1 ensure that s(t 2 )↑ and [c]s. The same properties guarantee that [ι 2 ]σ 0 = (v, l 2 ). For internal operations and stores the proof is trivial, since f n(ι 2 ) < t 2 , and t 2 < t 1 . The proof for loads uses Lemma 2 to guarantee that str-act(σ 0 , t 2 ) = str-act(σ 1 , t 2 ).
Hence we can apply rule (exec) to show that σ 0
The hypothesis of the rule ensure that s 1 (t 2 ) = v, t 2 ∈ C 1 , bn(str-may(σ 1 , t 2 )) ⊆ C 1 , and s 1 (t a ) = a. The conclusion of the rule ensures thatŝ 2 = ∅,Ĉ 2 = {t 2 },F 2 = ∅, I t2 = ∅, and l 2 = ds a. Property (2) and t a < t 2 < t 1 ensure that s 0 (t 2 ) = v, t 2 ∈ C 0 , and s 0 (t a ) = a. Similar to the case exec-load, Lemma 2 guarantee that bn(str-may(σ 1 , t 2 )) = bn(str-may(σ, t 2 )). Since the str-may are smaller than t 2 then bn(str-may(σ 0 , t 2 )) ⊆ C 0 .
Hence we can apply rule (commit) to show that σ 0 l2 − −− (I 0 , s 0 , C 0 ∪ {t 2 }, F 0 ) = σ . Finally, to show that α 1 = CMT(a, v ) we proceed by contradiction. If α 1 = CMT(a, v ), then bn(str-may(σ 0 , t 1 )) ⊆ C 0 . However, t 2 ∈ bn(str-may(σ 0 , t 1 )), because they write the same address a and t 2 < t 1 . This contradict that t 2 ∈ C 0 . (Case FTC) In this case o = st PC t v . The hypothesis of the rule ensure that s 1 (t 2 ) = v, t 2 ∈ F 1 , and bn(str-may(σ 1 , t 2 )) ⊆ F 1 . The conclusion of the rule ensures thatŝ 2 = ∅,Ĉ 2 = ∅,F 2 = {t 2 },Î 2 = translate(v, max(I 1 ), and l 2 = il a. Property (2) and f n(c)∪{t a } < t 2 < t 1 ensure that s 0 (t 2 ) = v and t 2 ∈ F 0 . Similar to the case commit, Lemma 2 guarantee that bn(str-may(σ 1 , t 2 )) = bn(str-may(σ, t 2 )). Since the str-may are smaller than t 2 then bn(str-may(σ 0 , t 2 )) ⊆ F 0 . To complete the proof we must show thatÎ 1 = ∅. We proceed by contradiction: ifÎ 1 = ∅ then α 1 = FTC(Î 1 ), hence this transition fetched t 1 and bn(str-may(σ 0 , t 1 )) ⊆ F 0 . However, t 2 < t 1 and both update the program counter, therefore t 2 ∈ bn(str-may(σ 0 , t 1 )). This contradicts t 2 ∈ F 0 .
Finally, we can apply rule FTC to show that σ 0
, step-param(σ , σ 2 ) = (α 1 , t 1 ), and if α 2 = CMT(a, v) then l 1 = CMT(a, v ). PROOF. The existence of σ is given by Lemma 4. For transition σ l1 − −− σ 2 we first bound its effects.
(1) If σ 0 l1 − −− σ 1 and step-param(σ 0 , σ 1 ) = (α 1 , t 1 ), then exists
We continue the proof by cases on α 2 . (Case EXE) The hypothesis of the rule ensure that s 0 (t 1 )↑, [c]s 0 , and [ι 1 ]σ 0 = (v, l 1 ). The conclusion of the rule ensures thatŝ 1 = {t 1 → v},Ĉ 1 = ∅,F 1 = ∅, and I t1 = ∅.
(3) Property (2) and t 2 < t 1 ensure that s (t 1 )↑. Moreover,
The same argument is used to guarantee that [ι 1 ]σ = (v, l 1 ). For internal operations and stores the proof follows the same approach of (3). The proof for loads uses Lemma 3. Notice that for internal operations and stores l 1 = l 1 = ·. For loads, either l 1 = l 1 = dl a or l 1 = dl a and l 1 = ·. The latter happens when step-param(σ 0 , σ ) = (CMT(a, v), t 2 ). In this case, we have reordered a memory commit before a load thus making not possible to forward the value of the store to the load and requiring a new memory interaction. This shows that the observations of the out of order model are a subset of the observations of in-order model, since the out-of-order model can execute loads before the corresponding stores are committed.
Finally, we can apply rule EXE to show that σ
(Case CMT(a, v)) In this case o = st M t a t v . The hypothesis of the rule ensure that s 0 (t 1 ) = v, t 1 ∈ C 0 , bn(str-may(σ 0 , t 1 )) ⊆ C 0 , and s 0 (t a ) = a. The conclusion of the rule ensures thatŝ 1 = ∅,Ĉ 1 = {t 1 },F 1 = ∅, I t1 = ∅, and l 1 = ds a. Property (2) and t 2 < t 1 ensure that s (t 1 ) = v and s (t a ) = a. To show that bn(str-may(σ , t 1 )) ⊆ C we use Lemma 3. Finally, t 1 = C , since t 1 > t 2 . To prove that α 2 = CMT(a, v ) we proceed by contradiction. If α 2 = CMT(a, v ) then t 2 ∈ C 0 . However, t 2 ∈ bn(str-may(σ 0 , t 1 )), because they write the same address a and t 2 < t 1 . This contradict that bn(str-may(σ 0 , t 1 )) ⊆ C 0 .
Therefore we can apply rule CMT to show that σ l1 − −− (I , s , C ∪ {t 1 }, F ) = σ 2 . To show that α 1 = (a, b ) (Case FTC) In this case o = st PC t v . The hypothesis of the rule ensure that s 0 (t 1 ) = v, t 1 ∈ F 0 , and bn(str-may(σ 0 , t 1 )) ⊆ F 0 . The conclusion of the rule ensures thatŝ 1 = ∅,Ĉ 1 = ∅,F 1 = {t 1 }, I t1 = translate(v, max(I 1 ), and l 1 = il a. Property (2) and t 2 < t 1 ensure that s (t 1 ) = v. To show that bn(str-may(σ , t 1 )) ⊆ F we use Lemma 3.
Finally, t 1 = F , since t 1 > t 2 .
To complete the proof we must show thatÎ 2 = ∅. We proceed by contradiction: ifÎ 2 = ∅ then α 2 = FTC, hence this transition fetched t 2 and t 2 ∈ F 0 . However, t 2 < t 1 and both update the program counter, therefore t 2 ∈ bn(str-may(σ 0 , t 1 )). This contradicts the hypothesis of bn(str-may(σ 0 , t 1 )) ⊆ F 0 . Finally, we can apply rule FTC to show that σ
E. Proof of correctness of t-equivalence: Lemma 1
If σ 1 ∼ t σ 2 and the t's microinstruction in
PROOF. For non-load operations and guards the proof is trivial, since their semantics only depends on the value of the bound names of the operation and guard. These names are statically identified from c and o and their value is the same in σ 1 and σ 2 by definition of ∼ t . For loads, the proof relies on showing that for every state σ, str-act(σ | T , t) = str-act(σ | T , t), where T = deps(t, σ). Let t ∈ str-act(σ, t). By definition, t and its bound names are in T , therefore their values are equal in σ and σ | T and hence t ∈ str-may(σ | T , t). Also, by definition of deps, all names referred by conditions and addresses of subsequent stores of t are in T . Therefore if there is no subsequent store that overwrites t (i.e., t ← c ?st τ t a t v such that [c ]σ and (σ(t a ) = σ(t) ∨ σ(t a ) = σ(t a ))) then there is no store overwriting t in σ | T and hence t ∈ str-act(σ | T , t). 2
F. Proof of Security of MIL Constant Time: Theorem 3
The proof is done by showing that the relation R is a bisimulation for the out-of-order transition relation, where σRσ iff σ ≈ σ and exist σ 0 ∼ L σ 0 and n such that σ 0 − − n σ and σ 0 − − n σ .
Let (I , s, C .F ) = σ, (I , s , C .F ) = σ , σ − − σ 1 = (I ∪I t , s∪s t , C ∪C t , F ∪F t ), and step-param(σ, σ 1 ) = (α, t). The proof is done by cases on α. We must show that exists v such that [ι]σ = (v , l). For o = e and o = st τ t a t v the proof is trivial. In fact, since [ι]σ then all free names of o are defined in σ and ≈ ensures that the same names are defined in σ . For o = ld τ t a , [ι]σ = (v, l) ensures that bn(str-act(σ, t)) = {t s }, σ(t a )↓, and σ(t s )↓. Relation ≈ ensures that σ (t a ) = σ(t a ), σ (t s )↓, I = I (hence there are the same store instructions), and that for every store t ← c ?st τ t a t v , [c ]σ = [c ]σ , and σ(t a ) = σ (t a ). Therefore bn(str-act(σ , t)) = bn(str-act(σ, t)) and [ι]σ = (σ (t s ), l ). Finally, since relation ≈ guarantee that (t s ∈ C ) ⇔ (t s ∈ C ) then l = l. These properties permit apply rule (EXE) to show that σ For t ∈ f n(c ) we reason as follows. States σ 1 and σ 1 are the (n+1)-th states of two out-of-order traces ρ = σ 0 − − n+1 σ 1 and ρ = σ 0 − − n+1 σ 1 such that σ 0 ∼ L σ 0 . There is a trace ρ 1 = σ 0 − − n+1 σ 1 − − m σ s that has prefix ρ, such that C(σ s , t ) for every t ≤ max(bn(I )). Notice that this state is "sequential". Since in the out-of-order semantics the storage is monotonic then σ 1 (t) = σ s (t). Theorem 1 permits to connect this trace to a sequential trace, which enables to use the MIL constant-time hypothesis. In fact, there exists an ordered execution π of ρ 1 that ends in σ s : π = σ 0 − → n+1+m σ s . For the same reason, ρ is a prefix of a trace ρ 1 that ends in a sequential state σ s , σ s (t) = σ 1 (t), and there exists a sequential trace π of n + 1 + m steps that ends in σ s . Since (Case CMT) The hypothesis of the rule ensure that t ← c?st M t a t v ∈ I , s(t) = v, t ∈ C , bn(str-may(σ, t)) ⊆ C , and s(t a ) = a. The conclusion of the rule ensures that s t = ∅, C t = {t}, F t = ∅, I t = ∅, and l = ds a. Also, the invariant guarantees that [c]s. To show that bn(str-may(σ , t)) = bn(str-may(σ, t)) we use the same reasoning used to prove that bn(str-act(σ , t)) = {t s } of case EXE when o = ld t a . Relation ≈ ensures that bn(str-may(σ , t)) ⊆ C . Therefore we can apply rule (CMT) to show that σ l −− (I , s , C ∪ {t}, F ) = σ 1 hence σ 1 ≈ σ 1 .
(Case FTC) The hypothesis of the rule ensure that t ← c?st PC t v ∈ I , s(t) = v, t ∈ F , bn(str-may(σ, t)) ⊆ F . The conclusion of the rule ensures that s t = ∅, C t = ∅, F t = {t}, I t = translate(v, max(I )), and l = il v. Also, the invariant guarantees that [c]s. The relation ≈ ensures that t ← c?st PC t v ∈ I , ∃v .s (t) = v , t ∈ F , and [s] [c]. To show that bn(str-may(σ , t)) = bn(str-may(σ, t)) we use the same reasoning used to prove that bn(str-act(σ , t)) = {t s } of case EXE when o = ld t a . Relation ≈ ensures that bn(str-may(σ , t)) ⊆ F . Therefore we can apply rule (FTC) to show that σ il v − −−−− (I ∪ translate(v , max(I ))), s , C , F ∪ {t}) = σ 1 . To show that v = v and that ≈ is re-established for translate(v, max(I )) we use a similar reasoning to case EXE. We find two sequential traces that ends with the fetch of t and we use MIL constant time to show that the value used for the pc update must be the same and that the parameter and conditions of the newly decoded microinstructions are equivalent in the two states.
G. Proof of Memory Consistency of the Speculative Semantics: Theorem 2
We reduce memory consistency for the speculation model to the ooo case using Theorem 1. Since the ooo semantics already takes care of reordering, to prove Theorem 2 a bisimulation argument suffices. Intuitively, referring to Figure 2 , the states "decoded", "guessed", "speculated" and "speculatively fetched" in the speculative semantics all correspond in some sense to the state "decoded" in the out-of-order semantics, in that any progress can still be undone to return to the "decoded" state. In a similar vein, the state "retired" corresponds to "executed" in the ooo semantics, "fetched" to "fetched" and "committed" to "committed". The only exception is states that are speculatively fetched. In this case there is an option to directly retire the fetched state, without passing through "retired" first. The proof reflects this intuition.
The main challenges in defining the bisimulation are i) to pin down the non-speculated instructions in the speculative semantics and relate them correctly to instructions in the ooo semantics, and ii) account for speculatively fetched instructions. The latter issue arises when retiring an instruction in the speculative semantics that has earlier been speculatively fetched. In that case, the corresponding decoded microinstructions are already in flight, although the bisimilar ooo state will have no trace of this. Then the ooo micro instruction will have to be first executed and then fetched. The following definitions make this intuition precise.
First say that a name t is produced by the pc store microinstruction t ← c?st PC t v ∈ I , if t ≺ t, i.e. t ∈ dom(δ(t )). We would like to conclude that t is uniquely determined, as we need this to properly relate the speculative and ooo states. However, this does not hold in general. For a counterexample consider the pc store microinstruction t ← c?st PC t v . Suppose that the fetch from t causes a new instruction I to be allocated with another pc store instruction t ← c ?st PC t v followed by a pc load t ← c ?ld PC , i.e. such that t < t . At this point, δ(t ) = [t → v] and δ(t ) = [t → v].
After executing the fetched pc store t and then the pc load, s(t ) = v and s(t ) = v . At this point, δ(t ) will map t to v and t to v . But then t is produced by both t and t . The same property holds if t is used as an argument to operations other than a pc load. This causes us to impose the following well-formedness condition on instruction translations:
Definition 9 (Wellformed instruction translation): The translation function translate is wellformed if translate(v, t) = I implies:
. 3) For all s there is a unique t ← c?st PC t v such that
[c]s.
Condition 9.1 and 2 can be imposed without loss of generality since any occurrence of t bound to the microinstruction c?st pc v can be replaced by v itself, and condition 9.3 is natural to ensure that any linear control flow gives rise to a correspondingly linear flow of instructions. We obtain: Proposition 1: If t is produced by t 1 and t is produced by t 2 then t 1 = t 2 .
2 Consider now instruction (images of translate) I 1 and I 2 such that bn(I 1 ) ∩ bn(I 2 ) = ∅. Say that I 1 produces I 2 , I 1 < I 2 , if there is t ← c?st PC t v ∈ I 1 such that for each t ∈ I 2 , t is produced by t. Clearly, if I is added to the set of microinstructions due to a fetch from I then I < I . Say then that I (and by extension states containing I ) is well-formed by the partitioning I retired , I 1 , . . . , I n , if I = {I retired , I 1 , . . . , I n }, I retired is retired, and for each I i , 1 ≤ i ≤ n there is I ∈ {I retired , I 1 , . . . , I n } such that I < I i . Moreover we require that < * on the partitions {I retired , I 1 , . . . , I n } is well-founded and that the partitions are maximal. Note that if I is well-formed by I retired , I 1 , . . . , I n then the partitioning is unique. We note also that all reachable states in the speculative semantics are well-formed and each partition corresponds to the translation of one single ISA instruction. We say that an ISA instruction I i in the partitioning I 1 , . . . , I n is unconditionally fetched, if I retired < I i and let I uf be the union of I retired and the instructions that have been unconditionally fetched.
We can now proceed to define the bisimulation R. We restrict attention to reachable states in both the ooo and speculative semantics in order to keep the definition of R manageable and be able to implicitly make use of simple invariant properties such as dom(δ) ∩ dom(δ(t)) ⇒ t / ∈ C (no instruction with a speculated dependency is committed). Let (I 1 , s 1 , C 1 , F 1 ) R (I 2 , s 2 , C 2 , F 2 , δ 2 , P 2 ) if 1) I 2 is well-formed by the partitioning I 2,retired , I 2,1 , . . . , I 2,n .
2) There is a bijection · from I 2,uf to I 1 .
3) C 2 = C 1 , 4) s 2 | dom(δ2) = s 1 , 5) F 2 \ dom(δ 2 ) = F 1 , In 3.-5. the bijection · is pointwise extended to sets and expressions.
Note that, from 2. and 4. we get that a microinstruction t in I 1 has been executed iff t / ∈ dom(δ 2 ). We prove that R is a weak bisimulation in two steps. We first show that all speculative transitions up until retire or nonspeculative fetch are reversible. To prove this it is sufficient to show that each of the conditions 1.-4. is invariant under PRD, EXE, PEXE, RBK, and FTC, the latter under the condition that the fetched instruction is in δ 2 . These transitions are identified by T 1 in the following picture:
(σ2, δ2, P2) (σ 2 , δ 2 , P 2 ) (σ 2 , δ 2 , P 2 ) (σ 2 , δ 2 , P 2 )
Lemma 6: If σ 1 R (σ 2 , δ 2 , P 2 ) and (σ 2 , δ 2 , P 2 ) − − − − (σ 2 , δ 2 , P 2 ) is an instance of PRD, EXE, PEXE,RBK, or speculative FTC then σ 1 R (σ 2 , δ 2 , P 2 ). PROOF. Let σ 2 = (I 2 , s 2 , C 2 , F 2 ) and (σ 2 , δ 2 , P 2 ) = (I 2 , s 2 , C 2 , F 2 , δ 2 , P 2 ). (Case PRD) We get σ 2 = (I 2 , s 2 [t → v], C 2 , F 2 , δ 2 , P 2 ∪ {t}) and note that conditions 1.-5. are trivially satisfied by the assumptions. (Case EXE) We get σ 2 − − σ 2 with step-param(σ 2 , σ 2 ) = (EXE, t), t ∈ P 2 , δ 2 = δ 2 ∪ {t → s 2 | deps(t,σ2) } and P 2 = P 2 . Cond. 1 and 2 are straightforward since I 2,uf and I 1 are not affected by the transition. For cond. 3 and 5 we get C 2 = C 2 and (Case PEXE) In this case σ 2 | {t} − − σ 2 with step-param(σ 2 , σ 2 ) = (EXE, t), t ∈ P 2 , δ 2 = δ 2 ∪ {t → s 2 | deps(t,σ2) } and P 2 = P 2 \ {t}. Cond. 1 and 2 again are immediate. For cond. 3, C 2 = C 2 and for cond. 5,
since, again, dom(δ 2 ) \ dom(δ 2 ) = {t} and t ∈ dom(F 2 ). Finally for cond. 2, RBK) We get that (I 2 , s 2 , C 2 , F 2 ) ∼ t (I 2 , δ 2 (t), C 2 , F 2 ) and t ∈ P 2 . We get I 2 = I 2 \ ∆ + , s 2 = s 2 | ∆ * , C 2 = C 2 , F 2 = F 2 \ ∆ * , δ 2 = δ 2 \ ∆ * , and P 2 = P 2 \ ∆ * . For cond. 1 and 2 first note that t ∈ dom(δ 2 ). If t is not a pc store the effect of RBK is to remove t from s 2 , F 2 , δ 2 , P 2 . This does not affect the bijection ·, so 1 and 2 remain valid also for (σ 2 , δ 2 , P 2 ). If t is a pc store then we need to observe the following: Since t is speculated, t is a member of some "macro"-instruction (= partition) I 2,i . The set ∆ + contains all instructions/partitions I 2,j such that I 2,i < + I 2,j . In particular, no such I 2,j is in I 2,f u , since otherwise I 2,j would have been added by a retired pc store microinstruction. It follows that the bijection · is not affected by the removal of ∆ + , and 1 and 2 are reestablished for the new speculative state.
For cond. 3, C 2 = C 2 . For cond. 4, we calculate:
= s 2 | dom(δ 2 )∪P 2 ∪∆ * = s 2 | dom(δ2\∆ * )∪∆ * = s 2 | dom(δ2)∪∆ * = s 2 | dom(δ2) .
Note that the final step uses prop. ??. Finally for cond. 5:
) .
(Case speculative FTC) We get that σ 2 − − σ 2 , step-param(σ 2 , σ 2 ) = (FTC(I 2 ), t), P 2 = P 2 , δ 2 = δ 2 ∪{t → s | {t} | t ∈ I 2 }, and, since t is speculated, t ∈ dom(δ 2 ). Also, we find t ← c?st PC t v ∈ I 2 , s 2 (t) = v, t / ∈ F 2 , bn(str-may(σ 2 , t)) ⊆ F 2 , I 2 = I 2 ∪ I 2 , s 2 = s, C 2 = C 2 , and F 2 = F ∪ {t}. For cond. 1 and 2 observe that no instruction I added by the fetch can belong to I 2,f u , since all such instructions are produced by a retired pc store instruction. For cond. 3, C 2 = C 2 is immediate. For cond. 4 we calculate: s 2 | dom(δ 2 ) = s 2 | dom(δ2)∪{t |t ∈I 2 } = s 2 | dom(δ2) Finally for cond. 5:
The following Lemma handle cases for RET when t ∈ F 2 , CMT, FTC when t ∈ dom(δ 2 ) (which are identified by T 2 in Figure) , and the cases for RET when t ∈ F 2 (which are identified by T 3 in Figure) .
Lemma 7: Assume that σ 1 R (σ 2 , δ 2 , P 2 ). 1) If (σ 2 , δ 2 , P 2 ) − − − − (σ 2 , δ 2 , P 2 ) is an instance of CMT, FTC, or RET then σ 1 − − * σ 1 such that σ 1 R (σ 2 , δ 2 , P 2 ).
2) If σ 1 − − σ 1 then (σ 2 , δ 2 , P 2 ) − − − − (σ 2 , δ 2 , P 2 ) such that σ 1 R (σ 2 , δ 2 , P 2 ). PROOF. Assume first that Assume first that (σ 2 , δ 2 , P 2 ) − − − − (σ 2 , δ 2 , P 2 ). Let σ i = (I i , s i , C i , F i ) and σ i = (I i , s i , C i , F i ). We prove that σ 1 R (σ 2 , δ 2 , P 2 ) and proceed by cases, first from the speculative to the ooo semantics.
