Abstract. SPARC processors have many applications in mission-critical industries such as aviation and space engineering. Hence, it is important to provide formal frameworks that facilitate the verification of hardware and software that run on or interface with these processors. This paper presents the first mechanised SPARC Total Store Ordering (TSO) memory model which operates on top of an abstract model of the SPARC Instruction Set Architecture (ISA) for multicore processors. Both models are specified in the theorem prover Isabelle/HOL. We formalise two TSO memory models: one is an adaptation of the axiomatic SPARC TSO model [31, 32] , the other is a novel operational TSO model which is suitable for verifying execution results. We prove that the operational model is sound and complete with respect to the axiomatic model. Finally, we give verification examples with two case studies drawn from the SPARCv9 manual.
Introduction
As multi-core processors prevail in computers, it is important to provide a formal specification of the instruction set architecture (ISA) and weak memory model that establishes the precise principles of concurrent low-level programs and the contract between hardware and software. ISA provides the semantics of instructions and processor operations, and it is essential in formal verification of the correctness and security of microkernels [17, 14] . Weak memory behaviour is particularly important for low-level system code such as synchronisation libraries, concurrent data structures, concurrent program compilers, etc [30] . The main purpose of such a specification is to [31] "Allow hardware designers and programmers to work independently, while still ensuring that any program will work as intended on any implementation."
Sindu and Frailong also point out that a specification should be formal so conformance to specification can be verified at some level [31] . Interactive theorem proving allows one to specify theories in rigorous mathematics and logic, and to reason about the specification with machine assisted tools. Deductive verification methods used in theorem provers enable the verification of complex infinite-state systems, where automatic techniques such as model checking struggle. As a result, formal verification projects, such as the renowned seL4 [17] and CertiKOS [14] , rely on theorem provers and mechanised models to provide a higher level of confidence that the formalisation is correct.
In our context, "formal" means that the model not only is specified in mathematics, but also is mechanised in a theorem prover.
The state-of-the-art on ISAs models cover different architectures such as Intel, AMD, SPARC, and PowerPC (references). Some of these formalizations also include weak-memory models to model multi-core architectures but as far as we are aware of, there are no formalisations of the weak memory model for the SPARC ISA. The multiprocessor SPARC architecture is adopted by the European Space Agency (ESA) to develop SPARC-based LEON multi-core processors in their space-crafts for critical missions [6] . In order to formally verify concurrent software running in top of these CPUs down to the lowest layers of the execution stack, it is necessary to formalise the SPARC ISA and its weak memory model. To assist with the verification tasks, we need a model that (1) supports SPARC ISA for multi-core processors, and (2) is formalised in a theorem prover. We focus on the SPARC TSO memory model since the critical software in our application uses TSO to avoid complex programs that require PSO. This work solves the above problems and serves as a case study to the verification community for our specific needs.
We build upon the single-core SPARCv8 ISA model of Hóu et al. [16] , which has been tested against a LEON3 simulation FPGA board for correctness, and develop a new SPARC ISA model for multi-core processors. The new ISA model abstracts the detailed operational semantics in the SPARCv8 ISA model into more general semantics while retaining the same operations in successful executions. Therefore, the previous experimental validation still holds for successful executions of the abstracted semantics. The new semantics is more suitable to be used as an interface for memory operations. The new ISA model is also an adaptation because various considerations for multi-core processors are taken into account. We drop the suffix "v8" for the abstract ISA model because we extend the SPARCv8 model with features and instructions from the SPARCv9 architecture. Specifically, we include the SPARCv9 atomic load-store instruction Compare and Swap (CASA), which is not present in SPARCv8 manual but is implemented on certain SPARCv8 processors. CASA is crucial for symmetric multiprocessors (SMP).
On top of the abstract ISA model, we give two TSO models: the first one is a formalisation of the axiomatic SPARC TSO model [31, 32] ; the second one is an operational TSO model which can be used to reason about program executions. The integration of instruction semantics and weak memory model is essential to support formal reasoning about concurrent programs, but this problem is sometimes neglected in the weak memory literature [30] . We show that the operational TSO model is sound and complete with respect to the axiomatic model. That is, every execution given by the operational model conforms with the axioms, and every sequence of memory operations that conforms with the axioms can be executed by the operational model. Finally, we give two case studies based on the "Indirection Through Processors" program and spin lock with CASA, both of which are drawn from the SPARCv9 manual, to exemplify verifications on the order of memory operations as well as on the result of execution. All the models and proofs in this paper are formalised in Isabelle/HOL 4 .
Related Work
An essential part of our work is the formal model for SPARC instruction semantics. There has been much work on formalising various instruction set architectures, but they focus on instruction level modelling instead of memory operations. A model of the SPARCv9 architecture is given by Santoro et al. [28] , but their model is not formalised in a theorem prover. Hóu et al. [16] formalise the ISA for the integer unit of SPARCv8 single-core processors. Their model can be exported for execution, and they have proven an instruction level noninterference property for the SPARCv8 architecture. Fox et al. give various models for ARM [8, 11] , they also build a framework for specifying and verifying ISAs [9, 10] . Goel et al. has a framework for building ISA models in ACL2 [12] . There are also formalisations for compilers for PowerPC, ARM, and IA32 processors [18, 19] , and for JVM [20, 3] . Our ISA model differs from the above work in that we model multi-core processors.
There is an non-exhaustive list of literature on relaxed memory models, but most of them do not consider machine code semantics. Here we only discuss the most closely related ones. Typically memory models appear in two forms: axiomatic model and operational model. The axiomatic TSO memory model for SPARC is given by Sindhu and Frailong [31] . This model is used in the SPARCv8 manual [32] , and is later referred to as the "golden memory model" [21] . Petri and Boudol [26, 4] give a comprehensive study on various weak memory models, including SPARC TSO, PSO, and RMO. They show that the store buffer semantics of TSO and PSO corresponds to their semantics of "speculations". Gray and Flur et al. [13, 7] have established axiomatic and operational models for TSO, and their equivalence. Their work is also integrated with detailed instruction semantics for x86, IBM Power, ARM, MIPS, and RISC-V. They have developed a language called Sail for expressing sequential ISA descriptions with relaxed memory models that later can be translated into Isabelle/HOL. However, the current set of modelled ISA does not include any variance for the SPARC ISA. Although it would have been possible to rewrite the semantics of [16] in Sail, this language lacks some important features necessary for our work. First, Sail does not provide some low level system semantics such as exceptions and interrupts; second, their framework does not include an execution model for multi-core processors.
Besides Burckhardt's work, there are other tools and techniques developed for verifying memory operations. Notably, Hangal et al.'s TSOtool [15] is a program for checking the behaviour of the memory subsystem in a shared memory multiprocessor computer aginst the TSO specification. Although verifying TSO compliance is an NP-complete problem, the authors give a polynomial time incomplete algorithm to efficiently check memory errors. Companies such as Intel also actively work on tools for efficient memory consistency verification [27] . Roy et al.'s tool is also polynomial time and is deployed across multiple groups at Intel. A tool specialised for SPARC instructions is developed by Park and Dill [25] .
There are also memory models that are formalised in theorem provers, such as Yang et al.'s axiomatic Itanium model Nemos in SAT solvers and Prolog [34] and the Java Memory Model in Isabelle/HOL [2] . Alglave et al. formalised a class of axiomatic relaxed memory models in Coq [1] . Crary and Sullivan formalise a calculus in Coq for relaxed memory models [5] . Their calculus is more relaxed than existing architec- tures, and their work is intended to serve as a programming language. A more related work is Owens, Sarkar, Sewell, et al.'s formalisation of x86 ISA and memory models [29, 24, 30] . They formalise both the ISA and relatex memory models such as x86-CC and x86-TSO in HOL and show the correspondence between different styles of memory models. It is possible to translate Gray and Flur et al.'s work [13, 7] to Isabelle/HOL or Coq code. However, the resulting formal model would rely on the correctness of the translation tool such as Lem [22] , which adds one more layer of complication in our verification tasks.
SPARC Abstract Instruction Set Architecture
This section presents our abstract SPARC ISA model, which is an abstraction and adaptation of the one of Hóu et al. [16] . The previous model is suitable for reasoning about operations at instruction level, but it is too complex and detailed to reason about memory operations. Hence we abstract their work into a more general model with big-step semantics and less SPARC specific features. Besides the non-memory-access instructions in the integer unit, we focus on the following instructions for memory access: load (LD), store (ST), swap memory and register content (SWAP), and compare and swap (CASA). The latter two are atomic load-store instructions.
Mapping from Instructions to Memory Operation Blocks
To bridge the gap between the instruction semantics level and the memory operation level, we define the concept of program block as a list of instructions where there can be at most one instruction for memory access (load, store, etc.), and the memory access instruction must be the last instruction in the list. Intuitively, a block of instructions in the ISA model corresponds to a memory operation in the memory model, with an exception discussed below. We illustrate program blocks with the example in Figure 1 .
Given a list of instructions for the processor to execute, we identify the memory access instructions (in bold font, such as LD, ST) and divide the list into several program blocks. In the example in Figure 1 , there are instructions after the last memory access instruction, they form a block as well (block 6), although strictly speaking they are not memory operations. In the SPARC TSO axiomatic model [31] , an atomic loadstore instruction is viewed as two memory operations [L; S] where the load part L and the store part S have to be executed atomically. In correspondence, we split an atomic load-store instruction, such as SWAP, into two parts and put them in two consecutive program blocks (block 3 and 4 in Figure 1 ). We assume that each program block can be uniquely identified. This gives rise to a mapping M block = id ⇒ block from an identifier (natural number) to a program block. The latter is a tuple i, p, id , where i is a list of instructions, p (natural number) is the processor in charge of executing the code, and id is the identifier of the load part of an atomic load-store instruction (optional).
We distinguish the types of program blocks by the memory operation involved in it. Program blocks without memory operations are called non-mem block, whilst program blocks including memory operations are called memory operation blocks. A memory operation block is a load block when it has an LD, it is a store block when it has an ST. An atomic load block has either SWAP LD or CASA LD, whereas an atomic store block has either SWAP ST or CASA ST.
In contrast to the SPARCv8 ISA model, here we lift the processor execution to be oriented on program blocks, based on the program order. A program order is the order in which a processor executes instructions [31] . Since we can identify program blocks using their identifiers we define the program order PO for a processor p as a mapping from p to a list of identifiers:
Given a program order PO and a processor p, the program blocks in this program order are related by a before relation ";" as follows:
p PO id 2 iff id 1 is before id 2 in the list of program block identifiers given by (PO p).
We shall omit the p and/or the PO in the notation of program order before and write id 1 ; id 2 when the context is obvious. Only program blocks issued by the same processor can be related by program order. Thus id 1 ; id 2 implicitly identifies a processor.
We divide program execution into two levels: the processors execute instructions and issue memory operations in a given program order; the memory executes memory operations in its own memory order, which will be described in Section 4.
State and Instruction Semantics
The state of a multi-core processor is a tuple ctl, reg, mem, L var , G var , op, unde f , next , with the following definitions:
ctl are the control registers (per processor), these include Processor State Register (PSR), which records the current set of registers, whether the processor is in user mode or supervisor mode, etc.; Program Counter (PC); Next Program Counter (nPC), among others. ctl is formally defined as a function ctl = p ⇒ C reg ⇒ val, where p is the processor, C reg is the control register, val is the value held by the register (32-bit word).
reg are the general registers (per processor). Formally, reg = p ⇒ r ⇒ val, where p is the processor, r is the address of the register (32-bit word), and val is the value of the register. SPARC instructions often use three general registers: two source registers, refered to as rs 1 and rs 2 , and a destination register, refered to as rd. For instance, the addition instruction takes two values from rs 1 and rs 2 , and store the sum in rd. We shall refer to the value reg p rx of a register rx in processor p as r[rx] when the context of the processor and the state is clear. SPARC fixes the value at register address 0 to be 0. So when rd = 0, we have r[rd] = 0.
A main memory mem is shared by all processors. Similar to the machine code semantics for x86 [30] , we focus on memory access of word (32-bits) only, and we assume that each memory address points to a word, and data are always well-aligned. Memory is a (partial) mapping mem = addr val.
Each processor has a local Boolean variable L var = p ⇒ bool. This Boolean variable is used to record whether the next instruction should be skipped or not after executing branching instructions. We refer to this variable as the annul flag.
All processors share a global variable G var , which is a pair f lag atom , val rd , where f lag atom is the id of the atomic load block when the processor is executing the corresponding atomic load-store instruction, or is undefined otherwise. val rd stores the value of the general register for destination rd which is used in atomic load-store instructions.
op records a memory operation. Formally, op = id ⇒ op addr , op val , where id is the identifier of the program block for the corresponding memory operation, op addr is the address of the operation, and op val is the value of the operation. For instance, a store operation writes value op val at address op addr , whereas a load operation loads value op val from address op addr . For a given id, op addr and op val are initially undefined. These values are computed during execution of memory blocks.
Finally, unde f indicates whether the state is undefined or not, and next gives the index (in the list typically given by (PO p)) of the next memory operation to be issued by processor. Formally, next = p ⇒ nat, where p is a processor and nat is the index.
To provide consistency w.r.t. the memory model, we split the semantics of atomic load-store instructions into the load part and the store part. The processor executes them separately, but the memory model guarantees that their executions are "atomic".
We give an example of the formalisation of the CASA instruction below. The SPARC manual [33] specifies the semantics of CASA as follows, where we adapt the setting from 64-bit registers in SPARCv9 to 32-bit 
Processor Execution
Processor execution includes three stages: fetch, decode, and dispatch. Since this model is built for analysing memory operations, we assume that there is a given program order from which we fetch the instructions. This is similar to the concept of "run skeletons" in the x86 weak memory models [29] . Decoding facilities are provided by the SPARCv8 ISA model [16] . Dispatching and executing the instructions require more care because we will be executing blocks (lists) of instructions at a time. For simplicity we only discuss three interfaces in prose here. The function exe is used for executing store blocks, atomic store blocks, and nonmem blocks. Load and atomic load blocks require more execution steps. We define the following functions to handle them, assuming the same parameters: The exe last id val function essentially executes the load (or atomic load) instruction by loading the value val from memory. Again, take Fig. 1 for example, when i = 3, exe last 3 val executes the SWAP LD instruction. Note that we do not need the extra input val for executing store instructions because both the address and the value for a store can be pre-computed from the instruction code. For load instructions, however, only the address can be pre-computed from instruction code. We need to execute until the instruction before the load instruction, then invoke the memory model to determine the value val to be loaded, which is why we need two steps when executing a load (or atomic load) block.
In this setup, when executing a memory load operation, all previous memory operations in the program order have been executed, and their corresponding addresses (op addr ) and values (op val ) have been updated in the state. This allows us to directly use the SPARC TSO Axiom Value (cf. Section 4.1) to obtain the value of the load operation.
SPARC TSO Memory Model
Details of the SPARC TSO model can be found in [31, 32] . This section formalises the axiomatic model in Isabelle/HOL. More importantly, we give a novel operational model, and show that the operational model corresponds to the axiomatic model.
Axiomatic TSO Model
The complete semantics of TSO are captured by six axioms [31, 32] , which specify the ordering of memory operations. The semantics of loads and stores to I/O addresses are implementation-dependent and are not covered by the TSO model. The SPARCv8 manual only specifies that loads and stores to I/O addresses must be strongly ordered among themselves. We adapt these axioms to our abstract SPARC ISA model and formalise them in Isabelle/HOL. Similar to the x86-TSO model [24] , we focus on data memory, thus our memory model does not consider instruction fetch and flush.
Besides the program order before relation (cf. Definition 1), the axiomatic model also relies on a before relation over operations but in memory order, which is the order that the memory executes load and store operations. Given a partial/final memory execution represented by a sequence x of ids, the before relation over two operations id 1 and id 2 in memory order is defined below as a partial function from the pair to bool, where we write id ∈ x when id is in the sequence x:
Definition 7 (Memory Order Before). id 1 < x id 2 ≡ if (id 1 ∈ x) ∧ (id 2 ∈ x) then if id 1 is before id 2 in x then true else f alse else if id 1 ∈ x then true else if id 2 ∈ x then f alse else unde f ined We may loosely refer to a memory order by the corresponding partial/final memory execution sequence x. We may write id 1 < id 2 when the context is clear. Note that any memory operation id in the sequence of executed operations x has been already executed by the processor and thus op addr id in the current state is defined.
The axiom Order states that in a final execution sequence x, every pair id, id of store operations are related by < x . This axiom is formalised as below: Definition 8 (Axiom Order). order id id x M block ≡ If both (M block id) and (M block id ) are either a store or an atomic store block, and both id and id are in x, and id = id , then either (id < x id ) or (id < x id).
The axiom Atomicity ensures that for an atomic load-store instruction, the load part id l is executed by the memory before the store part id s , and there can be no other store operations executed between id l and id s .
Definition 9 (Axiom Atomicity). atomicity id l id s PO x M block ≡ If id l and id s are from the same instruction instance, and (id l ; id s ), and (M block id l ) is an atomic load block, and (M block id s ) is an atomic store block, then id l < x id s , and for all store or atomic store block (M block id), if id ∈ x and id = id s , then either id < x id l or id s < x id.
The axiom Termination states that all store operations eventually terminate. We capture this by ensuring that after the execution is completed, every store operation id that appears in the program list of some processor is in the sequence x of executed operations. We formalise this axiom as follows:
Definition 10 (Axiom Termination). termination id PO x M block ≡ If there exists a processor p such that id ∈ (PO p), and (M block id) is a store or atomic store block, then id ∈ x.
The axiom Value states that the value of a load operation id issued by processor p at address addr is the value written by the most recent store to that address. The most recent store at addr could be: (1) the most recent store issued by processor p, or (2) the most recent store (issued by any processor) executed by the memory.
Definition 11 (Axiom Value). value p id addr PO x M block state ≡ Let Max < denote a function that outputs the last element in the order defined by < (memory order before) in a set of ids.
Max < ({id | id < x id, and (M block id ) is a store or atomic store block, and addr is equal to op addr of id } ∪ {id | id ; id and (M block id ) is a store or atomic store block, and addr is equal to op addr of id }), the value to be loaded is op val of the output of Max < .
Intuitively, the output of Max < is the last element in the order given by < from two sets of block ids: The first set includes all the store operations that are before id in the memory order x and write values at address addr. The second set includes all the store operations that are before id in the program order (given by (PO p)) and write values at address addr. Therefore Max < returns the most recent store operation at address addr in memory order. We write Lval id to denote the value to be loaded for operation id based on Axiom Value.
The axiom LoadOp requires that any operation id issued after a load id in the program order must be executed by the memory after id. This is formalised as below: ) is a load or atomic load block, and id ; id , then id < x id .
The axiom StoreStore states that if a store operation id is before another store operation id in the program order, then id is before id in the memory order. 
Operational TSO Model
Compared with other operational memory models such as the x86-TSO model [30] , our ISA model enables us to develop a more abstract operational memory model without using concrete modules such as store buffer, which effectively buffers the address and value of most recent store operations. This alleviates the burden of modelling complicated operations and interactions between the processor and the store buffer, and results in a simple and elegant operational memory model. Our operational TSO model is defined via inference rules. An operation takes the form x, s ; x , s where x and s are respectively the partial execution sequence and state before the operation, and x and s are respectively the partial execution sequence and state after the operation. We shall use the following notations: We write type id to denote the type of the memory operation block (M block id). We use the following abbreviations for memory operation block types: ld (load), ald (atomic load), st (store), ast (atomic store), non (non-mem). We write x@x for the concatenation of two sequences x and x . We write W mem id s for memory commit (write) of operation id in state s. We define the operation f lag set atom id s to set the atomic flag f lag atom to id in state s. This operation returns a new state. We write f lag set atom unde f s to set the flag to undefined. When the operation id is an atomic store operation, the function atom pair id returns the operation id such that id is the corresponding atomic load operation of the same instruction. This function is otherwise undefined.
The operational TSO model consists of four rules, which are given in Figure 2 . The first rule for load operations has two premises: (1) the type of the operation id is load; (2) every load operation before id in the program order has been executed by the memory. The operation first executes (exe pre id ) all instructions in the program order before the last instruction (which must be the load instruction) in the block id, then uses Axiom Load (Lval id ) to determine the value to be loaded, and finally executes (exe last id ) the load instruction.
The rule for store operations requires that f lag atom in state s must be undefined. That is, the memory is not in the middle of executing an atomic load-store operation. Also, the rule requires that every load or store operation before opid in the program order has been executed by the memory. Combining the last premise of load, atom load, and atom store respectively, these requirements ensure that axioms LoadOp and StoreStore are respected in execution. For instance, it is possible that a store is issued (by a processor) before a load but is executed (by memory) after the load; but it is not possible that a load is issued before a store but executed after the store. The store operation's final step is to commit the store operation id in memory. This step fetches the value op val and address op addr of the operation id from the state, and writes the value at the address in the memory.
The premises for the rule atom load can be read similarly. The final step of the atom load operation sets f lag atom to id, where id is the atomic load operation. Accordingly, the rule atom store requires that the memory has executed the atomic load part id , but has not executed the store part. The rule atom store also ensures that the atomic pair of the store part id is indeed id . The operation eventually sets the f lag atom back to undefined and commits the operation in the memory. The premises with regard to f lag atom and atomic pair ensure that axiom Atomicity holds in execution.
In addition to the rules for memory operations, to obtain the final result of processor execution, we may need the rule non mem:
This rule executes the block after the last memory operation (e.g., block 6 in Figure 1 ), if there is any. This rule is not related to the memory model because it does not involve memory operations. It plays no roles in the proofs in the remainder of this section.
Soundness and completeness of the operational model
We are now ready to present the main results of this work: the operational TSO model is sound and complete w.r.t. the TSO axioms. The previous subsection has briefly discussed that the design of operational rules respects the axioms such as LoadOp, StoreStore, and Atomicity. Axiom Value trivially holds in the operational model because the rule load directly uses axiom Value to obtain load result. Axiom Termination is satisfied by the construction of the execution witness sequences, because the x part of the final witness is guaranteed to contain all the store operations, which means that the execution of these operations have been completed by the memory. Axiom Order holds because all the executed store operations are recorded in a list, which means every pair of them are ordered. The formal proof of the correspondence of the axiomatic model and the operational model is rather complicated, and here we only discuss the results. Interested readers can check the Isabelle/HOL formalization and proofs 5 for more details. Theorem 1 (Soundness). Every memory operation sequence generated by the operational model satisfies the axioms in the axiomatic model.
Theorem 2 (Completeness).
Every memory operation sequence that satisfies the axioms in the axiomatic model can be generated by the operational model.
Case Studies
With the above work, we can now formally reason about concurrent machine code. The axiomatic model can be used to reason about the order of memory operations, while the operational model is better at reasoning about properties of the execution flow. We run two case studies drawn from examples in the SPARCv9 manual [33] . We may use the term process and processor interchangeably. See Owen's work [23] for a semantic foundation for reasoning about programs in TSO-like relaxed memory models.
Indirection Through Processors
The "Indirection Through Processors" program is taken from Figure 46 of the SPARCv9 manual [33] . This example intends to reflect the TSO property that causal update relations are preserved. The original program involves three processors, each processor issues two memory operations. A memory operation is given in an "instruction-like" style, e.g., st #1, [A] means that the value 1 is stored into address A of the memory. Unfortunately in real SPARC store instructions, the value to be stored and the value of the memory address must be taken from registers, so we need to add a few instructions to initialise the registers for this example to work. Our formalised "Indirection Through Processors" example is shown in Table 1 . The global register %g0 in SPARC always contains 0. The first instruction in block 0 adds 0 and 1, and puts the result in register %r4. The ST in block 0 thus stores 1 at memory address 1. The ST in block 1 stores 1 at address 2. The LD in block 2 loads the value at address 2 to register %r1. Block 3 then stores the value in %r1 at address 3. Finally, processor 3 loads the values at addresses 3 and 1 to registers %r1 and %r2.
Reasoning about memory operation order. It is intuitive to use the axiomatic TSO model to reason about the order of memory operations. For the program in Table 1 , the SPARCv9 manual gives some example sequences of memory operations allowed under TSO, and an example sequence that is not allowed under TSO: x = [1, 2, 3, 4, 5, 0] . This is because (0 ; 1) must hold in the program order given by Table 1 , and the above sequence implies that ¬(0 < 1 = true) in the memory order, which falsifies the axiom StoreStore.
Alternatively, the completeness of the operational TSO model enables us to use the operational model to reason about the possible next step operations. The above reasoning can be confirmed by our operational model in the lemma below:
Lemma 1 states that given a partial execution sequence which contains only an initialisation step init where memory addresses are set to unde f ined and registers are set of 0, memory operation block 1 in Table 1 cannot be the first operation to be executed.
Reasoning about execution result. Besides eliminating illegal executions, one can also use our operational model to reason about the results of legal executions. For instance, the SPARCv9 manual lists the sequence x = [0, 1, 2, 3, 4, 5] as a legal execution under TSO. For simplicity, here we only show that after a partial execution [0, 1, 2], the register %r1 of processor 2 has value 1, which is stored to address 2 by processor 1 previously. This shows that a processor can observe the memory updates made by other processors. This is formalised in the following lemma:
The right hand side of the implication means that in state s 3 , the general register 1 of processor 2 contains value 1. The proof for execution results usually involves a "simulation" of the execution using the abstract ISA model and the operational TSO model. For this example, we start from the initial witness, and prove a series of lemmas about the execution witnesses
It is straightforward to complete this series of proofs and obtain the result of a final execution.
Spin Lock with Compare and Swap
Section J.6 of the SPARCv9 manual [33] gives an example of spin lock implemented using the CASA instruction, the code is shown in Figure 3a . Note that the code in Figure 3a is in synthetic instruction format. SPARCv8/v9 manual provides a straightforward mapping from this format to SPARC instruction format, which is what our ISA model supports. For instance, in the retry fragment, the first instruction mov corresponds to an OR, which adds the ID proc id of the current process and 0, and stores the result to register %l0, which corresponds to register %r16. After executing this line, %l0 (%r16) contains the ID of the current process. The second line is the CASA instruction. It checks whether the memory value at address lock is equal to the value at %g0 (which must be 0), and swaps the value at address lock and the value at register %l0 when the above check is positive. Otherwise, the value at address lock is stored at register %l0. Therefore, when no processes hold the lock, the value at address lock is 0, and after executing the second line, %l0 (%r16) will have 0 and address lock will contain the ID of the current process. On the other hand, when the lock is held by another process, after executing CASA, the memory address lock is unchanged, and %l0 contains the ID of the process that holds the lock. The code tst %l0 corresponds to an ORcc, which checks if %l0 is equal to 0. If it is, then the program branches to out, and starts to execute in the critical region. Otherwise, the program goes to loop and keeps reading the address lock until it contains a 0. We give the fragment of instructions before entering the critical region in Figure 3b , and consider a concrete situation where two processes (processors) 1 and 2 are competing to get the lock, and process 3 initialises the lock to 0. Assume that process 3 executes operation 4 first for initialisation, also assume without of loss generality that operation 0 of process 1 is executed by the memory earlier than process 2's operations, we show that process 1 will enter the critical region. The case where operation 2 of process 2 is executed earlier by the memory is symmetric. In this example, we set the address of critical region as 28 2 = 112 relative to the address of the branch instruction BE, where is sign extended shift to the left.
The proof uses a mixture of the techniques in the previous subsection to obtain valid memory operation sequences and reason about the results. We omit the intermediate steps and show the final lemma below: Lemma 3. [4, 0, 1, 2, 3, 5], s 6 ; [4, 0, 1, 2, 3, 5, 6 ], s 7 −→ (ctl s 7 ) 1 nPC = (ctl s 7 ) 1 PC + 112 ∧ (ctl s 7 ) 2 nPC = (ctl s 7 ) 2 PC + 4
The right hand side of the implication shows that the nPC (next program counter) of processor 1 is the entry point of the critical region, while the nPC of processor 2 points to NOP, after which will lead processor 2 to the loop in Figure 3a .
Conclusion and Future Work
This paper gives an abstraction of the SPARCv8 ISA model in Isabelle/HOL [16] . The new model is suitable for formal modelling and verification at the memory operation level. We also extend the ISA model with semantics for the SPARCv9 instruction Compare and Swap, which is useful in concurrent programs. The more abstract ISA model splits the semantics for atomic load-store instructions into two parts: the load part and the store part, which correspond to the operations in the memory model.
On top of the abstract ISA model, we formalise the SPARC TSO axiomatic memory model in Isabelle/HOL. This model is useful for reasoning about the order of memory operations. We also give a novel operational TSO memory model as a system that consists of four rules. We show that the operational TSO model is sound and complete with respect to the axiomatic model. Finally, we demonstrate the use of our memory models with two examples in the SPARCv9 manual.
All the models and proofs in this paper are formalised in Isabelle/HOL. The abstract SPARC ISA model measures 1960 lines of code, the two memory models and the soundness and completeness proofs constitute 4753 lines of code, the case studies take up 1750 lines of code.
One of our next steps is to generate executable code from our operational TSO model and conduct experiment against real hardware. One can view this as a "validation" step. However, our understanding of the SPARC TSO model is that the TSO axiomatic model came as a part of the SPARCv8 manual before the implementation of actual hardware, thus the TSO axiomatic model should be seen as a standard that the hardware must comply rather than the other way around. Therefore a better validation would be to show that our formalisation of the TSO axiomatic model is consistent with the definitions in the SPARCv8 manual, which is easy to verify.
Our current on-going work is about developing a Hoare-style logic for SPARC machine code. The current framework, which includes the abstract ISA model and the memory models, provides the foundation for the verification of concurrent machine code. However, if a program involves a complex control-flow with branches and loops, it is tedious to use the current models to reason about the program. A Hoare-style logic is much desired to make the reasoning task easier. We envision that this new work will make it easier to prove properties such as reachability, safety, and non-interference.
