Abstract. This paper presents a mechanised Hoare-style programming logic framework for assembly level programs. The framework has been designed to fit on top of operational semantics of realistically modelled machine code. Many ad hoc restrictions and features present in real machine-code are handled, including finite memory, data and code in the same memory space, the behavior of status registers and hazards of corrupting special purpose registers (e.g. the program counter, procedure return register and stack pointer). Despite accurately modeling such low level details, the approach yields concise specifications for machinecode programs without using common simplifying assumptions (like an unbounded state space). The framework is based on a flexible state representation in which functional and resource usage specifications are written in a style inspired by separation logic. The presented work has been formalised in higher-order logic, mechanised in the HOL4 system and is currently being used to verify ARM machine-code implementations of arithmetic and cryptographic operations.
Introduction
Computer programs execute on machines where stacks have limits, integers are bounded and programs are stored in the same memory as data. However, verification of computer programs is almost without exception done using highly simplified models, where stacks and memory are unbounded, integers are arbitrarily large and the compilers are trusted to keep code and data apart. Proving properties of programs with respect to realistic models is generally avoided, since many of the common simplifying assumptions made by high-level programming logics tend to fit badly with realities of accurate low-level models. In this paper we present a programming logic that has been designed to fit on top of accurate models of machine languages.
We present a Hoare logic that has been carefully designed to accommodate many of the ad hoc restrictions and features of machine code: finite memory, data and code in the same memory space, the behaviour of status register, hazards of corrupting special purpose registers and some details that arise from hardware implementations. As an example of a restriction imposed by the underlying hardware, consider the following two seemingly equivalent implementations of the factorial program in ARM assembly. The example uses the ARM instructions "MOV b, #1" (set register b to 1), "MUL c, a, b" (put the product of the contents of registers a and b into register c, but see restrictions discussed shortly), "SUBS a, a, #1" (subtract 1 from register a and update status bits so that status bit Z is assigned the boolean expression a-1=0) and "BNE L" (jump to L if status bit Z is 0). The first implementation terminates with the factorial of a (modulo 2 32 ) in b, while the other one has an unpredictable outcome, "MUL b, b, a" is specified as 'unpredictable' for ARM in order to accommodate hardware optimisations [15] . Thus "MUL c, a, b" cannot be modelled as c := a × b without a side condition.
The judgments of our framework are total-correctness specifications that state the functional behaviour and resource usage of machine-code programs. We use a separating conjunction, similar to that of separation logic [13] , in order to write concise specifications about resource usage as well as to avoid unwanted aliasing between special purpose registers (and normal registers as motivated above). Our specifications allow multiple code segments and use positioning functions to enable reasoning about mixtures of position independent code and position dependent code. As a result, procedures and procedural recursion is readily handled (without assuming an unbounded stack).
The Hoare triples described in this paper have been defined in higher-order logic. Rules for reasoning about them have been derived from the formal definitions of the Hoare triples, using the HOL4 system [6] (thus the rules are sound). We can reason about ARM machine code by instantiation of our framework's Hoare triples to a high-fidelity model of the ARM machine language. The specialisation of our framework to ARM machine code is presented in a companion paper [10] . Here we concentrate on the core ideas of our approach. This paper is not the first to address the problem of verifying realistically modelled machine code. Some early work was done by Maurer [9] , Clutterbuck and Carré [5] and Bevier [3] . Boyer and Yu [4] did impressive pioneering work on verifying machine code written for a commercial processor: they verified programs using the bare operational semantics of a model of the Motorola MC68020. Projects on proof-carrying code (PCC) [11] and particularly foundation PCC [1] have ignited new interest in verification of low-level code. Of work on PCC, Tan and Appel's work [16] is particularly relevant to this paper: they use a Hoare logic to reason about a detailed model of the Sparc machine language. As for most work on PCC, their aim is to address safety properties that can be proved automatically (e.g. type safety). Tan and Appel's approach is hampered by the requirement of an extensive soundness proof. Hardin, Smith and Young [7] verify machine code for Rockwell Collins AAMP7G using a form of symbolic simulation. Work by Klein, Tuch and Norrish [8] has similar goal as ours, but they reason at a higher level about realistically modelled C programs.
The remainder of this paper is organised as follows. Section 2 gives an overview of how our specifications relate to those of standard Hoare-triples and motivates our design decisions. Section 3 contains the bulk of the material: it defines a Hoare triple for machine code, presents an example and shows how rules can be derived for procedures and procedure calls. Section 4 demonstrates how the framework can be instantiated to a given operational semantics of a machine language. Section 5 concludes with a summary.
Approach
This section motivates some key design decisions informally and gives an overview of the main ideas. The detailed definitions are given in the next section.
Basic Specifications
Our framework supports code specifications with multiple entries, multiple exits and multiple code segments, but for simplicity we start by considering specifications having single entry, single exit and single code segment. The full generality is described in Section 3.
Consider the ARM implementation of the factorial function given in the introduction. In classical Hoare logic, its specification could be written as follows with a side-condition:
The registers associated with a and b are distinct.
This specification is not satisfactory because it leaves many aspects unspecified. For example, it does not say whether the code modifies the status bits or what happens to the program counter. We require specifications to mention each component of the state that might be altered during execution. That way we can easily see what is changed and what is not. Our approach is similar to that of separation logic [13, 12] , which assigns a memory footprint to each assertion. We make a stricter requirement: every state component (e.g. register, memory location, status bit) must appear in the footprint of an assertion. In our framework, the factorial program has the following specification where, for now, informally read R a x as "register a has value x", S b as "the status bits have value b", underscore ( ) as "some value" and P * Q, following separation logic, as "P and Q are true for disjoint parts of the state" (precise definitions of these concepts are given later).
The superscript +4 specifies that FACTORIAL increments the program counter by 4. The separating conjunction * avoids the need for the side-condition, since the side condition is implied by the occurrence of * between R a x and R b in the precondition.
Heterogeneous Specifications
Machine-code programs depend on a variety of different resources. Even in a simple setting we encounter registers, special registers, memory locations and various status bits. For this reason we treat all types of resources uniformly. Consider for instance the specification of the instructions str (store) and dstr (decrement-and-store). Read M x y as "memory location x has value y".
These specifications have a similar form to that of the factorial program, even though they specify the behavior of different types of resources.
Hoare-style reasoning can be applied to specifications. For example, dstr given above can implement a stack push for a descending stack. We can state this with a specification stack(sp, xs, n) defined to assert that the stack pointer (taken to be register 13) has value sp, that xs is on the stack and that there are n unused slots on top of the stack. We will use the HOL list notation [x 0 ; . . . ; x m ] and the cons function defined by cons
is then defined to be R 13 sp * ms(sp, xs) * blank(sp−1, n).
Using this specification of a stack segment we are able to derive a specification for stack push from the specification of dstr:
{R a x * stack(sp, xs, n+1)} dstr b a {R a x * stack(sp−1, cons x xs, n)} +1
Positioning Functions
We use positioning functions to make our Hoare triple general. These functions are written as superscripts in our notation: {P } f cs g {Q} h . We omit superscripts that are the identity function (λx.x). The positioning functions specify entry points, exit points and code placement with respect to a variable base address. More concretely, {P } f cs g {Q} h states the following: for any address p, if the program counter points at address f (p), the code sequence cs is stored at address g(p) and P holds, then some time later the program counter will reach address h(p) in a state where Q holds.
The positioning functions can be used to make position-independent specifications, position dependent specifications and mixtures of the two. A specification is position independent if the positioning functions describe offsets: we use +n to abbreviate λx.x+n, −k to abbreviate λx.x−k and write nothing to mean a null offset, i.e. λx.x. A specification is position dependent if it ignores its argument: e.g. λx.5 and λx.y. These positioning functions are useful as they can capture some of the nontrivial control structures used in machine-code. For example, the control structure of a procedure is easy to define: procedures are given a return address to which they must jump on completion. If we suppose that register 14 holds the return address, then we have the following format for procedure specifications:
The superscript λx.y specifies that the value of the program counter is y on exit from cs no matter what it was on entry to cs. Section 3.6 presents a derivation of a call rule that evaluates the effect of a call to such a procedure.
The call rule and stack assertion, from above, have been used in the verification of recursive procedures in ARM code. An example of such a procedure is the code called BINARY SUM shown in Figure 1 . BINARY SUM calculates the sum of values attached to the nodes of a binary tree. The trees we consider have nodes consisting of a value and addresses of two subtrees. Address 0 refers to the empty subtree. A predicate stating that tree t is stored with root at address x:
The specification of BINARY SUM states that BINARY SUM adds to register s the sum of the nodes of a tree that is addressed by register a. The specification also states that no more than 2 × depth(t) words of stack space is required during execution.
([ ] is the empty list and stack(sp, [ ], n) = R 13 sp * blank(sp − 1, n)).
The formal ARM specification of BINARY SUM requires some of the entities to be aligned addresses. Such details appear as slight variations of predicates M and R, for details see the companion paper [10] .
Excessive Separation
The separating conjunction * is set up in such a way that an occurrence of R a x * R b y in a precondition will always imply a = b. This is both a weakness and a strength of our approach. It is a weakness since we will need many specifications for what seems to be special cases of a single operation. For instance, binary operators are given 5 different specifications.
What appears to be an excessive use of * is actually often a benefit. As mentioned earlier, not all the specifications above are true in every case. Furthermore, and particularly important, the separating conjunction makes the mechanisation significantly easier, as technicalities concerning register name aliasing diminish.
Hoare Triple for Machine Code
This section defines a Hoare triple for machine code and formalises what was informally presented in the previous section. This section ends with an example of how proof rules can be derived for procedure calls.
State Representation
We assume that a state is represented as one large set of basic state elements, where each element is an assertion specifying the state of a particular resource. State sets are required to enumerate all the resources of the observable state. In this presentation concrete states are enumerations of the following form:
32
memory elements Mem a y (memory address a holds value y) and one status bit Status b (the status bit is b). No state is allowed to duplicate a basic state element, e.g. register 3 must not occur, in any state, as both Reg 3 34 and Reg 3 45 . We will denote the set of all well-formed states by Σ, thus members of Σ represent states. Issues regarding restrictions on Σ are discussed further in Section 4.
The basic assertions described informally in the previous section can now be defined as predicates on states.
R
. Separating conjunction ( * ) and the notion of "some value" (written as a postfixed operator ) are then defined by:
Execution Predicate
The judgments of our Hoare logic are based on assertions about processor executions. We define the execution assertion P ; Q to mean that execution starting from any state which has a part satisfying P , will reach a state where only the part initially satisfying P has been changed and satisfies Q. Note that this incorporates a 'frame assumption'. The formal definition assumes a next-state function next : Σ → Σ and then uses run(s, n) to denote the state reached after n steps starting from s (i.e. run is defined recursively by run(s, 0) = s and run(s, n+1) = run(next(s), n)).
The following frame-rule, similar to that of separation logic, easily follows.
Code Assertion
The basic execution predicate determines how the underlying processor executes on a bare state. In order to specify how code executes we need first to specify how code is located in memory and what the value of the program counter has.
Asserting the value of the program-counter is generally simple, say R 15 p if register 15 is the program counter. Let pc(p) be such an assertion. Making a general assertion about the code in memory is more difficult. The idea is to use a kind of assertion we call a code-pool , which asserts that a union of possibly overlapping code sequences are part of the memory. Our approach is similar to that of Saabas and Uustalu [14] and Tan and Appel [16] .
The definition of code-pool assertions uses a set-based separating conjunction operator expressing the * -combination of the elements of an arbitrary set. Informally:
{P 1 , · · · , P n } = P 1 * · · · * P n (when P 1 · · · P n are distinct). The formal definition is based on a partial bijection between predicates P i and partitions of the state set. The definition is straightforward, but has a few subtle details which are not particularly interesting. It is omitted due to lack of space.
A code pool is an assertion obtained by applying to the union of sets of basic instruction assertions M p c, where M p c specifies that instruction c is executed if the program counter has value p (this is a special case of the notion of basic instruction assertion that we actually use). If cs is a sequence of instructions, then M set(p, cs) denotes the set of assertions stating that the sequence starts at position p and runs consecutively from there.
A pair (cs, f ) is a code sequence cs together with a specification f of where to position it relative to a base address (see Section 2.3 for a discussion of positioning functions). We use C to range over sets of such pairs, and then define:
The intuition is that mpool(p, {(cs 1 , f 1 ), · · · , (cs n , f n )}) is the same as the expansion of ms(f 1 (p), cs 1 ) * · · · * ms(f n (p), cs n ) with the duplicated M -assertions removed by the set union. The benefit of using such a code pool is that it allows code sequences to overlap and builds into the representation the removal of duplicate sequences. This benefit is particularly apparent in the rule for procedural recursion, Section 3.6.
At the end of a verification of concrete code one can of course not have distinct sequences of code that overlap. Such an arrangement makes the precondition(s) of the machine-code Hoare-triple (defined in the next section) false and hence the specification trivially true. The following two equivalences simplify a codepool into a simple sequence assertion.
1 Note that in the equation below and later, +length(cs) denotes the function that adds the length of cs, thus +length(cs)•f is the function λn. length(cs) + f (n).
Hoare Triple
In Section 2 we discussed a Hoare triple {P } f cs g {Q} h . We will shortly generalise this to have sets of preconditions, sets of code sequences and sets of postconditions, but first we give a formal semantics of the simple case.
We can read {P } f cs g {Q} h as asserting that if the processor is started from a state satisfying P and (for any p) if f (p) is in the program counter and the code cs stored as a sequence from address g(p) onwards, then it will reach a state satisfying Q. The specification also guarantees termination with the code unchanged and the program counter updated to h(p). The functions f and g are frequently the identity function, in which case the program counter points at the first instruction in the sequence of instructions cs. Notice that the meaning of * ensures that the precondition P * ms(g(p), cs) * pc(f (p)) only holds when P does not mention the program counter or any memory location where cs is stored.
We generalise the simple case to multiple preconditions, code segments and postconditions, each with positioning functions f i , g i and h i , respectively:
The intuition is the following: if all the code segments are present in memory, then whenever one of the preconditions {P i } fi is true, some time later (at least) one of the postconditions {Q j }
hj will be true. For the definition of the general Hoare-triple collect the preconditions, code segments and postconditions into respective sets P = {(
The machinecode Hoare-triple, which is written here as P | C | Q , is defined using disjunction over as set of predicates (formally: X = λs. ∃P ∈ X . P s).
A variety of rules have been derived from this definition of Hoare triple. Some of the rules are presented in Figure 2 . The rules for frame, shift and compose are used when joining specifications (as illustrated in the next section). Strengthen, weaken and merge are used when specifications are simplified. Contraction, extension and loop elimination add/remove entry points, exit points and code segments. The rule for loop elimination removes any number of interconnected exit points that match some set of entry point for a decreasing variant. The equivalences are mainly used in derivations of new rules.
Example: Composition
The rule for composition given in Figure 2 is quite abstract. We demonstrate its use by composing a specification of a decrement instruction and a branch instruction (c.f. the instructions of the factorial program). The branch instruction has two exit points, thus two postconditions. We illustrate the three possible compositions below.
Let ":" denote insertion into a set and "≺" denote any well-found relation.
Frame, shift and compose.
Contract, extend, strengthen and weaken. Composition is commonly done in three stages: first the scope of the specifications is extended so that the footprints match, then the positioning functions are made to match by a shift and finally the composition rule is applied followed by an application of a code merge if applicable.
We start by constructing a specification for " subs a a 1; bne k". The frame rule is used to extend the specification of bne and b is instantiated:
A shift by +1 makes the precondition of bne match the postcondition of subs:
An application of the composition rule followed by a code merge yields:
Alternatively, the specification for subs can be tacked onto either branch of bne. The compositions are done with shifts +1 and +k, respectively. The composition with shift +k results in a specification with two code segments.
Example: Procedures and Procedural Recursion
This section illustrates how specifications for procedures and procedure calls fit into our framework. We define the control-flow contract of a procedure and a procedure call, derive a rule stating the effect of a procedure call and finally present a rule that we have found useful when proving recursive procedures.
The standard contract of a procedure can be captured easily within our framework. Commonly a procedure is given a return address to which it must jump upon completion. Given a resource, say, lr that holds the return address we can specify a reasonably general contract as follows: Specifying a general procedure call is slightly more involved in our framework. We define a call to be a jump that starts with the program counter set to h(p), for any p, stores the address g(p) in lr and jumps to address f (p).
The ARM instruction for branch-and-link BL satisfies a specification that is essentially the same as call(+k, {(BL k, +0)}, +0, +1).
The effect of executing a call call(f, C, h, g) to a procedure proc(f, P, C , Q) is described by the call rule, derived in Figure 3 .
The call rule is quite general. It does not restrict the procedure body or the call statement to be position dependent or independent. This was achieved by the inclusion of positioning functions h, g and f . Of these functions f has an artificial role when the procedure is position independent. Why should the procedure specification have a positioning function in common with the call specification, if the procedure specification is position independent?
In order to remove this oddity a special rule can be proved for calls to procedures that have the positioning function set to λx.x. proc(λx.x, P, C , Q) ∀p. {(P * lr(p), λx.x)} | C | {(Q * lr , λx.p)} ∀p. {(P * lr(p), (λx.
Informally this rule can be understood as follows: A call with jump function f executes a position-independent procedure with code C , if code C is placed using function f . Procedural recursion of one or more procedures is proved by induction over a bounded variant function that decreases strictly on each recursive call. The observation that each recursive call pushes at least one value (the return address) onto the stack 2 , suggests that induction over the natural number is sufficient. The remaining stack space 3 is a natural number that decreases for each recursive call. We have found the following induction rule useful in proofs of recursive procedures. Let v be some variant function, < be less-than over the natural numbers and ψ be any boolean-valued function.
The parameter C is intended to hold a set of code segments. Notice that C does not occur in the assumption of the premise. The absence of C makes the rule easier to use, as one does not need to assume the code one is constructing. The definitions and theorems of this section were used in the verification of BINARY SUM, Section 3.6. The verification of BINARY SUM was done as a case analysis over the structure of the tree. The case of a leaf was trivial as it exits on the second instruction. The case of a branch required more work. For it we assumed that there is some code C that performs the desired function for the subtrees. We used the second version of the call rule to extract specifications for the BL instructions that perform the recursive calls. The specifications for all twelve instructions were then composed and the cases (leaf and branch) were merged. The induction rule, from above, was specialised to trees by setting v to depth (depth of a binary tree) and then used to eliminate the assumed specifications and imaginary code C . The same induction was also used in proving a variant of BINARY SUM that has the last call replaced by a tail-recursive call. The details of both proofs are given in [10] .
Formalisation and Specialisation
Section 3.1 made restrictions on the format of the sets that are members of the set of valid states Σ. Restrictions are needed in order to ensure the intended meaning of separation for separating conjunction * . This section describes how we avoid such issues in our formalisation of the general case and also how we address them when the general theory is specialised and used.
The general theory, which consists of the definition of the machine-code Hoare triple and its rules, can be proved without any restrictions on the structure of the state sets 4 . The machine-code Hoare triple can be defined and all its rules proved for any set of state sets Σ, given a next-state function next : Σ → Σ 5 , a program-counter assertion pc : α → Σ → B and a basic instruction assertion inst : α × β → Σ → B, for some set α of instruction addresses and some set β of instructions. These abstractions ease the proof effort. All the definitions and rules are parametrised by a 6-tuple (Σ, α, β, next, pc, inst).
When the general theory is instantiated and one wants to prove basic specifications for the elementary operations of a specific language (examples of basic specification: Section 2.2, 2.4 and 3.5), then one has to restrict the shape of Σ so that * has its intended meaning. We have found that a practical method for restricting the shape of the state sets is to have them produced by a function. We define Σ to be the range of a function tr, i.e. Σ = { tr(x) | any x }, for some function tr that produces state sets of a specific form.
The function tr can be a translation function from a different state representation. If this is the case and the translation is accurate enough to also have an inversetr (i.e. ∀x.tr(tr(x)) = x), then one can define the next-state function for the set-based representation (next) using a next-state function over the other state representation (say next sem ): next(s) = tr(next sem (tr(s))). The benefit of defining next according to a next-state function over a different staterepresentation is a practical one. The detailed semantics of a machine-code language might be more readily defined using a state-representation different from the set-based representation that our approach requires. This is the case in the application of our framework to the ARM processor: we generate members of Σ formally from the representations of states used by the ARM model.
Summary
This paper has presented a Hoare logic that has been carefully designed to fit on top of accurately modelled operational semantics of machine languages. Specifications are built on a separating conjunction, that allows concise resource usage specifications and also helps avoid unwanted aliasing. Multiple code segments and positioning functions make our specifications support control flow that allows specifications of procedures and procedure calls, as well as general control flow between position independent and position dependent code. We build on previous work on separation logic [13] and unstructured control-flow [2, 16] .
Our framework has been fully formalised in higher-order logic, mechanised using the HOL4 system and has been applied to ARM machine-code using an existing high-fidelity model of the ARM processor [10] . We have not yet applied our framework to other architectures nor large case studies, but we think we have a methodology and implemented tools that will scale. Demonstrating this is the next phase of our research.
