Abstract. Sometimes machine code is a better target for verification than source code. RISC machine code is especially advantaged with respect to source code because it has just two instructions that interact with memory. This characteristic is the basis for an inference system that can prove code safe against hardware memory aliasing, an effect that occurs in embedded systems. There are programming memes that make programming safe in that context, but we want to show that a given machine code is provably safe. Our system tracks accesses at given offsets from the pointer to a data structure held in memory.
Introduction
In a computer system, 'software' memory aliasing occurs when different logical addresses simultaneously or sporadically reference the same physical location in memory. We are all familiar with it and think nothing of it, because the same physical memory is nowadays reused millisecond by millisecond for different user-space processes with different addressing, and we expect the operating system kernel to weave the necessary illusion of separation. The kernel programmer has to be aware that different logical addresses from different or even the same user-space process may alias the same physical location, but the application programmer may proceed unawares.
However, we are interested in a converse situation, called 'hardware' memory aliasing, where different physical locations in memory are sporadically bound to the same logical address. To the unwary kernel or applications programmer it looks as though adding 1 to the memory address 1 has not yielded the memory address 2 where he/she stored the value that is now to be retrieved. If software memory aliasing is likened to one slave at the beck of two masters, hardware memory aliasing is like identical twins slaved to one master who cannot tell which is which. In this paper we investigate the safety of machine code in the light of hardware memory aliasing issues.
Memory aliasing has been studied before [12] and is the subject of some patents [8, 13] . There appears to be no theoretical treatment of aliasing published, although the subject is broadly treated in most texts on computer architecture (see, for example, Chapter 6 of [1] ) and is common lore in operating systems kernel programming. The 'hardware' kind of memory aliasing arises particularly in embedded systems where the arithmetic components of the processor are insufficient to fill all the address lines. Suppose, for example, that the memory has 64-bit addressing but the processor only has 40-bit arithmetic. The extra lines might be grounded, or sent high; this varies from platform to platform. They may be connected to 64-bit address registers in the processor, so their values change from moment to moment as the register is filled. In that case, it is up to the software to set the 'extra' bits reliably to zero, or one, or some consistent value, in order that computing an address may yield a consistent result.
We first encountered this phenomenon in the context of the KPU [7] , a general purpose 'crypto-processor', i.e., a processor that performs its computations in encrypted form in order to provide security against observation and protection from malware. Because real encryptions are one-to-many, the result of the encrypted calculation of the address 1 + 1 will always mean 2 when decrypted, but may be different from another encryption of 2. If the two different aliases are used as addresses, then two different memory cell contents are accessed and the result is chaotic.
The same effect occurs in the embedded system that has processor arithmetic with fewer bits than there are address lines; add 1 + 1 in the processor and instead of 2, 0xff01000000000002 might be returned. If those two values both denoting '2' are used as addresses, they may access different memory cells.
There are programming memes that are successful in a memory aliasing environment: if a pointer is needed again in a routine, it must be copied and saved for the next use; if an array or string element is accessed, the address must always be calculated exactly the same way. But whatever the programmer says, the compiler may implement as it prefers and ultimately it is the machine code that has to be checked in order to be sure that memory aliasing is not a risk at run-time. Indeed, in an embedded environment it is usual to find the programmer writing in assembler precisely in order to control the machine code emitted. The Linux kernel consists of about 5% hand-written assembly code, for example (but rarely in more than 10-15 line segments at a time).
An inference system will be set out here that can guarantee a (RISC [2, 11] ) machine code program safe against aliasing as described above. The idea is to map a stack machine onto the machine code. Each inference rule in our system represents what an assembly language instruction [3] for the stack machine does computationally. Choosing an inference rule to apply is equivalent to deciding the stack machine instruction to which the machine code should disassemble to [4, 5] . The choice must be such that the resulting proof tree is well-formed and that can guide disassembly. The stack machine is aliasing-proof when operated within parameters so verifying alias-safety is a question of verifying that the code obtained by disassembly does not overstep certain bounds.
The machine code we can check in this way is ipso facto restricted to that which we can disassemble (and verify). At the moment, that means code that uses string and array data structures without any further internal structure, and which uses machine code 'jump and link' and 'jump register' instructions only for subroutine call and return respectively, and in which subroutines make their own local frame and do not access the caller's frame -arguments are passed to subroutines in registers.
Mistakes in disassembly are possible: if a 'jump register' instruction, for example, were used to implement a computed goto instead of return, it could still be treated as a subroutine return by the verification, which would end prematurely, possibly missing an error further along and returning a false negative. A mistaken return as described would in fact always cause verification to fail in our system, but other such situations are conceivable in principle. A false positive may also arise through mistaken disassembly. So a human needs to check that the proposed disassembly is not wrongheaded. In practice that is not difficult because, as noted above, hand-written machine code at a professional standard consists of short, concise, commented segments. The difficulty is instead that there is a great deal of it to be checked and humans tire easily of checking it. Verification can reduce the burden to just checking disassembly against comments.
The rest of the paper is structured as follows: after an illustration of programming against aliasing in Section 2 and a discussion of disassembly in Section 3, an abstract semantics for machine code is set out in Section 4 and worked into an annotation-based verification method for code in Section 5.
Programming memes
We model aliasing as introduced when memory addresses are calculated in different ways. A memory address may be copied and used again without hazard. But if even 0 is added to it then a different hardware alias of the address may result, such that accesses via the new alias do not coincide with accesses via the old one.
That is particularly a problem for the way in which a compiler -or an assembly language programmer -renders machine code for the stack pointer movement around a function call. Classically, the subroutine starts by decrementing the stack pointer to make room on the stack for its local frame. Just before return, it increments the pointer again. The pseudo-code is shown on the left in Table 1 at restoring the stack pointer puts a hardware alias of the intended address in the sp register, and the caller may receive back a stack pointer that no longer points to the data. The code on the right in Table 1 works correctly; it takes an extra register (gp) and instruction, but the register can be saved on the stack and restored before return, avoiding the loss of the slot. Strings and arrays are also problematic because different ways of addressing the same element in these structures also cause aliasing. To avoid it, the elements of string-like structures should be accessed by incrementing the base address in constant steps, then by an offset no greater than the designated step size to access the elements between steps; array elements must be accessed via offsets from the array base address alone. These techniques ensure that only one calculation for the address of any given element exists, avoiding code that may give rise to aliasing.
Disassembly
Nothing in the machine code indicates which register holds a subroutine return address, and that affects which machine code instructions may be interpreted as a return from a subroutine call. To deal with this and similar issues in an organised manner, we describe rules of reasoning about code both in terms of the machine code instruction to which they apply and an assembly language instruction for a more abstract stack machine that the machine code instruction may be disassembled to. We do not claim to be able to disassemble all existing RISC machine code programs, but only those which have been Table 2 : Underlying semantics of RISC machine code as actions on a state triple consisting of a vector of 32 registers R, a vector of (usually 2 32 or more) memory locations M , and the program counter p. The latter gives the address of the current instruction. The named ra register is standardly used to hold a subroutine call return address, and M ⊕ {a → v} is the vector M overwritten at location a with the value v. The processor arithmetic (bold font '+') is distinguished from the instruction addressing arithmetic (light font '+'): r1, r2 are register names or indices; k is a signed 16-bit integer. instruction mnemonic semantics Table 3 : Stack machine instructions: the n are small integers, the r are register names or indices, and the a are relative or absolute addresses. s ::= cspt r | cspf r | rspf r | push n // stack pointer movement | get r n | put r n | . . . // stack access | newx r a n | popx r n | getx r n(r) | putx r n(r) | . . . // string operations | newh r a n | lwfh r n(r) | swfh r n(r) | . . . // array operations | gosub a | return | goto a | ifnz r a | . . . // control operations | mov r r | addaiu r r n | . . . // arithmetic operations constructed in accord with the memes sketched out in the previous section. Then we may check formally that the programmer's intention has been carried through correctly. The machine code processor state is a triple (R, M, p), where the register state R is a vector of 32 registers each containing a 32-bit value, M is the memory state, a vector of 2 32 values, and p is the address of the current instruction. The semantics of the RISC machine code instructions are described as state-to-state transformations in Table 2 .
The stack machine state is a 4-tuple (R, K, H, p), where the register state R consists of 31 registers (it is missing the stack pointer register), a stack K consisting of the top part of memory above the stack pointer value, the 'heap' part of memory H below the stack pointer, and the address p of the current instruction. That state is an abstraction:
The missing value s of the stack pointer is needed to recreate the processor state. addiu r r n push -n popx r n addaiu r r n lw r1 n(r2)
machine code assembly language lb r1 n(r2) getb r1 n lbfh r1 n(r2) getbx r1 n(r2) sb r1 n(r2) putb r1 n sbth r1 n(r2) putbx r1 n(r2) jal a gosub a jr r return j a goto a li r a newx r a n newh r a n bnez r a ifnz r a
The stack machine instructions (listed in full in Table 3 ) manipulate the components of the stack machine state. They are assembled to machine code as shown in Table 4 . The stack access instructions put r 1 n and get r 1 n are assembled to machine code sw r 1 n(sp) and lw r 1 n(sp) instructions in which the stack pointer register sp is explicit. The stack machine instructions cspt, cspf, rspf, push manipulate the stack pointer: the cspt r 1 ('copy stack pointer to') instruction saves a copy of the stack pointer in register sp to register r 1 and is assembled to move r 1 sp; the cspf r 1 ('copy stack pointer from') instruction refreshes the stack pointer in register sp from a copy in r 1 that has the same value that was saved earlier (we will not explore here the reasons why a compiler might issue such a 'refresh' instruction) and is assembled to move sp r 1 ; the rspf r 1 ('restore stack pointer from') instruction returns the stack pointer in sp to a value that it held previously by copying an old saved value from r 1 and is assembled to move sp r 1 ; and push n decrements the stack pointer in sp by n, extending the stack downwards and is assembled to addiu sp sp −n.
The same machine code instructions may be interpreted as stack machine instructions that manipulate a 'string pointer' in register r 2 = sp. The pointer value a is introduced via a machine code li r 2 a instruction assembled from the stack machine newx r 2 a n instruction. The stack machine popx r 2 n instruction steps through it in increments of n bytes at a time, and then its elements are accessed with stack machine putx r 1 k(r 2 ) and getx r 1 k(r 2 ) instructions, for 0 ≤ k ≤ n − 4. The same machine code instructions may also be interpreted as setting up an 'array' area size n bytes in memory (newh r 2 a n), and accessing its elements (swth, lwth).
There are also 'b' ('byte-sized') versions of the get, lwfh, getx instructions named getb, lbfh, getbx. Whereas get assembles to the RISC lw instruction, getb assembles to the RISC lb instruction. It has the same format and works just like lw but transfers only one byte to the register, zero-filling the top bits. Similarly for putb, sbth, putbx, as listed in Tables 4 and 3. Table 5 : Annotations a assert a binding of registers r or stack slots (n) to an annotated type t. One of the register names may be starred to indicate the stack pointer position. A type is either 'uncalculated', u, or 'calculated', c. Either may be decorated with '!n' annotations indicating historical writes at that offset from the typed value when used as an address. A c base type may also be superscripted by a 'tower' of natural numbers n denoting 'frame sizes' (see text), while a u base type may have a single superscript (also denoting size). We also use1 for a tower 1
of undetermined extent and a single repeated size. Also, formal type variables x, y, etc are valid stand-ins for annotated types, and formal 'set of offsets variables' X, Y, etc are valid stand-ins for sets of offsets.
a ::= r [*] , . . . , (n), . . . = t; . . .
Disassembly is where human intuition can help in the formal method. But the formal method also helps the human by providing reasoning that can drive disassembly along.
Motivating semantics
We will restrict the commentary here to the ten instructions from the 32-bit RISC instruction set architecture shown in Table 2 . These are also the elements of a tiny RISC-16 machine code/assembly language [9] . Because of their role in RISC-16, we know that they form a complete set that can perform arbitrary computations.
We suppose in this paper that programs are such that the stack pointer always remains in the sp register. Copies may be made of it elsewhere using the move (copy) instruction, and it may be altered in situ using the addiu instruction. Adding a negative amount increases the stack size, and stack conventionally grows top-down in the address space. We also suppose that the return address pointer is always in the source register r at the point where a ('jump register') jr r instruction is executed, so that the latter may be interpreted as a stack machine return instruction.
A program induces a set of dataflow traces through registers. A dataflow trace is a unique path through registers and stack memory cells that traces movement of data. The segments of the trace may be labelled with events as detailed below, signifying data transformation, or they may be unlabelled, signifying transfer without transformation. Each trace starts with the introduction of a value into a register, either from the instruction itself in the case of the li and the source is shown as a blank triangle, or by hypothesis at the start of a subroutine and the source is shown as a vertical bar.
li r a newh r a n
The left hand diagram above shows the introduction of the address a of an array of size n into register r, the li machine code instruction having been disassembled to newh. The indices of those elements already written to the array are recorded in the set X.
Usually that is the full set of indices up to n and the address is that of an array written earlier. The label on the arrow is a annotated type (Table 5) , indicating an introduction event. The annotated type brought in with the array pointer introduction is u n !X standing for an address that may not subsequently be altered ('u', or 'uncalculatable') of n bytes of memory, that has been written to at each of the offsets in the set X.
If the li instruction is instead interpreted as introducing the address a of a 'stringlike' object, then the annotated type brought in is cn!X standing for an address that may be altered ('c', or 'calculatable') and stepped in increments of n bytes. The X again stands for a set of offsets from the base (up to n bytes) at which the structure has been written. The same pattern X applies at every increment n along the string. The form 'n' is meant to be understood as 'n 
!X
that the stack pointer is associated with (for some finite sequence n 1 ,. . . ,n k as superscripts) and which records a historical sequence of local stack frames created one within the scope of the other culminating in a current stack frame of size n 1 bytes.
Each trace that we consider ends with the return from a subroutine call. Only traces that have reached some register r 2 at that moment are 'properly terminated'. Any other trace (i.e., one that has reached a stack cell) is not considered further. In the call protocol that we allow here, the subroutine's local frame is created at entry and destroyed at return and the data in it is not shared with the caller:
. . . We aim to constrain the possible sequences of events along traces. The events are:
1. !k for a write at stack offset k with put r k (or putx, swth for strings, arrays); 2. ?k for a read at stack offset k with get r k (or getx, lwfh for strings, arrays); 3. u n for the introduction of an array data address a via newh r a n; 4. c n for the introduction of a 'string' data address a via newx r a n; 5. τ for the introduction of data of any kind τ 'by hypothesis'; 6. c 0 for the production of new data via the addaiu or other arithmetic instruction; 7. n↑ for the creation of a new stack frame of size n bytes via push n; 8. n↓ for restoring the previous stack frame, terminating a frame of size n bytes via rspf n. (or popx when moving along a string); 9. nothing, for maintaining the data as-is or copying it.
An event does not always occur on the link one might expect: for example, reading data to r 1 with lw r 1 4k(sp) evokes an event on a 'sp to sp' link in Fig. 2 , not on the '(k) to r 1 ' ('stack slot k to register r 1 ') link that the data flows along. We wish to enforce the following restrictions. First, on the stack pointer:
(a) every !k and ?k event is preceded by a last n↑ event that has n − w ≥ k ≥ 0 (where w is the number of bytes written), so stack reads and writes do not step outside the local frame of the subroutine; (b) every ?k event is preceded by a !k event that takes place after the last preceding n↑ event, so every read is of something that has been written; (c) every n↓ event is preceded by a last m↑ event with m = n, and so on recursively so stack pushes and pops match up like parentheses; (d) no trace containing a c or u event other than an originating c 0 may eventually pass through the stack pointer register, so the only operations allowed on the stack pointer are shifts up and down; (e) every n↑ event is with n > 0.
Secondly, on the traces through registers containing a string pointer:
(a) every !k and ?k event is within the bound n established by the introduction cn on the trace, in that n − w ≥ k ≥ 0, where w is the width of the transferred data; (b) there is no (b) constraint; (c) every n↓ event is with n equal to the string increment established by the introduction cn on the trace; (d) no trace containing any other event than the cn introduction and subsequent n↓ shifts may later pass through the string pointer register, so the only modifications allowed to the string pointer are shifts down; (e) there is no (e) constraint.
The constraints applied to traces through array pointers are stricter: (a) every !k and ?k event is within the bound n established by the preceding introduction u n on the trace, in that n − w ≥ k ≥ 0. (b) there is no (b) constraint; (c) there are no n↓ or n↑ events allowed; (d) no trace containing any other event than the u n introduction may later pass through the array pointer register, so no modifications to the array pointer are allowed; (e) there is no (e) constraint.
We express these constraints formally below. Starting with the event that introduces an annotated type τ we accumulate a running 'total' annotated type along each trace. The first two equations and their guards express the constraints on an array pointer. Shifts of the base address are not allowed and reads and writes are restricted to the array bound:
The next three equations express the constraints on a string pointer. Additionally, over the array pointer equations, shifts-down on (increasing) the pointer are allowed:
The next four equations express the constraints on the stack pointer. Additionally, over the string pointer equations, shifts-up on (decreasing) the pointer are allowed. The first two equations make shifts nest like parentheses:
These calculations bind an annotated type to each register and stack cell at each point in the program. Does the same register get the same type in every trace calculation? Traces converge only after a nand (when the type computed is c 0 , so 'yes it does' in this case) and after a jump or branch. In these latter two cases we specify:
The calculated type at the same registers or stack slots must be the same across different traces starting from the same entry point for the programs considered. (*)
The programs in which (*) is true are the only programs we consider. They are programs that re-establish the same pattern of annotated types at each point at every pass through a loop and no matter which path through to a given point is taken.
The annotated types that get bound to registers and stack slots are the values in the states of an abstract stack machine whose instruction semantics is described by Figs. 1  and 2 and (5-13) . That may be shown to be an abstract interpretation of the instruction trace semantics in a stack machine. That in turn abstracts a machine code processor via disassembly and (1-4) .
Call an attempt in the stack machine to read or write beyond the current local frame out-of-bounds. That the abstract stack machine that calculates with annotated types is an abstract interpretation of the stack machine that calculates with integer words means that an out-of-bounds access in the stack machine must evoke a !k or ?k event on a trace through the abstract stack machine where k is not bounded by the size n of the last n↑ event on the trace. But that is forbidden by (5) (6) (7) (8) (9) (10) (11) (12) (13) in the abstract stack machine. So if we can verify that (5-13) hold of a program in the abstract stack machine, out-of-bounds accesses cannot happen in the stack machine.
If out-of-bounds accesses in the stack machine cannot happen, then we argue that aliasing cannot happen in the machine code processor. The argument goes as follows: the base address used for access via the RISC lw or sw instructions must be either 1. the stack pointer (disassembly is to put, get, putb, getb and the base address register gets the annotated type c f !X for some finite tower of frame sizes f ); 2. the base address of a string, incremented several times by the string increment (disassembly is to putx, getx, putbx, getbx and the base address register gets the annotated type cn!X for some string step n); 3. the base address of an array (disassembly is to swth, sbth, lwfh, lbfh and the base address register gets the annotated type u n !X for some array size n).
Those are the only annotated types allowed by (5-13) on the abstract stack machine to be bound to the pointer's register at the moment the event !k or ?k happens. In the first case, the offset in the accessing instruction is less than the stack frame size, in the second case less than the string increment, and in the third case less than the array size. Those calculations are the only ones that can be made for the address of the accessed element, and they are each unique. For example, in case 1, the address used is s + k, where s is the stack pointer and 0 ≤ k ≤ n − w, where n is the local frame size and w is the size of the data accessed. If two such accesses from the same frame are at arithmetically equal address aliases s + k 1 ≡ s + k 2 but s + k 1 = s + k 2 identically. So k 1 ≡ k 2 arithmetically but k 1 = k 2 identically. But k 1 and k 2 are small numbers in the range 0 to n, where n is the frame size. If they cannot be distinguished by the processor arithmetic, then something is deeply wrong with the processor design. Accessing an element of a parent frame with s 1 + k 1 ≡ s 2 + k 2 where s 1 = s 2 − n is simply out of the question because k 1 is restricted to the range 0 to n.
We conclude that accessing different aliases of the same address is impossible if the abstract interpretation of the program as set out by Figs. 1 and 2 can be verified to satisfy (5-13).
Reasoning about annotations and annotated types
According to the previous section, instructions can be interpreted abstractly as computing annotated types bound to registers and stack slots. In this section we develop a nand r1 r2 r3 nand r1 r2 r3 system of annotation on program code and a logic for reasoning about them with which we can verify that (5-13) are satisfied in the abstract interpretation, and thus conclude that the underlying program is aliasing-safe. Consider the 'good' pseudo-code of Table 1 Table 6 : Non-aliasing subroutine machine code pattern.
mented as machine code in Table 6 . The incoming annotation on the sp register is 'c 0 ', which indicates a pointer to a zero-size frame. The 'return address' register ra is supposed to contain the program address to return to after the subroutine finishes, which is indicated by a u 0 annotation. The first instruction in subroutine foo copies the stack pointer to register gp and we infer that register gp also gets the 'c 0 ' annotation, using a Hoare-triple-like notation:
The stack pointer location (in the sp register) is indicated by an asterisk. The arithmetic done by the next instruction is applied to the stack pointer register, and creates a new local frame with 32 bytes of space on the stack:
Suppose the annotation on the gp register is still valid at the end of subroutine foo, so the stack pointer register is finally refreshed by the move instruction with the same annotation as at the start:
The return (jr ra) instruction does not change these annotations, but it requires that register ra be marked as 'u', which is the case here, and sets the same annotation again:
Provided the logic used to set up these annotations is sound with respect to the abstract semantics discussed in Section 4 (and it is), and the intervening code can also be annotated successfully in this way, and verifies (5-13), then aliasing cannot happen in the underlying machine code, as discussed in Section 4. We now write down formal rules for the logic introduced informally above. We start with a list of so-called 'small-step' program annotations introduced by individual stack machine instructions (Table 7) . These instructions are the result of disassembly.
In the list, 'offsets variables' X, Y, etc, stand for sets of offset annotations '!k'. For example, the put gp 4 instruction is expected to start with an annotation sp * = c f !X for the stack pointer register and the annotator will assert that the stack pointer is in register sp (the asterisk) and that the stack frame size tower f starts with some particular number at least 8 in size, in order to accommodate the 4-byte word written at offset 4 bytes within the local stack frame. The next annotation for the stack pointer register is sp * = c f !4!X, indicating that 4 is one of the offsets at which a write has been made. It may be that 4 is also a member of the set denoted by X (which may contain other offsets too), or it may be not in X. That is not decided by this formula, which merely says that whatever other offsets there are in the annotation, '4' is put there by this instruction. The small-step signature for the instruction has the following form:
and considering the effect on the gp register (which may be supposed to have the type denoted by the formal type variable x initially) and the stack slot denoted by '(4)' gives {gp=x; sp * =c 8 !X} put gp 4 {sp * =c 8 !4!X; gp,(4)=x} because whatever the description x of the data in register gp before the instruction runs, since the data is transferred to stack slot '(4)', the latter gains the same description. Generalising the stack frame size tower to f and the stack offset to n, and generalising registers gp and sp to r 1 and r 2 respectively, one obtains the small-step signature listed.
Registers not mentioned in the signature are unaffected. Small-step annotations {Θ} κ {Ψ } for an instruction ι at address a with a disassembly κ generate a so-called 'big step' rule
in which Φ is the final annotation at program end and T denotes a list of big-step annotations {Ψ } a {Φ}, one for each instruction address a in the program (note that, in consequence, branches within the program must get the same annotation at convergence as there is only one annotation there). Thus the big-step rule is an inference about what theory T contains. The rule above says that if {Ψ } a + 4 {Φ} is in theory T , then so is {Θ} a {Φ}. The label justifies the inference by the fact that instruction ι is at address a, and disassembly κ has been chosen for it. The big-step rules aim to generate a 'covering' theory T for each program. That is, an annotation before every (reachable) instruction, and thus an annotation between every instruction. The rule above tells one how to extend by one further instruction a theory that is growing from the back of the program towards the front.
Where does theory construction start? It is with the big-step rule for the final jr ra instruction that classically ends a subroutine. The action of this instruction is to jump back to the 'return address' stored in the ra register (or another designated register). The annotation for it says that there was a program address (an 'uncalculatable value', u 0 ) in the ra register before it ran (and it is still there after), and requires no hypotheses:
The '0' superscript indicates that the address may not be used as a base for offset memory accesses; that would access program instructions if it were allowed. Calling code conventionally places the return address in the ra register prior to each subroutine call. There are just three more big-step rules, corresponding to each of the instructions that cause changes in the flow of control in a program. Jumps (unconditional branches) are handled by a rule that refers back to the target of the jump:
This rule propagates the annotation at the target b of the jump back to the source a. At worst a guess at the fixpoint is needed. The logic of branch instructions (conditional jumps) at a says that the outcome of going down a branch to b or continuing at a + 4 must be the same. But the instruction bnez r b ('branch to address b if register r is nonzero, else continue') and variants first require the value in the register r to be tested, so it is pre-marked with c ('calculatable'):
The case b < a (backward branch) requires a guess at a fixpoint as it does for jump. The annotated incremental history f , likely none, of the value in the tested register is irrelevant here, but it is maintained through the rule. The set of offsets X already written to is also irrelevant here, but it is maintained through the rule. The RISC jal b machine code instruction implements standard imperative programming language subroutine calls. It puts the address of the next instruction in the ra register (the 'return address') and jumps to the subroutine at address b. The calling code will have saved the current return address on the stack before the call. The callee code will return to the caller by jumping to the address in the ra register with jr ra, and the calling code will then restore its own return address from the stack.
Because of jal's action in filling register ra with a program address, ra on entry to the subroutine at b must already have a u 0 annotation, indicating an unmodifiable value that cannot even be used for memory access. And because the same subroutine can be called from many different contexts, we need to distinguish the annotations per call site and so we use a throwaway lettering T ′ to denote those annotations that derive from the call of b from site a. The general rule is:
The '0' superscript means that memory accesses via the return address as base address for lw/sw are not allowed; that would access the program instructions. The stack pointer register has not been named, but it must be distinct from the ra register. We have found it useful to apply extra constraints at subroutine calls. We require (i) that each subroutine return the stack to the same state it acquired it in (this is not a universal convention), and (ii) that a subroutine make and unmake all of its own local stack frame (again, not a universal convention). That helps a Prolog implementation of the verification logic start from a definitely known state at the end of each subroutine independent of the call context -namely, that the local stack frame at subroutine end (and beginning) is size zero. These constraints may be built into the jal rule as follows:
The requirement (i) is implemented by returning the stack pointer in the same register (r * with the same r on entry and return) and with no stack cells visible in the local stack frame handed to the subroutine and handed back by the subroutine (the two 0s). The requirement (ii) is implemented by setting the local stack frame on entry to contain no stack, just the general purpose registers, which forces the subroutine to make its own stack frame to work in. Other calling conventions require other rule refinements.
As noted, the small-step and big-step rules can be read as a Prolog program with variables the bold-faced offsets variables X, Y, etc, and type variables x, y, etc.
Example annotation
Below is the annotation of the simple main routine of a Hello World program that calls 'printstr' with the Hello World string address as argument, then calls 'halt'. The code was emitted by a standard compiler (gcc) and modified by hand to be safe against aliasing, so some compiler 'quirks' are still visible. The compiler likes to preserve the fp register content across subroutine calls, for example, even though it is not used here.
The functionality is not at issue here, but, certainly, knowing what each instruction does allows the annotation to be inferred by an annotator without reference to rules and axioms. The li a0 instruction sets the a0 ('0th argument') register, for example, so the only change in the annotation after the instruction is to the a0 column. The annotator introduces the string type, c1, into the annotation there, since the instruction sets a0 to the address of the Hello World string. The annotator assumes that the stack pointer starts in the sp register and that 'main' is called (likely from a set-up routine) with a return address in the ra register. Changes are marked in grey: up with the same annotation down both branches and it is not set in one of them, v0 must have its final type on entry to the subroutine just in order to satisfy the rule. The same artifact is responsible for the requirement on the v1 register on entry. The 'halt' subroutine does not use the stack pointer; its function is to write a single byte to the hard-coded I/O-mapped address of a system peripheral. The annotation for register v1 on output is the taint left by that write.
Conclusion
We have set out a method of annotation that can show that a machine-code program is safe against memory aliasing. Aliasing is assumed to be introduced through arithmetic processing, which may access different hardware aliases of the same logical address.
