Formal Verification of MSSP by Salverda, Pierre M. & Zilles, Craig
Formal Veriflcation of MSSP
Pierre Salverda
salverda@uiuc.edu
Craig Zilles
zilles@cs.uiuc.edu
December 16, 2003
Abstract
MSSP is a new execution paradigm that achieves high performance by removing correctness constraints from
the critical path. A collection of concurrently executing slave processors, which are not on the critical path, check
the operation of a single master processor, whose execution is on the critical path, but is fast because it need
not be correct. This report formally verifles that such an execution model works, in the sense that it correctly
achieves a sequential execution of the application code.
We describe abstract models of both the sequential and MSSP execution paradigms, and distill from these
the fundamental aspects of functionality that are needed to establish their equivalence. The veriflcation itself is
an iterative process. We begin with a number of high-level assumptions, which we use to prove some very basic
results, and then successively reflne our formalisms to argue that our initial assumptions are indeed reasonable.
In formally reasoning about MSSP, we achieve a number of goals. First, and most importantly, we demon-
strate that MSSP is indeed equivalent to sequential execution. Second, we derive an abstract model of MSSP
execution, which permits us to distill the fundamental properties of an MSSP machine that are necessary for
correct operation. Having thus enumerated those properties, we facilitate reasoning about the impacts on correct-
ness (and hence the feasibility) of new design ideas. Finally, we show that operation of the master|which is not
guaranteed to be correct|does not compromise overall correctness. We thus demonstrate that an architecture in
which performance and correctness are pursued as distinct design goals is indeed viable, at least from the point
of view of maintaining correctness irrespective of the activity of the performance sub-component.
1 Introduction
Master/Slave Speculative Parallelization (MSSP) [4] is a recent proposal for speculative parallelization of sequential
programs. The paradigm uses a single processor, called the master, to spawn parallel tasks on multiple slave
processors. Data°ow dependences between the tasks are resolved by the master, which predicts live-in values for
each task by executing an approximate version of the original program. Slaves use the original program code when
executing their appointed task, but consume the speculative live-ins supplied by the master. The results thus
computed are permitted to afiect the machine’s architected (visible) state only when the live-ins used to compute
them are consistent with the current architected state. If inconsistencies are detected, the results are discarded
and the machine resumes its operation using the current, pristine architected state as a starting point.
Two factors faciliate high performance in MSSP. First, slave execution is truly concurrent because live-in
values supplied by the master circumvent the inter-task dependences that would otherwise force serial execution.
Second, the master’s own execution can be made fast because it need only execute an approximate version of
the original program. With su–ciently many slave processors, it is generally the master that determines overall
MSSP performance. In turn, the master’s contribution to performance is subject to two in°uences: the time spent
executing the approximate code and the accuracy with which the approximate code models the original program.
With aggressive optimization and appropriate selection of task boundaries, a highly e–cient and very accurate
approximate program can be obtained; overall, MSSP is able to achieve signiflcant speed-ups over speculative,
out-of-order superscalar machines [4].
A distinguishing feature of MSSP is its decoupling of performance and correctness concerns. That is, MSSP
physically separates correctness and performance by devoting distinct hardware components to each. Figure 1
depicts this idea. Correctness is maintained by the slaves, which essentially check the results produced by the
master. High performance is achieved by the master, which executes an aggressively optimized, but not necessarily
correct, version of the program.
1
Simple
processor
Simple
processor
Simple
processorcompiler
Simple
Runtime
hints
Input
data
Source
program
Output
data
Correct
binary
Distilled
binary
Correct path
Fast path
Slave
processor
Master
processordistiller
Program
Figure 1: Conceptual organization of an MSSP machine. The master executes an approximate program (the
distilled binary) to run ahead of the slaves, providing hints (the live-ins) of where execution is likely to be headed.
Slaves use those hints to concurrently compute the output results. To ensure correctness, live-ins are checked before
the slaves’ results are permanently committed to machine state.
By separating correctness and performance, MSSP aims to address a problem faced by current architectures: per-
formance and correctness tend to be opposing design goals; ensuring correctness demands simplicity, yet achieving
performance demands complexity in the system. As noted above, concurrency in the slaves moves their performance
ofi the critical path. Slaves can thus be engineered to be slow but simple, which makes their veriflcation easier. In
contrast, the master resides on the critical path, but, being unconstrained by correctness requirements, is amenable
to complex optimizations (in both hardware and in the distilled binary) that aggressively target performance.
The primary goal of this work is to establish correctness of the extant MSSP proposal (as it stands in [3]). To
this end, we demonstrate MSSP’s equivalence to the existing sequential model, showing that any state reachable
by an MSSP machine is also reachable by a sequential machine. We tackle the problem iteratively, developing flrst
a high-level abstract model in which we make a number of simplifying assumptions. Section 3 describes this work.
We then successively reflne the formal models in Sections 4 and 5 to prove that our assumptions in the flrst step
are indeed reasonable.
In developing an abstract model for MSSP operation, we achieve a secondary goal of distilling from the existing
model|which is replete with technology-driven design trade-ofis|a more basic model for MSSP that is devoid of
implementation details. We intend to use this abstract model in reasoning about the correctness of subsequent
iterations of MSSP design.
This work also serves an important role in terms of demonstrating the viability of an architecture that aims
to decouple performance and correctness. More speciflcally, a machine successfully decouples performance and
correctness if it meets two criteria. First, correct-path execution must not in any way be afiected by the fast-path;
that is, correctness must not be compromised by the pursuit of performance. Second, and of equal importance,
performance on the fast path must be (largely) immune to the speed of the correct-path; that is, performance
should not be constrained by correctness. It is the flrst of these two criteria that we demonstrate in this report: we
prove that correctness in MSSP cannot be in°uenced by how the master operates, nor by the instructions contained
in the distilled binary it executes.
Before we introduce the formal models for MSSP and sequential execution (Section 3), we describe the operation
of MSSP in more detail in Section 2, which follows.
2 An overview of MSSP
In this section, we present an overview of Master/Slave Speculative Parallelization (MSSP). This high-level de-
scription is meant to provide the contextual knowledge necessary to understand the formal work that follows in the
remainder of the paper. A more extensive treatment of MSSP can be found in [3] and [4].
Consider again Figure 1. An MSSP machine has two execution paths: the fast and the correct path. The fast
path is composed of a single, complex processor|the master|that executes an aggressively optimized executable
called the distilled program. The master processor runs ahead of the correct path execution to produce hints of
2
Figure 2: Master processor distributes checkpoints to slaves. The master|executing the distilled program
on processor P0|assigns tasks to slave processors, providing them with predicted live-in values in the form of
checkpoints. The live-in values are verifled when the previous task retires. Misspeculation, due to an incorrect
checkpoint, causes the master to be restarted with the correct architected state.
where the execution is headed. The correct path is implemented by multiple slave processors, which lag behind the
master. Because the individual slave processors are slower than the master, we need a means for the correct path
to keep up. MSSP uses speculative parallelization [2] for this purpose. Execution of the correct path program is
split into segments, called tasks, that are executed concurrently on the slaves. From a correctness standpoint, the
selection of task boundaries is relatively arbitrary, but it can afiect performance.
To enable these tasks to execute independently and in parallel, the master execution is used to predict the
sequence of tasks (i.e. the starting program counter (PC) of each task) and the live-in values to each task. These
are the hints that the fast-path execution provides. The predictions are generated by logically taking a checkpoint
of the master’s state at the point corresponding to the beginning of the task. Low overhead checkpointing can be
implemented by timestamping and bufiering recent register writes and stores performed by the master, which serve
as a \difi" from architected state; the master never updates architected state directly.
Because the master’s predictions are not guaranteed to be correct, the slave processors do not update architected
state immediately. Instead, each task’s inputs (live-ins) and outputs (live-outs) are recorded and sent to the
veriflcation/commit unit. When a completed task becomes the oldest (that is, the next to commit), a memoization-
like operation is performed that commits the outputs (live-outs) if the inputs match architected state.
Furthermore, because the master’s predictions are not required, the slave processors can execute (in sequential
mode) without the master. This allows the distilled program to be constructed for only a (performance critical)
subset of the program. In efiect, MSSP can be thought of as a mode that the system can shift into when a
distilled program fragment exists for the current region of execution. Because MSSP mode is only a performance
enhancement, neither the master nor the distiller need to handle the full functionality of the ISA.
2.1 MSSP example
To facilitate a conceptual understanding of MSSP, we provide an example that outlines its basic behavior. Figure 2
illustrates an MSSP execution with four processors: one master (P0) and three slaves (P1, P2, and P3) that begin
the example idle. The master executes the distilled program and, at task boundaries, spawns a new task in the
original program on an idle slave processor, providing it access to the bufiered checkpoint values. At annotation
(1) in the flgure, the master processor spawns Task B onto processor P2. P2 begins executing the task after some
3
latency due to the inter-processor communication (2). P0 continues executing (3) the distilled program segment
that corresponds to Task B, which we refer to as Task B’.
As the slave Task B executes, it will read values that it did not write (live-in values) and perform writes of its
own (live-out values). If a corresponding checkpoint value is available, it is used for the live-in value; otherwise the
value is read from current visible (architected) state. As the slave tasks are speculative|they are using predicted
live-in values|their live-out values cannot be immediately committed: we have to ensure the live-in values are
correct before committing the live-outs. To this end, we record the task’s live-in and live-out values. When all
previous tasks have completed and updated the architected state, the live-ins can be compared with the architected
state. To avoid inter-processor communication in the veriflcation critical path, our implementation performs this
comparison at the (banked) shared level of the cache hierarchy. Thus, when the task is complete (4), P2 sends its
live-in and live-out values to the shared cache. If the recorded live-in values exactly correspond to the architected
state, then the task has been verifled and can be committed, and the architected state can be updated (5) with the
task’s live-out values.
If one of the recorded live-in values difiers from the corresponding value in the architected state (e.g. because
the master wrote an incorrect value (3)), this mismatch will be detected during veriflcation. On detection of the
misspeculation (6), the master is squashed, as are all other in-°ight tasks. At this time, the master is restarted at
C’ (7) using the current architected state. In parallel, non-speculative execution of the corresponding task in the
original program (Task C) begins (8).
2.2 Program distillation
In both the master processor and the distilled program, we can exploit the fast path’s lack of correctness constraints
to implement optimizations that are not guaranteed to be correct. MSSP’s fast-path compiler (the program distiller)
exploits this property to eliminate predictable computations from the fast-path. The predictable program behaviors
are removed via approximation transformations. Unlike traditional compiler transformations, which can only be
applied in situations where they preserve all potential program behaviors, approximation transformations are
allowed to arbitrarily violate correctness. By using proflle information, we can apply these transformations such
that the common case performance is improved and correctness violations are minimized.
To make the fast-path’s hints useful to the correct path, the two binaries have to correspond at task boundaries.
For example, if register r11 is live-in to a task (i.e. the slave reads r11 without deflning it), the distilled program
must include the necessary computation before the task boundary and store its result in r11. This correspondence
requirement can potentially constrain program distillation, but the correspondence does not have to be exact;
transition code|much like the \flx-up code" used by speculative compiler transformations, except that it need not
be verifled|can be used to transform the fast-path’s state into a form expected by the correct path.
3 Proof of equivalence
In this section, we introduce and then prove the equivalence of abstract models for sequential and MSSP execution.
We begin by formally deflning a simplifled model of sequential program execution, in which the operating system
and I/O are absent. In Section 3.2, we then introduce a hypothetical machine that implements the MSSP execution
model. We use this as the basis for our formal model of MSSP execution, which is presented in Section 3.3. By
making a number of \reasonable assumptions", we then prove in Section 3.4 that the two models are equivalent.
Sections 4 and 5 revisit a number of the simplifying assumptions we make here.
3.1 The SEQ execution model
The sequential execution model, SEQ, serves as a reference against which correctness of MSSP is measured. Since
we do not wish to couple ourselves to a particular sequential ISA, we avoid specifying one for SEQ. That is, we
deflne the operation of SEQ at a high level only. We can nonetheless reason about equivalence to MSSP because
we can (reasonably) assume that the slaves implement the same ISA as our sequential reference machine; in other
words, we show equivalence by assuming the same ISA in both models, but without interpreting that ISA.
Our formal models for both SEQ and MSSP are centred on the notion of machine state. The following subsection
introduces this concept.
4
3.1.1 Machine state
Intuitively, a machine’s state is captured by the collection of storage cells from which it is built. The state we are
interested in is that which is visible via the machine’s ISA. More speciflcally, this is the intersection of all storage
cells that are:
† readable, directly or indirectly, by an instruction being executed, and hence can afiect the outcome of that
execution; or
† modiflable, directly or indirectly, through the execution of an instruction.
This characterization is of course not a precise one. We enumerate the above properties only to make explicit the
intuition behind our formal notion of machine state: an uninterpreted domain over which sequential execution is
deflned. The precise deflnition follows.
Deflnition 3.1 (Sseq) We denote by Sseq the set of all states for sequential machines. At this point, the content
of Sseq is uninterpreted. ⁄
Any conflguration of a complete sequential machine is captured by some member of the domain Sseq . However,
the converse does not apply: a given S 2 Sseq does not necessarily model the state of a complete machine; there
exist machine states in Sseq that capture only a subset of a machine’s complete state.
3.1.2 SEQ execution
Execution of an instruction results in updates to one or more storage cells. Accordingly, instruction execution
constitutes a transformation of machine state. Sequential execution of more than one instruction is then naturally
deflned in terms of function composition.
Deflnition 3.2 (Sequential execution) State transition function seq : Sseq £ Z+ 7! Sseq models the sequential
execution of one or more instructions. We deflne seq inductively for all n ‚ 0.
seq(S; n + 1) =
‰
seq step(seq(S; n)) if n ‚ 1
seq step(S) otherwise
‚
seq(S; 0) = S
The function seq step : Sseq 7! Sseq , which models the execution of a single instrcution, is uninterpreted. ⁄
For S0 and S1 2 Sseq , we will write S0 seq; S1 if S0 can be transformed into S1 through the execution of some
number of instructions. That is, S0
seq
; S1 if there exists n such that seq(S0; n) = S1. When n is known, we
write S0
seq
;n S1. Note that this deflnition states only that there exists some number of instructions that efiect
the transition, but it does not qualify which instructions will be involved. The initial machine state|in this case,
S0|determines those instructions implicitly. This stands to reason, since a machine’s storage cells hold both
instructions and data. Speciflcally, the program counter, itself a member of S0, identifles the storage cell in which
the encoding of the next instruction is held.
We now present a simple lemma that follows directly from the above deflnition of sequential execution.
Lemma 3.1 Let n ‚ 0 and S 2 Sseq be given. Then, for all k ‚ 0, seq(S; n + k) = seq(seq(S; n); k). ⁄
Proof By induction on k.
When k = 0, the result follows immediately from Deflnition 3.2: for any n ‚ 0 and S 2 Sseq , seq(S; n + 0) =
seq(S; n) = seq(seq(S; n); 0).
For the induction step, we assume that seq(S; n + k) = seq(seq(S; n); k) for some k ‚ 0. Consider then
seq(S; n + (k + 1)) = seq(S; (n + k) + 1). Applying Deflnition 3.2, this can be rewritten as seq step(seq(S; n + k)),
which, by our assumption, is equal to seq step(seq(seq(S; n); k)). Applying again Deflnition 3.2, this time in reverse,
we get seq step(seq(seq(S; n); k)) = seq(seq(S; n); k + 1), as required. ¥
5
54
Master
Generates random
live−in sets for slaves
Verifier
Local
live−in store
Slave1
Local
rd/wr store
Local
live−in store
Local
rd/wr store
Slave2
Local
live−in store
Local
rd/wr store
Slaven
Architected state
2
1
13
3
live−ins
live−outs
liv
e−
in
s
Figure 3: Conceptual MSSP machine used for formal veriflcation.
3.2 An abstract MSSP machine
We now introduce an abstract version of an MSSP machine that will serve as the foundation upon which we build
our formal model of MSSP execution. We stress that, while this abstract machine captures the salient features of
MSSP operation, it is not intended to serve as a blueprint for a real implementation. Speciflcally, we pay no heed
to performance concerns in our model; we are interested here in correctness, and accordingly organize our design
so as to facilitate abtract reasoning about MSSP. Figure 3 depicts the machine.
Clearly, this difiers in a number of respects from the MSSP infrastructure described in Section 2. Most notably,
the master processor is viewed now as a \black box" that generates live-in sets for the slaves. As such, the formal
model makes no assumptions about the live-in data used by slaves; the master is modelled essentially as a generator
of random data.
Another divergence from the existing MSSP infrastructure is the manner in which slaves are associated with
two local stores: one local live-in bufier and one local read/write store. The former, which cannot be read (directly)
by a slave, holds the live-ins produced by the master for the task currently being executed. The latter constitutes
the slave’s local view of machine state. At the commencement of task execution, the content of the local live-in
store is copied into the local read/write store (thus initializing it); all storage cells that can be read by the slave,
or that have been written to by it, are contained in that store. We choose to model slaves in this manner because
it permits us to make explicit two key properties of MSSP. First, a read-only record of all live-in data used by
a slave is maintained at the slave. Second, writes generated during execution of a task are bufiered locally|and
coalesced, in cases of more than one write to the same storage cell|without being visible to any other slave and
without afiecting architected state.
At the completion of a task, a slave sends the content of its two stores to the verify unit. The verifler checks the
contents of the read-only store (the live-ins) against architected state. If veriflcation succeeds, the contents of the
writable store (the results of slave execution) are committed to architected state; if not, they are simply discarded.
The preceding description of MSSP operation uncovers a set of requirements that must hold if correct operation
of our abstract MSSP machine is to be ensured. Those requirements, which are annotated in Figure 3, are as
follows.
1° Local stores: A local live-in store may hold only what the master supplied at the commencement of a task,
and this content may not change for the duration of that task. Similarly, the local read/write store is intialized
with the content of the live-in store, and, thereafter, may be updated only with data that is produced by the
slave during execution of its task.
2° Slave processors: All slaves must be correct implementations of a given sequential ISA. Slaves may read from
6
and write to their local read/write store only.
3° Interconnect: Live-ins and live-outs must be transferred, without modiflcation, to the verify/commit unit at
the end of a task. That is, the data received by the verifler must be exactly the data held in the originating
store.
4° Verify/Commit: The verifler must be a correct implementation of the veriflcation functionality (we formalize
this behavior shortly). Intuitively, this involves permitting updates to architected state only when all live-ins
are consistent with the current architected state.
5° Architected state: Naturally, architected state must correctly hold all data written to it by the commit unit.
Further, only the commit unit is able to write to architected state.
We will henceforth assume all of the above hold.
3.3 The MSSP execution model
Analogous to the sequential model, we deflne MSSP execution in terms of a state transition function, but we now
make it explicit that it is the machine’s architected state being manipulated. Further, manipulation of state occurs
now at the granularity of tasks rather than instructions. We therefore begin by formalizing the notion of MSSP
tasks.
3.3.1 Tasks
A task represents a unit of work in the MSSP model. Accordingly, we deflne below a formal construct to capture
the idea that a task comprises a set of inputs (the live-ins produced by the master) and a set of outputs (which
accrue as the slave executes).
Deflnition 3.3 (Task) A task is a 5-tuple contained in T = Sseq £ Z+ £ Sseq £ Z+ £Q. hSin ; n; Sout ; k; qi 2 T
denotes a task with live-in set Sin and live-out set Sout . The value n, which is flxed by the master, is the number
of sequential instructions that constitute complete execution of this task; k is the number of instructions that have
been executed by a slave so far (thus, 0 • k • n). The flfth component, q, represents the current execution state of
the task; we deflne Q = fRUN; COMMIT; ABORTg. ⁄
Recall, at the creation of a new task, the master places live-in data in a slave’s local live-in bufier, and this is
copied to the local read/write store when the slave commences execution. Thus, a newly created task has form
hSin ; n; Sin ; 0; RUNi, and, at its completion, has form hSin ; n; Sout ; n; COMMITi; we will relate Sin and Sout later in
this section. The use of state ABORT will be introduced in Section 4.
We deflne a number of functions on T for the sake of notational convenience. Let T = hSin ; n; Sout ; k; qi.
Functions live in : T 7! Sseq and live out : T 7! Sseq produce the live-in and live-out sets for a given task. Thus,
live in(T ) = Sin and live out(T ) = Sout . Similarly, functions length : T 7! Z+ and progress : T 7! Z+ yield
the second and fourth components of a task, respectively: length(T ) = n and progress(T ) = k. The function
state : T 7! Q produces the execution state of a task: state(T ) = q. We also deflne predicate done(T ) to be true if
and only if state(T ) 6= RUN.
Naturally, an MSSP machine operates on more than one task during the execution of a program. We thus
introduce the notion of a sequence of tasks.
Deflnition 3.4 (Task sequence) The set T ⁄ denotes the set of all flnite-length sequences of MSSP tasks. For
T 2 T , we write [T ] to denote the singleton task sequence containing T only. The empty task sequence is denoted †.
Operator j concatenates task sequences. Thus, for ¿ 2 T ⁄ and T 2 T , [T ]j¿ is a new sequence formed by prepending
T to ¿ . Also, ¿ j† = †j¿ = ¿ . ⁄
3.3.2 MSSP execution
Having deflned tasks and task sequences, we are now ready to deflne formally the abstract machine’s overall
operation. We do so through a series of deflnitions.
7
Deflnition 3.5 (MSSP execution) State transition function mssp : Sseq £T ⁄ 7! Sseq captures MSSP execution:
mssp(A; [T ]j¿) =
‰
mssp(commit(A; T ); ¿) if done(T )
mssp(A;mssp step([T ]j¿)) otherwise
‚
mssp(A; †) = A
Function mssp step : T ⁄ 7! T ⁄ captures the activity of the slaves and is deflned as follows.
mssp step([T1; T2; : : : ; Tm]) = [T1; T2; : : : ; slave step(Ti); : : : ; Tm]
for all 1 • i • m for which done(Ti) is false. ⁄
The transition function deflnes MSSP operation at a coarse granularity: the machine either commits completed
tasks to architected state, or it simply advances the state of any tasks that are not yet completed. We will write
S0
mssp
; S1 if there exists a ¿ 2 T ⁄ such that mssp(S0; ¿) = S1; if ¿ is known, we write S0 mssp; ¿ S1.
Recall, we stipulated in Section 3.2 that slaves execute according to the SEQ model. This requirement is
captured formally in the following deflnition.
Deflnition 3.6 (Slave execution) Execution of a slave processor is modelled by function slave step : T 7! T .
For T 2 T such that state(T ) = RUN, we deflne:
slave step(hSin ; n; Sout ; k; RUNi) =
‰ hSin ; n; seq step(Sout); k + 1; RUNi if k < n
hSin ; n; Sout ; k; COMMITi otherwise
‚
The function is undeflned for all tasks T with state(T ) 6= RUN. ⁄
Note that the above deflnition exploits the fact that our abstract machine initializes the read/write store with the
live-in set at the commencement of a task. More precisely, because the slave’s local read/write store begins with all
state supplied by the master, the flrst step in slave execution simply advances that set using the sequential model:
live in(T ) is transformed to seq step(live in(T )). Subsequent steps will likewise advance the set. Extrapolating,
we arrive at the following lemma.
Lemma 3.2 For any T 2 T , live out(T ) = seq(live in(T ); progress(T )). ⁄
Proof By induction on k, the details of which we omit. ¥
Note that the above lemma applies throughout the lifetime of any given task. In particular, it applies at task
completion, in which case live out(T ) = seq(live in(T ); length(T )).
The verify/commit unit updates architected state with the live-out set of a task when it is complete (subject
to certain conditions being met, of course). We call this process superimposition of live-outs on architected state.
Deflnition 3.7 (Sumperimposition) Operator ˆ: Sseq 7! Sseq denotes the superimposition of machine state. It
remains uninterpreted at this point. ⁄
That is, S0 ˆ S1 denotes the superimposition of machine state S1 on S0. We do not interpret this operation in
this section simply because the domain Sseq itself remains uninterpreted. Nonetheless, it is useful to understand the
intuition behind its working: S0 ˆ S1 is a new machine state in which values held by storage cells in S1 \override"
values currently held in those cells in S0. It is permitted that S0 refer to cells not currently held by S1, in which
case S0 ˆ S1 will contain that same state, unafiected by the superimposition. Likewise, if S1 contains state not
held by S0, the superimposition operator efiectively introduces that state into the resulting set S0 ˆ S1.
Having thus introduced and informally characterized superimposition, we can now deflne the conditions under
which the verify unit applies it. Intuitively, we will permit a task to be committed to (superimposed on) architected
state only if the resulting state is the same as that which would have been produced by sequential execution.
Deflnition 3.8 (Task safety) We say T 2 T is safe for A 2 Sseq , and write safe(A; T ), if seq(A; length(T )) =
A ˆ live out(T ). Extending this idea to task sequences, we say [T ]j¿ 2 T ⁄ is safe for A 2 Sseq if safe(A; T ) and ¿
is safe for seq(A; length(T )). ⁄
8
Since live out(T ) = seq(live in(T ); length(T )) at the completion of T (Lemma 3.2), task safety equivalently
characterized by the requirement seq(A; length(T )) = A ˆ seq(live in(T ); length(T )).
Task safety underpins MSSP correctness, and we accordingly devote considerable attention to it in Section 4.
For now we treat it as a basic property that can be checked by the verify/commit unit. Whence the following
deflnition, which completes our formalization of MSSP execution.
Deflnition 3.9 (Commit) The function commit : Sseq£T 7! Sseq , which denotes the action of the verify/commit
unit, is deflned:
commit(A; T ) =
‰
A ˆ live out(T ) if safe(A; T )
A otherwise
‚
⁄
3.4 Equivalence
To prove that MSSP is equivalent to SEQ, we show that any transformation of machine state that can be efiected
by MSSP has a corresponding transformation in SEQ. That is, we show S0
mssp
; S1 ) S0 seq; S1.
3.4.1 Equivalence with safe sequences
We begin by proving a slightly weaker claim: we show that equivalence holds for a safe sequence of tasks. Extending
to any sequence of tasks, which is an easy step, is covered in the next sub-section.
Theorem 3.1 Let ¿ = [T1; T2; : : : ; Tm] be a non-empty sequence of tasks, and let A0 be a machine state for which
¿ is safe. Then:
mssp(A0; ¿) = seq(A0;
mX
i=1
length(Ti))
⁄
Proof We will write Ak as shorthand for seq(A0;
Pk
i=1 length(Ti)). Our proof obligation is thus to show that
mssp(A0; [T1; T2; : : : ; Tk]) = Ak. We use induction on k for this purpose.
The base case follows directly from our deflnition of task safety. Let A0 be some machine state for which ¿ = [T1]
is safe. If the sequence [T1] is safe, it follows (from Deflnition 3.8) that the task T1 is itself safe for A0, and hence
that seq(A0; length(T1)) = A0 ˆ seq(live in(T1); length(T1)). That is, A1 = A0 ˆ seq(live in(T1); length(T1)).
The right-hand side of this expression is easily rewritten (using Lemma 3.2) as A0 ˆ live out(T1), which, by
Deflnition 3.9, is equal to commit(A0; T1). Deflnition 3.5 ensures commit(A0; T1) = mssp(A0; [T1]).
The inductive step proceeds similarly. Assume the result holds for some k ‚ 1. That is, assume Ak =
mssp(A0; ¿), where ¿ = [T1; T2; : : : ; Tk] is safe for A0. Then consider task sequence ¿ 0 = ¿ j[Tk+1] that is also
safe for A0. We want to show mssp(A0; ¿ 0) = Ak+1. Applying Deflnition 3.8 multiple times, safety of ¿ 0 implies
Tk+1 is safe for seq(: : : seq(seq(A0; length(T1)); length(T2)); : : : ; length(Tk)). Using Lemma 3.1, we can rewrite this
expression as seq(A0;
Pk
i=1 length(Ti)). That is, Tk+1 is safe for Ak. This, in turn, implies seq(Ak; length(Tk+1)) =
Ak ˆ live out(T ). From our hypothesis, it follows that seq(Ak; length(Tk+1)) = mssp(A0; ¿) ˆ live out(Tk+1).
The right-hand side of this expression is simply mssp(A0; ¿ j[Tk+1]); the left-hand side is, by deflnition, Ak+1. ¥
3.4.2 Equivalence for all sequences
We can easily extend the above proof to cater for any sequence of tasks, thus concluding our proof that MSSP and
SEQ are equivalent.
Theorem 3.2 Let A0, A1 2 Sseq and ¿ 2 T ⁄. Then
A0
mssp
; ¿ A1 ) 9n ‚ 0 : A0 seq;n A1
⁄
9
Proof Assume A0
mssp
; ¿ A1. It follows from our deflnition of MSSP execution|speciflcally, from Deflnition 3.9|
that architected state is updated if and only if a task is safe for that state. There must therefore exist some subse-
quence of ¿ that efiects the transition from A0 to A1, and that sequence contains only safe tasks. Let ¿ 0 be that sub-
sequence. Since those members of ¿ that are not safe do not efiect any state changes, it follows that mssp(A0; ¿) =
mssp(A0; ¿ 0) = A1. We can then apply Theorem 3.1 to show that mssp(A0; ¿ 0) = seq(A0;
Pk
i=1 length(Ti)), where
¿ 0 = [T1; T2; : : : ; Tk]. That is, the value of n, whose existence is asserted by the theorem, is
Pk
i=1 length(Ti). ¥
4 Task safety
In the previous section, we proved that MSSP and SEQ are equivalent by making the (reasonable) assumption that
task safety is a property that can be checked by the verify unit. In this section, we reflne our formal models to
prove that a more low-level set of checks|which are realistically assumed to be performed by the hardware|can
be used to ensure task safety holds. To do so, we must enhance our formalisms to interpret the domain Sseq and
hence deflne more rigorously the superimposition operator. This will, in turn, allow us to infer some key properties
of sequential execution, which together can be used to reason about the collection of writes that accrue in a slave’s
read/write store. We begin with machine state.
4.1 Machine state
We have so far viewed machine state merely as a set that captures the values held in a machine’s various storage
cells. To formalize this intuition, we deflne the content of a machine state to be a collection of name-value pairs.
The name component of each pair represents a unique identifler for one of the machine’s storage cells; the value
component represents the contents of that storage cell.
Deflnition 4.1 (Machine state) The set of all machine states Sseq is deflned
Sseq = fS µ Nseq £ Vseq : well formed(S)g
The sets Nseq and Vseq , which are uninterpreted, hold, respectively, the names of and values that can be held in a
machine’s storage cells. For S µ Nseq£Vseq , we deflne predicate well formed(S) to be true if and only if (n; v1) 2 S
and (n; v2) 2 S ) v1 = v2. ⁄
Thus, a machine state is a set of uniquely named pairs, each of which deflnes the value held in one of the machine’s
storage cells. The exact set of names and allowable values are not specifled simply because we do not want to deflne
a particular ISA. We need understand only that the set Nseq constitutes the set of storage cells identifled by the
ISA, and, as such, comprises general-purpose registers, the program counter, and main memory addresses.
We are now in a position to deflne the superimposition operator formally. Before we do so, however, we introduce
some terminology that will prove useful shortly.
Deflnition 4.2 (Names) Function names : Sseq 7! Nseq produces the names of all pairs contained in a given
machine state. That is,
names(S) = fn 2 Nseq : 9(n; v) 2 S for some v 2 Vseqg
⁄
Deflnition 4.3 (Cover) We say that S 2 Sseq covers N 2 Nseq if N µ names(S). ⁄
4.2 Superimposition
Recall, operator ˆ: Sseq £ Sseq 7! Sseq captures the action of the commit unit. We previously described its
operation informally, stating that it replaces appropriate members in its left operand with the content of its right
operand. Having formalized the notion of machine state, we can now interpret this operation more precisely.
Deflnition 4.4 (Superimposition) For S1, S2 2 Sseq , we deflne
S1 ˆ S2 = (S1 “ S2) [ S2
where S1 “ S2 = S1 ¡ f(n; v) 2 S1 : n 2 names(S1) \ names(S2)g. ⁄
10
Informally, the superimposition of S2 onto S1 is similar to the union of the two sets, but is biased in the sense that
members of S2 are always chosen over members of S1, whenever there is a choice. This intuitive characterization
of superimposition is expressed more formally in the following lemma, the proof of which follows directly from
Deflnition 4.4.
Lemma 4.1 If (n; v) 2 S1 ˆ S2, then exactly one of the following is true.
1. (n; v) 2 S1 and S2 does not cover fng; or
2. (n; v) 2 S2.
⁄
That is, (n; v) is in S1 ˆ S2 if S1 already contains it and S2 has no such member, or (n; v) is in S2. Note that
in the latter case, S1 may or may not contain a pair with name n; it does not make any difierence because the
like-named pair in S2 supercedes it.
Lemma 4.1 allows us to infer two important properties of operator ˆ, namely its associativity and its preser-
vation of set containment. The following two lemmas present these results.
Lemma 4.2 The superimposition operator is associative. That is, if S1, S2 and S3 2 Sseq , then
(S1 ˆ S2)ˆ S3 = S1 ˆ (S2 ˆ S3)
⁄
Proof We show mutual containment.
(S1 ˆ S2) ˆ S3 µ S1 ˆ (S2 ˆ S3). Let (n; v) 2 (S1 ˆ S2) ˆ S3. Then, by Lemma 4.1, either (n; v) 2 (S1 ˆ
S2) and S3 does not cover fng, or (n; v) 2 S3. We consider each case in turn.
† If (n; v) 2 (S1 ˆ S2) then, again by Lemma 4.1, one of the following must hold:
{ (n; v) 2 S1 and S2 does not cover fng; or
{ (n; v) 2 S2.
In the flrst case, we know (n; v) 2 S1 and neither S2 nor S3 covers fng. Hence, S2 ˆ S3 cannot cover fng
either, and so (n; v) 2 S1 ˆ (S2 ˆ S3). In the second case, S3 does not cover fng, but (n; v) 2 S2, and so
(n; v) 2 (S2 ˆ S3); it follows that (n; v) 2 S1 ˆ (S2 ˆ S3).
† Since (n; v) 2 S3, it follows directly from Deflnition 4.4 that (n; v) 2 (S2 ˆ S3). The latter implies, in turn,
that (n; v) 2 S1 ˆ (S2 ˆ S3).
In all cases, (n; v) 2 (S1 ˆ S2)ˆ S3 ) (n; v) 2 S1 ˆ (S2 ˆ S3). That is, (S1 ˆ S2)ˆ S3 µ S1 ˆ (S2 ˆ S3).
S1 ˆ (S2 ˆ S3) µ (S1 ˆ S2) ˆ S3. The argument for containment in the other direction proceeds along the
same lines, and so we will not repeat the details here.
Having established mutual containment, it follows that S1 ˆ (S2 ˆ S3) = (S1 ˆ S2)ˆ S3. ¥
Lemma 4.3 The superimposition operator preserves set containment. That is, for S1, S2 and S3 2 Sseq ,
S1 µ S2 ) (S1 ˆ S3) µ (S2 ˆ S3)
⁄
Proof Assume (n; v) 2 S1 ˆ S3. Then, by Lemma 4.1, either (n; v) 2 S1 and S3 does not cover fng, or
(n; v) 2 S3. In the flrst case, containment of S1 in S2 ensures that (n; v) is in S2 also; since S3 does not cover fng,
it follows that (n; v) 2 S2 ˆ S3 too. In the second case, (n; v) 2 S3 guarantees (n; v) 2 S2 ˆ S3. ¥
11
4.3 Sequential execution
Conceptually, a sequential machine operates by fetching an instruction from machine state, decoding and executing
it, and then writing the results back to machine state. This view leads naturally to a deflnition of instruction
execution in terms of superimposition.
Deflnition 4.5 (Instruction execution) The execution of a single instruction in the sequential execution model
is deflned:
seq step(S) = S ˆ –(S)
⁄
The set –(S) 2 Sseq is a collection of name-value pairs that constitute the changes to state that will result from
executing the next instruction. Efiectively, the function – : Sseq 7! Sseq performs the fetch-decode-execute steps
alluded to above; the superimposition achieves the flnal write of the results back to machine state.
Although we do not interpret function – at this point, we do stipulate that there are preconditions for its
meaningful application. This is necessary because, in our MSSP model, slave processors operate on machine state
that is potentially unsuitable for the purposes of executing the next instruction1. This leads us to the following
deflnition.
Deflnition 4.6 (Completeness) We say S 2 Sseq is complete if –(S) is deflned. We inductively extend this
deflnition to say S is n-complete if S is complete and seq(S; 1) is (n¡ 1)-complete. ⁄
Informally, S is complete if it contains all name-value pairs needed for computation of –(S), and hence for the
execution of the next instruction. If S is n-complete then execution of the next n instructions is well-deflned.
Having deflned execution of a single instruction in terms of superimposition, we extend the idea naturally to
sequential execution of more than one instruction.
Deflnition 4.7 (Cumulative writes) The function ¢ : Sseq £ Z+ 7! Sseq produces the cumulative writes that
accrue from the sequential execution of multiple instructions. It is deflned inductively for all n ‚ 0, as follows.
¢(S; n + 1) =
‰
¢(S; n)ˆ –(seq(S; n)) if n >= 1
–(S) otherwise
‚
¢(S; 0) = ;
⁄
The set of cumulative writes is simply the collection of updates made to the slave’s local read/write store. In
cases where a given storage cell has been updated more than once, the cumulative writes deflne only its latest value
(because writes are coalesced by the superimposition operator).
Deflnition 4.7 permits us to characterize sequential execution in terms of superimposition. The following lemma
presents this important result.
Lemma 4.4 (Sequential execution through cumulative writes) Let S 2 Sseq be n-complete (n ‚ 0). Then:
seq(S; n) = S ˆ ¢(S; n)
⁄
Proof By induction on n.
The base case is immediate: seq(S; 0) = S = S ˆ ; = S ˆ ¢(S; 0).
For the inductive step, we assume seq(S; k) = S ˆ ¢(S; k) for some k ‚ 0. Now, consider seq(S; k + 1). By
deflnition, seq(S; k + 1) = seq step(seq(S; k)). Applying Deflnition 4.5 to the right hand side of this expression, we
get seq(S; k) ˆ –(seq(S; k)). From our hypothesis, this is the same as (S ˆ ¢(S; k)) ˆ –(seq(S; k)). Applying
Lemma 4.2, we can rewrite the latter as S ˆ (¢(S; k) ˆ –(seq(S; k))). But ¢(S; k) ˆ –(seq(S; k)) is, by
Deflnition 4.7, equal to ¢(S; k + 1). Thus, seq(S; k + 1) = S ˆ ¢(S; k + 1), and we are done. ¥
1Recall, we make no assumptions about the about the live-ins produced by the master, and so it cannot be relied upon that all
machine state needed for the execution of the next instruction is available to the slave in its local store.
12
An important result that follows from the above characterization of sequential execution is that two machine
states that are both complete and consistent with one another will remain consistent with one another after the
sequential execution of one or more instructions. By \consistent" we mean that the two sets agree on the values of
storage cells whose names are common to both. The following lemma formalizes this.
Lemma 4.5 Let S1, S2 2 Sseq be n-complete machine states (for some n ‚ 0). Then:
S1 µ S2 ) seq(S1; n) µ seq(S2; n)
⁄
Proof The result follows directly from a simple, inductive extension of Lemma 4.3.
The base case is trivial: S1 = seq(S1; 0) and similarly for S2. Thus, S1 µ S2 ) seq(S1; 0) µ seq(S2; 0).
For the inductive step, assume seq(S1; k) µ seq(S2; k) for some 0 • k • n. Consider then seq(S1; k + 1) =
seq step(seq(S1; k)) = seq(S1; k)ˆ –(seq(S1; k)) (Deflnitions 3.2 and 4.5, respectively). Likewise, seq(S2; k + 1) =
seq(S2; k) ˆ –(seq(S2; k)). It follows from Deflnition 4.6 that, because both S1 and S2 are n-complete, seq(S1; k)
and seq(S2; k) are also 1-complete. We have assumed seq(S1; k) µ seq(S2; k), so both sets must agree on the
next instruction to be executed. Since instruction execution is deterministic, this, in turn, implies –(seq(S1; k)) =
–(seq(S2; k)). We can then apply Lemma 4.3 to get seq(S1; k) ˆ –(seq(S1; k)) µ seq(S2; k) ˆ –(seq(S2; k)), and
hence that seq(S1; k + 1) µ seq(S2; k + 1). ¥
We established in the above proof that S1 µ S2 ) –(S1) = –(S2). This result can, in fact, be generalized to
cumulative writes.
Lemma 4.6 Let S1, S2 2 Sseq be n-complete machine states (n ‚ 0). Then
S1 µ S2 ) ¢(S1; n) = ¢(S2; n)
⁄
Proof A simple inductive extension of our arguments in the proof of Lemma 4.5.
Assume ¢(S1; k) = ¢(S2; k), where S1 µ S2 and k ‚ 0. Lemma 4.5 guarantees that seq(S1; k) µ seq(S2; k).
Then, using an argument that is by now familiar, this implies that –(seq(S1; k)) = –(seq(S2; k)). Combining
this with our inductive hypothesis, we get ¢(S1; k) ˆ –(seq(S1; k)) = ¢(S2; k) ˆ –(seq(S1; k)). In other words,
¢(S1; k + 1) = ¢(S2; k + 1). ¥
4.4 Establishing task safety
In this sub-section, we show that task safety, the property we relied upon in Section 3, follows from two requirements:
1. Completeness | all machine state needed for well-deflned execution of a task must be contained in that task’s
live-in set; and
2. Containment | live-in sets must be consistent with architected state.
This result is expressed formally in the following theorem.
Theorem 4.1 Let A;S 2 Sseq . If S is n-complete and S µ A, then seq(A;n) = A ˆ seq(S; n). ⁄
Proof We know from Lemma 4.4 that, since S is n-complete, seq(S; n) = S ˆ ¢(S; n). Hence, A ˆ seq(S; n) =
A ˆ (S ˆ ¢(S; n)). By Lemma 4.2, the right hand side is the same as (A ˆ S) ˆ ¢(S; n). But S µ A, so
A ˆ S = A (this can easily be inferred from Deflnition 4.4). Thus, A ˆ seq(S; n) = A ˆ ¢(S; n). Using Lemma 4.6
and the fact that S µ A, we also know that ¢(A;n) = ¢(S; n), and hence that A ˆ seq(S; n) = A ˆ ¢(A;n).
Using Lemma 4.4 again, the latter expression is equivalent to seq(A;n). ¥
Corollary 4.7 Let A 2 Sseq denote an MSSP machine’s architected state. If T 2 T is a task such that live in(T )
is length(T )-complete and live in(T ) µ A, then safe(A; T ) is true. ⁄
13
Proof The preconditions for the corollary match those of Theorem 4.1, so it follows that seq(A; length(T )) =
A ˆ seq(live in(T ); length(T )). This is precisely the requirement for safe(A; T ) to hold. ¥
4.5 Putting it all together
We established in the previous sub-section that for T 2 T and A 2 Sseq , safe(A; T ) is true if live in(T ) is complete
for length(T ) instructions and live in(T ) µ A. That is, a task is safe if its live-ins are complete and consistent with
architected state. In this sub-section, we reflne our formal model of MSSP execution to replace predicate safe(A; T )
with checks for completeness and containment.
The check for completeness is easily performed by the slave processor: if it cannot flnd (in its local read/write
store) the machine state required for the execution of an instruction, completeness does not hold and the slave
aborts its task. Thus, if a slave successfully reaches the end of its task, we can be sure its live-in set is complete.
Containment is established by the verify unit when it receives the live-in sets from slaves; it can check each member
of the live-in set against the corresponding member of architected state. We therefore need to make changes to
Deflnitions 3.6 and 3.9; Deflnition 3.5 remains as is.
Deflnition 4.8 (Slave execution, reflned) For T 2 T such that state(T ) = RUN, we deflne:
slave step(hSin ; n; Sout ; k; RUNi) =
8<
:
hSin ; n; ;; 0; ABORTi if Sout is not complete
hSin ; n; seq step(Sout); k + 1; RUNi if k < n
hSin ; n; Sout ; k; COMMITi otherwise
3
5
The function is undeflned for all tasks T with state(T ) 6= RUN. ⁄
Recall, the predicate done(T ) was deflned to be true if and only if state(T ) 6= RUN, so it becomes true when
a task’s state becomes either COMMIT or ABORT. The slave step function therefore guarantees progress toward
completion of the task, either by moving it to state COMMIT (if completeness checks all pass), or state ABORT (if
completeness chacks fail). At that point, the machine will attempt to commit the task.
Deflnition 4.9 (Commit, reflned) For A 2 Sseq and T 2 T , where done(T ) is true, we deflne:
commit(A; T ) =
‰
A ˆ live out(T ) if live in(T ) µ A and state(T ) = COMMIT
A otherwise
‚
⁄
5 Memory-like state
Our formalizations have, up to this point, implicitly relied upon a number of properties holding for the machine
state that is manipulated during instruction execution. More speciflcally, we have been assuming all along that
the state manipulated by execution comprises general purpose registers and main memory only. In real machines,
this is, of course, only part of the picture; our results in the preceding sections therefore apply only to a subset of
machine state that might be encountered by a real application.
In this section, we enumerate the various requirements that have so far been implicit in our work. That is, we
make explicit the preconditions for our results in Sections 3 and 4. These preconditions relate primarily to machine
state, and it is the extant deflnition of MSSP execution that depends on their holding true. Accordingly, we extend
our deflnition of MSSP execution to deal with situations in which those preconditions do not hold. This, in turn,
requires a slight reflnement of our SEQ model.
5.1 Memory-like state
To understand the potential for the problems alluded to above, consider the verify/commit phase of MSSP execution.
We make at least the following assumptions in our formal model:
† At veriflcation, we assume checking live in(T ) µ A can be performed without changing A; and
14
† Deflnition 4.4 does not specify the order in which the superimposition operator replaces members in its left
operand with content from its right operand, and thus assumes all orderings are equivalent; and
† Lemma 4.4 presumes that writes to the same storage cell can be coalesced into a single write to produce the
same net efiect as their serialized application.
Although these assumptions are reasonable for the most common forms of machine state, we must acknowledge the
existence of state that does not have these properties. For example, reads from and writes to memory-mapped I/O
addresses usually have side-efiects, either within the machine state itself, or external to the machine. We require,
therefore, a more complete model of machine state that distinguishes two types of storage cells.
Deflnition 5.1 (Machine state) We deflne Sseq = Smem [ Sio, where
Smem = fS µ Nseq £ Vseq : well formed(S) ^memory like(S)g
Sio = Sseq ¡ Smem
To distinguish elements from these two sets, we deflne disjoint namespaces for each:
Nmem =
[
S2Smem
names(S)
Nio =
[
S2Sio
names(S)
⁄
Thus, we have partitioned Sseq into two disjoint components: Smem , which is the set of all machine states that
are memory-like [1], and Sio , which contains all remaining members of Sseq (that is, machine states that are not
memory-like). Note that because Smem [ Sio = Sseq , it follows that Nmem [Nio = Nseq .
Deflnition 5.2 (Memory-like state) For S 2 Sseq , we deflne the predicate memory like(S) to be true if and
only if all of the following hold.
1. All members of S can be both read and written;
2. Neither reads nor writes have any side-efiects;
3. A write followed by a read yields the written value.
⁄
Our work in Sections 3 and 4 implicitly assumed these properties for all members of Sseq . With the above
deflnitions, we have merely made it explicit that only a subset of Sseq|namely, Smem|behaves as required. With
Sseq thus partitioned, we are now ready to revisit our SEQ and MSSP models to identify precisely the domains in
which execution in each is deflned.
5.2 SEQ execution
Recall, SEQ serves as the reference against which we measure correctness of MSSP. Accordingly, if our results are
to be meaningful, we require that SEQ be representative of a realistic ISA. It must, therefore, support execution
of instructions that might read or write state that is not memory-like. Since execution in SEQ modifles machine
state one instruction at a time, and in the order specifled by the program, none of our formalisms are predicated
on memory-like behaviour being adhered to. That is, any non-memory-like behaviour resulting from reads from, or
writes to, members of Sio are, by deflnition, intended to occur in SEQ; side-efiects and ordering constraints form
part of the machine’s sequential ISA speciflcation.
We therefore maintain our formalization of sequential execution, as per Deflnition 3.2. However, our subsequent
changes to the MSSP model do require that we modify the deflnition of single-instruction execution (Deflnition 4.5).
We need this change to allow us to reason about the subset of machine state that participates in the execution of
an instruction.
Recall, seq step(S) = S ˆ –(S). We now interpret – according to the fetch-decode-execute model described
earlier.
15
Deflnition 5.3 (Instruction execution, reflned) We deflne – : Sseq 7! Sseq as follows.
–(S) = execute(fetch(S))
The functions fetch : Sseq 7! Sseq and execute : Sseq 7! Sseq remain uninterpreted. ⁄
Informally, the fetch function \reads" from machine state all those name-value pairs that will participate in
execution of the next instruction. The resulting set is operated on by function execute to produce the results of
an instruction’s execution. (This is the decode-execute part of the execution model.) These are written back to
machine state by the superimposition of –(S) on S.
5.3 MSSP execution
MSSP execution can occur only with state that is memory like. We have already alluded to the reasons for this
constraint; to be clear, they are as follows.
† Slaves cannot speculatively operate on members of Sio , simply because their speculation must be checked by
the verifler, which, in turn, must read architected state; it may not be possible to do so without afiecting that
state (reads might have side-efiects).
† Again, because side-efiects are possible, it is not in general safe to coalesce multiple writes into a single update
if each of those writes would have had a (necessary) side-efiect.
We address these issues by circumventing the problem: we will reflne our MSSP model to operate only on members
of Smem .
We admit two \modes" of execution into the MSSP model. State is modifled using the SEQ model when it
is detected that the next instruction will \touch" Sio ; otherwise, MSSP execution occurs as deflned previously.
Before presenting this reflnement of the MSSP model, we formalize the notion of state that is touched by the next
instruction.
Deflnition 5.4 (Instruction footprint) The set of name-value pairs that participate in the execution of the next
instruction is called the footprint of that instruction. Formally, we deflne footprint : Sseq 7! Nseq , as follows.
footprint(S) = names(fetch(S)) [ names(execute(fetch(S)))
⁄
MSSP-mode operation can safely occur when the next instruction’s footprint is wholly within the memory-like
namespace; that is, when footprint(S) µ Nmem , or, equivalently, when footprint(S) \ Nio = ;. It remains to be
specifled how this requirement is enforced.
Since slaves execute entirely within their local read/write store, which is isolated from architected state, whether
or not they touch Sio is not important; as noted earlier, it is during the verify/commit process that problems with
Sio arise. We therefore constrain MSSP execution to remain within Smem by checking the members of the live-in
and live-out sets before permitting veriflcation and commit to proceed. If this check flnds members of Sio , we
simply discard the ofiending task. The proposed check is formalized as follows.
Deflnition 5.5 (Clean tasks) We say T 2 T is clean, and write clean(T ), if its live-in and live-out sets are both
memory-like. Formally:
clean(T ), (live in(T ) 2 Smem ^ live out(T ) 2 Smem)
Equivalently:
clean(T ), (names(live in(T )) µ Nmem ^ names(live out(T )) µ Nmem)
⁄
16
This leads us to a new speciflcation for MSSP operation. Our machine continues to advance architected state,
as before, until it reaches a conflguration in which the next instruction touches Sio . At that point, the machine
advances its architected state according to the SEQ model, one instruction at a time. Further, while in MSSP-
mode, we fllter tasks that have touched state outside of Smem before passing them onto the verify/commit unit.
The following deflnition captures this idea.
Deflnition 5.6 (MSSP execution, reflned) The function mssp : Sseq 7! Sseq is deflned as follows.
mssp(A; [T ]jj¿) =
8>><
>>:
seq step(A) if footprint(A) \Nio 6= ;
mssp(A; ¿) if clean(T ) is false
mssp(commit(A; T ); ¿) if done(T )
mssp(A;mssp step([T ]jj¿)) otherwise
3
775
Functions commit : Sseq £ T 7! Sseq and mssp step : T ⁄ 7! T ⁄ are deflned as before. ⁄
Implicit in the above deflnition is the assumption that the MSSP machine is able to check if footprint(S) \ Nio
is empty and that clean(T ) holds without actually reading state that is not memory like. We take it to be a
reasonable assumption that a machine is able to do so.
5.4 Equivalence
Equivalence of the modifled MSSP and SEQ models still holds because Theorem 3.2 can be applied to any ¿ 2 T ⁄
that contain only clean tasks. Execution in all other cases either does not afiect architected state, or advances it
according to the sequential model, in which case equivalence follows trivially. ¥
6 Conclusion
This report presents the formal veriflcation of MSSP, a recent proposal for speculative parallelization of sequential
programs. Our primary objective has been to formally establish that MSSP is capable of achieving the equivalent
of a sequential execution. We demonstrated this in Section 3, where we showed that any state reachable by an
MSSP machine is also attainable by a sequential machine implementing the same ISA as the MSSP slaves. That
result was predicated on the notion of task safety, a property we initially assumed could be checked directly by
the MSSP machine. We then proved in Section 4 that task safety follows from a more low-level set of checks:
completeness and containment of live-ins within architected state, which we argued can be performed by the slave
processors and the verify/commit unit, respectively.
In establishing these results, we also introduced a model for a hypothetical MSSP machine (Section 3.2).
We believe this model will serve a useful purpose in subsequent developments in the evolving MSSP paradigm.
Speciflcally, a number of performance-enhancing modiflcations to the existing MSSP deflnition are imminent; our
hypothetical machine, being su–ciently agnostic to performance concerns, will not likely incur similar changes. As
a result, it can serve as a reference against which design changes can be checked in terms of correctness.
In addition to analyzing the correctness of MSSP, we identifled conditions in which it cannot work. The latter
issue was addressed in Section 5, where we enumerated the requirements we impose on machine state to ensure that
it is amenable to MSSP-style execution. In particular, we identifled memory-like behaviour as the key property
upon which we depend. While this does imply MSSP is not universally applicable, it also admits the possibility for
MSSP operation crossing the user-kernel boundary. Simply put, execution of operating system code is no difierent
from user code in our formal models: in both cases, instructions transform machine state as per the machine’s
ISA. That MSSP can execute kernel code|so long as that code touches only memory-like state|is a boon to the
paradigm, since it potentially allows slaves to take interrupts, handle exceptions and make system calls. This is an
area we plan to investigate thoroughly in the near future.
Additional challenges are posed by certain machine components that we would prefer not to incorporate into
our formal models. The TLB is one such example. By including the TLB in a machine’s state, we would be
architecting its content. Doing so would force MSSP tasks to perform the same series of TLB replacements as a
sequential execution, which, in turn, would require that the master also predict TLB replacements. Clearly, such
an architecture would sufier unnecessary performance degradation through spurious misspeculations. The problem
arises because the exact mix of entries in the TLB is not important from a correctness point of view; what counts
17
is that the mappings that are used are consistent with the page table held in architected state. One of our current
research objectives, therefore, is to develop an abstract deflnition of the TLB that captures, instead of its exact
state, the semantics of its role in supporting virtual memory.
We have also drawn attention in this report to the fact that MSSP is an example of an architecture that decouples
performance and correctness by dedicating distinct hardware to each concern. In this context, our results make the
important contribution that such a decoupled machine is indeed feasible, at least in terms of isolating correctness
from performance concerns. Speciflcally, by modelling the master processor as a random generator of live-in data,
we successfully show that MSSP’s correctness is independent of fast-path components.
References
[1] R.L. Sites, editor. Alpha architecture reference manual. Digital Press, 3rd edition, 1998.
[2] G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar processors. In Proc. 22nd Annual International Symposium
on Computer Architecture, June 1995.
[3] C. Zilles. Master/slave speculative parallelization and approximate code. PhD thesis, University of Winsconsin
- Madison, 2002.
[4] C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. 35th Annual ACM/IEEE International
Symposium on Microarchitecture, November 2002.
18
