A Virtual Machine for Interpreting Programs in Static Single Assignment Form by Jeffery von Ronne et al.
ICS
TECHNICAL REPORT
A Virtual Machine for Interpreting Programs
in Static Single Assignment Form
Jeﬀery von Ronne Ning Wang Alexander Apel Michael Franz
Technical Report 03-19
School of Information and Computer Science
University of California Irvine, CA 92697-3425
October 23, 2003
Revised June 4, 2004
Abstract
Optimizing compilers, including those in virtual machines, commonly utilize Static Single
Assignment Form as their intermediate representation, but interpreters typically imple-
ment stack-oriented virtual machines. This paper introduces an easily interpreted variant
of Static Single Assignment Form. Each instruction of this Interpretable Static Single
Assignment Form, including the Phi Instruction, has self-contained operational seman-
tics facilitating eﬃcient interpretation. Even the array manipulation instructions possess
directly-executable single-assignment semantics. In addition, this paper describes the con-
struction of a prototype virtual machine realizing Interpretable Static Single Assignment
Form and reports on its performance.
This work is based on Interpreting Programs in Static Single Assignemnt Form, in the Proceedings of
the ACM SIGPLAN 2004 Workshop on Interpreter, Virtual Machines and Emulators c °ACM, 2004.
Information and Computer Science
University of California, IrvineCONTENTS i
Contents
1 Introduction 1
2 Interpretable SSA 2
2.1 Unique Naming . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Choosing φ-function Operands . . . . . . . . . . . . . . . . . . 3
2.3 Simultaneous Execution of φ-functions . . . . . . . . . . . . . 7
2.4 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 The Cytron et al. Array Model . . . . . . . . . . . . . 8
2.4.2 ISSA’s Implementation of Arrays . . . . . . . . . . . . 9
2.4.3 Optimizations . . . . . . . . . . . . . . . . . . . . . . . 12
3 Prototype Performance 12
4 Future Work 15
4.1 A Faster ISSA Virtual Machine . . . . . . . . . . . . . . . . . 15
4.2 A SafeTSA Interpreter . . . . . . . . . . . . . . . . . . . . . . 16
5 Related Work 17
6 Conclusion 18
7 Acknowledgements 19
References 19
A Implementation of the Interpreter Core 23
A.1 ssa vm.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
A.2 ssa vm.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A.3 ssa array.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.4 inst.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B Benchmarks 33
B.1 Factorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
B.1.1 factorial.ssa . . . . . . . . . . . . . . . . . . . . . . . . 33
B.1.2 factorial.c . . . . . . . . . . . . . . . . . . . . . . . . . 34
B.1.3 Factorial.java . . . . . . . . . . . . . . . . . . . . . . . 35
B.1.4 factorial.pl . . . . . . . . . . . . . . . . . . . . . . . . . 36LIST OF FIGURES ii
B.2 Fibbonacci Sequence (in scalars) . . . . . . . . . . . . . . . . . 37
B.2.1 ﬁbonacci.ssa . . . . . . . . . . . . . . . . . . . . . . . . 37
B.2.2 ﬁbonacci.c . . . . . . . . . . . . . . . . . . . . . . . . . 38
B.2.3 Fibonacci.java . . . . . . . . . . . . . . . . . . . . . . . 39
B.2.4 ﬁbonacci.pl . . . . . . . . . . . . . . . . . . . . . . . . 40
B.3 Fibbonacci Sequence (in an array) . . . . . . . . . . . . . . . . 41
B.3.1 ﬁbonacci array.ssa . . . . . . . . . . . . . . . . . . . . 41
B.3.2 ﬁb array.c . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.3.3 FibArray.java . . . . . . . . . . . . . . . . . . . . . . . 44
B.3.4 ﬁb array.pl . . . . . . . . . . . . . . . . . . . . . . . . . 45
List of Figures
1 simple program . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 ISSA VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 program with if-then-else structure . . . . . . . . . . . . . . . 3
4 executing if-then-else constructs . . . . . . . . . . . . . . . . . 4
5 program ﬁnding the Fibonacci sequence . . . . . . . . . . . . . 5
6 computing the Fibonacci sequence, ﬁrst two iterations . . . . . 5
7 computing the Fibonacci sequence, third iteration . . . . . . . 6
8 array model tree . . . . . . . . . . . . . . . . . . . . . . . . . 8
9 program with arrays . . . . . . . . . . . . . . . . . . . . . . . 9
10 exeuction of aray mutation . . . . . . . . . . . . . . . . . . . . 10
11 execution of array accesses . . . . . . . . . . . . . . . . . . . . 11
12 performance slowdown . . . . . . . . . . . . . . . . . . . . . . 13
List of Tables
1 execution times . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 execution slowdowns relative to C . . . . . . . . . . . . . . . . 131 INTRODUCTION 1
1 Introduction
Intermediate representations based on Static Single Assignment (SSA) Form
[Alpern et al., 1988, Rosen et al., 1988] have been used in many research
and industrial optimizing compilers. Leveraging this body of work, we have
previously developed SafeTSA [Amme et al., 2001], a veriﬁable external pro-
gram representation, which decreases the eﬀort needed for just-in-time com-
pilation without sacriﬁcing the safety or quality of the resulting machine
code [Amme et al., 2003]. Just-in-time compilation, however, still imposes a
startup delay, which may not be justiﬁed for infrequently executed methods.
Java Virtual Machine implementations, such as Sun’s HotSpot Performance
Engine [Agesen and Detlefs, 2000], typically use mixed-mode interpretation
and compilation to combine interpretation’s shorter startup times with com-
piled code’s better throughput. Perhaps because of several non-imperative
features of SSA, conventional high-performance interpreters have not been
written for SSA representations. Consequently, Krintz has proposed storing
and transporting programs in both Java class ﬁles (which use a stack-oriented
virtual machine) for interpretation and also in SafeTSA classes for compi-
lation [2002], allowing both compilation and interpretation at the cost of
supporting two program representations.
If an SSA interpreter were available, however, it would be possible to build
a virtual machine supporting both compilation and interpretation using only
SSA representations, providing the same beneﬁts as the hybrid virtual ma-
chine Krintz proposed [2002] without the overhead of supporting two input
program representations. Incidentally, the same interpreter technology could
also be used as a debugging and testing tool for executing the SSA interme-
diate representations of optimizing compilers.
While an interpreter for SSA is desirable, several features of SSA have
non-obvious imperative semantics: the provision of a separate variable name
for each deﬁnition, the selection of φ-function operands, the simultaneous
execution of mutually dependent φ-functions, and the handling of non-scalar
variables (such as arrays) with single-assignment semantics.
The next section presents Interpretable SSA (ISSA), an SSA variant
in which each instruction has directly-interpretable operational semantics,
demonstrating how ISSA handles each of these problematic features. In ad-
dition, this paper reports on the performance of a prototype ISSA virtual
machine, and concludes after discussing future improvements and related
work.2 INTERPRETABLE SSA 2
int x = 3;
int y = 2;
x = x + y;
x = x * y;
print(x);
exit();
(a) source code
iconst 3;
iconst 2;
iadd;
iconst 2;
imul;
print;
exit;
(b) stack code
x ← const 3
y ← const 2
x ← iadd x y
x ← imul x y
print x
exit
(c) SSA code
0 const 3
1 const 2
2 iadd RR0 RR1
3 imul RR2 RR1
4 print RR3
5 exit
(d) ISSA Form
Figure 1: simple program
PC=1
0 const 3 3
1 const 2
2 iadd RR0 RR1
3 imul RR2 RR1
4 print RR3 X
5 exit X
(a) 1st instruction
PC=2
0 const 3 3
1 const 2 2
2 iadd RR0 RR1
3 imul RR2 RR1
4 print RR3 X
5 exit X
(b) 2nd instr.
PC=3
0 const 3 3
1 const 2 2
2 iadd RR0 RR1 5
3 imul RR2 RR1
4 print RR3 X
5 exit X
(c) 3nd instr.
PC=4
0 const 3 3
1 const 2 2
2 iadd RR0 RR1 5
3 imul RR2 RR1 10
4 print RR3 X
5 exit X
(d) 4th instr.
Figure 2: ISSA VM
2 Interpretable SSA
2.1 Unique Naming
The principal property of Static Single Assignment Form is that the left
hand side of each and every variable assignment must have a unique name.
As a result, each original program variable has several corresponding SSA
variables (often distinguished with subscripts).
Since each SSA variable is deﬁned by exactly one program instruction (the
right hand side of the assignment), our Interpretable SSA (ISSA) instantiates
an abstract machine for each program containing one result register per in-
struction. Each instruction in ISSA is labeled with an instruction number. A
few instruction types, such as the const instructions, take immediate integer
values, but most have indirect operands which speicify a value by refering to2 INTERPRETABLE SSA 3
int x = 5;
int y = 7;
int z;
if (x < 0)
z = x + y;
else
z = y - x;
return z;
(a) source code
x ← const 5
y ← const 7
if x < 0
goto L0
z ← isub x y
goto L1
L0:
z ← iadd x y
L1:
z ← φ(z,z)
return z
(b) ssa code
0 const 5
1 const 7
2 const 0
3 blt RR0 RR2 [6] 0
4 isub RR1 RR0
5 goto [7] 1 6 iadd RR0 RR1
7 phi RR6 RR4
8 pfe
9 return RR7
F T
1 0
(c) ISSA code
Figure 3: program with if-then-else structure
the result register of the value’s deﬁning instruction.
Figure 2 shows the execution of a simple program’s abstract machine.
The instructions’ result registers appear as the boxes to the left of the in-
structions; an auxiliary program counter (PC) register is used to indicate the
instruction to be executed next. As each instruction executes, it retrieves its
inputs from the indicated registers, performs its computation, and writes to
the appropriate result register (RR) on its left. For example, as instruction
3 executes, it reads the values of RR2 (i.e.,5) and RR1 (i.e.,2), multiplies
them together, and writes the result (i.e.,10) to RR3.
2.2 Choosing φ-function Operands
SSA’s φ-functions pose the greatest obstacles to direct imperative interpre-
tation. In standard SSA Form, each φ-function resides in a basic block (at
which more than one control ﬂow edge converges) and selects an output from
among its input operands based on which control ﬂow edge the dynamic ex-
ecution entered the basic block through.
Figure 3 shows a simple program with converging control ﬂow translated
into Interpretable SSA. The φ-functions (which would exist in standard SSA
Form) have been replaced by phi instructions. It is not clear how an inter-2 INTERPRETABLE SSA 4
PC=5 CEN=0
5
7
0
X
2
X
X
X
0 const 5
1 const 7
2 const 0
3 blt RR0 RR2 [6] 0
4 isub RR1 RR0
5 goto [7] 1
6 iadd RR0 RR1
7 phi RR6 RR4
8 pfe
9 return RR7
(a) before branch
PC=7 CEN=1
5
7
0
X
2
X
X
X
0 const 5
1 const 7
2 const 0
3 blt RR0 RR2 [6] 0
4 isub RR1 RR0
5 goto [7] 1
6 iadd RR0 RR1
7 phi RR6 RR4
8 pfe
9 return RR7
(b) after branch
PC=9 CEN=0
5
7
0
X
2
X
2
X
X
0 const 5
1 const 7
2 const 0
3 blt RR0 RR2 [6] 0
4 isub RR1 RR0
5 goto [7] 1
6 iadd RR0 RR1
7 phi RR6 RR4
8 pfe
9 return RR7
(c) after phi
Figure 4: executing if-then-else constructs
preter should decide whether the phi instruction is to copy from RR6 or RR4,
especially since the basic-block control ﬂow graph (CFG) (which is shown as
the dashed boxes and arrows in Figure 3(c)) is not explicitly represented in
ISSA.
For this reason, ISSA provides an auxiliary CFG-Edge Number (CEN)
register, which is set on branching instructions and is used by each phi in-
struction to select among its operands. Figure 4 shows some snapshots of this
program’s execution. Consider the execution of instruction 5 (transforming
the state of Figure 4(a) into that of Figure 4(b)); this corresponds to the
traversal of the CFG edge labeled “1” in Figure 3(c). ISSA’s goto instruc-
tion takes two immediate operands, the ﬁrst is the target instruction number
(in this case, 7) and the edge number (in this case, 1). When instruction
5 is executed the CEN register is set to 1 and the control is transfered to
instruction 7 (the phi instruction). Because the CEN register is 1, the sec-
ond operand of the phi instruction is selected, and 2 is read from RR4 and
placed in RR7. After this the CEN register is reset to 0; the resulting state
can be seen in Figure 4(c).2 INTERPRETABLE SSA 5
int n = 10, i = 2;
int fi− = 0, fi− = 1;
int fi;
do {
fi = fi− + fi−;
i = i + 1;
fi− = fi−;
fi− = fi;
} while (i ≤ n);
return fi;
(a) source code
L0:
fi− ← φ(1,fi)
fi− ← φ(0,fi−)
i ← φ(2,i)
fi ← iadd fi− fi−
i ← iadd i 1
if i ≤ 10
goto L0
return fi
(b) ssa code
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
0
1
(c) ISSA code
Figure 5: program ﬁnding the Fibonacci sequence
PhiSet={}
PC=10 CEN=0
0
1
10
2
1
0
2
X
1
3
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(a) end of 1st iteration
PhiSet={}
PC=10 CEN=0
0
1
10
2
1
1
3
X
2
4
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(b) end of 2nd iteration
Figure 6: computing the Fibonacci sequence, ﬁrst two iterations2 INTERPRETABLE SSA 6
PhiSet={(RR4,2)}
PC=5 CEN=1
0
1
10
2
1
1
3
X
2
4
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(a) after ﬁrst phi
PhiSet={(RR4,2), (RR5,1)}
PC=6 CEN=1
0
1
10
2
1
1
3
X
2
4
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(b) after second phi
PhiSet={(RR4,2), (RR5,1), (RR6,4)}
PC=7 CEN=1
0
1
10
2
1
1
3
X
2
4
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(c) after third phi
PhiSet={}
PC=8 CEN=0
0
1
10
2
2
1
4
X
2
4
X
X
0 const 0
1 const 1
2 const 10
3 const 2
4 phi RR1 RR8
5 phi RR0 RR4
6 phi RR3 RR9
7 pfe
8 iadd RR4 RR5
9 iadd RR6 RR1
10 ble RR9 RR2 [4] 1
11 return RR8
(d) after pfe
Figure 7: computing the Fibonacci sequence, third iteration2 INTERPRETABLE SSA 7
2.3 Simultaneous Execution of φ-functions
The observant reader will have noticed that the previous section glossed over
the pfe (Phi-Function End) instruction, which marks the end of the phi
instructions within a basic block. The pfe instruction is needed1 because
standard SSA Form φ-function semantics require that φ-functions be “ex-
ecuted” at the beginning of the basic block in which they reside [Cytron
et al., 1991]. An often overlooked consequence of this rule manifests itself
when a φ-function (in a loop) references the result value of another φ-function
within the same basic block. In this case, they must be implemented so that
they behave as if they were all executed simultaneously using the previous
iteration’s result values [Morgan, 1998].
A concrete example of this problem occurs during the execution of the
program shown in Figure 5, which calculates the ﬁrst 10 numbers of the
Fibonacci sequence. This program has a φ-function (instruction 5) that ref-
erences the result (RR4) of another φ-function from the previous iteration.
(This happens, because the previous iteration’s fi−1 becomes the new itera-
tion’s fi−2; in more complicated programs, there could be multiple mutually
dependent φ-functions.) If these φ-functions were to be executed sequentially
simply copying the results from the correct input operand into the result reg-
ister, instruction 5 will erroneously copy the value placed in RR4 during the
current iteration instead of the previous iteration. For example, at the end
of the second iteration (Figure 6(b)), RR4 and RR5 will both be 1; for the
third iteration (Figure 7(d)), the new value of RR4 is 2, but the new value
of RR5 should still be 1.
For this reason, an ISSA virtual machine will buﬀer phi instruction trans-
fers until it executes the pfe instruction, which commits the transfers stored
in the PhiSet buﬀer and resets the CEN register. This solves the problem be-
cause all of the reads associated with the SSA φ-functions occur at the ISSA
phi instructions before performing any of the writes (at the pfe instruction).2 INTERPRETABLE SSA 8
A = [0,0,0]
Update(A,1,1)
A = [0,1,0]
Update(A,0,2)
A = [2,1,0]
Update(A,2,3)
A = [0,1,3]
Update(A,1,4)
A = [2,4,0]
Figure 8: array model tree
2.4 Arrays
2.4.1 The Cytron et al. Array Model
Support for non-scalars has long been problematic in SSA, and many exten-
sions have been proposed for supporting arrays and other non-scalars (e.g.,
Array SSA Form [Knobe and Sarkar, 1998]). The simplest array semantics
consistent with the single-assignment property are found in the seminal de-
scription of SSA by Cytron et al. [1991]. This model treats each array as a
single scalar variable with multiple instances and describe two primitives for
accessing and manipulating these arrays: Access(Ax,i) and Update(Ay,j,V ).
The Access primitive merely fetches the value at index i in array instance Ax.
The Update primitive creates a new array instance Az, which is equivalent
to Ay, except that element j has been changed to the value, V . This model
can be viewed as creating a tree (Figure 8) where each array instance is a
node, and each Update creates a new child instance derived from a parent
instance. All instances remain accessible to future Updates and Accesses.
Maintaining multiple instances of each array may seem expensive, but
they are needed to maintain proper SSA semantics and avoid output depen-
1Some method of marking basic blocks or super-instructions containing multiple φ-
functions could be used instead.2 INTERPRETABLE SSA 9
a = new int [3];
a[0] = 13;
a[1] = 14;
x = a[1];
a[1] = 15;
y = a[1];
(a) source code
a ← newarray 3
a ← update (a, 0, 13)
a ← update (a, 1, 14)
x ← access (a, 1)
a ← update (a, 1, 15)
y ← access (a, 1)
(b) SSA form
a ← newarray 3
a ← update (a, 0, 13)
a ← update (a, 1, 14)
a ← update (a, 1, 15)
x ← access (a, 1)
y ← access (a, 1)
(c) after code motion
Figure 9: program with arrays
dencies. The output dependencies would not be a problem if the SSA code
was produced by a straightforward translation from a source program. If,
however, code motion was performed as part of the program’s optimization
in SSA Form (e.g., when debugging compiler output after partial redundancy
elimination), an SSA interpreter supporting non-destructive array semantics
must be prepared to deal with the possibility of an array access being moved
below an update.2
An example program requiring non-destructive Update semantics is shown
in Figure 9. Figure 9(b) shows the direct SSA translation of the source pro-
gram in Figure 9(a). Figure 9(c) shows an SSA program, which is seman-
tically equivalent to that shown in Figure 9(b) but which has been altered
by legal code motion and, as a result, accesses an old version of an array
(i.e.,a2) even after it has been updated (becoming a3). Thus, a2 and a3 have
overlapping live ranges and, for this reason, cannot share the same storage
space.
2.4.2 ISSA’s Implementation of Arrays
ISSA supports array manipulation with newarray, access, and update in-
structions modeled after the Update and Access functions of the Cytron et
al. model [1991], which treats each entire array as a single SSA variable.
Each newarray instruction takes as its operand the number of elements, cre-
ates a new array of that size, and places a reference to the new array in the
newarray instruction’s result register. Every access takes, as operands, a
2Alternatively, the optimizer could be made aware of output dependencies for non-
scalars, or output dependencies could be ﬁxed-up by another code motion phase prior to
interpretation, but these solutions go beyond single-assignment semantics.2 INTERPRETABLE SSA 10
PC=7
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
8 update RR7 RR4 RR1
9 update RR8 RR4 RR2
10 access RR8 RR4
11 access RR9 RR4
. . .
(a) after newarray
PC=8
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
13
?
?
8 update RR7 RR4 RR1
9 update RR8 RR4 RR2
10 access RR8 RR4
11 access RR9 RR4
. . .
(b) after ﬁrst update
PC=9
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
13
?
?
8 update RR7 RR4 RR1
13
14
?
9 update RR8 RR4 RR2
10 access RR8 RR4
11 access RR9 RR4
. . .
(c) after second update
PC=10
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
13
?
?
8 update RR7 RR4 RR1
13
14
?
9 update RR8 RR4 RR2
13
15
?
10 access RR8 RR4
11 access RR9 RR4
. . .
(d) after third update
Figure 10: exeuction of aray mutation2 INTERPRETABLE SSA 11
PC=11
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
13
?
?
8 update RR7 RR4 RR1
13
14
?
9 update RR8 RR4 RR2
13
15
?
10 access RR8 RR4 14
11 access RR9 RR4
. . .
(a) after ﬁrst access
PC=12
AV
0 const 13 13
1 const 14 14
2 const 15 15
3 const 0 0
4 const 1 1
5 const 3 3
6 newarray RR5
?
?
?
7 update RR6 RR3 RR0
13
?
?
8 update RR7 RR4 RR1
13
14
?
9 update RR8 RR4 RR2
13
15
?
10 access RR8 RR4 14
11 access RR9 RR4 15
. . .
(b) ﬁnal state
Figure 11: execution of array accesses3 PROTOTYPE PERFORMANCE 12
reference to an array and the index into that array. It, then, fetches the ap-
propriate element from that array and places a copy of that element’s value
in the access’s result register. Each update instruction takes, as operands,
a reference to an array, an index into that array, and a new value. After
that, it copies the array and writes the new value to the element of the new
array identiﬁed by the index. Finally, the update places a reference to the
new array in its result register.
As a concrete example, we will now describe the dynamic execution of
the program shown in Figure 9; several steps in this program’s execution are
illustrated in Figures 10 and 11. Array references are implemented as indexes
into the array vector (AV), a dynamic data structure, which contains pointers
to all of the arrays instantiated during program execution. The execution
of each newarray or update adds a new array to the array vector (Figure
10(a), Figure 10(b), and Figure 10(c)). The access instructions select array
instances by referencing the result register of the array instance’s deﬁning
instruction; this result register contains an index to the array vector, which
in turn contains a pointer to the actual array instance. For example, the
access of instruction 10 uses the array produced by instruction 8, which was
unaﬀected by the update at instruction 9, so it retrieves the “old” value of
element 1 (i.e.,14). Instruction 11, however, uses the “current” version of the
array produced by the update at instruction 9 and retrieves the “current”
value of element 1 (i.e.,15) (Figure 11(b)).
2.4.3 Optimizations
As noted above, each array update results in a copy. Most of the time, these
are unnecessary. A live range analysis could be used to identify cases where
the update’s input array is never used again. (For most programs, this would
be all of them.) In those cases, the update can safely avoid the copy and
instead destructively modify the array in place and output the a reference to
that same array.
3 Prototype Performance
We have implemented a simple prototype ISSA virtual machine in about
1,000 lines of C code. During execution, it reads and parses an ASCII rep-
resentation of ISSA code and then executes it using a simple interpretive3 PROTOTYPE PERFORMANCE 13
Program Elapsed Time (in sec)
ISSA C Perl Java
destructive non-dest. with JIT no JIT
Factorial (12! × 107) - 42.57 0.35 115.57 1.04 13.01
Fibonacci (ﬁrst 47 × 107) - 166.12 1.02 719.67 3.20 76.64
Fibonacci (Array) (ﬁrst 47 × 105) 2.95 8.98 0.04 11.64 0.37 1.29
Table 1: execution times
Program Slowdown (Relative to Optimized C Code)
ISSA Perl Java
destructive non-dest. with JIT no JIT
Factorial (12! × 107) - 122× 330× 3.0× 37×
Fibonacci (ﬁrst 47 × 107) - 163× 706× 3.1× 75×
Fibonacci (Array) (ﬁrst 47 × 105) 74× 225× 291× 9.3× 32×
Table 2: execution slowdowns relative to C
Factorial Fibonacci (Scalar) Fibonacci (Array)
S
l
o
w
d
o
w
n
(
s
m
a
l
l
e
r
i
s
b
e
t
t
e
r
)
0
100
200
300
400
500
600
700
800
122×
330×
37×
163×
706×
75×
225×
74×
291×
32×
ISSA
dest. ISSA
Perl
Java
Figure 12: performance slowdown3 PROTOTYPE PERFORMANCE 14
engine consisting of a switch statement (with 30 case statements, one for
each instruction opcode) embedded in a loop. The virtual machine is un-
typed; all immediate and register values are 32-bit words but may be used as
integers, single-precision ﬂoats, or indexes into the array vector. Similarly,
result registers are referenced using 32-bit words encoding each deﬁnition’s
instruction numbers. The result register ﬁle and the PhiSet buﬀer are im-
plemented as arrays in main memory. In addition, the the virtual machine
performs dynamic bounds checking to ensure that neither invalid instruction
numbers nor illegal array manipulation can violate its integrity.
Although dynamic bounds checking guarantees the virtual machine’s in-
tegrity, the virtual machine does not verify other properties whose violation
can only aﬀect program correctness. In particular, CEN values on branches,
phi instructions, and pfe instructions, must be used correctly in order to
implement standard SSA semantics, misuse may produce programs that are
not in SSA form. Similarly the virtual machine does not distinguish ﬂoat,
integer, and array reference types; instead all data exists as 32-bit words and
individual instructions use those words as if they were of the types appropri-
ate for those instructions.
Even though our interpreter was written using switch dispatch and pri-
oritizing simplicity over performance, we measured its performance on a few
simple benchmarks.3 Because there is no compiler targeting ISSA, we manu-
ally transliterated each of our benchmarks from C into ISSA, Perl and Java,
and timed their execution in their respective environments.4 The resulting
execution times (in seconds) are shown in Table 1. There were three bench-
marks:
1. Factorial computes the ﬁrst twelve factorials ten million times.
2. In each of ten million iterations, Fibonacci (Scalar) uses scalar vari-
ables to ﬁnd the forty-seventh element of the Fibonacci sequence.5
3. Fibonacci (Array) builds a dynamically allocated array containing
3The complete source code for these benchmarks can be found in Appendix B.
4The prototype interpreter and the C benchmarks were produced using gcc 2.96 with
-O3 switch, and the Java benchmarks were compiled to Java Bytecode using jikes 1.15.
The Perl benchmarks were executed using Perl 5.6.1 compiled by RedHat, and the Java
benchmarks were executed using the Blackdown Java 2 SDK 1.3.1 02b FCS.
512! and the 47th element of the Fibonacci sequence are the largest numbers of their
respective sequence that do not overﬂow 32-bit words.4 FUTURE WORK 15
the ﬁrst forty-seven elements of the Fibonacci sequence, repeating this
one hundred thousand times.
These benchmarks were executed on a dual-processor 1GHz Pentium III Xeon
with 256KB cache and 1GB of 133Mhz SRAM, running RedHat Linux 7.2
with a Linux 2.4.18 kernel; all I/O was suppressed, and the preprocessing
times of the ISSA and Perl interpreters were excluded.6 The ISSA Fibonacci
array benchmark was run using both destructive and non-destructive array
manipulations. On this benchmark, single-assignment semantics for arrays
resulted in a 3× slowdown relative to destructive array manipulation, which
is, perhaps, less than would be expected considering that there were forty-
seven array updates (which one would expect to be the most expensive op-
eration) in every iteration of the outer loop.
Figure 12 shows performance slowdowns relative to optimized C code.
The ISSA interpreter’s performance (with full single-assignment semantics
for arrays) varied from 122× to 225× slower than the optimized C code.
This places its performance between that of Sun’s JVM and Perl on all three
benchmarks. Thus, while the prototype is slower than the best optimized
interpreters (which can have slowdowns of less than 10×), it is faster than at
least one widely used interpreter and performs reasonably well for a simple
non-threading interpreter.
4 Future Work
4.1 A Faster ISSA Virtual Machine
The prototype implementation, described above, was written prioritizing
code simplicity and the directness of ISSA model implementation over ex-
ecution speed. We plan to rewrite the interpreter, possibly using vmGen
[Ertl et al., 2002], prioritizing performance. We expect this rewrite, applying
some of the state-of-the-art interpreter optimizations and hand-tuning the
interpreter code, to result in as much as an order of magnitude speedup.
Much of the execution time spent by an optimized interpreter on a mod-
ern processor can be attributed to dispatching cost [Ertl and Gregg, 2003].
6We were unable to obtain the current userspace time consumption from within Java,
so we examined the wall-clock time consumed by the computation itself, the time reported
by Java’s proﬁling feature, and the user time reported by Linux for the process’s complete
run, and reported the lowest of these three.4 FUTURE WORK 16
Our prototype virtual machine uses switch dispatch, which is platform in-
dependent and easy to implement but is also relatively expensive. Each
dispatch typically requires the execution of 3 control transfer instructions
[Gagnon, 2002], one of which is an indirect branch that is particularly dif-
ﬁcult for hardware to predict [Ertl and Gregg, 2003]. Threaded execution
dispatch techniques [Bell, 1973, Dewar, 1975] can reduce this overhead. In
addition, superinstructions [Proebsting, 1995, Piumarta and Riccardi, 1998]
can reduce the number of dispatches required, and instruction replication
can increase the eﬀectiveness of hardware branch predictors [Ertl and Gregg,
2003]. Our rewritten interpreter will utilizes some type of threaded dispatch
and may also make use of superinstructions and replication.
Portable interpreter implementations often implement their operand stacks
and virtual registers as elements of arrays in memory. In interpreters of stack-
oriented languages, it is possible to use one (or more) machine registers to
hold the top element(s) of the stack [Ertl, 1995], reducing the number of
memory loads. This optimization is not possible in ISSA, but the results of
SSA instructions are often used soon after their creation. Thus, caching the
most recently generated result registers as local variables and accessing these
explicitly in the subsequent instructions may result in a signiﬁcant reduction
in operand loads.
In addition to any design-level optimizations, the interpreter code itself
could be improved signiﬁcantly. For example, the interpreters state variables
(e.g., PC, CEN) are not currently local variables; thus it is impossible for
the compiler to place these into registers. We expect that the code can be
tightened signiﬁcantly.
4.2 A SafeTSA Interpreter
In parallel with the construction of an improved ISSA interpreter described
above, Amme and Apel are creating an interpreter for the SafeTSA represen-
tation [Amme et al., 2001], which utilizes some of the techniques described in
this paper. This interpreter is designed as what Klint classiﬁes as a Type III
interpreter [Klint, 1981], consisting of a relatively extensive preprocessor and
an interpretive engine. In the initial static preprocessor pass, the interpreter
converts SafeTSA’s tree structured control primitives into a ﬂat sequence
of instructions with explicit branches. In addition, this pass translates φ-
functions into ISSA-like phi instructions, adding the correct CEN operands
to branches and adding pfe instructions after the phi instructions in each ba-5 RELATED WORK 17
sic block. The dynamic interpretive engine currently supports a subset of the
features required to implement the SafeTSA language. Speciﬁcally, primitive
data types can be manipulated and static method calls can be dispatched,
but reference types and dynamic dispatch are not yet implemented.
Although reference types have not yet been implemented, two properties
of SafeTSA will simplify the treatment of non-scalars compared to the han-
dling of arrays described in this paper. First, SafeTSA’s enforced type safety
replaces the array vector, since array and object references can be statically
veriﬁed and implemented with direct pointers. Second, SafeTSA’s memory
operations are destructive making the more expensive non-destructive array
handling described here unnecessary.
An additional challenge to interpreting SafeTSA will be the eﬃcient han-
dling of recursive function calls. The most obvious implementation is to copy
all result registers into the stack on each function call. This naive mecha-
nism may prove too expensive in space or time, and it may be necessary to
explicitly represent the storing of live result registers on a stack.
5 Related Work
This work was motivated by the existence of SafeTSA [Amme et al., 2001] as a
mobile code format, but SafeTSA diﬀers from ISSA in several ways, including
the lack of annotated CFG edge numbers (CEN) and explicit phi-function
end (pfe) instructions, the use of tree structured control primitives instead of
unrestricted gotos, and the use of destructive heap-memory primitives. The
published work on SafeTSA has concentrated on the program representation
itself [Amme et al., 2001], processing it with an optimizing compiler in a
Java Virtual Machine [Amme et al., 2003], and reducing the online cost of
optimizations [von Ronne et al., 2001, 2002, Hartmann et al., 2003]. None of
this work, however, addresses the interpretation of SafeTSA.
Both the Program Dependence Web (PDW) [Ballance et al., 1990] and
the Static Single Information (SSI) [Ananian, 1999] augment SSA Form with
addition information which allows for more explicit execution semantics. To
represent a program as a PDW, each of an SSA program’s φ-functions is
replaced with either a γ- or µ-function (depending on whether the operands
come from forward or backwards control ﬂow); in addition η-functions (which
mark values after the termination of loops) and switches are inserted. This
conversion is only possible for programs with reducible control ﬂow graphs,6 CONCLUSION 18
but provides “all the information needed for control-driven, data-driven, or
demand-driven interpretation”. The interpretation envisioned, however, is
not that of an eﬃcient byte-code interpreter but rather that of a dataﬂow ar-
chitecture simulator. Similarly, the SSI+ variant of Static Single Information
form adds ξ-functions to loops in order to enable abstract interpretation and
provide event driven semantics. The conversion of programs in SSA Form
to each of these representations is more involved than annotating branches
with CENs and grouping phi-functions with pfe instructions as required for
conversion to ISSA.
Interpreting programs in SSA form represents a departure from the tradi-
tional stack-based virtual machine; another alternative is the virtual register
machine. Davis et al. argue that by having less instructions (and thus reduc-
ing indirect jumps) machines with a virtual register architecture can outper-
form those with a stack-based architecture despite requiring extra memory
loads for the explicit operands [2003]. The performance characteristics of an
ISSA interpreter should be closer to that of a virtual register machine than to
those with stack architectures. Both virtual register machines and ISSA re-
duce the number of instructions at the cost of adding explicit input-operands.
The diﬀerence is that the ISSA interpreter has less operands, because the in-
struction result is implicit; this beneﬁt is achieved at the cost of having one
result register per instruction, which is less dense than a typical virtual regis-
ter machine and may increase the size of each operand and have detrimental
cache eﬀects.
6 Conclusion
One can indeed construct an interpretable Static Single Assignment Form.
Programs in standard SSA Form can be translated into this Interpretable
SSA (ISSA) by simply renaming operands to implicit registers, annotating
edge numbers at branches, and marking the last φ-function in each converging
basic block.
We have demonstrated how to build an ISSA interpreter for scalars using
a result register for each instruction, a control-ﬂow edge number register
to select phi instruction operands, and a PhiSet buﬀer to simultaneously
commit phi instruction result values. In addition, we have provided an actual
implementation of the Access and Update functions of Cytron et al.’s single-
assignment array model. Our prototype handles all of these constructs with7 ACKNOWLEDGEMENTS 19
the performance expected of a simple non-threading interpreter.
This demonstrates the practicality of interpreting programs represented
in Static Single Assignment Form. Such SSA interpreters may be useful
in debugging SSA compilers and are a prerequisite for mixed-mode virtual
machines using only SafeTSA.
7 Acknowledgements
We would like to thank Alex Apel for his assistance, especially in prepar-
ing the Perl and Java versions of the benchmark programs. We would also
like to thank Ferm´ ın Reig, Vivek Haldar, Andreas Hartmann, Roxana Dian-
conescu, Niall Dalton, Peter Fr¨ ohlich, and the anonymous reviews for their
suggestions, feedback, and corrections; these have been invaluable.
This eﬀort is partially funded by the Defense Advanced Research Projects
Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel
Command, USAF, under agreement number F30602-99-1-0536, by the Na-
tional Science Foundation under grants CCR-0205712 and CCR-0105710, and
by the Oﬃce of Naval Research under grant N00014-01-1-0854.
Any opinions, ﬁndings, and conclusions or recommendations expressed
in this material are those of the authors and should not be interpreted
as necessarily representing the oﬃcial views, policies or endorsements, ei-
ther expressed or implied, of Defense Advanced Research Projects Agency
(DARPA), the National Science foundation (NSF), the Oﬃce of Naval Re-
search (ONR), or any other agency of the U.S. Government.
Wolfram Amme and Alex Apel are developing the SafeTSA interpreter
under grant AM-150/1-3 of the Deutsche Forschungsgemeinschaft.
References
Ole Agesen and David Detlefs. Mixed-mode bytecode execution. Technical
Report SMLI TR-2000-87, Sun Microsystems, Palo Alto, CA, June 2000.
Bowen Alpern, Mark N. Wegman, and F. Kenneth Zadeck. Detecting equality
of variables in programs. In Proceedings of the 15th SIGPLAN-SIGACT
symposium on Principles of programming languages, pages 1–11, 1988.
Wolfram Amme, Niall Dalton, Jeﬀery von Ronne, and Michael Franz.REFERENCES 20
SafeTSA: a type safe and referentially secure mobile-code representation
based on static single assignment form. In Proceedings of the SIGPLAN’01
conference on Programming language design and implementation, pages
137–147, 2001.
Wolfram Amme, Jeﬀery von Ronne, and Michael Franz. Using the SafeTSA
representation to boost the performance of an existing java virtual ma-
chine. In 10th International Workshop on Compilers for Parallel Comput-
ers, January 2003.
C. Scott Ananian. The static single information form. Master’s thesis, Mas-
sachusetts Institute of Technology, Cambridge, Massachussetts, September
1999. URL http://www.cag.lcs.mit.edu/~cananian/Publications/.
Robert A. Ballance, Arthur B. Maccabe, and Karl J. Ottenstein. The pro-
gram dependence web: A representation supporting control-, data-, and
demand-driven interpretation of imperative languages. In Proceedings of
the conference on Programming language design and implementation, pages
257–271, 1990.
James R. Bell. Threaded code. Communications of the ACM, 16:370–372,
1973.
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and
F. Kenneth Zadeck. Eﬃciently computing static single assignment form
and the control dependence graph. ACM Transactions on Programming
Languages and Systems (TOPLAS), 13(4):451–490, 1991.
Brian Davis, Andrew Beatty, Kevin Casey, David Gregg, and John Waldron.
The case for virtual register machines. In Interpreters, Virtual Machines
and Emulators (IVME ’03), pages 41–49, 2003.
Robert B. K. Dewar. Indirect threaded code. Communications of the ACM,
18:330–331, June 1975.
M. Anton Ertl. Stack caching for interpreters. In Proceedings of the SIG-
PLAN 1995 conference on Programming language design and implementa-
tion, pages 315–327, 1995.
M. Anton Ertl and David Gregg. Optimizing indirect branch prediction accu-
racy in virtual machine interpreters. In Proceedings of the SIGPLAN 2003REFERENCES 21
conference on Programming language design and implementation, pages
278–288, 2003.
M. Anton Ertl, David Gregg, Andreas Krall, and Bernd Paysan. vmgen — a
generator of eﬃcient virtual machine interpreters. Software—Practice and
Experience, 32(3):265–294, 2002.
Etienne M. Gagnon. A Portable research framework for the execution of java
bytecode. PhD thesis, McGill University, 2002. URL http://www.info.
uqam.ca/~egagnon/gagnon-phd.pdf.
Andreas Hartmann, Wolfram Amme, Jeﬀery von Ronne, and Michael Franz.
Code annotation for safe and eﬃcient dynamic object resolution. In 2nd
International Workshop on Compiler Optimization Meets Compiler Veri-
ﬁcation, April 2003.
Paul Klint. Interpretation techniques. In Software–Practice and Experience,
pages 11:963–973, 1981.
Kathleen Knobe and Vivek Sarkar. Array SSA form and its use in paral-
lelization. In Proceedings of the 25th SIGPLAN-SIGACT symposium on
Principles of programming languages, pages 107–120, 1998.
Chandra Krintz. Improving mobile program performance through the use of
a hybrid intermediate representation. In 2nd Workshop on Intermediate
Representation Engineering for Virtual Machines, June 2002.
Robert Morgan. Building an Optimizing Compiler. Butterworth-Heinemann,
Woburn, Massachusetts, 1998.
Ian Piumarta and Fabio Riccardi. Optimizing direct threaded code by se-
lective inlining. In Proceedings of the SIGPLAN 1998 conference on Pro-
gramming language design and implementation, pages 291–300, 1998.
Todd A. Proebsting. Optimizing an ANSI C interpreter with superoperators.
In Proceedings of the 22nd SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 322–332, 1995.
Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Global
value numbers and redundant computations. In Proceedings of the 15th
SIGPLAN-SIGACT symposium on Principles of programming languages,
pages 12–27, 1988.REFERENCES 22
Jeﬀery von Ronne, Michael Franz, Niall Dalton, and Wolfram Amme. Com-
pile time elimination of null- and bounds-checks. In 9th Workshop on
Compilers for Parallel Computers, June 2001.
Jeﬀery von Ronne, Andreas Hartmann, Wolfram Amme, and Michael Franz.
Eﬃcient online optimization by utilizing oﬄine analysis and the SafeTSA
representation. In James F. Power and John T. Waldron, editors, Recent
Advances in Java Technology: Theory, Application, Implementation, chap-
ter 27, pages 233–241. Computer Science Press, Trinity College Dublin,
2002. ISBN 0-9544145-0-0.A IMPLEMENTATION OF THE INTERPRETER CORE 23
A Implementation of the Interpreter Core
A.1 ssa vm.c
#include ” inst .h”
#include ”ssa vm .h”
#include ” ssa array .h”
#include ” ssa parser .h”
#include <stdlib .h>
#include <stdio .h>
typedef struct phi assignment phi assignment ;
struct phi assignment
{
ssa variable v ; //v = value to be transfered
int ovi ; //ovi = output variable index
};
typedef struct
{
inst ∗∗ ia ; //ia = instruction array
int ial ; // ial = instruction array length
ssa array vector ∗av ; //av = array vector
ssa variable ∗ oa ; //oa = output array ( size = inst length )
phi assignment ∗pq ; //pq = phi−assignment queue
int pqt ; //pqt = phi−assignment queue top
int ip ; //ip = instruction pointer ( index to inst array )
int cen ; //cen = CFG edge number
} vm state ;
void init ( vm state ∗ s , inst ∗ inst array [ ] , int inst length ) ;
void commit phis ( vm state ∗ s ) ;
int execute ( vm state ∗ s , inst ∗ instruction ) ;
inline int pq empty ( vm state s ) {return s . pq == NULL;}
inline ssa variable decode immediate ( inst ∗ instruction , int i ) {
ssa variable v = (( ssa variable ∗) instruction −>data ) [ i ] ;
return v ;
}
inline ssa variable decode operand ( vm state ∗ s , inst ∗ instruction , int i ) {
int index = (( int ∗) instruction −>data ) [ i ] ;
return s−>oa [ index ] ;
}
inline ssa variable first operand ( vm state ∗ s , inst ∗ instruction ) {
return decode operand (s , instruction , 0) ;
}
inline ssa variable second operand ( vm state ∗ s , inst ∗ instruction ) {
return decode operand (s , instruction , 1) ;
}A IMPLEMENTATION OF THE INTERPRETER CORE 24
void ssa vm ( inst ∗ inst array [ ] , int inst length )
{
vm state s ;
inst ∗ ci ;
init (&s , inst array , inst length ) ;
while (1)
{
if ( s . ip < 0 || s . ip >= s . ial ) abort () ; // we jumped out of the method
ci = s . ia [ s . ip ] ;
if ( execute (&s , ci ) == 0) break ;
}
}
void init ( vm state ∗ s , inst ∗ inst array [ ] , int inst length )
{
// instructions −− should we copy these?
s−>ia = inst array ;
s−>ial = inst length ;
// simple registers ;
s−>ip = 0;
s−>cen = 0;
// complex data
s−>av = av init () ;
s−>oa = ( ssa variable ∗) calloc ( inst length , sizeof ( ssa variable ) ) ;
s−>pq = ( phi assignment ∗) calloc ( inst length , sizeof ( phi assignment ) ) ;
s−>pqt = 0;
}
void commit phis ( vm state ∗ s )
{
int i ;
for ( i = 0; i < s−>pqt ; i++) {
s−>oa [ s−>pq [ i ] . ovi ] = s−>pq [ i ] . v ;
}
s−>pqt = 0;
}
int execute ( vm state ∗ s , inst ∗ ci )
{
// temporaries
int t ,n , a , i ;
ssa variable x , y ;
switch ( ci−>opcode )
{
case CONST:
x = decode immediate ( ci ,0) ;
s−>oa [ s−>ip ] = x ;
s−>ip++;
break ;
case PRINT:
//puts (” printing int ”) ;A IMPLEMENTATION OF THE INTERPRETER CORE 25
x = first operand (s , ci ) ;
printf ( ”==> %d\n” , x . i ) ;
s−>ip++;
break ;
case ADD:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i + y . i ;
s−>ip++;
break ;
case SUB:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i − y . i ;
s−>ip++;
break ;
case DIV:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i / y . i ;
s−>ip++;
break ;
case MUL:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i ∗ y . i ;
s−>ip++;
break ;
case AND:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i && y . i ;
s−>ip++;
break ;
case OR:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . i = x . i | | y . i ;
s−>ip++;
break ;
case NEG:
x = first operand (s , ci ) ;
s−>oa [ s−>ip ] . i = −x . i ;
s−>ip++;
break ;
case BGE:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch target
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i >= y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}A IMPLEMENTATION OF THE INTERPRETER CORE 26
break ;
case BGT:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch target
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i > y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}
break ;
case BLE:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch target
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i <= y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}
break ;
case BLT:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch target
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i < y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}
break ;
case BNE:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch target
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i != y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}
break ;
case BEQ:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
t = decode immediate ( ci ,2) . t ; // branch targetA IMPLEMENTATION OF THE INTERPRETER CORE 27
n = decode immediate ( ci ,3) .n ; // CFG Edge Number
if ( x . i == y . i )
{
s−>cen = n;
s−>ip = t ;
} else {
s−>ip++;
}
break ;
case GOTO:
t = decode immediate ( ci ,0) . t ; // branch target
n = decode immediate ( ci ,1) .n ; // CFG Edge Number
s−>cen = n;
s−>ip = t ;
break ;
case EXIT:
//puts (” exiting ”) ;
exit (0) ;
case RETURN:
x = decode operand (s , ci ,0) ; // thing to set the element to
exit (x . i ) ;
case PHI:
// check that the PHI is big enough for the cfg edge number
n = ci−>opdnum;
if ( s−>cen >= n) abort () ; // phi must have enough operands
if ( s−>pqt >= s−>ial ) abort () ; // can ’ t overflow phi queue buffer
// record the data in the phi−assignment queue
s−>pq [ s−>pqt ] . ovi = s−>ip ;
s−>pq [ s−>pqt ] . v = decode operand (s , ci , s−>cen ) ; ;
s−>pqt++;
// next instruction
s−>ip++;
break ;
case PFE:
commit phis ( s ) ;
s−>cen = 0;
s−>ip++;
break ;
case NOOP:
s−>ip++;
break ;
case NEWARRAY:
x = decode operand (s , ci ,0) ; // thing to set the element to
s−>oa [ s−>ip ] . a = av newarray (s−>av , x . i ) ;
s−>ip++;
break ;
case UPDATE:
a = decode operand (s , ci ,0) . a ; // array ( index of array in array
vector )
i = decode operand (s , ci ,1) . i ; // element ( index into array )
x = decode operand (s , ci ,2) ; // thing to set the element to
s−>oa [ s−>ip ] . a = av update (s−>av , a , i , x) ; // result is new array
s−>ip++;
break ;A IMPLEMENTATION OF THE INTERPRETER CORE 28
case ACCESS:
a = decode operand (s , ci ,0) . a ; // array ( index of array in array
vector )
i = decode operand (s , ci ,1) . i ; // element ( index into array )
s−>oa [ s−>ip ] = av access (s−>av , a , i ) ; // result is element
s−>ip++;
break ;
case FADD:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . f = x . f + y . f ;
s−>ip++;
break ;
case FSUB:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . f = x . f − y . f ;
s−>ip++;
break ;
case FDIV:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . f = x . f / y . f ;
s−>ip++;
break ;
case FMUL:
x = first operand (s , ci ) ;
y = second operand (s , ci ) ;
s−>oa [ s−>ip ] . f = x . f ∗ y . f ;
s−>ip++;
break ;
case FCONST:
s−>oa [ s−>ip ] = decode immediate ( ci ,0) ;
s−>ip++;
break ;
case FPRINT:
x = first operand (s , ci ) ;
s−>ip++;
break ;
default :
abort () ;
}
return 1;
}A IMPLEMENTATION OF THE INTERPRETER CORE 29
A.2 ssa vm.h
#ifndef SSA VM H
#define SSA VM H
#include ” inst .h”
typedef signed s32 ;
typedef unsigned u32 ;
typedef float f32 ;
typedef union ssa variable {
s32 i ; // 32− bit signed integer integer
f32 f ; // 32− bit floating point value
u32 a ; // 32− bit unsigned array vector index
u32 n ; // 32− bit unsigned array CFG Edge Number
u32 t ; // 32− bit unsigned branch target
} ssa variable ;
void ssa vm ( inst ∗ inst array [ ] , int inst length ) ;
#endifA IMPLEMENTATION OF THE INTERPRETER CORE 30
A.3 ssa array.h
#ifndef SSA ARRAY H
#define SSA ARRAY H
#include ”ssa vm .h”
#include <stdlib .h>
typedef struct
{
unsigned l ; // array length
ssa variable ∗a ; // array of ssa variables
} ssa array ;
typedef struct
{
unsigned na ; // next array
unsigned l ; // allocated length
ssa array ∗v ; // array ( vector ) of arrays
} ssa array vector ;
ssa array vector ∗ av init () ;
u32 av newarray ( ssa array vector ∗av , int size ) ;
void av cleanup ( ssa array vector ∗av) ;
u32 av update ( ssa array vector ∗av , u32 av index , u32 a index , ssa variable
v) ;
ssa variable av access ( ssa array vector ∗av , u32 av index , u32 array index )
;
u32 av fastupdate ( ssa array vector ∗av , u32 av index , u32 array index ,
ssa variable v) ;
#endifA IMPLEMENTATION OF THE INTERPRETER CORE 31
A.4 inst.h
#ifndef INST H
#define INST H
#include <stdlib .h>
#define MAX INSTS 1024
#define BASE 257 /∗ the f i r s t 256 is assigned to ascii char ∗/
typedef struct
{
int opcode ;
int opdnum ; /∗ length in words ∗/
char ∗ img ;
void ∗ data ;
int data type ; /∗ int , bool , float ∗/
} inst ;
typedef struct
{
char ∗ img ;
int opdnum;
} inst attribute ;
static inst attribute inst att [] = {
{”const” ,1} ,
{” fconst ” , 1} ,
{”add” ,2} ,
{”sub” ,2} ,
{”div” ,2} ,
{”mul” ,2} ,
{”and” ,2} ,
{”or” ,2} ,
{”neg” ,1} ,
{”fadd” , 2} ,
{”fsub” , 2} ,
{” fdiv ” , 2} ,
{”fmul” , 2} ,
{”bge” ,4} ,
{”bgt” ,4} ,
{”ble” ,4} ,
{” blt ” ,4} ,
{”bne” ,4} ,
{”beq” ,4} ,
{”goto” ,2} ,
{”phi” ,−1},
{”pfe” ,0} ,
{”update” , 3} ,
{” access ” , 2} ,
{”newarray” , 1} ,
{” exit ” , 0} ,
{”return” , 1} ,
{” print ” , 1} ,
{” fprint ” , 1} ,A IMPLEMENTATION OF THE INTERPRETER CORE 32
{” null ” ,0} ,
};
extern inst ∗ insts array [ ] ;
extern int in s t s s iz e ;
/∗ inst . c ∗/
inst ∗ new inst ( int opcode ) ;
inst ∗ new unary inst ( int opcode , int opd) ;
inst ∗ new unary finst ( int opcode , float opd) ;
inst ∗ new binary inst ( int opcode , int opd1 , int opd2) ;
inst ∗ new tenary inst ( int opcode , int opd1 , int opd2 , int opd3) ;
inst ∗ new quandary inst ( int opcode , int opd1 , int opd2 , int opd3 , int opd4) ;
inst ∗ new phi inst ( int opcode , int opdnum , int opd [ ] ) ;
void p r i n t a l l i n s t s ( inst ∗ insts [ ] , int size ) ;
void print inst ( inst ∗ ist ) ;
void d e l e t e a l l i n s t s ( inst ∗ insts [ ] , int size ) ;
#endif /∗ no INST H ∗/B BENCHMARKS 33
B Benchmarks
B.1 Factorials
B.1.1 factorial.ssa
// Find 12! , 10 ,000 ,000
0 const 0 // zero
1 const 1 // one
2 const 12 // x
3 const 10000000 // iterations
// Outer Loop
4 phi 2 (0 12)
5 pfe
// Inner Loop
6 phi 2 (1 9) // f
7 phi 2 (1 10) // j
8 pfe
9 mul (6 7)
10 add (7 1)
11 ble (10 2) [6] 1
// Outer Loop Continued
12 add (4 1)
13 blt (12 3) [4] 1
// Exit
14 exitB BENCHMARKS 34
B.1.2 factorial.c
#include <stdio .h>
#include <stdlib .h>
#include <time .h>
int main( int argc , char ∗∗ argv ){
clock t start , end , used = 0;
int f ,x , i , j ;
start = clock () ;
f = 1;
x = 12;
i = 0;
do {
f = 1;
j = 1;
do {
f = f ∗ j ;
j++;
} while ( j <= x) ;
i++;
} while ( i < 10000000) ;
end = clock () ;
fprintf ( stderr , ”Time used %d\n” ,end−start ) ;
}B BENCHMARKS 35
B.1.3 Factorial.java
public class Factorial {
public static void main( String [ ] args ){
long start ;
long end ;
int f ,x , i , j ;
start = System . currentTimeMillis () ;
f = 1;
x = 12;
i = 0;
do {
f = 1;
j = 1;
do {
f = f ∗ j ;
j++;
} while ( j <= x) ;
i++;
} while ( i < 10000000) ;
end = System . currentTimeMillis () ;
System . out . println (”Time used : ” + Long . toString (
end−start ) ) ;
}
}B BENCHMARKS 36
B.1.4 factorial.pl
#!/ usr/bin/perl −w
require ’ sys/ syscall .ph ’ ;
$TIMEVAL T = ”LLLL” ;
$done = $start = pack($TIMEVAL T, ( ) ) ;
syscall( &SYS times , $start , −1) ;
$f=1;
$x=12;
$i =0;
do {
$f = 1;
$j = 1;
do {
$f = $f ∗ $j ;
$j++;
} while ( $j <= $x) ;
$i++;
} while ( $i < 10000000) ;
syscall( &SYS times , $done , 0) ;
@start = unpack($TIMEVAL T, $start ) ;
@done = unpack($TIMEVAL T, $done) ;
print”Time used : ” .( $done[0]− $start [0]) . ”\n” ;B BENCHMARKS 37
B.2 Fibbonacci Sequence (in scalars)
B.2.1 ﬁbonacci.ssa
// F(0) = 0
// F(1) = 1
// F(n) = F(n−2) + F(n−1) for all n >= 2
//
// Calculate F(46) , 10 ,000 ,000 times
// Block 0
0 const 0 //
1 const 1 //
2 const 46 // max = 46
3 const 2 // n = 2
4 const 10000000 // iterations
// Outer Loop
5 phi 2 (0 14)
6 pfe
// Inner Loop
7 phi 2 (0 8) // phi ( f 0 , f {n−2})
8 phi 2 (1 11) // phi ( f 1 , f {n−1})
9 phi 2 (3 12) // phi (n=2, n+1)
10 pfe
11 add (7 8) // f {n} = f {n−2} + f {n−1}
12 add (1 9) // n <− n+1
13 ble (12 2) [ 7 ] 1 // n <= max repeat loop
//Outer Loop Continued
14 add (5 1)
15 blt (14 4) [5] 1
// End
16 exitB BENCHMARKS 38
B.2.2 ﬁbonacci.c
#include <stdio .h>
#include <stdlib .h>
#include <time .h>
int main( int argc , char ∗∗ argv ){
clock t start , end , used = 0;
int i , n , max = 46 , f n , f n 1 , f n 2 ;
start = clock () ;
f n 1 = 0;
f n 2 = 1;
i = 0;
do {
n = 2;
do {
f n = f n 2 + f n 1 ;
f n 2 = f n 1 ;
f n 1 = f n ;
n = n + 1;
} while (n <= 46) ;
i = i + 1;
} while ( i <10000000) ;
end = clock () ;
fprintf ( stderr , ”Time used %d\n” ,end−start ) ;
}B BENCHMARKS 39
B.2.3 Fibonacci.java
public class Fibonacci {
public static void main( String [ ] args ){
long start , end ;
int f n 2 , f n 1 , f n , n;
int max = 46;
int i ;
start = System . currentTimeMillis () ;
f n 1 = 0;
f n 2 = 1;
i = 0;
do{
n=2;
do{
f n = f n 2 + f n 1 ;
f n 2 = f n 1 ;
f n 1 = f n ;
n = n + 1;
}while(n<=46);
i = i + 1;
}while( i <10000000) ;
end = System . currentTimeMillis () ;
System . out . println (”Time used : ” + Long . toString (
end−start ) ) ;
}
}B BENCHMARKS 40
B.2.4 ﬁbonacci.pl
#!/ usr/bin/perl −w
require ’ sys/ syscall .ph ’ ;
$TIMEVAL T = ”LLLL” ;
$done = $start = pack($TIMEVAL T, ( ) ) ;
syscall( &SYS times , $start , 0) ;
$f n 1 = 0;
$f n 2 = 1;
$i =0;
do {
$n = 2;
do {
$f n = $f n 2 + $f n 1 ;
$f n 2 = $f n 1 ;
$f n 1 = $f n ;
$n = $n + 1;
} while ( $n <= 46) ;
$i = $i + 1;
} while ( $i <10000000) ;
syscall( &SYS times , $done , 0) ;
@start = unpack($TIMEVAL T, $start ) ;
@done = unpack($TIMEVAL T, $done) ;
print”Time used : ” .( $done[0]− $start [0]) . ”\n” ;B BENCHMARKS 41
B.3 Fibbonacci Sequence (in an array)
B.3.1 ﬁbonacci array.ssa
// F[0] = 0
// F[1] = 1
// F[n] = F[n−1] + F[n−2]
// Find F[46] , 100 ,000 using arrays
0 const 0 // 0
1 const 1 // 1
2 const 2 // j0 = 2
3 const 46 // x = 46
4 const 100000 // iterations
// Outer Loop
5 phi 2 (0 22) // i
6 pfe
7 add (3 1)
8 newarray 7 // F = newarray x+1
9 update (8 0) 0 // f [0] = 0
10 update (9 1) 1 // F[1] = 1
// Inner Loop
11 phi 2 (2 20) // j
12 phi 2 (10 19) // F
13 pfe
14 sub (11 1) // j − 1
15 sub (11 2) // j − 2
16 access (12 14) // F[ j −1]
17 access (12 15) // F[ j −2]
18 add (16 17)
19 update (12 11) 18 // F[ j ] = F[ j −1] + F[ j −2]
20 add (11 1) // j = j + 1
21 ble (20 3) [11] 1
// Outer Loop ContinuedB BENCHMARKS 42
22 add (5 1)
23 blt (22 4) [5] 1
// Exit
24 exitB BENCHMARKS 43
B.3.2 ﬁb array.c
#include <stdio .h>
#include <stdlib .h>
#include <time .h>
int main( int argc , char ∗∗ argv ){
clock t start , end ;
int i , j , ∗ f , x;
start = clock () ;
x=46;
i = 0;
do {
f = ( int ∗) malloc ( sizeof ( int ) ∗ ( x + 1) ) ;
f [0] = 0;
f [1] = 1;
j = 2;
do {
f [ j ] = f [ j −1] + f [ j −2];
j++;
} while ( j <= x) ;
free ( f ) ;
i++;
} while ( i < 100000) ;
end = clock () ;
fprintf ( stderr , ”Time used %d\n” ,end−start ) ;
}B BENCHMARKS 44
B.3.3 FibArray.java
public class FibArray {
public static void main( String [ ] args ){
long start ;
long end ;
int [ ] f ;
int i , j ;
int x;
start = System . currentTimeMillis () ;
x = 46;
i = 0;
do{
f = new int [x+1];
f [0]=0;
f [1]=1;
j =2;
do{
f [ j ] = f [ j −1] + f [ j −2];
j++;
}while( j<=x) ;
f=null ;
i++;
}while( i <100000) ;
end = System . currentTimeMillis () ;
System . out . println (”Time used : ” + Long . toString (
end−start ) ) ;
}
}B BENCHMARKS 45
B.3.4 ﬁb array.pl
#!/ usr/bin/perl −w
require ’ sys/ syscall .ph ’ ;
$TIMEVAL T = ”LLLL” ;
$done = $start = pack($TIMEVAL T, ( ) ) ;
syscall( &SYS times , $start , 0) ;
$x=46;
$i =0;
do {
@f = (0.. $x) ;
$j = 2;
do {
$f [ $j ] = $f [ $j −1] + $f [ $j −2];
$j++;
} while ( $j <= $x) ;
$i++;
} while ( $i < 100000) ;
syscall( &SYS times , $done , 0) ;
@start = unpack($TIMEVAL T, $start ) ;
@done = unpack($TIMEVAL T, $done) ;
print”Time used : ” .( $done[0]− $start [0]) . ”\n” ;