Purdue University

Purdue e-Pubs
Department of Computer Science Technical
Reports

Department of Computer Science

1975

One Address Computers are Faster and Use Less Memory Space
to Execute Arithmetic Assignment Statements
Victor Schneider
Bradford Wade

Report Number:
75-149

Schneider, Victor and Wade, Bradford, "One Address Computers are Faster and Use Less Memory Space
to Execute Arithmetic Assignment Statements" (1975). Department of Computer Science Technical
Reports. Paper 95.
https://docs.lib.purdue.edu/cstech/95

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries.
Please contact epubs@purdue.edu for additional information.

ONE ADDRESS COMPUTERS ARE FASTER AND USE
LESS MEMORY SPACE TO EXECUTE ARITHMETIC
ASSIGNMENT STATEMENTS
Victor Schneider
Bradford Wade
Computer Sciences Department
Purdue University
West Lafayette, Indiana 47907
CSD TR 149

One Address Computers Are Faster and Use

Less Memory Space to Execute Arithmetic
Assignment Statements

Victor Schneider
Bradford Wade
Computer Sciences Department
Purdue University
West LafayeV*e, Indiana
47906

Work on this p~er was supported by NIP Grant ~J-3l572
to Purdue University and by IBM Corporation graduate
fellowships.

One Address Computers Are Faster and Use Less
Memory Space to Execute Arithmetic
Assignment Statements

Index Terms:

computer instruction sets, machine architecture, code
size minimization, minimization of execution time

C. R. Categories:

4.12, 4.22, 4.6, 6.21

Abstract
A notation is developed which permits space and time efficiemcy
comparisons of four basic computer architectures in use today for
executing Fortran-style assignment statements. From the com~arisona,
we discover that a suitably designed l-address archit.cture (one
accumulator machine) outperforms toe other architectures in speed
of execution and in encoded size ~ compiled Fortran statements.
The comparisons are valid for CPU's ranging from very inexpensive
designs with few registers to the most expensive designs having
many registers and employing "pipelining" teohniques or lookahead
fetches of operands or instructions into fast cache memories.

-1-

Introduction
This paper provides some answerS to questions of what
sort of a computer instruction set will give best executiontime performance for simple Fortran assignment statements (or
Algol or PL/l, for that manter) such as those considered in
Knuth's statistical study of Fortran program characteristics (3).
We compare four different computer architectures with respect to
the number of bits needed to represent the object code of
selected Fortran assignment statements and the execution time
in microseconds for the object codes. The four architectures
are
(a) The st~ck machines, notabl~ the ICL KDF9 and the Burroughs
5000 and 6000 series computers,
(b) The I-address machines, as implemented in numerous 16bit minicomputers, such as the IBM 1130 and 1800, the
Varian 70 series, and many others,
(c) The 2_address machines, most well known of which is the
IBM 360/370 series, but also represented in the Data
General minicomputers, and in computers made_~y
companies like XDS and Interdata.Corporation,
(d) The 3-address machines, currently exemplified by the
CDC 6000 and Cyber 70 ssries of computers. These
machines are not strictly 3-address, in the sense that
only t~€ register-to-register operations actually involve
three independent addresses (namely, two source registers
and a destination register). But, since these machines
are so widely used for sciantific computing by universities
and government research laboratories, we include them for
sake of comparison.
Our method of comparison involves comparing the effects of
implementing the four architectures on a selected CPU. Thus,
we are not comparing Burroughs computers with CDC equipment,
but rather Burroughs machines wired into CDC equipment or CDC
machines wired into Burroughs equipment. Another way of looking
at this is to talk about implementations of the four architectures
on a CPU using emulation techniques, possibly assisted by hard wired

-2-

decoding circuits and speci.lly designed fast cache memcries.
In this way, we compare apples with apples, rather than apples
with coconuts or bananas.
Another point we should make is that the I-address architecture
is assumed to include "reverse Bub-u-act" and "reverse divide ll
operations, so that the noncommutative nature of these operations

can be disregarded in our comparisons. (4) Finally, the "paper
and pencil" results obtained in this study have been confirmed
empirically through emulations on a microprogrammed minicomputer

and through simulations and timings run on IBM 370 equipment. (6)
Por standard definitions of the four architectures, the readar

is referred to a textbook, such as

~·s.

(1)

-3Notation
Let x be the number of bits used to represent the "opcode",
or machine operation, portion of an instruction in any of ths
archi tectures tc be compared. Then, y is the number of bits used
to address source and destinetion registers in both the 2-address
and 3-address architectures, w is the number of bits used for
short addresses of operands stored in some region of main memory,
and v is the number of bits used for a "long address" of operands
stored anywhere at all in main memory.
If anyone reproaches us for not considering variable-length
~ncoding of opcodes, we will respond by letting x be the average
number of bits used to represent an opcode, or, for a persistent
critic, we would le~.x be the miDimum number of bits needed to
represent an ope ode ~n the encoding. The point is to select some
value for x that quiets the opposition, and then show that the
results still hold.
In most contemporary architectures, x varies from 5 to 8
bits, y is 3 or 4 bits, and we will e~dow our 2_address
architecture with the same number of software registers as the
3-address version. In machines having a short addressing Bcheme,
W is typically 8 bits, or 7 bits signed, and is often treated
(unwisely, we believe) as an offset from the program counter,
thus mixing alterable operands with executable instructions in
the object code.
The approach that we advocate is to use an
implicit base register containing the (stack pointer to the)
beginning addre~s of the currently used table of program variables.
Typical values for v range from 15 bits to 22 or even 24 bits.
For our ~urposes,
V)x:?w)y
a fact that we will use later 1n comparisons of s~ace needed to
store algorithms for different architectures.
An interesting question that arises next is "what can we say
about the contribution of each portion of a software instruction
to its total execution time?"
In expensively designed CPU's
having an abundant supply of internel registers or fast saratchpad
memory, it is ~pparen~ that the operations on register data must

-4be included in the time estimate for executing an operation on
the expensive e-CPU; namely,
te(x) = te(x) + 2ge (y) = t.(x) + 3ge (y).
At the other extreme, 1n a very inexpensive a-CPU there may be

so few registers available that the software registers are
implemented as special locations

in

~in

memory.

For this. cheap

c-CPU,
tc(x) + 3gc (Y) > tb(x) + 2gc (y»
tc(x).
A feeeh to or store from a long address should take the
same time in an a-CPU as the corresponding operations on a
short address; t. e.,

fe(v) = fe(w).
For the c-CPU, however, there are usually problems caused by the
inexpensive main memory used, in which the memory word width in

bits is less than the wcrd width ,implemented in the software
instruction set.

fc(v»

In

th~s

situation,

fc(w).

Because the c-CPU, e-CPU, and any intermediate price CPU's
all emulate the same software instruction set in each architecture,
all sequences of instructions fit the Bame number of bits in
_ill mlllllll1'7'_t'or'sll CPU's emulating the same architecture.

-5Timing and Space Comparisons

Knuth's study (3) found that the simple Fortran assignment
statement of the form
A = B

constituted 68% of the statically measured corpus of programs that
was analyzed. Table 1 below presents the compiled code and
corresponding space requirements of this statement for all four
architectures.

Table 1:

A- B

Stack
Addr(A)
x+v
Va1ue(B) , ~+V
Store
x
Pop
x
Space: 4x+2v
Time: 4t(x)+2f(v)

1-Address
Load B x+v
Stcre A x+v
Space: 2x+2v
Time: 2t(x)+llf(v)

2-Address
Load B in R1 x+v+y
Store R1 in A x+v+y
Space: 2x+2v+2y
Time: 2t(x)+2f(v)+2g(y)

3-Address
Load B in HI
x+v+y
Store R1 in A X+V+l
Space: 2x+2v+2y
Time: 2~(x)+2f(v)+2g(y)
In the Knuth stud:f! ~p~o~i~~~,.7f!!o_ of. ~he re~aining
assignment st~tement statements (those fn whtch arithmetic was
performed) had the geDeraI form
A = B cperator C.
The comparison of the four architectures for the compiled code
of such a statement, with "+" taken as the operator, is given in

Table 2.
Table 2:

Statement A - B + C'

Stack
Addr(l)
Va1ue(B)
Value (.)
Add

x+v
x+v
x+v
x

Store

x

Pop
x
Space: 6x+3v
Time: 6t(x)+3f(v)

1-Address

2-Address
Load B in R1
Add C to R1

Load B

x+v

Add C

x+v

Store A

x+v

Space: 3x+3v
Time: 3t(x)+3f(v)

x+v+Y
x+v+Y

Store Rl in A X+V+Y

Space: 3x+3v+3y
Time: 3t(x)+3f(v)+3g(y)

-6Table 2 (coQtinued)
3-Address
Load B in Rl
x+v+y
x+v+y
Load C in H2
H3 = HI + H2
x+ 3y
Store H3 in A x+v+y
Space: 4x+3v+6y
Time: 4t(x)+3f(v)+3g(y)
In Tables 1 and 2, it is apparent that the I-address machine
is superior to the other architectures in terms of space ~
time. requirements for the two statements considered. These two
statements represent 90% of all statioally measured assignment
statements in the Knuth study. It could be argued that the
remaining 10% of as~ignment statements are probably executed
with much greater frequency than their static fraction of all
code would indicate. As an example of one such more complex
statement that requires no use of temporaries tc stcre intermediate
values during the computation, consider the statement of Table 3:
Table 3:

Statement A - Bl + B2 + B3 + B4

Stack
Addr(4)

I-Address
x+v

VllI:1fe (BI) x+v

Value (B2) x+v
x
4.dd
Va&u(B3) x+v
Add
x
ValuelB4) x+v
J,dd
x
x
Store
Pop
x

space: lOx+5v
Time: lOt(x)+~f(v)

Load Bl
Add B2
Add B3
Add B4
Store A

2-Address
x+v
x+v
x+v
x+v
!:!:!

Load BI in HI
Add B2 to HI
. Add B3 to HI
Add B4 to HI
Store HI in A

Space: 5x+5v
Time: 5t(x)+5f(v)

x+v+Y
x+v+Y
x+v+Y'
X+V+l

Space: 5x+5v+5y
Time: 5t(x)+5f(v)+5g(y)

3-Address
Load Bl in Bl
Load.B2 in H2
HI z H2 + R3
Load B4 in H2
H3·. HI + H2
Store H3 in A

X+V+Y

X+V+7

x+v+;Y

x +3y
x+v+:f
Jt +3y
Z+V+Y

Space: 8~ 5v +14Y
Time: 8t(x)*5f(v)+14g(y)

-1-

Prom inspection of Tables 2 and 3, it is olear that increasing
the length of the assignment statement of Table 3 still leaves the
l-address machine in the position of consuming les8 memory space

and running as fast or faster than the other architectures.

The

assignment statements in Tables 2 and 3 generate no intermediate

values that muet be stored temporarily during a computation (5).
What happens when intermediate reeulte are generated by a computation?
Consider the assignment statement of Table 4, th.t generatee one
intermediate result. (Since the 3-address machine obviously will
not compete at this and further levels of complexity, we omit it
from subsequent tables and discussion.)
Table 4:

Statement A • B

* a+D* E

Stack

liOAddress

Addr(A)
Value (a)
Value(C)

Load B
Mpy C·
Store Tl
Load D
Mpy E
Add Tl

x+v
x+v
x+"
x+v
x+v
x+w

Store A

x+v

x+v
x+v
x+v
MUlti~lY x
Value D) x+v
Value lB) x+v
MUltiply x
Store
x
Pop
x
Space: 9x+5v
Time: 9t(x)+5f(v)

2-Address
Load B in Rl
Mpy C by Rl
Load D inR2
Mpy E by·~
Add R2 to Rl

Store Rl in A x+v+Y

Space: 1x+5v+2w
Time: 1t(x)+5f(v)+2g(w)

At thisnpoint, the I-address
In particUlar, we have to provide
storage of intermediate values in
space advantage. We also have to

x+v+Y
x+v+Y
x+v+y
x+v+y
x+ 2y

Space: 6x+5v+6y
Time: 6t(x)+5f(v)+6g(y)

architecture begins to lose ground.
a short-address format for
order to retain the I-address
note that a modified stack

architecture that provides a Store(.) instruction to store the top
of the staok in A and pop the stack definitely takes less ~pace
and executes the statement in Table 4 in time comparable to the
l-address machine. Another point to note is that the 2-address
machine will execute this statement more rapidly than the others
on an expeneive CPU in which g(y)=O and only requires 6y-x more
bits to encode than the modified stack machine.
It becomes interesting then to Bee what happens when more
than one intermediate resul$ is generated in the course of executing

-8an assignment statement. The example that we use is of an
assignment statement written so as to be impervious to the complexit~
reducing manipulations of an optimizing compiler (5). Because of
its speeial form, it tends to penalize' thit l-address architecture
(there are no sequences such as A*B*G*D that would favor the
machine) in favor of other a~hitecturss.
Table 5:

Statement A - «B*G' + D*E) * (P*G + H*J»

Stack

l-Address

2-Address

Addr(A)
X+V
Value(B) x+v
Value (C) x+v
IIUltirlY x
Value D) x+v
Value E) x+v
Multiply x
AdO.
x
Value (I') x+v
:lalue(G) x+v
MliltiflY x
Value H) x+v
Value J) x+v
MUltiply x
Add
x
Store
x
Pop
x
Space: 18x+9v
Time: 18t(x)+9f(v)

Load B
x+v
Mpy C
x+v
Store Tl
Load D
x+v
Mpy E
x+v
Add Tl
x+w
Store Tl x+w
Load I'
x+v
Mpy G
x+v
Store T2 x+w
Load H
x+v
Mpy J
x+v
Add T2
x+w
Mpy n
x+w
Store A x+v
Space: 15x+9V~6w
Time: 15t(x)+,f(v)+6g(w)

Load B in Rl x+v+y
Mpy C by Rl
x+v+Y
Load D in R2 x+v+Y
Mpy E by R2
x+v+y
Add R2 to Rl x +2y
Load F in R2 x+v+Y
~~ G by R2
x+v+Y
Load H in R3 x+v+Y
Mpy J by R3
x+v+Y
Add R3 to R2 x +2y
Mpy R2 by Rl x +2y
Store Rl in A X+V+Y
Space: 12x+9V+15Y
Time: 12t(x)+9f(v)+15g(y)

x..-

On the assumption that w-y, the 2-address machine requires 9y-3x
more bits to encode the Table 5 algorithm, and takes 9g(Y)-3t(x)
microseconds more time to execute than does the I-address veesion.

Por a very inexpensive CPU, in which g(y) ) t(x)/3, the l-address
machine is slightly faster; but, for all other situations, the 2address architecture is uniformly superior in speed of execution.

The space estimate only slightly favors the stack architeoture for
this algorithm.
As a final example, we can consider the statement in Table 6
that calls for only one intermediate value, but "favors" the 1-

address machine by providing more than t~ minimum of operations
to force an intermediate valus in the object code:

-9Table 6:

Statement A - B*C*D + E*P

Stack

I-Address

2-Address

Addr(A)
Value(B)
Value(C)
Multhly
Value(D)

Load B
Mpy C
Mpy D
Store Tl
Load E
Mpy F
Add Tl

Load B in HI x+v+y
Mpy C by HI
x+v+y
Mpy D by Rl
x+v+y
Load E in H2 x+v+y
Mpy F by R2
x+v+y
Add H2 to HI x +2y
Store H2 in A x+v+y
Space: 7x+6v+8y
Time: 7t(x)+6f(v)+8g(y)

x+v
x+v
x+v
x
x+v
MUltirl~ x
Value E) x+v
Value F) x+v
Multiply x
Add
x
Stcre
x
Pop

x+v
x+v
x+v
x+w
x+v
x+v
x+w

Store A x+v

Space: 8x+6v+2w
Time: 8t(x)+6f(v}~2g(w)

x

Space: 12x+6v
Time: 12t(x)+6f(v)
Here again, if we allow ~ (a limiting case), then the
I-address machine requires 6y-x leBs bits of space than the 2-

address machine, an improvement in space usage over the Table 4
example, and the execution time comparisons improve slightly for

the case of the inexpensive CPU.

-10-

Conclusions

This paper has demoaatrated that, for all but the most
expensive CPU's, a CDC-style 3-address architecture is least
efficient for executing simple assignment statements, both in
terms of bits needed'to encode the algorithms and times of
execution. A 2-address machine compares most advantageously in
a CPU having an operand cache-i. e., high-speed hardware for prefetching operands. With such hardware, the 2-address machine
is uniformly as fast or faster than the competition. For a
simpler CPU, or one with an instruction ~ache, however, the 1address architecture will outperform.its rivals in speed of
execution, and will on the average require fewer bits to encode
its assignment statements. The stack machine, even in a modified
version that performs"-l-address stores from the stack into main
memory, offers no advantages, either in execution time or in bits
required to encode algorithms.

-11References
Gear, William C., Computer

~nizat1on

McGraw-Hill Book Co., New YOrK;
2.

1969.

and Programming,

---

Hager, K., "Die BewertWlg einiger Rechnerkemtypen fuer
daB Verarbei ten von arithmetischen Ausdruecken,"

Elektronische Rechenanlagen 13, 6 (Dec., 1971), Pp. 241-

249.

3.

4.
5.
6.

-

-

Knuth, D. E., "An Empirical Study of FORTRAN Programs,"
Stanford University, Computer Science Department Report
No. CS-186.
Lindssy, C. H., "Making the Hardware Suit ths Language,"
in ALGOL 68 ~ementation (~. E. L. Peck, ed.), Amsterdam:
North-HolIanaPUblishing Co., 1971.
Schnsider, Victor B., ~On the Number of Regieters Needed to
Evaluate Arithmetic Expressions," ill 11 (1971), 84-93.
Wade, Bradford W., A General-~ose High-Level Lan~ge
Machine for Minicomputers, Do~ral DISBertation, PU~ue
UniversitY; August, 1975.

