k Versus k+1 index registers and modifiable versus non-modifiable programs  by Mehlhorn, K. et al.
INFORMATION AND COMPUTATION 101, 123-129 (1992) 
k versus k+ 1 Index Registers and Modifiable 
versus Non-modifiable Programs* 
K. MEHLHORN, W. J. PAUL, AND C. UHRIG 
Fachbereich I4 Informatik, Universittit des Saarlandes, 
6600 Saarbriicken, Germany 
We compare Random Access Machines with k or k + 1 index registers and 
modifiable or non-modifiable programs and show for a simple problem of data 
transfer that the more prowerful versions are more efficient. cj 1992 Academic 
Press. Inc. 
I. INTRODUCTION 
Complexity theory has addressed a number of architectural questions in 
the past: the power of an additional tape or head in Turing machines, the 
power of multidimensional versus one-dimensional tapes, the power of two- 
way versus one-way input, the power of queues and stacks versus tapes 
(Aanderaa, 1979; Li and Vitanyi, 1988; Maass, 1987; Paul, 1982, 1984). In 
this note we consider more realistic machine models than Turing machines. 
More specifically, we show that an additional index register or the ability 
to modify its program strictly increases the power of Random Access 
Machines. 
We now give the precise statements of the results. The proofs are 
contained in the next section. 
A RAM is characterized by three parameters r, k, and n and is denoted 
M,(k, n). It consists of a CPU with r registers R,, . . . . R, and a memory 
D[O, . . . . 2” - l] of 2” words. Each register and memory cell can hold a 
value in [0, . . . . 2” - 11. Out of the r registers the first k can be used as index 
register. 
Table 1 gives the instructions that are available to access the memory, 
that is, to load data from and store data into the memory. Besides these 
load and store instructions we allow an arbitrary number of instructions 
affecting only the processing unit, i.e., data and index registers and 
program counter. (Note that we do not allow instructions which access the 
memory and simultaneously change the CPU in a complex way, e.g., 
Ri + Ri+ DCR,]. That is a limitation of our.proof technique which we 
address in the final section.) 
* This research was supported by the DFG, SFB 124, Projects B2 and D4. 
123 
0890~5401/92 $5.00 
Copyright 0 1992 by Academic Press, Inc. 
All rights ol reproduction in any form reserved. 
124 MEHLHORN, PAUL, AND UHRIG 
TABLE 1 
Load and Store Instructions 
Instruction Effect 
LOAD & INC i, .i. c R,tD[R,+c];R,tR,+I;PC+PC+l 
LOAD i, j. c R,+D[R,+c]:PC+PC+l 
STORE & INC i, j, c D[R,+c]+R,;R,+R,+l;PC+PC+l 
STORE i, j, c D[R,+c]+R,;PCtPC+l 
Note. 0 <j < k, 1~ i < r. Ri is the ith register, D is the memory, and PC is the program 
counter. c is an integer. R, is a shorthand for 0. 
We use the unit cost measure to compute the cost of the computation. 
In an Aiken machine, the program is fixed throughout execution; i.e., 
program and data are stored in different memories and LOADS and 
STORES only affect the data memory. In von Neumann machines there is 
no such restriction; more precisely, we assume that the instructions have 
numerical codes and that the additional instruction 
Effect 
LOADINSTR i, j P[i] + R, 
loads the instruction whose numerical code is contained in Rj into the ith 
cell of the program memory P. 
Programs for either machine type are written for fixed values of r and k, 
but must work for arbitrary values of n. 
We consider the following simple h-block transfer problem. 
Given integers N, bO, . . . . b, in O[O], . . . . D[h + l] such that b,-, + N< bi 
and b,+ N<2”, move the contents of D[b, +j] into D[br+ j] for 
0 < j < N and 1~ i < h. N is called the size of the problem. 
THEOREM 1. (a) Let c, k, and r be positive integers with k + 3 < r. Then 
there is an Aiken program A, for M,(k + 1, *) which for all n solves any 
k-block transfer problem of size N = c. n in time k + 3 + (k + 1 + 2/c) . N on 
the Aiken machine M,(k + 1, n). 
(b) Let c and k be positive integers. Then there are a constant d and 
a von Neumann program V, for &I,( 1, *) which for all n solves any k-block 
transfer problem of size N = c . n in time d. c . k + (k + 1 + 3/c) . N on the von 
Neumann machine Mz( 1, n). 
THEOREM 2. For all r there is a P(r) > 0 such that for all c and k G r and 
any Aiken program A for M,(k, *) the following holds: for all sufficiently 
INDEX REGISTERS AND PROGRAMS 125 
large n there is a k-block transfer problem of size N = c. n for which A on 
the Aiken machine M,(k, n) requires time at least (k + 1 + p(r)). N. 
Theorems 1 and 2 together separate Aiken and von Neumannn machines 
and also Aiken machines with different numbers of index registers. Let k 
and r k k + 3 be fixed. Choose c such that 3/c < p(r). In this situation 
Theorem 2 implies for all large n the existence of a k-block transfer 
problem of size c .n which takes at least (k-t 1 + P(r)). N on Aiken 
machine M,(k, n). However, by Theorem 1, on the Aiken machine 
M,(k + 1, n) this problem can be solved in time O(k) + (k + 1 + 2/c). N and 
on von Neumann machine M2( 1, n) this problem can be solved in time 
O(c .k) + (k + 1 +3/c). N, which in both cases is strictly less for n 
sufficiently large. 
II. PROOFS 
Proof of Theorem 1. (a) The program is quite straightforward. We use 
k + 1 index registers R, , . . . . Rk + 1 and two additional registers R, _ 1 and R,. 
We step through the k + 1 blocks in lock-step fashion and use the ith index 
register to index the (i- l)th block, 1 < i< k + 1. In each iteration we first 
load register R,-, from block B, and then store the register in all other 
blocks; i.e., each .iteration takes k + 1 steps. Register r is loaded with N 
initially and is decremented by c after c iterations. It is also tested for zero 
after c iterations. Thus c words are transferred in c(k + 1) + 2 time units. 
The details are as follows: 
( 1 )LOAD r, 0, 0) R,c D[O](D[O] = N) 
(2)LOAD l,O, 1 
(k + 2)LOAD k+l,O,k+l 1 
R,tD[i]forl<i<kt 
(k + 3)LOAD & INC r- 1, 1,o) 
(k t 4)STORE& INC r-1,2,0 
move one word 
(2k + 3)STORE & INC r-l,k+l,O) 
(2k + 4)LOAD & INC r- 1, 1,O 
move one word 
(k+2+(k+l)c)STbRE&INC r-l,k+l,O 
move one word 
(k+3+(k+l)c)DEC 
JNZ 
STOP 
r,c}R,+R,-c 
r,k+3}gotok+3ifR,>O 
The program above clearly runs in time k + 3 + (k + 1 + 2/c). N. 
126 MEHLHORN, PAUL, AND UHRIG 
(b) V,, uses a single index register R, and one additional register RZ. 
The main loop is very similar to the main loop of program A, of part (a). 
If R, has a value between 0 and N - 1, say x, then one word can be moved 
by 
LOAD 
STORE 
2, 1, h, 
2, 1, b, 
R, + D[b, + xl 
D[b, +x1 + R, 
STORE 2, 1, h-1 D[bk-l+x]+Rz 
STORE & INC 2,1, bk D[bk+x] t R,;x+x+ 1 
and c copies of this piece of code will move c consecutive words and also 
increment R, by c. There is a difficulty, however. The quantities b,, . . . . b, 
are not known when the program is written, but are only known when the 
program is started. The solution is, of course, that a von Neumann 
machine can use its ability to modify the program in order to generate the 
code listed above. Time d. c . k for some constant d certainly suffices to do 
so. Having moves c words, we subtract N( = D[O] ) from R, , test for zero, 
restore the old value of R,, and repeat. Thus c words can be transferred in 
c . (k + 1) + 3 times units. 1 
Proof of Theorem 2. We first give an informal outline of the proof. 
Clearly, each word D [ bO + j], 0 < j < N, has to be loaded once and stored 
k times for a total of (k + 1) . N time steps. We call these steps standard. In 
order to count non-standard steps we divide the computation into intervals 
of length D := 1 + r(2k + 1) and assume that the majority of them do not 
contain a non-standard step; in the other case, the lower bound follows 
immediately. We then identify for each such interval a cell D[b, + j] such 
that this cell is loaded in the interval, its contents are not stored in the 
CPU at the end of the interval, and yet they were not stored in all the k 
other blocks by the end of the interval. Thus the value D[b, +j] must be 
reestablished in the CPU at some later time and this is necessarily done by 
a non-standard step. In this way, we associate a non-standard step with 
every interval and obtain a lower bound. The details follow. 
Let A be any Aiken program for M,(k, *) which solves the k-block 
transfer problem. Let M be the maximal constant that appears in A; let 
b, > k + 1, bj > hip I + N + M for i = 1, . . . . k and b, + N < 2”. We choose the 
contents of D[b,], . . . . D[b,+ N- l] so that 
D[bo+j] 4 [bi-M...bi+N] for OQj<N-l,Odi<k, 
and 
D[b,+jl#D[bO+j’l for O<j<j’<N-1. 
INDEX REGISTERS AND PROGRAMS 127 
Let B’, be the initial contents of D[&,+j], let B,= (Bi, . . . . B,N-‘}, and 
let T be the computation time. 
A step t, 1 Q t < T, is called standard if the instruction executed at time t 
is a load from D[bo + j] for the first time in the computation for some 
j, 0 <j < N, or if it is the first store of the value B’, into D[b, + j] for some 
i, 1 < i d k, and j, 0 < j < A? There are clearly exactly (k + 1) . N standard 
steps in the computation. 
Our goal is to prove a lower bound on the number of non-standard 
steps. For this matter we divide the computation into intervals of length 
D := 1 + r. (2k + 1). Let interval Zj comprise steps (Z- 1) D + 1 to 
min(j. D, T), where 1 <Z < p := rT/Dl. Let di be the number of non- 
standard steps in Zj. Then 
T>(k+l).N+xd,. 
An interval has type I if it contains at least one non-standard instruction 
and it has type II otherwise. We now distinguish cases according to the 
type of the majority of the intervals. 
Case I. At least [p/2] intervals have type I. Then xi dj B p/2 and hence 
T>(k+l).N+T/2Da(k+l+(k+1)/(2D)).N 
B (k + 1 + 1/(4r)) .N. 
Case II. At least [p/2] intervals have type II. Then there are at least 
[p/2] - 1 intervals of type II which have length D (since only the last 
interval may have length CD). Let Z be any such interval. 
LEMMA 1. There is an index j, 0 < j < N - 1, such that: 
(1) the standard load of B’, is contained in Z 
(2) the standard store into D[b,+ j] is not contained in Zfor some i 
(3) no register of the CPU contains B’, at the end of the interval I, 
Proof Let i, 0 < id k, be such that no register R,, . . . . Rk contains a 
valuev with v+mE[biS..bi+N-l] for some m,O<m<M, at the 
beginning of interval I. 
We show first that Z contains no standard step involving Bi. Assume 
otherwise. Then there must be a minimal t E Z such that after step t some 
R,, 1 < I< k, contains a value v such that 
v+mE [bi...bi+N- l] forsome m,O<m<M. 
The tth step either loaded some B,P into R,, i.e., v = B& or increased R, by 
one, or did not change R,. The first alternative contradicts our choice of 
128 MEHLHORN, PAUL, AND UHRIG 
the B,P’s, and the third alternative contradicts the definition of t. In the 
second alternative we must have 
u + m - 1 $ [b,, . . . . bi+ N- l] 
for all m, 0 Q m d M by the definition of t, and hence 
v+M=b,. 
Also, since R, was used as an index register in the tth instruction and hence 
u+m’-lE[bh,...,bh+N-l] 
for some m’, 0 <m’ < M, and some h #i, this is a contradiction to our 
choice of the b’s. In either case we have derived a contradiction and hence 
there cannot be any standard step involving Bi. 
We now turn to the existence of index j. For this matter we distinguish 
the cases i = 0 and i # 0. 
Assume first that i = 0. Since all instructions in Z are standard and none 
of them involves B,, all instructions in Z are (standard) store instructions. 
We say that a register Rj, 1 < j d k, is used as an index register at step t E Z 
if an instruction Inst *, j, * is executed at step t. In this case the value of 
Rj lies in 
[b,- M, . . . . b,+ N] 
for some 1, 1 < 1 <k, before and after the tth instruction, and hence 
not in B,. Thus if Z contains an instruction STORE j, *, * or 
STORE & INC j, *, *, then Rj cannot be used as an index register in Z and 
hence the value of Rj does not change during Z. Thus Z can contain at most 
r . k < D instructions, a contradiction. 
Assume next that i>O. Let m be the number of LOAD instructions in 
I. As in case i= 0 we can argue that there are at most (r + m) . k STORE 
instructions in Z. Since m + (r + m) . k > D we conclude m 2 r + 1. Then one 
of the m words loaded from B, cannot be contained in the CPU at the end 
of I. The index of this word is the desired j. i 
For each but the last interval Z of type II let j(Z) be an index which 
satisfies the properties of Lemma 1, and let t(Z) > max Z be a step such that 
B$” is contained in a register of the CPU after step t(Z), but not before 
step t(Z). The step t(Z) clearly exists, since not all standard stores of B$(” 
have taken place by the end of interval Z and since no register of the CPU 
contains the value B$” at the end of interval Z. Also the step t(Z) must be 
a non-standard step (by an argument similar to the proof of Lemma 1) and 
there can be at most r distinct intervals of type II which have the same 
INDEX REGISTERS AND PROGRAMS 129 
value f(Z), since the CPU has only r registers. (It is tempting to claim that 
there can be at most one interval. This is fallacious, however, since we 
allowed instructions which change all registers of the CPU and hence it is 
possible that r values appear at the same step.) Thus there are at least 
(rp/21- 1)/r non-standard steps. For p > 2, this is > [p/41/r and hence 
T>(k+ 1 +(k+ 1)/(4Dr)).N>(k+ 1+ 1/(8r2)).N. 
With p(r) = 1/(8r2) this completes the proof of Theorem 2. 
III. DISCUSSION 
A preliminary version of this paper was presented at ICALP 89 
(Mehlhorn and Paul, 1989). In that paper only the cases of one and two 
registers were dealt with. Even for that case the proof given was 
considerably more complex than the present proof, and hence the present 
paper may be considered progress. However, we have to admit that our 
result is still quite weak since our proof makes essential use of the fact that 
a LOAD or STORE affects the CPU in a very limited way. In particular, 
our proof technique cannot handle the situation where a LOAD may affect 
the entire CPU, i.e., the content of the CPU after a LOAD is a function 
of the previous content and the word loaded. In this general situation the 
problem remains open. 
RECEIVED April 17, 1989; FINAL MANUSCRIPT RECEIVED December 18, 1990 
REFERENCES 
AANDERAA, S. 0. (1974), On k-tape versus (k - I)-tape real-time computation, in “Com- 
plexity of Computation,” pp. 75-96. 
Lr., M., AND VIT~NYI, P. M. B. (1988) Tape versus queue and stacks: The lower bounds, 
Inform. and Comput. 78, 5685. 
MAASS, W., SCHNITGER, G., AND SZEMEREDI, E. ((1987), Two tapes are better than one for 
off-line Turing machines, in “Proc. 19th ACM Symposium on Theory of Computing,” 
pp. 94100. 
MEHLHORN, K., AND PAUL, W. J. (1989), Two versus one index register and modifiable versus 
non-modifiable programs, in “Proc. 16th International Colloquium on Automata, 
Languages and Programming, LNCS 372,” pp. 603-609. 
PAUL, W. J. (1982) On-line simulation of k + 1 tapes by k tapes requires non-linear time, 
Inform. Contr. 53, 1-8. 
PAUL, W. J. (1980) On heads versus tapes, Theoret. Comput. Sci. 28, 1-12. 
Printed in Belgium 
Uilgever- Academic Press. Inc. 
Veranrwoordel~ke uttgever wmr BelgE: 
Hubert Van Mae/e 
Alrenasrraar 20. B-8310 Sin&Kruis 
