I. INTRODUCTION
Complexity theory has addressed a number of architectural questions in the past: the power of an additional tape or head in Turing machines, the power of multidimensional versus one-dimensional tapes, the power of twoway versus one-way input, the power of queues and stacks versus tapes (Aanderaa, 1979; Li and Vitanyi, 1988; Maass, 1987; Paul, 1982 Paul, , 1984 . In this note we consider more realistic machine models than Turing machines. More specifically, we show that an additional index register or the ability to modify its program strictly increases the power of Random Access Machines.
We now give the precise statements of the results. The proofs are contained in the next section.
A RAM is characterized by three parameters r, k, and n and is denoted M,(k, n). It consists of a CPU with r registers R,, . . . . R, and a memory D [O, . . . . 2" -l] of 2" words. Each register and memory cell can hold a value in [0, . . . . 2" -11. Out of the r registers the first k can be used as index register. Table 1 gives the instructions that are available to access the memory, that is, to load data from and store data into the memory. Besides these load and store instructions we allow an arbitrary number of instructions affecting only the processing unit, i.e., data and index registers and program counter. (Note that we do not allow instructions which access the memory and simultaneously change the CPU in a complex way, e.g., Ri + Ri+ DCR,]. That is a limitation of our.proof technique which we address in the final section.) Note. 0 <j < k, 1~ i < r. Ri is the ith register, D is the memory, and PC is the program counter. c is an integer. R, is a shorthand for 0.
We use the unit cost measure to compute the cost of the computation. In an Aiken machine, the program is fixed throughout execution; i.e., program and data are stored in different memories and LOADS and STORES only affect the data memory. In von Neumann machines there is no such restriction; more precisely, we assume that the instructions have numerical codes and that the additional instruction THEOREM 2. For all r there is a P(r) > 0 such that for all c and k G r and any Aiken program A for M,(k, *) the following holds: for all sufficiently large n there is a k-block transfer problem of size N = c. n for which A on the Aiken machine M,(k, n) requires time at least (k + 1 + p(r)). N.
Theorems 1 and 2 together separate Aiken and von Neumannn machines and also Aiken machines with different numbers of index registers. Let k and r k k + 3 be fixed. Choose c such that 3/c < p(r). In this situation Theorem 2 implies for all large n the existence of a k-block transfer problem of size c .n which takes at least (k-t 1 + P(r)). N on Aiken machine M,(k, n). However, by Theorem 1, on the Aiken machine M,(k + 1, n) this problem can be solved in time O(k) + (k + 1 + 2/c). N and on von Neumann machine M2( 1, n) this problem can be solved in time O(c .k) + (k + 1 +3/c). N, which in both cases is strictly less for n sufficiently large.
II. PROOFS
Proof of Theorem 1. (a) The program is quite straightforward. We use k + 1 index registers R, , . . . . Rk + 1 and two additional registers R, _ 1 and R,. We step through the k + 1 blocks in lock-step fashion and use the ith index register to index the (i-l)th block, 1 < i< k + 1. In each iteration we first load register R,-, from block B, and then store the register in all other blocks; i.e., each .iteration takes k + 1 steps. Register r is loaded with N initially and is decremented by c after c iterations. It is also tested for zero after c iterations. Thus c words are transferred in c(k + 1) + 2 time units. The details are as follows: Proof of Theorem 2. We first give an informal outline of the proof. Clearly, each word D [ bO + j], 0 < j < N, has to be loaded once and stored k times for a total of (k + 1) . N time steps. We call these steps standard. In order to count non-standard steps we divide the computation into intervals of length D := 1 + r(2k + 1) and assume that the majority of them do not contain a non-standard step; in the other case, the lower bound follows immediately. We then identify for each such interval a cell D[b, + j] such that this cell is loaded in the interval, its contents are not stored in the CPU at the end of the interval, and yet they were not stored in all the k other blocks by the end of the interval. Thus the value D[b, +j] must be reestablished in the CPU at some later time and this is necessarily done by a non-standard step. In this way, we associate a non-standard step with every interval and obtain a lower bound. The details follow. An interval has type I if it contains at least one non-standard instruction and it has type II otherwise. We now distinguish cases according to the type of the majority of the intervals. Let i, 0 < id k, be such that no register R,, . . . . Rk contains a valuev with v+mE[biS..bi+N-l] for some m,O<m<M, at the beginning of interval I.
We show first that Z contains no standard step involving Bi. Assume otherwise. Then there must be a minimal t E Z such that after step t some R,, 1 < I< k, contains a value v such that Also, since R, was used as an index register in the tth instruction and hence u+m '-lE[bh,...,bh+N-l] for some m', 0 <m' < M, and some h #i, this is a contradiction to our choice of the b's. In either case we have derived a contradiction and hence there cannot be any standard step involving Bi. We now turn to the existence of index j. For this matter we distinguish the cases i = 0 and i # 0.
Assume first that i = 0. Since all instructions in Z are standard and none of them involves B,, all instructions in Z are (standard) store instructions. We say that a register Rj, 1 < j d k, is used as an index register at step t E Z if an instruction Inst *, j, * is executed at step t. In this case the value of Rj lies in [b,-M, . . . . b,+ N] for some 1, 1 < 1 <k, before and after the tth instruction, and hence not in B,. Thus if Z contains an instruction STORE j, *, * or STORE & INC j, *, *, then Rj cannot be used as an index register in Z and hence the value of Rj does not change during Z. Thus Z can contain at most r . k < D instructions, a contradiction.
Assume next that i>O. Let m be the number of LOAD instructions in I. As in case i= 0 we can argue that there are at most (r + m) . k STORE instructions in Z. Since m + (r + m) . k > D we conclude m 2 r + 1. Then one of the m words loaded from B, cannot be contained in the CPU at the end of I. The index of this word is the desired j. i
For each but the last interval Z of type II let j(Z) be an index which satisfies the properties of Lemma 1, and let t(Z) > max Z be a step such that B$" is contained in a register of the CPU after step t(Z), but not before step t(Z). The step t(Z) clearly exists, since not all standard stores of B$(" have taken place by the end of interval Z and since no register of the CPU contains the value B$" at the end of interval Z. Also the step t(Z) must be a non-standard step (by an argument similar to the proof of Lemma 1) and there can be at most r distinct intervals of type II which have the same value f(Z), since the CPU has only r registers. (It is tempting to claim that there can be at most one interval. This is fallacious, however, since we allowed instructions which change all registers of the CPU and hence it is possible that r values appear at the same step.) Thus there are at least (rp/21-1)/r non-standard steps. For p > 2, this is > [p/41/r and hence T>(k+ 1 +(k+ 1)/(4Dr)).N>(k+ 1+ 1/(8r2)).N.
With p(r) = 1/(8r2) this completes the proof of Theorem 2.
III. DISCUSSION
A preliminary version of this paper was presented at ICALP 89 (Mehlhorn and Paul, 1989) . In that paper only the cases of one and two registers were dealt with. Even for that case the proof given was considerably more complex than the present proof, and hence the present paper may be considered progress. However, we have to admit that our result is still quite weak since our proof makes essential use of the fact that a LOAD or STORE affects the CPU in a very limited way. In particular, our proof technique cannot handle the situation where a LOAD may affect the entire CPU, i.e., the content of the CPU after a LOAD is a function of the previous content and the word loaded. In this general situation the problem remains open. RECEIVED April 17, 1989; FINAL MANUSCRIPT RECEIVED December 18, 1990 
