Indirect addressing and the time relationships of some models of sequential computation  by Dymond, Patrick W.
Camp. & Maths with Apple.. Vol. 5. pp. 193-209 
Pergamon Press Ltd.. 1979. Pnnted m Great Bntam 
INDIRECT ADDRESSING AND THE TIME RELATIONSHIPS 
OF SOME MODELS OF SEQUENTIAL COMPUTATION 
PATRICK W. DYMOND 
Department of Computer Science, University of Toronto, Toronto, Canada 
Communicated byAllan Borodin 
(Received January 1979) 
Abstract-We study the time relationships between several models of computation (variants of counter 
machines, Turing machines, and random access machines). It is shown that counter machines augmented by 
a “copy” instruction can be simulated in linear time by counter machines without such an instruction, and 
that these counter machines can be simulated by RAM’S with speedup by a fixed polynomial. Since the 
difference between augmented counter machines and RAM’s lies partly in the latter’s indirect addressing 
capabilities, we obtain bounds on the extent o which these capabilities speed up computations. We also 
show that unit-cost RAM’s can simulate multi-dimensional Turing machines with speedup using their 
addressing capabilities to efficiently implement multidimensional arrays. Evidence is presented to show that 
on a restricted class of RAM’s, “successor” RAM’s, efficient implementation f multi-dimensional arrays is 
not possible. 
I. INTRODUCTION 
Turing [ l] introduced the idea of a mathematical model of computation, aprecise description of 
an abstract computing machine. Many other models, all capable of performing precisely the 
same tasks, have since been proposed, and we find them interesting because they sometimes 
seem to require different amounts of some resource (such as time or storage) to accomplish the 
same objective. If the difficulty of a task varies from model to model we can ask what features 
of the models make the task easier or more difficult; and we can also ask about the complexity 
of such a task on a real computer. 
The fact that many different models all are capable of computing exactly the same class of 
functions is taken as evidence in support of Church’s thesis[2]-that this class of functions 
consists exactly of those functions which are “effectively computable”. But from the viewpoint 
of computational complexity, different models may vary in resources required to accomplish 
identical tasks; and we have not yet found a general analogue to Church’s thesis in this area. 
Such an analogue might be a claim that the class of functions computed by a particular model A 
in any time bound T consists precisely of the functions which are “really computable in 
sequential (resp. parallel) time T”, and that therefore JX is the right model with which to study 
sequential (resp. parallel) time complexity. A more limited form of this analogue to Church’s 
thesis is the widely-held belief that the Turing machine (described in Section 2) computes in time 
bounded by a polynomial in the length of the input precisely those functions “really sequen- 
tia!ly computable in polynomial time”. The fact that many other sequential models compute the 
same class of functions as the Turing machine in polynomial time is taken as evidence for this 
belief. 
We would like to further develop complexity analogues to Church’s thesis, analogues uch 
as: 
For any c 2 0, the class of functions computed by a unit-cost successor RAM (described in 
Section 3) in time O(n’) where n is the length of the input, is precisely that class of 
functions “‘really computable in sequential time O(n’)“-and thus the unit-cost successor 
RAM is a good model for studying sequential polynomial time complexity. 
Evidence for this statement may perhaps be found by studying the relationship between this 
and other models and by understanding more fully the model’s exact power. (On the other 
hand, the intuitive evidence for Church’s thesis is very persuasive because widely differing and 
extremely general approaches to the problem have all produced the same results; at this time 
we do not know what form that equally persuasive, vidence for analogues dealing with specific 
193 
194 P. W. DYMOND 
polynomial (sequential) time bounds should take. In fact, it could be conjectured that sequential 
time may be too unstable a notion for such specific analogues to exist.) 
In this paper we discuss the relative time complexities of several sequential models-Turing 
machines, counter machines and random access machines-and examine the power of indirect 
addressing in the RAM model. 
A machine model is a class of formal machines all of whose members have the same 
“permissible operations”, “storage predicates”, input-output conventions and storage structure. 
“Permissible operations” describe the ways in which storage is modified; “storage predicates” 
extract information from the contents of storage. The input-output conventions for the models 
discussed in this paper are as follows: 
-The input is a word w of unbounded but finite length n composed of symbols from a finite 
alphabet 1, and is viewed as being written from left to right on n successive squares of an 
input tape, with a special symbol CE I as a delimiter at each end. Initially the read-only 
input tape head is located at the first symbol of the input, and at each time instance during 
the course of the computation may be moved cne square right or left or left unmoved. (If 
the model has only one-way input, the head may only move right.) 
-The output is written onto successive squares of the output tape, which is infinitely long 
to the right; at any time during the course of a computation the machine may write an 
output symbol (some member of the finite output alphabet 0) on the current tape square 
and move the output tape head one square to the right. 
Machine models are described in terms of either programs or transition functions. When a 
transition function is used the model has a finite state control; and by a configuration of such a 
machine we mean the contents of its storage, the state and the input tape contents and head 
position. A reduced configuration consists of the state and symbol scanned by the input tape 
head and the results of applying the storage predicates to the contents of storage. (The 
predicates describe the aspects of the current configuration which affect the transition func- 
tion-for Turing machines they identify the symbols currently scanned by the work tape heads, 
for counter machines they identify the counters containing zeroes.) The transition function S 
maps a reduced configuration to a (possibly) new state, a subset of the model-specific 
permissible operations which may modify the contents of storage, and a set of I/O instructions. 
By-a step of such a machine we mean an application of S to the reduced configuration 
corresponding to the current total configuration. A computation is a sequence of configurations 
starting from the initial configuration, which may terminate when a halting state is reached or 
when the current reduced configuration is not in the domain of the transition function. 
When, instead of a transition function, a program is used in the description of a model, no 
finite state control is needed; instead explicit branching instructions to locations in the program 
serve this function. A program consists of a sequence of (possibly labelled) statements, each of 
which can be: 
-The instruction HALT. 
-One of the model-specific permissible operations on storage. 
-An input or output instruction. 
-The word IF, followed by a storage predicate, followed by a statement. 
-An instruction to branch to a specific instruction. 
A computation then consists of a sequence of statement executions. 
The measure of the time used by a machine depends on the model being considered-for 
some models it can just be the number of steps in the computation-but since other measures 
are sometimes useful, we associate with each model timing instructions from which the number 
of time units required for a given step may be calculated. 
We can view machines in several ways-as acceptors of subsets of I*, as generators of 
sequences, or as computers of partial functions from I* to O*. We describe the notion of time 
used in this paper in terms of the computation of functions; similar definitions can be framed 
for the other viewpoints. 
Let T be a function T: N-N and let M be a machine computing a partial function 
f: I* + O*. If for all words w E dom f, M computes f(w) using less than or equal to T(n) time 
units, where n = max (1 w(, If( w)l), we say M works in time T, or M computes f in time T. If 
there exists a constant c, such that for all n, T(n) I c * n, we say A4 computes f in linear time. If 
Indirect addressing and the time relationships of some models of sequential computation 195 
there exists a constant c such that M takes at most time c between each movement of its 
(one-way) input head, printing an output symbol between each movement of the input head, and 
halting within time c of the last movement of the input head, we say M works in real time. 
(Note: This definition differs from that of some writers in which real time is taken to mean 
T(n) = n. For Turing machines the two definitions are equivalent because of the well-known 
“speedup” theorem of [3]. For other models, where corresponding “speedup” theorems are not 
known, the more general definition of real time used in this paper may not be equivalent to the 
restricted definition; our definition is the more widely accepted one, particularly when referring 
to simulations.) 
Models can be studied both by examining their individual properties and by comparing them 
to other models. For these comparisons we use the concept of simulation. We say that machine 
M simulates machine M’ if they both compute the same partial function. We say a simulation is 
constructive if the computation of the simulating machine M can be viewed as a sequence of 
stages, corresponding to the steps of M’, the machine being simulated. Additionally, in a 
constructive simulation at any stage we can effectively determine the configuration of M’ given 
that of M. All of the simulations presented in this paper are constructive in this sense. 
Let T,, 7’, be functions T1: N + N, T2: N-P N. M simulates M’ in time T,(n) when, if M’ 
computes the partial function f in time T&z), M computes f in time TI(T2(n)). The simulation 
is called linear if there exists a c E N such that T,(n) s c - n for all n. M simulates M’ in real 
time if M takes time SC simulating each step of M’. We say a model J& simulates a model & 
in time T if, for every machine M2 E &, we can effectively find a machine Ml E A1 which 
simulates Mz in time T. 
In the final analysis, additional criteria will be used to evaluate a particular model besides its 
time relationships with other models. It is desirable that our models be chosen so that their 
operations and predicates correspond to, our notions of what can be done by a real computing 
agent (human or machine), and that the defined complexities of the operations relate con- 
sistently to the complexity of corresponding operations in the real world. But because our 
models possess an unbounded amount of storage, it is not readily apparent what the precise 
relationships between their architectures and those of real computers hould be. 
In this paper we restrict our attention to time relationships of several models. In Section 2, 
we discuss Turing machines and counter machines. Section 3 examines the random access 
machine modef. 
2. TURING MACHINES AND COUNTER MACHINES 
The Turing machine, first described in Turing[l], has been extensively studied. Many of the 
relevant theorems as well as programming examples and formal definitions for various Tm 
models may be found in Hopcroft and Ullman[4]. Informally, a k-Tm (k-tape Turing machine) 
consists of a finite state control, input and output tapes and k storage tapes, each infinitely long 
in both directions, divided into (initially blank) squares each of which may hold a single symbol 
from the finite worktape alphabet. At any time each of the k storage tape heads sees the symbol 
written on the current square and in one step may write a new symbol and be moved at most 
one square left or right. 
Because of the limited sequential storage access methods of the Turing machine, some 
theorems exist for them which have no anaologues for more powerful models (such as the 
RAM). For example, Hopcroft, Valiant and Paul[Sl have shown that for any function T(n) 2 n, 
any set which can be accepted by a Tm in time T log T can be accepted using space only O(T), 
and thus that space is a more powerful resource than time for Tm’s. But many aspects of the 
model remain to be studied; it is not even known if, for k 2 2, a k + l-tape Tm working in time 
T is more powerful than a k-tape Tm working in time T, except for real-time computations. 
Even for k = 1, nothing is known for T(n) L n*. 
Various “improvements” to the Turing machine have been suggested, among them: 
-The fast-rewind Tm, which, on each tape, has a fixed reset square to which the head may 
jump in a single step. 
-The multi-head Tm, on which multiple tape heads co-exist on each work tape. 
-The jump Tm, a multi-head Tm on which any head may jump across the tape to the 
position of any other head in one step. 
1% P. W. DYMOND 
-The two-dimensional Tm, which has storage planes instead of storage tapes; this model 
can be further generalized to k-dimensional storage spaces. 
Fischer and Rosenberg[6] have shown that any fast-rewind Tm can be simulated in real time 
by an ordinary multitape Tm (with more tapes); Fischer et af.[7] have shown the same for the 
multi-head Tm. Recently Savitch and Vitanyi[8] have shown that the jump-Tm can be simulated 
in linear time by an ordinary Tm (see Theorem 2.2). A multidimensional Tm of dimension d 
working in time T can be simulated by an ordinary Tm in time O( Tzeud) [9]. 
The counter machine [lo] is a simply described model which has been studied extensively by 
Fischer et al.[lll. For real-time language recognition problems, Fischer and Rosenberg[ 121 
have shown one-tape Turing machines to be more powerful than any counter machine. 
Informally, a k counter machine (k-CM) can be viewed as having a read-only input tape, a 
write-only output tape, a finite state control and a set of k counters. The only operations 
permitted on the counters are addition and subtraction of one; the only tests which can be made 
are tests to see which counters contain zero. A precise description of a k-CM consists of: 
-A finite set of states, Q. 
-A finite input alphabet, I.
-A set A C Q of accepting states. 
-An initial state 4 E Q. 
-A finite output alphabet 0. 
-A transition function 
where H is a set of input head movement instructions. H = {R, L, N} for a 2-way CM; 
H = {R, N} for a one-way CM. 
Initially all the counters contain zero and the input head is at the left of the input. The 
transition function maps the current state, input symbol being scanned and k boolean values 
representing the emptiness of the counters to a new state, possibly updating any of the counters 
by adding or subtracting one, possibly writing an output symbol and possibly moving the input 
head. Counter machines can simulate Turing machines, but the best known algorithms for doing 
so require an exponential amount of time in general. 
We now define an augmented counter machine, which extends the operations of the basic 
model. 
Like the counter machine, an augmented counter machine (ACM) consists of a finite 
collection of counters and a finite state control, but the permissible operations have been 
expanded to include: 
-Tests for equality between counters. 
-Copies of contents of one counter to another. 
Formally a k-ACM is defined like a k-CM except for the transition function 8. In a k-ACM 6 is: 
6: Q x I x (0, 1) (L2+k)‘2+Q x ({+l, -l,O}’ U [kl’) x (0 U (4)) x H 
where Q is a set of states, I is the input alphabet, [k] denotes the set {1,2,. . ., k), 0 is the 
output alphabet and H is a set of possible input head movements. The (k* + k)/2 boolean values 
represent he truth values of the k predicates “counter i contains zero” for i = 1,2,, . ., k and 
the (k’- k)/2 predicates “counter i is equal to counter j” for j = i, i + 1,. , ., k for each 
i = 1,2,. . ., k - 1. Each step can be either a “successor” step, in which counters are in- 
cremented by or decremented by one, or a “copying” step, in which each counter simul- 
taneously receives a value from any of the k counters. 
It is clear that a k-ACM can simulate a k-CM in real time. For the other direction we have: 
THEOREM 2.1. Given any k-ACM one can effectively find a k-CM which linearly simulates it. 
Proof. Given a k-ACM working in time T,(n), using the methods of Ill], it should be clear 
how to construct an equivalent k-ACM N which; (a), stores only non-negative integers in its 
counters, remembering signs in its finite state control; (b), alters at most one counter per step; 
Indirect addressing and the time relationships of some models of sequential computation 197 
(c), and works in time T(n) 5 k . T,(n). We now describe a k-CM M which linearly simulates 
N. Instead of storing the exact contents of N’s counters M will maintain a “linked list” of 
increments between successive counters of N rearranged in ascending order of contents. At 
every stage in the simulation the value VI of the first counter in M’s linked list will be equal to 
the smallest value in any of N’s counters; the value of the second counter in the list will be 
equal to the difference between V, and the second smallest value in N’s counters; and in 
general the value Vi of the ith counter in M’s list will be the difference between the i - lth and 
ith smallest values in N’s counters. The “linked list” is actually an ordering of M’s counters 
maintained in the finite state control. As well, associated with each position i in the list is a 
pointer Ci to the corresponding counter (i.e. the one containing the ith smallest value) of N, and 
these k pointers are also maintained in M’s finite state control. 
The reader may verify that given any storage configuration in the simulating CM, and the 
pointers remembered in the finite state control, the storage configuration of the simulated ACM 
can be uniquely reconstructed. 
Information about which counters of the ACM are equal is now easily obtained by M-two 
counters in N are equal if the second of the two counters in M’s list corresponding to them and 
all counters between these two in the list contain 0. Similarly, M “knows” which counters of N 
contain zero-those corresponding to the counters in its list which; (a), contain zero; and (b), 
are such that all their predecessors in the list also contain zero. Simulating addition of one to a 
single counter in N can be done as follows: if the corresponding counter in M is last in the 
linked list it is simply incremented by one; if not, it is incremented and its successor is 
decremented. (If the successor originally contains zero M must first rearrange the cor- 
respondences so that this is not the case-this can always be done by a shift of M’s finite state 
control.) A decrementing operation can be simulated similarly. To simulate a copy instruction 
of N, suppose the counter in M corresponding to the counter to be copied into is in the ith 
position of the linked list and that the counter in M corresponding to the counter to be copied 
from is in the jth position. To maintain its representation of N, M must achieve three 
things-remove a counter from the linked list at position i, insert a counter containing a zero 
immediately following counter j and adjust its finite state control. Delinking the counter at 
position i is easy if the counter in that position or its immediate successor contains a zero-a 
simple adjustment of the finite state control is all that is required. But if both of these counters 
are nonzero, one must be zeroed and the other must be set to the total of the previous contents. 
(If position i is the last position in the list, there is no successor and the counter is simply 
unloaded.) To choose which to zero and which to build up, M uses the following rule, 
suggested by Pippenger: the counter with the lower serial number is built up; the counter with 
the higher serial number is zeroed. (By serial number we mean the actual physical register 
number, not the position in the linked list.) M thus unloads.the contents V of the counter with 
the higher serial number of the pair into the other counter, taking V steps to do so. 
The general simulation of N by M thus consists of a sequence of stages corresponding to 
the steps of N. At each stage M handles input and output tapes as in the corresponding step of 
N. If N’s step is a “successor” step, M’s stage will consist of a single step to update the linked 
list of differences. If N’s step is a copy step, M’s stage will consist of a sequence of V steps, 
where V is the number in the counter to be zeroed. The reader may verify that a transition 
function for M can actually be effectively found given N’s transition function to achieve a 
correct simulation as above. 
To complete the proof we show that M simulates N in linear time. 
Claim. M works in time zz (k + 1). T(n). 
Proof of claim. A computation of T steps of N consists of some number s of successor 
steps and some number c of copy steps. The corresponding computation of M consists of s 
steps simulating N’s successor steps plus some number C steps used in the c stages imulating 
N’s copy steps. We will show C 5 ks + c. By an id of M we mean a list of the contents of each 
of M’s counters. We denote by idi the id of M immediately after stage j in the simulation. 
Define the potentiul P of an id as follows: 
P(id) = 5 i * Vi 
i=l 
198 P. W. DYM~ND 
where Vi denotes the value in the ith counter (i.e. the counter with serial number (i)), for that id. 
Intuitively, the potential of an id may be thought of as the maximum cost which could be 
incurred in simulating any sequence of nontrivial copies starting from that id. The potential has 
the following properties: 
-P(ido) = 0. 
-For all j, 0 I j I ‘J’, P(idi) 2 0. 
-If idi+l follows from idj by simulation of a successor operation then P(id,+,) I P(idj) + k. 
Proof. Simulation of a successor operation adds one to at most one counter. If this is 
counter k the new potential will be k greater than the old; otherwise, the increase will be 
less. 
-If idi+, follows from idi by simulation of a copy instruction of cost d then P(idj+l) I 
P(idj) - d + 1. 
Proof. If simulating the copy requires no unloading of a counter then idj+l = idj and d = 1. 
Otherwise, d units are unloaded from a counter and into a counter with a smaller serial number 
(except in the case of the last counter in M’s list, which may be merely unloaded) decreasing 
the potential by at least d. 
After T = s + c stages consisting of time s simulating successor steps and time C simulating 
copy steps 0 5 P(idT) s k * s -(C- c). So C 5 k * s + c. The time taken by M is therefore 
s+C=s+k*s+c=k.s+ts(k+I)t.Thiscompletestheproof. 
COROLLARY. A k-CM extended by the ability to determine inequality relationships{<, >} be- 
tween its counters can be simulated in linear time by a k-CM. 
Proof. In the simulation given above this information is always present in the simulating 
machine’s finite state control. 0 
As Pippenger (personal communication) has pointed out, the remark on page 281 of [ll] 
shows that a real-time simulation is impossible. 
A k-tape Turing machine can simulate a k-counter machine in real-time by making each tape 
head’s original position and then representing the contents of the CM’s counters by the 
distances of the heads from their starting positions. Alternatively, we could use a multi-head Tm 
(having a single tape with k heads) to do the same thing. An augmented counter machine could 
be real-time simulated by a multi-head TM in which heads could jump in one step to the 
location of any other head. Borodin (personal communication) observed that the recent result 
(proved independently of our Theorem 2.1) of Savitch and Vitanyi[l], which showed that 
multi-head Turing machines with head-to-head jumps can be simulated in linear time by 
multi-head Tm’s without such jumps, can be improved using the methods of our proof of 
Theorem 2.1. 
Definition. A k-J-Tm (k-Jump Turing machine) is a one-tape multi-head Tm which at any 
time instant can perform either a regular multi-head Tm move or a jump move, in which all of 
the k tape heads are redistributed to the locations of a non-empty subset of the currently 
scanned tape squares. 
Definition. A one-way infinite J-Tm is a J-Tm in which no head ever moves to the left of the 
starting position where they all begin. 
Using standard Turing machine techniques one can show that any k-J-Tm can be simulated 
in real-time by a one-way-intlnite k-l-Tm. 
THEOREM 2.2. Let J be a one-way-intinite k-J-Tm. J can be simulated in linear time by an 
ordinary multi-tape Tm with 4k + 1 single-head tape units. 
Proof. We will construct a machine M which simulates J in linear time using k tape-units 
each with 2 heads (which do not jump) and one single-head tape unit. The theorem then follows 
by the result of Leong and Seiferas[l3] which shows that the two-head tape units can be 
real-time simulated by four single-head units each. M uses its k 2-head tape units 
(TLJ,, . . ., TU,J to keep up to k tape intervals, those between each of J’s heads and the one 
Indirect addressing and the time relationships of some models of sequential computation 199 
between the leftmost head and the left end of J’s tape. On T&, its single head tape, M keeps 
the interval to the right of J’s rightmost head. When J’s heads coincide one or more of M’s 
tape units may be free, but no more than k 2-head units are ever required. M keeps the heads of 
its 2-head units at the ends of the interval being maintained and remembers in its finite state 
control (l), the order in which the intervals on its units would have to be arranged to get an 
exact copy of J’s tape; and (2), the correspondence between J’s heads and M’s units. 
This structure can be updated in one step to simulate an ordinary move of J, in which each 
of J’s heads may write a symbol and step left or right one square. If two of J’s heads which 
were coincident step apart, M allocates a 2-head tape unit to maintain the new interval and 
adjusts its finite state control to reflect the new correspondence between J’s heads and M’s 
tape units. If a head of J moves right towards its neighbour, M must extend the interval stored 
on one tape unit by one square at the right, and contract he interval stored on another by one 
square at the left; this can be done in one step because M has heads at both ends of each 
interval being stored. When J’s rightmost head moves right, M extends the interval stored on 
the corresponding tape unit by copying the symbol being scanned by its one-head tape unit and 
moves the head of the one-head tape unit one square to the right. Left moves by the rightmost 
head are handled symmetrically. 
The reader may satisfy himself that such details as J’s heads crossing over one another (by 
ordinary moves) can be handled correctly, and that for any tape configuration and ordinary 
move by .I, M can maintain the representation described by making one move. 
To describe the simulation of a jump move we first make a simplifying assumption about J: 
that no head is simultaneously a “jump” head and a “target” head. (In fact, this assumption 
doesn’t decrease the generality of the result, since by enlarging M’s finite state control we can 
always “renumber” heads in such a way that this property is maintained.) Using this assump- 
tion we can decompose a jump move by J into a sequence of k single jumps, in which exactly 
one head jumps to the location of another. 
To simulate a jump by one of J’s heads which is coincident with another head, M need only 
adjust its finite state control to reflect the new correspondence between J’s heads and the 
intervals on M’s tape units, since these intervals do not change. To simulate a jump from a 
position at which no head will remain, M must “coalesce” the intervals on two of its tape units, 
say TUi and TU;, into one interval. Say i > j (i.e. the physical tape unit with the lower serial 
number is TUj). M coalesces the intervals by copying, one square at a time, the contents of the 
squares of the interval being stored on TUi to TUj, extending the interval on 77Jj and 
contracting that of TUi. If TLIi is storing an interval of length d, in d steps M “empties” TIJi 
into TUj and deallocates TUi, holding it in reserve for the next time a new interval is created in 
the simulation. 
The simulation of t steps of J by M thus consists of t stages, each following from the one 
before by simulation of either an ordinary step or a jump step, maintaining the storage structure 
described above, handling input-output just as J does. By an id of M we mean a list of the 
serial numbers of M’s 2-head tape units with the length Vi of the interval being maintained on 
each. (If a tape unit TUj is not in use its interval ength Vj is 0.) ido represents the initial id, and 
idi represents M’s id after j stages of simulation (1 5 j I t). The potential of an id, P,.(id) is 
defined to be _. 
P(id) = 2 i * Via 
Claim 1. P(ido) =O. 
Claim 2. For all 1 5 j 5 t, P(idj) ~0. 
Claim 3. Say idi+, follows idj by simulation of an ordinary move. Then P(idj+l)s 
P(idj) + k*. 
Proof. This follows directly from the definition of potential and the facts that (l), no 
interval can be increased in length by more than 2; (2), the total of the lengths of all intervals 
increases by at most one in the simulation of an ordinary move. 
Claim 3 shows that the potential rises by at most kZ for each ordinary move simulated. 
Claim 4 shows that the potential falIs by at least the time taken to simulate a jump step-l for 
each jump step simulated. 
200 P. W. DYMOND 
Claim 4. Say idj+r follows idi by simulation of a jump step of cost d. Then P(idi+J-( 
P(id,)-d + 1. 
Proof. If simulating the jump does not require the “coalescing” of intervals then the 
potentials are the same and d = 1, as required. Otherwise, d squares are unloaded from tape 
units into tape units with lower serial numbers, decreasing the potential by at least d. 
Say in its computation of length t, J makes s ordinary moves and c jump moves. Then A4 
will have t stages and take s + C steps where C is the time spent simulating c jump moves. 
Consider the potential at stage t: 
0 I P(id,) s s - k* - C + c. 
s+C=s+s.k*+c=(k*+l)f. Cl 
3. RANDOM ACCESS MACHINES 
The random access machine (RAM), formalized by Cook[14], has been proposed as an 
alternative to Turing machines for the study of time complexity. Many algorithms have been 
described in terms of RAM programs or RAM-Algol (a high level language for RAMS developed 
in Reckhow [15]), partially because RAMS have many similarities to real computers, including 
registers, indirect addressing, assignments, arithmetic operations and branching instructions. 
Unlike real computers however, a RAM has an unbounded address space and each of its 
registers may hold an integer of any size. This has led to the definition of several types of RAM 
model, each differing from the others in instruction set or costing function. 
The three models studied in this paper are the U-Ram (unit-cost RAM), the L-Ram 
(logarithmic ost RAM) and the US-Ram (unit-cost successor RAM). Each model is defined as 
consisting of an h&rite sequence of registers x0, x1, x2,. . ., a read-only input tape and tinite 
input alphabet I, a one-way output tape and finite output alphabet 0, and a program, which is a 
finite sequence of instructions. The allowable instructions, their timings and effects for each of 
the three models are in Table 1. The last instruction must be a HALT instruction. 
Initially all registers contain zero and the input tape head is at the left end of the input. 
Instructions are executed in sequence starting from the Iirst, except as noted in Table 1. It can 
be seen that the U-Ram and L-Ram are distinguished by their timing functions. On the U-Ram 
unit time is charged for each instruction execution; whereas on the L-Ram the time taken to 
execute an instruction depends on the size of the contents of the registers accessed (as given by 
the function I(n) = max (1, [log I]nl]]}). 
Like the U-Ram, the US-Ram works under the unit cost criterion, but its instruction set has 
been reduced by eliminating eneral addition and subtraction of register contents (instructions 
5, 6). The only register arithmetic possible on the US-Ram is the use of successor functions 
(addition or subtraction of 1). 
Other “macro” instructions can be implemented using the basic ones given in Table 1. For 
example, tests for equality between registers can be performed on a US-Ram using a fixed 
number of “working registers” not otherwise involved in the computation, in the following way: 
An instruction such as 
“IF Xj = xk THEN WRITE U” 
can be simulated using indirect addressing (Fischer[l6]) by the following sequence of register 
transfers, after first checking to see that none of the registers XO, x1, Xj and xk contain the 
constants 0, 1, j or k. 
x0+-%, 
Xl+& 
xxi+ 1 
&k+O 
indirect addressing and the time relationships of some models of sequential computation 201 
Table I. RAM instructions, effects and execution timing 
Instructions (i,j ,k c N) 
U-Ram %$%m L-Ram 
1. xi + c (c an integer) 1 1 1 
2. xi+xj 1 1 '('j) 
3. xi + x 1 1 X. 
4. xx. + x; 
L(xjl+e(xXj) 
1 1 
xil+ xj+xk (U-Ram E L-Ram only) 1 
‘(‘i) +’ (xj) 
5. N.A. r(xj)+L(xk) 
6. xi + xj-xk (U-Ram $ L-Ram only) 1 N.A. k(xj)+e(xk) 
7. xi * xi+1 1 1 a(xi) 
8. x. + x. 1-l 1 1 ’ (‘il 
9. HiLT 1 1 1 
10. GO TO n (n c N+) 1 1 1 
11. INPUTL 1 1 1 
12. INPUTR 1 1 1 
13. WRITE o (u c 0) 1 1 1 
14. IF x. = 0 THEN 1 1 
IF x3 > 0 THEN 
‘(‘j) 
15. 1 1 L(xj) 
16. IF &PUT = (I THEN (o z I) 1 1 1 
Note: L(m)=~IAX(l,Tlog~lmlll) where IImll denotes absolute value of m. 
Effects of Instructions 
1. 
2. 
3. 
4. 
65: 
:: 
9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
xi is assigned integer c. 
xi is assigned the contents of x.. 
If the contents of xj are nonnegitive, xi is assigned the 
contents of the register whose address is found in x.; 
3 
otherwise, the machine halts. 
If the contents of xi are nonnegative, the register whose 
address is found in xi is assigned the contents of x.; 
otherwise the machine halts. 3 
xi is assigned the sum of the contents of x’ and xk. 
x. IS assigned the (signed) difference betw en the contents k 
Oi x' and xk. 
The sontents of Xi are incremented by one. 
The contents of Xi are decremented by one. 
Execution is stopped. 
The next instruction executed will be the n-th; if no such 
instruction exists, HALT. 
The input tape head is moved left. 
The input tape head is moved right. 
The symbol CT is written on the output tape and its head 
moved right. 
The next instruction, which may not be an “IF” instruction, 
is executed iff the contents of x.=0; otherwise it is skipped. 
The next instruction, which may n & t be an “IF” instruction, 
is executed iff the contents of xj are positive. 
The next instruction, which may not be an_“IF” instructjo?, is . ..__. . 
executed lff the input tape head 1s scanning a square wltn o on lr. 
IF xxi = 0 THEN WRITE u 
xxj + x0 
&,+X1 
where x0 and x1 are used as working registers. 
One way in which our RAMS differ from those of Reckhow [15] is in their input conventions. 
Under Reckhow’s definitions a U-Ram can read in an arbitrary integer n in cost 1, and an 
L-Ram can do the same in cost log n. On our models input always takes the form of strings of 
symbols over a finite input alphabet I, which can be read at most one symbol at a time. Thus for 
our U-Ram to input an integer n to a single register takes time about log n ; the same task can be 
done on a L-Ram in time O(lo$ n). (Input to L-Rams is discussed further in Section 4.) 
Reckhow [I51 and Cook proved that a small increase in the running time of a U-Ram or an 
L-Ram allows additional sets to be accepted; their result is tighter than the equivalent hierarchy 
theorem for Turing machines. 
A more recent theorem by Sudborough and Zalcberg[17] shows that there is no U-Ram 
version of the Hartmanis-Steams 131 speedup theorem for Turing machines. 
In Fischer[l61, it is shown that the US-Ram can simulate in real-time storage modification 
202 P. W. DYMOND 
machines, and conversely that these machines can simulate US-Rams in real-time. Storage 
modification machines were developed by Schanhage[l8] to model list processing com- 
putations. SchGnhage showed that these machines could real-time simulate multi-dimensional 
Turing machines and thus, US-Rams can simulate multi-dimensional Turing machines in real 
time. 
The U-Ram can trivially simulate the US-Ram in real time, by simply performing the same 
instructions. 
After t steps of a computation by a U-Ram M the largest number stored in any register is 
IC - 2’-’ where c is the largest constant appearing in M’s program, since the contents of the 
largest register may at most be doubled at each step once the value c is attained. An L-Ram 
simulating M by executing exactly the same sequence of instructions will thus incur costs 
O(log (c - 2’-‘)) = O(t) to execute the tth instruction. Thus the L-Ram can simulate the U-Ram 
in time O(T*). 
In a US-Ram M after t steps the largest number in any register is IC + t - 1, and so we can 
conclude that the L-Ram can simulate the US-Ram in time O(T log T). The following theorem 
appears in [16]. 
THEOREM 3.1. Let L be an L-Ram which works in time T(n). There exists a US-Ram M which 
linearly simulates L. 
Thus we have a ranking of RAM models: 
U-Ram + US-Ram + L-Ram 
where “+” denotes a linear simulation by the model on the left of the model on the right. In the 
other direction, the best simulations known take more than linear time: 
TlogT 
U-Ram 2 US-Ram - L-Ram 
t1 
T2 
where Jt2 LA, means that model J4, can simulate model A2 in time O(j). 
It remains an open problem to specify more closely (i.e. to within a linear factor) the 
inter-relationships of these three models, either by improving the nonlinear simulations or 
establishing that this cannot be done. 
Hopcroft et al. [5], using the fact that in a small number of steps a Turing machine can make 
only limited changes to its tapes, proved the following theorem which relates U-Rams to Tm’s: 
Let M be a k-Tm working in time t(n) 2 n log n, and let t be such that log (t(n)) can be 
computed on a U-Ram in time t(n)/log t(n). Then M can be simulated by some U-Ram in time 
O(t/log t). 
Thus, under very general conditions, Turing machine computations can be “sped up” by 
U-Rams. Two corollaries follow directly from the proof of the theorem. 
COROLLARY 1. Under the hypotheses of the theorem, M can be linearly simulated by some 
L-Ram. 
The proof of this corollary depends on the fact that in the course of the U-Ram simulation, 
no number larger than a fixed polynomial in t is ever computed. Thus under the logarithmic ost 
criteria each instruction costs O(log t) to execute. 
COROLLARY 2. The Tm cannot linearly simulate the U-Ram. 
Prooj. Assume the contrary. By the U-Ram time hierarchy theorem[lS] there exists a 
U-Ram M working in some constructible monotonic time bound T(n) L n log n accepting a set 
S; and furthermore no U-Ram working in any time T, such that iir~ (Tdn)lT(n)) = 0 accepts S. 
By assumption there exists a Tm M’ which accepts S and works in time CT. Applying the 
Indirect addressing and the time relationships of some models of sequential computation 203 
theorem we obtain a U-Ram MO which accepts S in time O(cT/log T); but this is a contradic- 
tion, since 
limo (ia) 
n-m T(n) =O- 
The techniques of the theorem suggest a method for speeding up counter machine 
putations by a U-Ram. As the next theorem shows, in this case the acceleration is 
polynomial factor depending on the number of counters. 
cl 
com- 
by a 
THEOREM 3.2. Let T be such that T(n) 2 n and (T(n)/n) “w*) is constructible on a U-Ram. Then 
for any k-CM C which works in time T we can effectively find a U-Ram M which simulates C 
and works in time 
Oh 1/(k+2) . T(k+W(k+*)) = o( T/(;)“(~+*))_ 
Proof. Assume C has m states and that C stores only non-negative integers in its k 
counters. M’s simulation of C will take the form of a sequence of stages, each stage simulating 
many consecutive steps of C. M begins by determining n and calculating (T(n)/n) 1’(k+2) = q. 
In q steps of C’s computation starting from any configuration, o counter can have its value 
altered by more than q (since at each step the value can change by at most 1). Moreover, my 
counter R whose initial value D is q or greater has a nonzero value throughout the q steps. Had 
the contents of R instead been some greater value u + i (i > 0) at the start, the same sequence 
of q transitions would be taken by C since in either case R would have nonzero contents 
throughout. In addition, if after q steps the final value in R is v + d for some ofset d from its 
original contents, then had R began instead with contents u + i its final contents would have 
been u + i + d (in either case the same sequence of increment-decrement operations would be 
applied). 
The fact that values zq can be treated alike in simulating q steps of C motivates the 
following procedure: M fist precomputes the outcomes of q steps of C starting from each of 
many possible configurations, toring the results in a table; and then can simulate blocks of q 
steps of C by simple table lookup operations, taking a small fixed number of steps for each 
lookup. 
More precisely, M’s simulation will consist of an initialization phase followed by a 
simulation phase in which T steps of C will be simulated in T/q stages. 
In the simulation phase M keeps in k registers x1, x2,. . ., xk the contents ul, u2,. . ., uk of the 
counters of C, and in two additional registers &+r, &+2 two integers which represent he 
counter machine’s tate s and position of the input tape head h. To simulate q steps of C, M 
forms a vector (s, h, fir, &,. . ., &) where fii = min (Vi, q). Using this vector as an index into a 
k +2-dimensional table M finds a resultant vector (s’, h’, dr, d2,. . ., dk) in which s’ and h’ are 
the resultant state and head positions and di is an integer giving the change in counter i’s value 
after the q steps. Using this resultant vector M updates x1,. . ., &+t. Thus in time O(k) M 
simulates q steps of C and the total time spent in the simulation phase is therefore O(k 9 (T/q)). 
In the precomputation phase M reads the input, storing it in n consecutive registers. For 
each possible k +Ztuple M first computes the resultant vector by simulating q steps of C 
directly then stores it in the k + 2 dimensional table. (The details of the “linear time” 
implementation of such a table are discussed later.) There are m x n x (q + 1)’ such tuples, each 
of which must be simulated for q steps, then added to the table, and so the time needed for the 
precomputation is n + O(m x n x qk - (q + k)) = O(nqk+‘). 
The total time taken in the entire simulation is therefore 
0 (k ++O(n. qk+‘)= 0 (k-(Tln;,tk+9)+0 (n * (;)‘“+‘“‘+*) 
C4MWA \ol 5. No 3-D = O(n l/(k+2)T(k+INk+2)) 0 
204 P. W. DYMOND 
COROLLARY 1. Under the hypotheses of Theorem 3.2, there exists an L-Ram simulating C in 
time 
O(n I/(&+2) . T(lr+lV(lr+2) log T). 
Proof. The largest number appearing in the U-Ram simulation is O(T*). Thus an L-Ram 
executing the same program incurs costs O(log T) at each step. 0 
COROLLARY 2. For any time bound ‘f, T(n) L n, there exist sets accepted by an L-Ram in time T 
which cannot be accepted by any CM working in time O(T). 
COROLLARY 3. For any time bound T, T(n) L n, there exist sets accepted by a US-Ram in time T 
which cannot be accepted by any CM working in time O(T). 
Proof. This follows from the previous corollary and the linear simulation of the L-Ram by 
the US-Ram. 0 
In fact, a stronger statement can be made. The speedup by a fixed polynomial shows that for 
any time bound T, T(n) 2 n, there exist sets which can be accepted in time T by an L-Ram, but 
which cannot be accepted by any CM working in time T log” T for any a E N. 
We can also apply the speedup technique to multi-dimensional Turing machines. For 
d, k 11 a d-D k-Tm D is a d-dimensional Turing machine in which k d-dimensional storage 
spaces replace the k (1 dimensional) storage tapes; at any step in its computation depending on 
its current state, the current input symbol and the symbols being scanned by its workspace 
heads, D may change state, possibly write a symbol on its output tape and move its input head 
one square, write new symbols on each of the k squares currently scanned by its k tape heads, 
and move these tape heads, each no more than one square positively or negatively along any of 
the d orthogonal dimensions. Initially all squares in the workspaces are blank. The transition 
function of such a machine is a map 
8: Q x I x Zk+ x (0 U (4) x {L, R, 01 x Sk x ((4) U ([dl x {+, -}Nk 
where Q is a finite set of states, I is the inpurtape alphabet, Z is the storage space alphabet, 0 
is the output alphabet, {L, R, 0) represent input tape head movement instructions, and [d] x 
{+, -} represents workspace head movement instructions along any of d dimensions. 
In the proof to be given it will clarify the exposition to make some simplifications to the d-D 
k-Tm which involve no significant loss of generality. By a simple d-D k-Tm we mean one in 
which the workspace heads never leave the subspaces defined by the origin and the positive 
directions in each of the d-dimensions and one which, in its first n moves, copies its input onto 
n successive squares of the hrst dimension of the first workspace never again moving its input 
tape head. It should be clear that given any d-D k-Tm D we can find a d-D k + I-Tm which 
linearly simulates D. 
THEOREM 3.3 Let D be a simple d-D k-Tm which works in time t, tbz) I n log n, and let t/log t 
be constructible on a U-Ram. Then there exists a U-Ram M which simulates D in time 
O(t/(log tpd). 
Proof. We will illustrate the construction for the case of a two symbol tape alphabet and 4 
states. Let c = 2 - k - 3’. 
A computation by D of t steps uses at most t tape squares in each workspace and is 
contained entirely in a d-dimensional hypercube of td tape squares for each workspace. 
Consider this hypercube divided into blocks, each a d-dimensional hypercube with side of 
length s = (log t/c)‘ld. Since each square in the block has one of the two workspace symbols 
written on it, the total number of possible different blocks is 2@“’ = t”‘. By interpreting the 
contents of the tape squares as a binary representation of an integer i in some standard way we 
can associate with each block its corresponding serial number i. (A serial number of 0 will 
correspond to a block all of whose squares are blank.) 
Indirect addressing and the time relationships of some models of sequential computation 205 
Now consider that in s steps of D only the k blocks in which the heads were initially 
present and possibly their k - (3d - 1) immediately adjacent blocks can be altered. We call this 
set of k . 3d blocks the block configuration. In fact, these k * 3d blocks, their head positions and 
initial state completely determine the next s steps of D’s computation. This, and the limited 
number of possible distinct blocks, suggest hat instead of simulating each step of D we could 
precompute the result of taking s steps from a given block configuration and then simply 
“lookup” this result whenever this configuration is encountered. If the lookup can be done 
quickly, i.e. in some constant number of steps, this procedure will lead to a computation of 
O(t/s) steps to simulate t steps of D. 
M’s simulation of D will therefore consist of three phases: an initialization phase, a 
precomputation phase and a simulation phase. 
The simulation phase 
After any multiple of s steps the machine is in some state 4’ and for each 1 5 i 5 k D’s 
workspace i has a head in a particular block bi = bid,,j2,, _, id where for I 5 I s d, jr describes the 
number of blocks between bi and the Ith axis (0 I j, 5 (t/s)). The neighbours of this block are 
the 3d - 1 blocks bi,j;,ji,,,.,jb as each j; ranges from jr - 1 to jr + 1. The total configuration of D is 
represented in the U-Ram M in this way: 
-x0 contains an integer between 1 and q representing c$, the current state. 
-For each workspace i M uses (t/s)d consecutive registers each containing the serial 
number of the corresponding block in M, arranged in increasing order of subscripts 
interpreted as a base (t/s) number. We leave it to the reader to satisfy himself that given 
this representation and the address of the register epresenting a block, the addresses of 
the registers representing the 3d - 1 neighbouring blocks may be easily obtained by adding 
and subtracting (precomputed) powers of t/s. 
-Also for each of the k workspaces, M uses 2 additional registers, one giving the address 
of the register representing the block which contains the head, the other specifying the 
position of the head within this block by an integer pi, 1 s pi I (log t/c). 
To update this representation to reflect s more steps of the Turing machine, M uses a 
precomputed 1+ k * (3d + 1) dimensional table in which the tuple u (consisting of (the integer 
representing) the current state and for each 1 5 i I k, the serial numbers of the block with the 
head in it and its 3d - 1 neighbours and pi the position of the head in its block) is mapped to a 
resultant uple which describes the changes which must be made to M’s representation to 
reflect s more steps of D (i.e. the new state and, for each of the k workspaces, the new serial 
numbers for the blocks and new head positions pi as well as instructions for updating the 
register pointing to the block containing the head if the head has moved to a neighbouring 
block). 
The table can be implemented so that access time is O(1 + k * (3d + 1)) and thus the entire 
time required to form the tuple, use it as an index into the table and perform the update is 
O(k . 3d). In this way the time required for the simulation phase will be O((t/s) - k * 3d) = O(t/s) 
since k and d are fixed. Since s = (log t/c)‘ld, the simulation phase requires O(t/(log t)“d) steps. 
The precomputation and initialization phases 
In the precomputation phase M must initialize the 1 + k(3d + 1) dimensional table described 
above. The number of entries to be made is the number of distinct 1 + k(3d + 1) tuples, which is 
q x (sd x (,“C)3d)n and to compute the resultant updating information requires a direct simula- 
tion of s steps of D. The time required to initialize such a table is linear in the number of entries 
so that the total time spent in precomputation is
p = O(q X (sd x t3d’c)lr x (s + k . 3d)) 
since k, d, q and 3 are fixed. 
206 P. W. DYMOND 
Having chosen c = 2 * k * 3d and s = (log ,/c)“~ 
p = 0 ( ‘log$J;“” x p) = 0 (log’+1 t . V/t)* 
In the initialization phase, the input must be read in and converted to block serial numbers o 
that the initial representation of D will be correct; s, t/s and the numbers (t/s)*, (t/~)~, . . ., (t/~)~ 
must be calculated. This can be done in time O(t/log t). 
The total time for all 3 phases of the simulation is thus dominated by the time for the 
simulation phase and the theorem follows. Cl 
As an immediate corollary of this theorem and the U-Ram hierarchy theorem we observe 
that for every time bound T, T(n) 1 n log n, there are sets which can be accepted by a U-Ram 
in time T but which cannot be accepted by any multi-dimensional Turing machine in time CT 
for any constant c. 
The proofs of Theorems 3.2 and 3.3 rely on the ability of the U-Ram to efficiently irnplement 
multi-dimensional rrays so that successive configurations of the computation can be obtained 
quickly. Using indirect addressing and then adding to compute offsets allows any entry of a 
(pre-existing) k-dimensional rray to be accessed in 2 - k steps of the U-Ram. The array can be 
viewed as a tree of k + 1 levels in which leaf nodes represent array entries, the first k levels 
correspond to the k dimensions of the array and the branching factor of all nodes at level i is 
equal to the extent bi of the corresponding dimension. In the U-Ram implementation of the 
tree, registers represent nodes and all sons of a node are stored in consecutive registers. Each 
father node holds the address of his first son; thus the address of any of his sons can be quickly 
k i k 
calculated. The size of the tree is 1 + Z II bi and the number of leaves is Il bh It can be shown 
i=lj-I j=l 
by induction that the number of internal (non-leaf) nodes is less than the number of leaf nodes 
if each dimension has extent 22. The time to set up this tree structure is then 0( h 6i), which is 
linear in the number of array entries. i=l 
Although we have thus far been unable to prove that the additional instructions of the 
U-Ram allow it to accept more sets than a US-Ram in a given time bound, Rackoff (personal 
communication) conjectured that the specific programming technique of directly implementing 
d-dimensional arrays (such that any entry could be accessed in d steps given d registers 
containing “subscripts”) is not possible on US-Rams. To make this idea precise we consider the 
case of a n x n 2-dimensional square array. By a solution to the 2-D array access problem of 
size n we mean a triple CR, f, C) where R is a US-Ram, f is an injective function f: [n]*+ N and 
C specifies the initial contents of R’s registers x2, x3, x4, x5,. . ., such that when started with two 
integers u. and u1 (15 uo: n, 15 u1 I n) placed in registers x0 and x1, R computes f(uo, u,) and 
stores its value in x3. We interpret his statement of the problem as follows: 
-For a given n, f is a one-one map from each possible pair of array subscripts to the 
address of the register containing the corresponding array element. 
-The US-Ram R can initially contain in its registers x2, x3,. . . any information depending 
on n but not depending on the specific input subscripts ul, u2. 
-R computes f(uo, u,) and stores it in x3. 
THEOREM 3.4. For every constant d, for all sufficiently large n, any solution (R, f, C) to the 2-D 
array access problem of size n is such that R requires more than d steps on some input pair 
(uo, 111). 
Note: It should be pointed out that while this theorem gives a lower bound on a specific 
programming technique in a very strong way (by allowing a different technique for every n) it 
says nothing about differences in the time-bounded acceptance powers of the U-Ram and 
US-Ram. 
Proof. Suppose the contrary. Then there exists a constant d such that for arbitrarily large n 
there is a solution (R, f, C) which requires Id steps on all input pairs (uo, vi). The computations 
of R on all n2 possible input pairs can be described by means of a rooted directed tree in which 
Indirect addressing and the time relationships of some models of sequential computation 207 
nodes represent he instructions of R’s program. Test instruction nodes have outdegree 2, all 
other nodes have outdegree 11. By assumption each branch through the tree from the root has 
length sd, and the total number of branches in the tree is therefore at most 2d. We conclude 
that some particular branch b must be taken by at least n2/2d of the possible input pairs. We 
now show that if n is sufficiently large no such branch b can exist. 
The sequence of instructions labelling the branch b have certain effects on the storage of R 
which may depend on the specific input pairs. We wish to analyze the total number of different 
possible values which could be placed in x3 by all the possible input pairs for which this branch 
is followed. To do this we will use the concept of a (potentially) altered register, that is a 
register which could have been altered by execution of the instructions on 6; and we will define 
the set V,(i) (0 I i 5 d) to be the set of all the values which could occur in registers which are 
potentially altered after i instructions have been executed. By convention, x0 and x1 are viewed 
as altered registers, and since they each may initially contain n different values, ( VA(O)1 =n. 
Claim. For all i, ( VA(i)1 _-Z n - 2’. That is the total number of possible values in the registers 
which could have been altered at most doubles for each instruction executed. 
Proof (by induction on i). The base step i = 0 follows from the definitions. For the 
induction step we show that execution of the i + Ith instruction can at most double the size of 
VA (i.e. ) VA(i + l)] I 2 * ) V,(i)l) by considering each possible type of instruction. 
Case 1. Input-output, testing and branching instructions do not alter registers and thus do 
not affect VA. 
Case 2. (i + lth instruction is of the form Xj t xk). If xk has been altered V,(i + 1) = V,(i). 
If not, 1 V,(i + I)[ 5 ) V,(i)1 + 1 since only the value in xk is added to the set V,(i). 
Case 3. (xi t xj + 1). If Xi has been altered, it could contain any of the values in VA(i). 
Adding 1 to reach of these values gives us at most IV,,(i)1 possible new values. If Xj has not yet 
been altered then at most one element is added to V,(i). 
Case 4. (Xi + Xj - 1). AS in case 3. 
Cuse 5. (xX, t Xi). If Xj has been altered, no new elements are added to V,(i); otherwise, 
one new value is added. 
Case 6. (xk +xX,). If Xj is unaltered this is equivalent o case 3; one new value may be 
added. If xj has been altered it can take on at most IV,(i)/ different values and the registers 
indexed by these values may all contain new values. Thus 1 V,.,(i + l)] 9 2 * ( V,,(i)]. 
This completes the proof of the claim. Thus after d steps over all the input pairs which 
cause this branch to be selected (there are at least n2/2d), no more than n * 2d different values 
could be placed in any of R’s registers, and in particular, in register x3. Observing that 
n . 2d < (n2/2d) wherever n > 22d shows that this branch cannot work correctly for n2/2d 
different input pairs, contradicting the assumption and completing the proof. 0 
The details of the proof of the theorem establish worst and average case lower bounds of 
fi(log n) steps for an access; this bound ignores the kind of tests used to select the branches. If
the instruction set of the US-Ram were extended by permitting comparisons of register 
contents (such as “IF xi > Xi THEN”) the above theorem would still hold. Rackoff has since 
observed that on the US-Ram for all n O(log n) steps (with no tests of any kind) are sufficient 
to solve the 2-D array access problem. Thus the above lower bound is optimal to within a 
constant. 
A 2D-US-Ram could be defined to be a RAM on which an unbounded plane of registers 
were available and program register eferences were doubly subscripted. On such a model, not 
only the 2-D array access problem but the analogous problem for higher dimensions could be 
solved quickly (i.e. in a number of steps depending only on the number of dimensions). 
4. CONCLUSIONS AND OPEN PROBLEMS 
Other types of RAM model have been proposed, varying from the ones presented here in 
instruction set or cost function. One significant change is the introduction of parallelism. For 
example, a model proposed by Goldschlager[19] has an instruction set similar to our L-Ram, 
but the timings of some instructions have been changed to reflect more closely the costs which 
would be incurred in a physical realization of the model using highly parallel circuitry. (For 
208 P. W. DYMOND 
example, the cost of an indirect assignment instruction Xi+Xxi would be /(xi) + log /(x,) in 
contrast o the L-Ram’s cost of /(xi) + /(xX$) Stockmeyer [20] has studied the relation between 
the U-Ram and the more powerful vector machine, in which the arithmetic operations of the 
U-Ram have been replaced by bit-wise boolean operations between registers and shift in- 
structions for individual registers. We have not here considered such parallel models, restricting 
our attention to sequential machines. 
In Aho, Hopcroft and Ullman[21] many algorithms are described in terms of RAM programs 
in which instructions can include the computation of products, the “mod” function, quotients 
and many other functions. Whether these algorithms could all be implemented on a U-Ram to 
work in the same time bound (to within a linear factor) is an interesting open problem. 
Kasai[22] has studied the relationship between the L-Ram and a similar model in which no 
indirect addressing is permitted, as well as showing that for on-line computations the T’ 
simulation of the L-Ram by the Tm (Reckhow [IS]) cannot be improved. 
It is interesting to note that while we have a linear simulation of Turing machines by 
L-Rams, this result applies only for Tm’s working in time f(n) L n log n. This suggests the 
possibility that L-Rams may not be able to linearly simulate linear time-bounded Turing 
machines. With an L-Ram can accept he set ~5, = {a”b”jn 11) in linear time (by using separate 
registers to hold each bit of a “counter” with lower order bits stored in lower numbered 
registers), we conjecture that there are sets (such as Lz = {xxRIx E (0, l}* and xR denotes the 
reversal of the string x}) which cannot be accepted by any L-Ram in linear time. What appears 
to make L2 more difficult to accept on the L-Ram is the fact that, from an information theoretic 
viewpoint to accept a word of length n belonging to L, in linear time only @log n) bits of 
information need be stored, while Lz may require n(n) bits to be stored for linear time 
acceptance. Pippenger has observed that the L-Ram can load k input bits into a register in time 
O(k’), and thus could store away a binary input of length n in cost O(n . (log n)“‘) by storing 
(log n)“* bits to a register. It remains an open problem to establish a nonlinear lower bound for 
this problem. 
It would also be of interest o develop tighter bounds for the simulation of the US-Ram by 
the L-Ram. Borodin has conjectured that given any initial L-Ram configuration, the number of 
different configurations which could be reached via any computation of cost q is 0(2q’V/“oEq’). In 
contrast o this, a US-Ram with appropriate initial contents can reach fl(2’) distinct configura- 
tions via computations of t steps. Thus a proof of Borodin’s conjecture might lead to a proof 
that the L-Ram cannot linearly simulate the US-Ram. 
Observing the various speedups of other models possible on the U-Ram, another open 
problem suggests itself: under what conditions can any uniform Bounded Activity Machine 
with polynomially limited accessibility (in the sense of Cook and Aanderaa[231) besimulated by 
a U-Ram (l), in linear time? or (2), with speedup? 
A US-Ram whose program contains no indirect addressing instructions is essentially an 
augmented counter machine, since both models perform the same kinds of tests and operations 
on storage. The sets accepted by polynomial time bounded counter machines are all members 
of DLOG = {A/A is a language accepted by a log space bounded Turing machine}. The sets 
accepted by polynomial time bounded US-Rams comprise exactly P = {A/A is a language 
accepted by a polynomial time bounded Turing machine}. It is a long-standing problem of 
computer science to determine the exact relationship between the sets P and DLOG. (It is 
generally believed that P properly contains DLOG.) The speedup of ACM’s by US-Rams gives 
us a setting in which we can at least conclude that “indirect addressing helps” in terms of time 
bounded computations. 
Acknowledgements-This research was done while I was a graduate student under the supervision of Allan Borodin. I’d 
like to thank Allan Borodin, Charles Rackoff, Nicholas Pippenger, Martin Tompa and Les Goldschlager for their help, and 
the Nation& Research Council of Canada for financial support. 
REFERENCES 
1. A. M. Turing, On computable numbers with an application to the entscheidung problem. Froc. London Math. Sot., 
Ser. 2.42. 230-265 (1936). Corrections Ibid. 43, M&546 (1937). 
2. A. Church, An unsolvable program of elementary number theory. Am. L Math. S&345-363 (1936). 
3. J. Hartmanis and R. E. Stearns, On the computational complexity of algorithms. Trans. Am. Math. Sot. 117, 285-306 
(1%5). 
4. 
5. 
6. 
7. 
8. 
9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
77 
I. E. Hopcroft and J. D. Ullman, Formal Languages and Their Relafion lo Artomafa. Addison-Wesley, Don Mills, 
Ontario (1%9). 
J. E. Hopcroft, L. Valiant and W. Paul, On time versus space and related problems. Proc. 16th IEEE Conference on 
Switching and Automata Theory, pp. 57-64 (1975). 
M. J. Fischer and A. L. Rosenberg, Limited random access Turing machines. Proc. 9th IEEE Conference on Switching 
and Automata Theory, pp. 356-367 (1968). 
P. C. Fischer, A. R. Meyer and A. L. Rosenberg, Real-time simulation of multi-head tape units. 1. Assoc. Camp. Mach. 
19,590-603 (1972). 
W. J. Savitch and P. M. B. Vitanyi, Linear time simulation of multi-head Turing machines with head-to-head jumps. 
Tech. Report IW79/77, Stichting Mathematisch Centrum, Amsterdam (1977). 
N. Pippenger and M. J. Fischer, Relations among complexity measures, IBM Research Report RC 6569 (1977). 
M. Minsky, Recursive unsolubility of Post’s problem of “tag” and other topics in the theory of Turing machines. 
Annals Math. 74(3), 437-455 (l%l). 
P. C. Fischer, A. R. Meyer and A. L. Rosenberg, Counter machines and counter languages. Math. Systems Theory 2, 
265-283 ( 1968). 
M. J. Fischer and A. L. Rosenberg, Real-time solutions of the origin crossing problem. Math. Systems Theory 2, 
284-2% (1%8). 
B. Leong and I. Seiferas, New real-time simulations of multi-head tape units. Proceedings 9th ACM Symposium on 
Theory of Computing, Boulder, Colorado, pp. 239-248 (1977). 
S. A. Cook, Linear time simulation of deterministic two-way pushdown automata. Tech. Report 22, Dept. of Computer 
Science, University of Toronto (1970). 
R. A. Reckhow, Diagonal theorems for random access machines. M.Sc. Thesis, Department of Computer Science, 
University of Toronto (1971). 
M. J. Fischer, Lecture notes for course 6853, No. 20. Department of Electrical Engineering and Computer Science, 
Massachusetts Institute of Technology (1975). 
I. H. Sudborough and A. Zalcberg, On families of languages defined by time-bounded random access machines. Proc. 
Conf. Mathematical Foundations of Computer Science, High Tatras, Czechoslovakia (1973). 
A. Schiinhage, Real-time simulation of multidimensional Turing machines by storage modification machines. Technical 
Memorandum 37, Project MAC, Massachusetts Institute of Technology (1973). 
L. Goldschlager, Synchronous parallel computation. Ph.D. Thesis, Department of Computer Science, University of 
Toronto (1977). 
L. J. Stockmeyer, Arithmetic versus boolean operations in idealized register machines. IBM Research Report RC 5954, 
T. J. Watson Research Fentre, Yorktown Heights, New York (1976). 
A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms. Addison-Wesley, Don 
Mills, Ontario (1974). 
L_. T. Kasai, Computational complexity of multi-tape Turing machines and random access machines. Proc. 1st IBM 
Symposium on Mathematical Foundations of Computer Science, IBM, Japan (1976). 
23. S. A. Cook and S. 0. Aanderaa, On the minimum computation time of functions. Trans. Am. Malh. Sot. 142,291-314 
(1969). 
Indirect addressing and the time relationships of some models of sequential computation 209 
