On similarity and duality of computation (I)  by Hong, Jia-Wei
INFORMATION AND CONTROL 62, 109--128 (1984) 
On Similarity and Duality 
of Computation (I) 
J IA-WEI HONG 
Peking Computing Institute, China 100044 
This paper describes what is the parallel time for sequential models, what is the 
sequential time for parallel models, and proves that the well-known computational 
models RAM, vector machines, Turing machines, uniform circuits, uniform 
aggregates, torage modification machines, hardware modification machines,..., are 
all similar in the sense that their parallel time, their sequential time, and their space 
complexities are polynomially related simultaneously. © 1984 Academic Press, Inc. 
0. THE DEVELOPMENT OF TURING'S THESIS 
Because of the stimulus of parallel computation and complexity theory, 
people have proposed many computational models. For example, Mul- 
titape TM, RAM (Aho, Hopcroft, and Ullman, 1974), vector machines 
(Pratt and Stockmeyer, 1978), multitape multihead multidimensional TM 
(Aho et al., 1974), storage modification machines (Schonhage, 1979), 
hardware modification machines (Dymond and Cook, 1980), uniform cir- 
cuits (Borodin, 1977), uniform aggregates (Dymond and Cook, 1980), 
VLSI (Brent and Kung, 1980; Thompson, 1979), and so on. Each model 
has several different resources. Even for one model, there are many dif- 
ferent computational modes (e.g., deterministic, nondeterministic, alter- 
nating ..... Therefore one can ask if the complexity for a given class of 
problems is a model-independent objective reality, how to unify them, and 
SO on.  
In recent years, people have begun to notice relations between different 
models. Cook and Reckhow (1973) and many others noticed that not only 
different computational models can simulate ach other in principle, but 
also their sequential time complexities are polynomially related, hence 
strengthened Turing's thesis. Pratt and Stockmeyer (1978) gave the first 
example of the polynomial equivalence of parallel time and sequential 
space. Chandra and Stockmeyer (1976) and Goldschlager (1978) proposed 
independently a parallel computation thesis which says that for parallel 
computational models, parallel time corresponds to space. Borodin (1977) 
pointed out that deterministic Turing machine time corresponds to uniform 
109 
0019-9958/84 $3.00 
Copyright © 1984 by Academic Press, Inc. 
All rights of reproduction in any form reserved. 
110 JIA-WEI HONG 
circuit size, while deterministic Turing machine space corresponds to 
uniform circuit depth. He also pointed out that in fact the equivalence of 
Turing machine time and uniform circuit size goes back to works of 
Schnorr, Piggenger, and Fischer. Pippenger (1979) proved that Turing 
machine time and space correspond to uniform circuit size and width 
simultaneously, while Turing machine time and reversal correspond to 
uniform circuit size and depth simultaneously. Dymond (1980) proved that 
Turing machine reversal and space correspond to uniform circuit depth 
and width simultaneously. He also proposed an extended parallel com- 
putation thesis that Turing machine reversal and space correspond to 
parallel time and hardware. Later, Dymond and Cook (1980) proposed 
uniform aggregates model, and proved this thesis for it. 
In this paper, the author proposes the similarity and duality principle. 
The similarity principle says that not only all idealized computational 
models are equivalent, but also simultaneously they use essentially the 
same parallel time, essentially the same space and essentially the same 
sequential time. By "essentially the same" we mean polynomially related. 
The duality principle says that parallel time is dual to space in some sense 
(It will be farther discussed in the author's paper "On Similarity and 
Duality of Computation II"). Therefore these two principles are further 
developments on this line. 
Especially, a generalized concept of reversal, or parallel time, is defined 
so that some very important sequential models such as RAM and SMM, 
can be compared with parallel models. Now we can say that a class of 
problems has a fast parallel algorithm if and only if it has a RAM program 
with low reversal. Hence the problem of transforming a sequential program 
to a parallel program is solved in principle. 
1. LOG-SPACE TRANSFORM MACHINE LSTM 
As a tool, we propose the following computational model: 
DEFINITION 1.1. A transform machine is a multitape Turing machine 
with a special state q. It has a read only input tape, k work tapes, and a 
write only output ape. The input alphabet and the output alphabet are the 
same. It works like an ordinary multitape Turing machine. But when the 
machine enters the special state q, it removes all original contents on the 
input tape, transfers the whole contents of the output tape to the input 
tape, changes the contents of all work tapes and output tape to blanks 
automatically, moves all its tape heads to their original places, and goes 
into the starting state qo- The machine halts when it goes into a ter- 
SIMILARITY AND DUALITY OF COMPUTATION 111 
minating state. Then the contents on the output tape are considered to be 
the result of the computation. 
The "phases" are the total number of times that the machine goes into q 
or a terminating state during the computation. The width is the maximum 
total length of the input and output tape contents during the computation. 
DEFINITION 1.2. Let n be the original input length, and r, s be two non- 
decreasing functions. If the phases and width of the machine are bounded 
by r(n) and s(n), we say that it is an (r, s) transform machine. Furthermore, 
if the maximum length of the k work tapes is bounded by c log s(n), where 
c is a constant, then we say that the machine is a log-space (r, s) transform 
machine, or simply an LSTM. 
Notice that we may choose s large enough so that ~t transform machine 
becomes an (r, s)-LSTM. 
LEMMA 1.1. Suppose that r, s are nondecreasing functions. Let 
sl(x) = x(s(x) + 1) be a function, ~ be the input output alphabet of an (r, s) 
LSTM L, wi~ Z* be a word, $ be a separater, and L(wi) be the result of L 
when input is wi. Then there is an (r, Sl) LSTM, which transforms the input 
w15 w2$'"$wj  to the output L(wl)$L(w2)$" '$L(ws) ,  where j is an 
arbitrary integer. 
Proof Suppose that L transforms wi into wi~ at the vth time L enters its 
special state. The phases that L transforms w~ into L(w~) is ~<r (Iwil). Then 
we can design an LSTM L1, which transforms w15 w2$'"$w s into 
w115w2~$'"$wj1 at the first time L1 enters its special state; into 
w125w225"" $ wj2 at the second time LI enters its special state, and so on. 
Therefore, the phases of L1 is not more than 
Max{r(lw~l)l i=  1, 2 ..... j} <~r(m) where m= Iw lSw2$ ' . -$wj l  
is the input length of L1. The width of LI is not more than 
J J 
s(Iw~l)+j<~ ~ s(m)+j<~j (s(m)+ 1)<~m(s(m)+ 1)~<sl(m). 
i=1 i=1 
The total work tape length is not more than 
Max{c log s(Iwil)li= 1, 2,..., j} ~< c log s(m). 
Hence the lemma is proved. | 
DEFINITION 1.3. A functionf(n) is log-space constructable, if there is an 
LSTM of phases 1 and width n +f(n) which, given 1 n as input, obtains 
112 JIA-WEI HONG 
1 f{n), where 1" and 1 f(n) are  the unary representations of n andf(n),  respec- 
tively. 
DEFINITION 1.4 A family of codings {CnICn6Z*, n=l ,  2,...} is 
log-space constructable, if there is an LSTM of phases 1 and width n +[Cn[ 
which, given 1 n as input, obtains Cn. 
For convenience, we use f *  to represent a polynomial o f f  For example, 
f *  ~>g means that there exists a polynomial p(x) such that p(f(n))>~g(n) 
holds for all n. 
DEFINITION 1.5. Let n be the total length of the input. A function 
F: ~*× Y~*~ ~* belongs to NC ~, if there is a ((log n)*, n*) LSTM, which 
can compute this function. 
DEFINITION 1.6. A pair of functions (r, s) is nice if the following con- 
ditions are all satisfied: 
(1) r and s are nondecreasing, log-space constructable functions. 
(2) r(n) = O(log n), s(n) = 12(log n). 
(3) r<~2S*,s~2 r'. 
(4) (rs)* = f2(n). 
(5) r*>~sor s*>>.r. 
LEMMA 1.2. I f  (r, s) is a nice pair of functions, then for any two 
polynomials Pl,P2 50  with nonnegative integer coefficients, (pl(r), p2(s)) is 
a nice pair. 
Proof The above five conditions are all obviously satisfied by 
(pl(r),p2(s)). | 
2. THE MAIN RESULT 
Suppose that there are two computational models, Model 1 and Model 
2, for which we have defined the parallel time and space complexitie s . We 
say a machine is an (r, s) machine, if the parallel time and the space o'f the 
machine is not more than r(n) and s(n), respectively, where n i~put  
length, and r, s are two functions. 
i Remark .  NC is already defined in another way, i.e., the function class which can be com- 
puted by a multitape TM within (log n)* reversal and n* space. We shall see that these two 
definitions are equivalent. 
SIMILARITY AND DUAL ITY  OF COMPUTATION 113 
Without loss of generality, we can assume that different models use the 
same input output alphabet {0, 1 }. 
DEFINITION 2.1. If for any nice pair of functions (r, s), for any (r, s) 
machine M1 e Model 1, there is an (r*, s*) machine M2e Model 2 such 
that for all inputs w of M1, M2(w)= M~(w) holds, then we say Model 2 
can simulate Model 1 simultaneously within polynomials for parallel time 
and space, or simply simulate simultaneously. If they can simulate each 
other simultaneously, we say they are similar. 
Of course, we have to give precise definitions of parallel time, space, and 
sequential time for each individual model. But the following principles 
should be kept in mind: 
(1) The space (width) is the maximum length of intermediate infor- 
mation in the computation. 
(2) The parallel time (reversal) is the total number of phases. A 
phase is a period of the computation during which no information is writ- 
ten on the work space and is read later (during the same period). 
(3) The sequential time is the total number of primitive operations 
during the computation. 
These are not mathematical definitions, but some basic conceptions in 
computer science. The main result of this paper is the following theorem: 
THEOREM 2.1. The following computational models are similar: 
(1) LSTM (phases, width) (Sect. 1) 
(2) Multi-index RAM (reversal, space) (Sect. 3) 
(3) Vector machines (time, space) (Sect. 5 and Pratt and 
Stockmeyer, 1978) 
(4) Multi-tape TM (reversal, space) (Sect. 5 and Aho. et al., 1974) 
(5) Multi-tape multi-head multi-dimensional TM (reversal, space) 
(Sect. 7 and Aho et al., 1974) 
(6) Storage modification machines (reversal, space) (Sect. 7 and 
Hong, 1984; Schonhage, 1979) 
(7) Hardware modification machines (time, space) (Sect. 7 and 
Dymond and Cook, 1980) 
(8) Uniform circuits (depth, width) (Sect. 6 and Borodin, 1977) 
(9) Uniform aggregates (time, space) (Sect. 6 and Dymond and 
Cook, 1980) 
(10) Uniform arithmetic ircuits (depth, width) (Sect. 6) 
114 JIA-WEI HONG 
(11) Uniform VLSI (time, area) (Sect. 6 and Brent and Kung, 1980; 
Thompson, 1979) 
For each model, under a suitable definition, the sequential time is 
polynomially related to the product of parallel time and space. Therefore, 
assume the parallel time and space of machine M1 E Model 1 are bounded 
by r(n) and s(n), respectively, where (r, s) is a nice pair, then there is an 
(r*, s*) machine MzE Model 2 to simulate M~. Hence the sequential time 
of M2 is not more than (r's*)* <<. r's* <~ t't* <<. t*, where t is the sequential 
time of M~. In view of this fact, we propose the following principle: 
SIMILARITY PRINCIPLE. All idealized computational models are similar in 
the sense that their parallel time, their space, and their sequential time com- 
plexities are polynomially related simultaneously. 
The similarity principle is not a mathematical proposition, because no 
one can define what is an idealized model, neither can one define what is 
parallel time, space, and sequential time in general. Therefore, we cannot 
prove the similarity principle. However we do show that it characterizes the 
most important and interesting class of computational models. 
The main purpose of this paper is to prove Theorem 2.1. According to 
Definition 1.6, for a pair of nice functions (r, s), there are two basic cases: 
r*~>s or s*>jr. In case of r*>~s, we can use traditional sequential 
simulation methods. Suppose the bounding functions for parallel time, 
space, and sequential time of the machine being simulated are r, s, t, 
respectively, the parallel time, space, and sequential time of the simulator 
are R, S, T, respectively. The traditional simulation methods give S<<.s* 
and T<<.t* simultaneously. (For example, consult Aho et al., 1974, 
Chap. 1.) Since t <~ r's* and R ~< T, we have 
R <~ T<<. t* <~ (r's*)* <~ r's* <<. r'r** <~ r*, 
Therefore, Theorem 2.1 holds when r* >~s. In order to avoid lengthy dis- 
cussions, we assume s* ~> r from now on. 
By condition (rs)*~ n in Definition 1.6, we have 
n <~ (rs)* <~ r's* <~ s**s* <~ s*. 
Since we are only interested in polynomially related simulations, we simply 
assume s~> n. 
SIMILARITY AND DUALITY OF COMPUTATION 115 
3. MULTI-INDEX RAM 
A multi-index RAM has infinite registers Ro, R1, R2 ..... Each register can 
store a natural number. Among them, Ro, R1, ..., R~ are index registers, 
others are ordinary registers; k is a constant. 
An admissible instruction of the RAM has the form A *- F (B, C), where 
A is the write-address, B and C are read-addresses and the function F 
belongs to NC. An address has two forms 
(1) Direct address R~, i=  1, 2 ..... 
(2) Indirect address # Ri, i=0,  1, 2,...,k. If the content of index 
register Ri is j, then the address is Rj. 
According to this definition, transfer instructions and arithmetic 
operations (Addition, subtraction, multiplication, and integer division. It is 
proved that they are all in NC) are all admissible. For example, R 7 ~ # R o 
means that if the contents in index register Ro is i, then transfer the con- 
tents of R i to RT. #R1 +-R~ x R~ means that if the contents in index 
register R~ is i, then integer i2 should be put into Ri. It is important hat 
only the values of index registers can be used as indirect addresses. 
A RAM program is a finite directed graph. The vertices in the graph are 
called states. The fanout of each vertex is not more than 2. If the fanout is 
1, then the corresponding edge is labelled with an admissible instruction; If
the fanout is 2, then the corresponding two edges are labelled with two 
opposite predicates Ri = 0 and R~ ~ 0, where R~ is an index register; If the 
fanout is 0, then the state is a terminating state. The machine starts from 
the state qo. Whenever the machine goes into a state of fanout 1, it per- 
forms the corresponding admissible instruction. Whenever the machine 
goes into a state of fan-out 2, it takes that branch of which its predicate is 
satisfied. The contents in some designated registers are considered to be the 
results, when the machine nters a terminating state. 
We call the length of the binary expression of an integer the length of 
this integer. For example, the length of 0 and 1 is 1, the length of 2 and 3 is 
2, and so on. Assume that the inputs are l integers, which are initially in 
R~ + 1,.-., Rk +~. The sum of the length of these integers is the input length n. 
The Space cost of a register is the length of the maximum integer stored in 
this register during the computation. For a given input, the space is the 
sum S of the space cost of the ordinary registers being used during the 
computation. Since the index registers are used mainly for indirect 
addressing, it is natural to assume that the total sum of the space cost of 
the index registers is not more than c log S, where c is a constant. 
DEFINITION 3.1. For a given input, the reversal is the total number of 
116 JIA-WEI HONG 
phases. A phase is a period of the computation, during which no ordinary 
register is written into and is read later (in the same period). 
Notice that in our definition, index registers are not considered as work 
space. The change and reuse of their contents will not interrupt a phase. 
Why do we need them? Because in a RAM program, there are all together 
finitely many names of registers. Therefore, if we do not have some "free 
space," we must reuse a name and interrupt a phase within a constant 
number of steps. Hence we cannot compare the RAM with other models. 
Furthermore, this is not fair, since for other sequential models, it is very 
natural to have some device which is like the index registers and which is 
not counted as work space. For example, the positions of tape heads of 
multitape Turing machines are not counted as work space. The change and 
reuse of their values will not interrupt a phase. 
LEMMA 3.1. For any RAM program M1, there is a RAM program M 2 to 
simulate M~ such that for any input, if the reversal and space of M1 are R 
and S, respectively, then for the same input, the reversal and space of M2 are 
O(R) and O(S log R), and m 2 has the following property: whenever M2 
begins a new phase, M2 enters a special state, otherwise M2 does not enter 
this state. 
Proof M2 uses an index register I to record the phase number of M1. 
M2 uses the 2ith and 2 i+ l th  registers to simulate the /th register. 
Whenever M1 writes an integer into its ith register, m 2 writes the same 
integer into its 2ith register, and transfers the contents of I (the phase num- 
ber) to its 2 i+ lth register; whenever M1 reads an integer from its ith 
register, M2 checks if the contents of its 2i + lth register is the same as the 
contents of the index register/. If it is (the phase must be interrupted), then 
M2 enters a special state, adds 1 to /, and begins to simulate the next 
phase. Since the phase number is not more than R, the space cost to record 
the phase number is not more than log R. Hence the space of M 2 is 
O(S log R). The reversal is the same. The length of the contents in I is not 
more than log R = O(log S). | 
4. SIMULATING THE RAM BY THE LSTM 
In this section, we use the LSTM to simulate the RAM. Without loss of 
generality, we can assume that the RAM has the property described in 
Lemma 3.1. Suppose that for some input wl, w2,..., w~ (w; is the binary 
representation f an integer), the reversal and space of the RAM are R and 
S, respectively. We use a pair of integers (i, c(i)) to represent that the con- 
tents of register Ri is c(i). Hence, the instantaneous decription of the RAM 
is I=PQ, where the prefix P=q(O,c(O))(1, c(1)). . .(k,c(k)) contains 
SIMILARITY AND DUALITY OF COMPUTATION 117 
the state q and the contents of all index registers, the suffix 
Q= (k+ 1, c(k+ 1))' . .(u, c(u)) contains the contents of all ordinary 
registers used. It is easy to see that the total length [PI of P is not more 
than O(log S), the total length ]I[ of I is not more than O(S*). 
In the whole simulation, we try different S. We first consider the case 
when S is fixed. For a given S, and a given ID. 1 = PQ at the time when the 
RAM begins a new phase, we use an LSTM to compute the ID when the 
RAM completes this phase. This simulation can be divided into 4 parts. 
(1) Find all prefixes: The input of this LSTM is 
PQ, 1 s. (1) 
The output of this LSTM should be 
PQ, P~Q, PRQ ..... P,Q, l', (2) 
where P1, P2 ..... P, are all possible prefixes uch that the total length of the 
contents in all index registers is not more than c log S, where c is the con- 
stant mentioned in last section. Therefore t= O(S*), the total length of the 
output (2) is O(S*). 
This computation can be done by using several counters of length 
O(log S), if we output P1 Q, P2 Q ..... Pt Q in lexical order. Therefore, there is 
an LSTM of phases 1 and width O(S*) to complete the task. 
(2) Calculate next prefices: If the ID of the RAM is P~Q, then the 
next ID of the RAM is completely determined. Especially, the next prefix 
P; is completely determined. In fact, if the fanout number of the state q in 
Pi is 2, then we can determine the next state by looking for the contents of 
the corresponding index register to see if it is 0 (cf. Sect. 3); If the fan-out 
number is 1, then we should perform an admissible instruction which 
belongs to NC. Let P; be the next prefix of Pi, in which the state, and the 
contents of an index register may be changed. The contents of an ordinary 
register may be changed too. We use (N~: C~) to represent that the Nith 
ordinary register's contents should be changed to integer C~. Since the 
operation of an admissible instruction belongs to NC, there is a 
((logn)*,n*) LSTM, which can obtain the output form P i~ P;(N~: C~), 
given input P~Q, where n = 1P~Q1. 
According to Lemma 1.1, there is an LSTM, "Calculate next prefixes," 
which transforms the input (2) to output form: 
PI ~ e'l (Xl : C~ ), P2 ~ P'2(N: : C2),..., P, ~ P't(N,: Ct), PQ. 1 s. (3) 
The phases and width of this LSTM are (log S)* and S*, respectively. 
118 JIA-WEI HONG 
(3) Find phase path: In (3), P' (the next prefix of P) may be some Pi 
(1 <~ i ~ t) if S is large enough. Suppose that i = 5, P} may be P:~, and so on. 
We can find the "phase path" P--* Ps--* P3 ~ "",  step by step until the 
phase ends, if S is large enough. Since in one phase, the results calculated 
will not be used in the same phase, the information Pi--, P'i(Ng: C~) com- 
puted on the phase path are correct until the phase ends. Since the RAM 
has the property mentioned in Lemma 3.1, it is easy to find where the 
phase ends. It is easy to find an LSTM "find phase path," which can obtain 
(a) an error information "Err" if S is not large enough, or 
(b) the output form 
*!  t . t v* ! . t t t  t ! PI(N, C,), P2(N2. C2) ..... Pe(Ne: Ce), 1" (4) 
given (3) as input. Here P= P[, P2',..-, Pe' is the phase path, P" is the prefix 
when the phase ends, (N~: C~) is the "change information," which means 
that at the ith step, the contents of the N~th register should be changed to 
integer C~. The phases and width of the LSTM are 1 and S*, respectively. 
(4) Substitute. Finally, we should substitute all"change information" 
into the suffix Q. The contents of an ordinary register may be changed 
several times. When the phase ends, the contents of this register should be 
equal to the contents after the last change. 
Given input (4), for each ordinary register Rj used, we look for the name 
j from right to left in (4). If this name j appears, then substitute the new 
contents into Q, and output this result immediately. Therefore, there is an 
LSTM, "substitute," of phases 1 and width S*. Given input (4), the LSTM 
can obtain the output 
P"Q1, lS, (5) 
where P'e'Q1 is the ID when the phase ends. If the input is Err, then the 
output of the LSTM "substitute" is Err. 
Summarizing the above four steps, the following LSTM of (log S)* 
phases and S* width transforms (1) to (5) or Err: 
Simulate one phase: {Find all prefixes; Calculate next prefixes; 
Find-phase-path; Substitute;} 
To simulate the RAM, we take S-- 2 first. If an Err appears in a stage of 
the simulation (that means the S is not large enough), we double S and 
simulate again. The phases of the simulator are increased at most log S 
times. To simulate one phase of the RAM, we need O((log S)*) phases of 
the LSTM. Therefore the whole simulation eeds O(R(log S)*) phases and 
S* width. 
SIMILARITY AND DUALITY OF COMPUTATION 119 
Suppose that the LSTM "Initiate" transforms the input wlA"" A wt to 
output I 0, 1 s (where Io is the initial ID), that the LSTM "Halt" and 
"Error" check if the state is terminating and if there is an error information 
Err, respectively, and that the LSTM "Take result" transforms It, i s to the 
final result and halts, where I t is the terminating ID. Then the whole 
simulation can be expressed by the following algorithm: 
S~2;  
L1 : Initiate; 
L2: Simulate-one-phase; 
if Halt then Take-result else 
if Error then (S~ 2S; gotoL1) else goto L2; 
Combining the above discussion with Lemma 3.1, we have the following 
theorem. 
THEOREM 4.1. For any multi-index RAM of reversal R and space S, 
there is an LSTM, of phases O(R(logS)*) and width O(S*logR), to 
simulate it. 
5. VECTOR MACHINES 
A vector machine has k vectors. Each vector can store an ultimately con- 
stant sequence of bits (elements of {0, 1}). The length of a vector is the 
number of significant bits in it, that is, the length of the shortest initial 
segment of the sequence whose removal would make the remaining 
sequence constant. For these vectors, the machine can perform the follow- 
ing instructions or check the following predicates: 
(1) A ~ constant, an instruction to load a constant bit-vector into 
vector A; 
(2) A *- ~B and A ~ B A C, "bitwise parallel" Boolean operations; 
(3) A ~- B]" C, which shifts B left a distance given by C; negative dis- 
tance means a right shift; when shifting left the vacated positions are filled 
with O's, and when shifting right, the bits shifted out are discarded; the 
significant part in vector C is considered to be the binary expression of the 
distance; the constant part in vector C is considered to be the sign,... 00 
means positive (shifting left) .... 11 l means negative (shifting right); 
(4) A = 0 and A ¢ 0, predicates for testing whether A is 0 everywhere. 
A program (machine) of the VM is a finite directed graph with one start 
vertex. The vertices in the graph are called states. The fanout number of 
120 JIA-WEI HONG 
each vertex is not more than 2. If the fanout is 2, then the two 
corresponding edges are labelled with two opposite predicates; if the fanout 
is I, then the corresponding edge is labelled with an instruction; if the 
fanout is 0, then the vertex is called a terminating state. 
The input is a word w e {0, 1 }*. Initially, the machine is in the start 
state, and if w begins with a 1, then "" 000w else-', l l lw  is in a vector. 
The contents of other vectors are all 0s. Whenever the machine goes into a 
state of fanout 1, it should perform the corresponding instruction; 
whenever the machine goes into. a state of fanout 2, it should go to the next 
state along the direction where the predicate is true; when the machine 
goes into a state of fanout 0, it halts and the contents in some vectors are 
considered to be the results. The total steps of the machine are called the 
time. The sum of the maximum lengths of these k vectors during the com- 
putation is called the space. Here we state a theorem by Pratt and 
Stockmeyer (1978 ): 
PROPOSITION 5.1. For any Turing machine T of space S >t log n, there b 
a vector machine V to simulate Tsueh that the time of V & O($3), the space 
of V is O(c~), where c is a constant. 
THEOREM 5.1. For any LSTM of Phases P and width W, there is a vec- 
tor machine V to simulate it such that the time and space of the vector 
machine V are O(P log 3 W) and O(W*), respectively. 
Proof. Each time, the LSTM performs a space O(log W) transfor- 
mation where log W~> log n. By Proposition 5.1, there is a vector machine 
to simulate it. The time and space of this vector machine are O(log3W) and 
O(clOg w)= O(W*), respectively. Hence to simulate the whole computation 
of the LSTM, the vector machine needs O(Plog 3 W) time and O(W*) 
space. II 
In the following, we use multitape Turing machines to simulate vector 
machines. Naturally, we use k tapes to simulate k vectors of the VM. We 
will not use the right half part of each tape. From right to left, the ith 
square corresponds to the ith bit of the vector. To represent the contents of 
a vector, we put the binary expression of the contents of the vector on the 
corresponding tape. For example, if the contents of a vector 
is .-' II110101, then.- .  { [ [ J [ JU -0101 is put on the tape, if the contents 
of a vector is ..- 00000101, then ... [J U [I U + 101 is put on the tape. 
Except for these k tapes, we use three additional tapes A, B, C to 
generate the distances 2 d, d= 0, 1, 2 ..... in order to simulate the shifting 
instruction. Tape A (and B) has a mark at its 0th square. Initially, the 
heads of these two tapes scan their 0th squares. Moving the head of tape A 
left one square, we obtain the distances 0 and 1. In general, if the distances 
SIMILARITY AND DUALITY OF COMPUTATION 121 
of the heads of tape A and B are 2 a- 1 and 0, respectively, we can move the 
head of tape A right and move the head of tape B left to obtain distances 0
and 2 d. 
THEOREM 5.2. For any vector machine V of time R and space S, there is 
a multitape Turing machine T to simulate it such that the reversal and space 
of the Turing machine T are O(R log S) and O(S), respectively. 
Proof According to the representation mentioned above, to simulate 
the bitwise parallel operations of the vector machine is straightforward. We 
need only to consider how to simulate the shifting instruction V1 ~ V2 T V3. 
The Turing machine can copy the contents of tape 2 (corresponds to Vz) to 
tape C, and find the sign of the shifting distance on tape 3. The Turing 
machine uses tapes A and B to generate distances 2a for d = 0, I, 2,..., and 
meanwhile it looks at the dth square of tape 3. If the symbol in the dth 
square is 1 then it shifts the contents of tape C a distance 2d to the left or 
right (to do so, we need another work tape) until the contents of C are all 
blanks or the shifting instruction is accomplished. If the shifting is to the 
left (or right, respectively) the total number of reversal needed is not more 
then a constant times the logarithm of the length of the new contents of C 
(or old contents of C, respectively.) Hence the reversals to simulate a 
shifting instruction of the vector machine is O(log S). The total reversal 
and space of the Turing machine are O(R log S) and O(S), respectively. |
The Turing machine model is much simpler than the multi-index RAM 
model. It is very easy tosimulate TM by RAM. Therefore we have the 
following theorem. 
THEOREM 5.3. For any multitape Turing machine of reversal R and space 
S, there is a multi-index RAM to simulate it, such that the reversal and space 
of the RAM are O(R) and O(S), respectively. 
Proof Suppose that T has k tapes, then it can be simulated by a 2k 
tape Turing machine T~ such that the tapes of T1 are all infinite only to the 
right, Hence we can assume that T has this property itself. We use the 
registers R0, R 1 ..... R~ of the RAM as index registers: use Ri i=  l, 2,..., k, to 
record the positions of the ith tape head, use Ro to store some intermediate 
results. We use the i + k (j + 1)-th register to store the symbol in the jth 
square of the ith tape, i= 1, 2 ..... k, j= 0, 1, 2, 3 ..... The RAM can read the 
symbols scanned by the k tape heads of the TM through indirect read 
instructions, can simulate the finite control of the TM by testing some 
predicates, can modify the symbols canned by the k tape heads of the TM 
through indirect write instructions, can simulate the movements of the k 
122 JIA-WEI HONG 
tape heads by adding k to or subtracting k from the corresponding index 
registers. Since the change and reuse of the index registers will not interrupt 
a phase, the reversal of the RAM is O(R). The space of the RAM is 
O(S+n)--O(S), since we have assumed S>~n. | 
Summarizing Theorems 4.1, 5.1, 5.2, 5.3, we have proved the following 
theorem. 
THEOREM 5.4, 
(1) 
(2) 
(3) 
(4) 
The following computational models are similar: 
LSTM (phases, width); 
Multitape TM (reversal, space); 
Vector machines (time, space); 
Multi-index RAM (reversal, space); 
6. UNIFORM MODELS 
Suppose that n > 0 is an integer. A circuit for input length n is a labeled 
accyclic directed graph (V, E) satisfying: 
(1) The vertex set V is divided into d+ 1 levels: level 0, level 1 ..... 
level d. There are exactly n vertices in level 0. They are called the input 
gates of the circuit. The number d is called the depth. The maximum num- 
ber of vertices in level 1 through level d, is called the width of the circuit. 
(2) Each vertex in level 1 through level d is labeled with ~, v ,  or 
/~ (the Boolean operations). For any vertex v labeled with ~ in level l, 
l<~l~<d, there is a unique vertex u in level 1 -1  or level 0 such that 
(u, v)E E is an edge (the input line). For any vertex v labeled with v or 
/x at level l, 1 <~l<~d, there are exactly two vertices u 1 and u2 in level l -  1 
or level 0 such that (ul, v)~ E and (u2, v)e E (two input lines). 
(3) Suppose that there are N= I VI vertices in V. There is a one to 
one mapping ~0 from V to the set {1, 2,..., N} such that if v 1 is in level l~, v2 
is in level 12 and l~ <12, then q~(v~)< ~0(v2). The natural number (p(v) is 
called the name of the vertex v. The number N is called the size of the cir- 
cuit. 
The input of the circuit is a binary string of length n. Each vertex in V 
has a value: If the vertex named i is in level 0, then 1 <~ i ~< n, and its value 
is the ith bit of the input string. If the vertex is in level l, 1 ~< l ~< d, then its 
value is defined in the obvious way. The values at level d, according to the 
order of the names of the vertices, are considered to be the results of the 
computation. 
In order to describe the circuit, we use the following coding system. For 
SIMILARITY AND DUALITY OF COMPUTATION 123 
each noninput gate, we give an information segment (type, name, input 1, 
input 2,), where "name" is the name of the gate, "type" is the Boolean 
operation of the gate, input 1 and input 2 are the names of the gates its 
input lines are connected to. If the type is "~" ,  then there is only one 
input, we assume that input 1 and input 2 are the same. A coding of the 
circuit is a collection of information segments of all noninput gates. 
DEFINITION 6.1. A uniform circuit is a family of circuits {C,I 
n = 1, 2,... } satisfying 
(1) The circuit Cn has exactly n input gates and 
(2) The coding of Cn is log-space contructable. 
Similarly, we can define uniform arithmetic ircuits. To do so, we need 
only make the following changes: The value of each gate (including the 
input gate) is a natural number, the types of the gates are +, - ,  *, /, 
where / is integer division, "- is defined by 
a "-- b = if a ~> b then a - b else 0. 
The depth is defined as before; the width of each level is defined to be the 
sum of the lengths of the values (integers) of all gates in the level; the width 
of the circuit is the maximum width among level 1 through level d. 
For uniform circuits, we have Borodin's theorem. 2 
PROPOSITION 6.1. Suppose that s(n) is a log-space constructable function 
and that a multitape Turing machine works in space s(n). Then the Turing 
machine T can be simulated by a uniform circuit, whose depth and width are 
O((s(n) + log n) 2) and O(n3cS~n)), respectively, where c is a constant. 
THEOREM 6.1. Suppose that (r, s) is a nice pair, that s(n ) >~ n, and that L 
is an (r, s) LSTM. Then there exists a uniform circuit to simulate L. The 
depth and width of the uniform circuit are O(r*) and O(s* ), respectively. 
Proof Each time, the LSTM works in space O(logs(n)). By 
Proposition 6.1, there is a uniform circuit to simulate it. The depth and 
width of the circuit are bounded by O((log s(n) + log n) 2) = O(log 2 s(n)) 
and O(n3cl°gs(n))=O(s*), respectively. Therefore, to simulate the whole 
computation of the LSTM, the depth and width of the uniform circuits are 
bounded by O(r(n)log2s(n))= O(r*) and O(s*), respectively. |
2 In (Borodin, 1977), the theorem is stated and proved only for language acceptance, not 
transduction. The proof for transduction is implicit in the proof of Theorem 3 in op cir. 
643/62/2/3-3 
124 JIA-WEI HONG 
THEOREM 6.2. Suppose that (r, s) is a nice pair, and that {C,} is an (r, s) 
uniform circuit. Then there is a uniform arithmetic ircuit of depth O(r) and 
width O(s) to simulate it. 
Proof We think of the two logical values 0 and 1 as integers. Using 
1 -  x instead of ~ gates, using integer multiplication instead of ^ gates, 
using A and ~ gates instead of v gates, we obtain an equivalent 
uniform arithmetic ircuit. I 
Another uniform model is the uniform aggregate. An aggregate for input 
length n is a labeled directed graph (V, E) of fanin ~< 2 satisfying 
(1) There are n vertices whose fanin number is 0 (input gates). 
(2) Any other vertex (non-input gate) is labeled with -.~, v or A. 
(the Boolean operations). If it is labeled with ~, then its fanin number is 1. 
If it is labeled with v or ^ ,  then its fanin number is 2. 
(3) Suppose that there are N= I VI vertices in the graph. There is a 
one to one mapping q from V to the set { 1, 2 ..... N} such that the input 
gates are mapped on to {1, 2,..., n}. We call q(v) the name of v. The num- 
ber N is called the space of the aggregate. 
(4) There is a distinguashed non-input gate Vo ~ V. 
At any time t = 0, 1, 2,..., each gate has a value. At time t = 0, the value of 
the input gate named i is the ith input bit (0 or 1). The value of any non-in- 
put gate at time t = 0 is 0. The values of all gates at time t is determined by 
the values of all gates at time t -1  and the Boolean operations in the 
obvious way. The time to, at which the value of the distinguished gate Vo 
becomes 1 the first time, is called the time of the computation. The values 
of the gates named N-m,  N-m-  1,..., N at time to are considered to be 
the results of the computation. 
In the same way as for circuits, we use (type, name, input 1, input 2) as 
information segment for a non-input gate. The coding of an aggregate con- 
sists of the information segments for all non-input gates and a natural 
number m. 
DEFINITION 6.1. A uniform aggregate is a family 
{An I n = 1, 2, 3,... }, satisfying 
(1) The aggregate An has n input gates. 
(2) The coding of An is log-space constructable. 
of aggregates 
THEOREM 6.3. I f  (r, s) is a nice pair and s* >~ r, then an (r, s) uniform cir- 
cuit can be simulated by an (r*, s*) uniform aggregate. 
SIMILARITY AND DUALITY OF COMPUTATION 125 
Proof In fact an (r, s) uniform circuit is an (r, r. (s+n)) uniform 
aggregate by definition. Since s*/> r, we have r. (s + n) <~ 
r'(s+r*s*)<<.s*. | 
THEOREM 6.4. I f  (r, s) is a nice pair, then an (r, s) uniform aggregates 
can be simulated by an LSTM, whose phases and width are O(r) and 
O(s log s), respectively. 
Proof Because the codings of the uniform aggregate are log-space con- 
structable, the LSTM can obtain the codings in one phase, given input w. 
In another phase, the LSTM can obtain the initial instantaneous descrip- 
tion (ID, which is a list of values for all gates) of the aggregate (of length 
O(s log s)). The LSTM can obtain the ID at time t + 1 from the ID at time 
t in one phase. Therefore an LSTM of phases O(r) and width O(s log s) 
can simulate this aggregate. | 
THEOREM 6.5. If(r, s) is a niee pair, then an (r, s) uniform arithmetic ir- 
cuit can be simulated by a multi-index RAM of reversal r* and space s*. 
Proof In fact, there is an LSTM to generate the codings of the uniform 
arithmetic ircuit in one phase. Therefore, by Theorem 5.4, an (r*,s*) 
RAM can generate these codings of length O(rs'log(rs))=O(s*). The 
RAM can obtain the ID of the arithmetic ircuits at time t + 1 from the ID 
at time t in one reversal. | 
Summarizing Theorems 6.1 ,-~ 6.5, we have the following theorem. 
THEOREM 6.6. The following computation models are similar: 
(1) LSTM (phases, width); 
(2) Uniform circuits (depth, width); 
(3) Uniform arithmetic ircuits (depth, width); 
(4) Uniform aggregates (time, space). 
7. OTHER COMPUTATIONAL MODELS 
In this section, we discuss some other computational models briefly. 
(1) Multitape multihead multidimensional TM (MMMTM) (Aho et 
al., 1974). The reversal of an MMMTM is the total number of phases, and 
a phase is a period during which no work tape square is entered twice or 
more.  
(2) Storage modification machines (SMM). For detailed efinition of 
the model and its reversal complexity, the reader may consult (Hong, 1984; 
Schonhage, 1979). 
126 JIA-WEI HONG 
(3) Hardware modification machines (HMM) (Barzdin and Kalnin, 
1974; Dymond and Cook, 1980). In (Dymond and Cook, 1980), the 
maximum number of active cells during the computation is called the 
"hardware," which corresponds to the space. 
(4) Uniform VLSI (UVLSI). The VLSI model is defined in (Brent 
and Kung, 1980; Thompson, 1979). Roughly, a VLSI is a layout of an 
aggregate on a plane grid. A uniform VLSI is a family of VLSI 
{ Vnl n = 1, 2,... }, whose coding is log-space constructable. The "area" of the 
VLSI corresponds to the space. 
Based on the results of last two sections, it is not hard to prove the 
similarity between the models mentioned in this paper. We can draw a tree 
to represent their relations: 
RAM 
I 
LSTM 
/ \ 
VM UC 
/ \  / \  
TM HMM UA UAC 
/ \ / 
MMMTM SMM UVLSI 
Each model can be simulated simultaneously b models under it. In fact, 
the MMMTM model and the SMM model are generalizations of the mul- 
titape TM model, therefore they can simulate the multitape TM model 
simutaneously. It is not difficult to prove that the VM model and the UA 
model can be simulated simultaneously by the HMM and the UVLSI 
model, respectively. But the RAM and the LSTM are very strong models. 
It is not hard to see that every phase of each model on the leaves 
(MMMTM, SMM, HMM, UVLSI, UAC) can be simulated by a phase of 
a RAM or by several phases of an LSTM. Therefore all the models on the 
tree can simulate ach other simultaneously. We leave the proofs of the 
relations mentioned above as exercises to the readers. 
8. MACHINES WITH REFERENCE INPUT 
In a computational model, we consider a fixed machine. Suppose that 
the input is a binary sequence of length n. The parallel time (reversal) and 
space of the machine are bounded by r(n) and s(n), where (r, s) is a nice 
SIMILARITY AND DUALITY OF COMPUTATION 127 
pair of functions of n. In order to obtain more general results, we allow the 
machine to have an additional reference input of length <~m(n), where 
m(n) is a log-space constructable function <~r*s*. But the complexities of 
the machine are still measured as functions of n. 
DEFINITION 8.1. Suppose that (r, s) is a nice pair, m is a log-space con- 
structable function <~r*s*. An (r, s) reference machine (with respect o m) 
is a machine M satisfying: 
(1) its input is in form w#w' ,  where w, w'e{0,1}*,  Iwl=n, 
]w'l ~< m(n); 
(2) the parallel time and work space are bounded by r(n) and s(n), 
respectively; 
(3) the output is a binary string of length [M(w # w')l <~s(n). 
Obviously, an ordinary machine is a reference machine with respect o a 
0-function. 
DEFINITION 8.2. Suppose there are two computational models, Model 1 
and Model 2. If for any nice pair (r, s) and any log-space constructable 
function m <~ r's*, for any (r, s) reference machine M 1 with respect o m in 
any one of these two models, there is an (r*, s*) reference machine M2 with 
respect o m in the other model such that Ml(w # w') = Mz(W ~ w') holds 
for all w#w' ,  ]wl =n, rw'l <~m(n), then we say that these two reference 
models are similar. 
Since (r, s) is a nice pair, we have r's* >~n. In case of s*~> r, we have 
s*>~n+m. 
In this paper, all the proofs go well in case the machine has some 
reference input. We have the following theorem: 
THEOREM 8.1. All corresponding reference models in Theorem 2.1 are 
similar. 
RECEIVED December 5, 1983; ACCEPTED September 25, 1984 
REFERENCES 
AHO, A., HOPCROFT, J., AND ULLMAN, J. (1974), "The Design and Analysis of Computer 
Algorithms," Addison-Weslsy, Reading, Mass. 
BARZDIN, J. M., AND KALNIN, J. J. (1974), A universal automaton with variable structure, 
Automatic Control and Computting Sciences 6. 
BORODIN, A. (1977), On relating time and space to size and depth, SIAM J. Comput. 6, No. 4. 
128 JIA-WEI HONG 
BRENT, R. P., AND KUNG, H. T. (1980), The chip complexity of binary arithmetic, in "12th 
ACM Symposium on Theory of Computing," pp. 190-200. 
CHANDRA, A. K., AND STOCKMEYER, L. J. (1976), Alternation, in "17th Annu. IEEE Sympos. 
Found. Comput. Sci." pp. 98-108. 
CooK, S. A. (1980), "Towards a Complexity Theory of Synchronous Parallel Computation," 
presented at Internationales Symposium uber Logic and Algorithmik zu Ehren von 
Professor Ernst Specker, Zurich, Switzerland, February. 
COOK, S. A., AND RECKHOW, R. A. (t973), Time bounded random access machines, J. Com- 
put. System Sci. 7, No. 4, 354-375. 
COOK, S. A. (1973), A hierarchy for nondeterministic time complexity, J. Comput. System Sci. 
7, No. 4, 343-353. 
DYMOND, P., AND COOK, S. A. (1980), Hardware complexity and parallel computation, in
"21st Annu. IEEE Sympos. Found. Comput. Sci. 360-372. 
DYMOND, P. (1984), Ph.D. thesis, Department of Computer Science, University of Toronto. 
FORTUNE, S., AND WYLLLE, J. (1978), Parallelism in random access machines, "Proceedings, 
10th ACM Sympos. Theory of Comput." pp. 114-118. 
GOLDSCHLAGER, L. (1978), A unified approach to models of synchronous parallel machines, in
"Proceedings, 10th Ann. IEEE Sympos. Theory of Comput." San Diego, California, 
pp. 89-94. 
G~LL, J. T. (1974), Computational complexity of probabilistic Turing machines, in 
"Proceedings, 6th Annu. Sympos. Theory of Comput." 
HONG, J. W. (1980), On similarity and duality of computation, in "Proceedings, 21st Ann. 
IEEE Sympos. Found. Comput. Sci.," 348-359. 
HONG, J. W. (1984), A trade off theorem for space and reversal, Theoret. Comput. Sci., 32, 
221-224. 
PIPPENGER, N. (1979), On simultaneous resource bounds, in "Proceedings, 20th Ann. IEEE 
Sympos. Found. Comput. Sci., October, pp. 307-311. 
PRATT, V., AND STOCKMEYER, L. (1978), A characterization f vertor machines, J. Comput. 
System Sci. 12, 198-211. 
Ruzzo, M. L. (1979), On uniform circuit complexity, in "Proceedings, 20th Ann. IEEE Sym- 
pos. Found. Comput. Sci.," October, pp. 312-318. 
SAVITCH, W., AND STIMMSON, M. (1978), Time bounded random access machines with parallel 
processing, J. Assoc. Comput. Mach. 26, 103-118. 
SAVITCH, W. J. (1970), Relationship between ondeterministic and deterministic tape com- 
plexities, J. Comput. ,System Sci. 4, No. 2, 177-192. 
SCHONHAGE, A. (1979), "Storage Modification Machines," Technical report, Mathematisches 
Institute, Universit/it Tubengen, Germany. 
THO~VSON, C. D. (1979), Area-time complexity for VLSI, in "l lth ACM Sympos. Theory of 
Computing, pp. 81-88. 
