Efficient simulations of multicounter machines by Vitányi, P.M.B. (Paul)
EFFICIENT SIMULATIONS OF MULTICOUNTER MACHINES *) 
(Preliminary version) 
Paul M.B. Vit~nyi 
Mathematisch Centrum 
Kruislaan 413 
I098 SJ Amsterdam 
The Netherlands 
ABSTRACT 
An oblivious l-tape Turing machine can on-line simulate a multicounter machine 
in linear time and logarithmic space. This leads to a linear cost combinational logic 
network implementing the first n steps of a multicounter machine and also to a linear 
time/logarithmic space on-line simulation by an oblivious logarithmic cost RAM. An 
oblivious log*n-head tape unit can simulate the first n steps of a multicounter ma- 
chine in real-time, which leads to a linear cost combinational logic network with a 
constant data rate. 
I. INTRODUCTION 
In many computations it is necessary to maintain several counts such that, at 
all times, an instant signal indicates which counts are zero. Keeping k counts in 
tally notation, where a count is incremented/decremented by at most ! in each step, 
governed by the input and the set of currently zero counts, is formalized in the 
notion of a k-counter machine [2]. Multicounter machines have been studied extensive- 
ly, because of their numerous connections with both theoretical issues and more or 
less practical applications. The purpose of this paper is to investigate the depend- 
ence of the required time and storage, to maintain counts, on storage structure and 
organization and the cost required by a combinational logic network. To do this, we 
use a notion of auxiliary interest: that of an oblivious Turing machine. An oblivious 
Turing machine is one whose head movements are fixed functions of time, independent 
of the inputs to the machine. The main result obtained here shows that an oblivious 
Turing machine with only one storage tape can simulate a k-counter machine on-line in 
linear time and in storage logarithmic in the maximal possible count. These bounds 
are optimal, up to order of magnitude, also for on-line simulation by nonob!ivious 
machines. 
It is obvious that, for any ~ime function T(n), given a k-counter machine, or a 
k-pushdown store machine, which operate in time T(n), we can find a time equivalent 
k-tape Turing machine. However, such a Turing machine will, apart from using k tapes, 
also use 0(T(n)) storage. In [7] it was shown that for the pushdown store, of which 
*) Registered at the Mathematical Centre as Report, 
547 
the contents can not be appreciably compacted, the best we can do for on-line simula- 
tion by an oblivious Turing machine is 2 storage tapes, O(T(n) log T(n)) time and 
@(T(n)) storage. For the multicounter machine, [2] demonstrated a linear time/logari- 
thimic space simulation by a l-tape Turing machine. [9, Corollary 2] showed how to 
simulate on-line a T(n) time-, S(n) storage-bounded multitape Turing machine by an 
oblivious 2-tape Turing machine in time 0(T(n) log S(n)) and storage 0(S(n)). Combin- 
ing the compacting of counts in [2] and the method of [9] we achieve the best pre- 
viously known on-line simulation of a k-counter machine by an oblivious Turing machine: 
2 tapes, 0(T(n) log log T(n)) running time and 0(log T(n)) storage. It is somewhat sur- 
prising to see that we can restrict a Turing machine for on-line simulation of a k- 
counter machine to I storage tape, logarithmic storage, oblivious head movements and 
still retain a linear running time. 
In Section 2 this result is derived and connected with a linear cost combination- 
al network for doing the same job. This network processes the inputs in sequence and 
may incur a time delay of O(log n) between processing and input and producing the 
corresponding output followed by the processing of the next input. Since we would 
like to obtain a constant data rate, i.e., a constant time delay between processing 
the i-th input at the i-th input port and producing the i-th output at the i-th out- 
put port, I ~ i ~ n, we show in Section 3 how to real-tlme simulate n steps of a multi- 
counter machine by an oblivious log*n-head tape unit and use this to obtain a linear 
cost combinational network with such a fast response time. It is not our purpose here 
to introduce an odd machine model with a variable number of access pointers. One 
should rather think of it as an expedient intermediate step to derive the desired 
result for fixed n. Subsequently we note that cyclic networks (or VLSI where the 
length of the wires adds to the cost) can real-time simulate a multicounter machine 
in logarithmic (area) cost. 
In Section 5 we analyse the cost of on-line simulation of a multicounter machine 
by a logarithmic cost RAM. This turns out to be 0(n) time and 0(log n) space on the 
oblivious version, which is optimal, also for nonoblivious RAMs. For the relevant de- 
finitions of multicounter machines [1,2], multitape Turing machines [8], combination- 
al logic networks [7], real-time and linear time on-line simulation [7] and oblivious 
computations [7,9~I0] we direct the reader to these references. The present paper is 
a preliminary draft; the results in Sections 2 and 4 appeared in Techn. Report IW167,  
Mathematical Centre, Amsterdam, M~y 1981. 
2. LINEAR-TIME ON-LINE SIMULATION BY AN OBLIVIOUS ONE-HEAD TAPE UNIT WITH AN 
APPLICATION TO COMBINATIONAL LOGIC NE~4ORKS 
We first point out one of the salient features of the problem of simulating 
k-CM's on-line by efficient oblivious Turing machines. Suppose we can simulate some 
abstract storage device S on-line by an efficient oblivious Turing machine M. Then 
we can also simulate a collection of k such devices SI,S2,...,Sk, interacting through 
548 
a cormnon finite control, by div iding all tapes of M into k tracks, each of which is 
a duplicate of the corresponding former tape. Now the same head movements do the Same 
job on k col lections of tracks as formerly on the tapes of M, so the time and storage 
complexity of the extended M are the same as those of the original. Whi le  the problem 
of, say, s imulating a k-counter machine in l inear time by a k ' - tape Tur ing machine 
k' < k, stems precisely from the fact that k' is less than k, the problem of simulat- 
ing a k-counter  machine by a k ' - tape obl ivious Turing machine in l inear time is the 
same problem as that of s imulat ing a l-counter machine in l inear time by a k ' - tape 
obl ivious Turing machine. Hence, for a proof of feasibi l i ty it suffices to look for 
the simulat ion of I counter only. (For a proof of infeasibi l i ty we would have the ad- 
vantage of knowing that the head movements are fixed, and are the same for all input 
streams. Besides, we could assume that we needed to simulate an arbitrary, albeit 
fixed, number of counters.) 
In [2] it was shown that a I-TM can simulate a k -CM on-l ine in l inear time. This 
s imulat ion uses 0( log n) storage, for n steps by the k-CM, which is c learly optimal. 
It is a priori  by no means obvious that an obl ivious mult i tape TM can simulate one 
counter in l inear time. We shall show that the result of [2] can be extended to hold 
for oblivious Turing machines. 
In our invest igation we noted that head-reversals are not necessary to maintain 
counters. We did not succeed in gett ing the idea below to work in an obl ivious envi- 
ronment, and include it here as a curiosity, possibly folklore, item. 
Suppose we want to simulate a k -O# C with counts x l ,x2, . . . ,x  k represented by the 
variables n I through n k. The number of s imulated steps of C is contained in the vari-  
able n. For i = 1,2,.. . ,k if count x. is incremented by ~ e {-1,0,+]} then 
1 
n. + n .+2 for 8 = +I 
1 1 
n. § n. + ] for ~ = 0 
1 1 
n. + n. for 6 = -I 
Let, for i = 1,2,...,k, ~.  denote the current count on the i-th counter of C. 
1 
PROPOSITION I. For i = 1,2,...,k, ~. = 0 iff n. = n. 
1 1 
PROOF. Let n be the number of steps performed by C, Pi be the number of +l~s, r i be 
the number of O's, and qi be the number of -]'s, added to the i-th counter, I N i ~ k 9 
during these n stens. Hence pi+qi+ri  = n for all i, I ~ i~k .  By def init ion we have 
n i=  2P i+ r i. Suppose n i=n .  Then it fol lows that P i=q i  and therefore P i -q i  =x i=0"  
Conversely, let x i=P i -q i= 0. Then P i=q i  and n i=p i  +q i  + ri =n"  R 
Hence we obtain: 
COROLLARY. A one-way k-CM C can be simulated in real-time by a (k+2)-head one-way 
non-writing finite automaton P of which the heads can detect coincidence. Hence, four 
heads without head reversals suffice to accept all recursively enumerable sets. 
549 
(Hint: 1 head reads the input from left to right, 1 head keeps the count of n by its 
distance to the origin, and the remaining k heads so keep the counts n I through n k- 
It was shown in [4] that 2-(l~s can accept all recursively enumerable sets. We assume 
that the tape is unbounded, whatever the input may be.) 
After this digression we show: 
THEOREM 2" If C is a k-counter machine, then we can find an oblivious l-t~pe Turing 
machine M that simulates C on-line in time O(n) and storage O(log n) for n steps by C. 
Following [7], we note that in the above theorem "machine" can be replaced by 
"transducer" and the proof below will still hold. 
PROOF. It shall follow from the method used, and is also more generally the case for 
simulation by oblivious Turing machines (cf. above), that if the theorem holds for 
l-CM's then it also holds for k-(IM's, k e I. Let C be a I-CM. The simulating oblivious 
I-TM M will have one storage tape divided into 3 cannels, called the n-channel, the 
y-channel, and the z-channel. If, in the current step of C its count c is modified to 
c+~, ~ E {-1,0,+I}, then: 
= +1 ~ n § n+l ;  y + y+l ;  z § z ,  
6 = 0 ~ n + n+l ;  y + y ; z § z ,  
6 = -1  ~ n + n+l ;  y + y ; z + z+l ,  
where n is the count contained on the n-channel, y is the count contained on the y- 
channel and z is the count contained on the z-channel. Hence, always (I) c = y-z, 
and (2) y+z N n. The count n on the n-channel is recorded in the usual binary nota- 
tion, with the low order digit on the start square and the high order digit on the 
right, see Figure I. At the start of the cycle simulating the i-th step of C, i = p.2 3 
and p is odd, squares 0 through j-I on the n-channel contain l's and square j contains 
a 0. So in this cycle, M's head, starting from square 0, travels right to square j 
and deposits a ! there. It turns all ITs on squares 0 through j-I into O's during this 
pass. The head then returns to square 0. This maintenance of the count n completely 
fixes ~i's head movement, sn M is oblivious. The representation of y and z is in a 
redundant binary notation~ If y is denoted by yOyl...yi~ yj in square j of the y- 
channel, then yj e {0,1,2}, 0 < j < i, and y = 21 yj2 J. - - j=O Similarly for the count z. 
So the representation of y[z] over {0,1,2} is not unique. Finally, the head covers 2 
squares on the tape, and shifts I square in 1 step of M, like a mask covering 2 tape- 
squares. So it has a look-ahead of I. See Figure I. 
l.Te now explain the operation of M. The intuitive idea behind a 2 in square j of 
the yEz]-channei is an, as yet unprocessed, carry from the j-th to (j+1)-th position 
of the binary representation of y[z]. During the left-to-right sweeps of its head, 
governed by the moves indicated for the updating of n, M maintains invariants (I) and 
(2). During the correspondinE right-to-left sweeps back to the start square, M 
550 
maintains also invariant (3): if yj[zj] > 0 is the contents of square j on the y[z] 
channel then zj_], zj, zj§ 1 Eyj_I,yj,yj+]] are 0 or blank. Moreover, every square 
right of a blank square, on that channel, contains blanks and no square containing 
a 0 has a blank right neighbour in that channel. This latter condition gets rid of 
leading O's. 
The validity of the simulation is now ensured if we can show the following 
assertions to hold at the end of M's cycle to simulate the i-th step of C, i > 0. 
(a) For all i, i e O, M can always add I to either channel y or z in the cycle simu- 
lating step i+I of C. 
(b) M can maintain invariants (I), (2) and (3) to hold at the end of each simulation 
cycle. 
(c) The fact that (]), (2) and (3) hold at the end of the i-th simulation cycle of M 
ensures that the count of C is 0 subsequent to C's i-th step iff both the y- 
channel and z-channel contain blanks on all squares subsequent to the completion 
by M of simulating C's i-th step. 
CLAIM I. Assertion (a) holds at the start of each simulation cycle. 
PROOF SKETCH. In the process of simulating the i-th step of C, M takes care of (a) 
during its left-to-right sweeps by propagating all unprocessed carries on squares 
0,1,...,j on both the y-channel and z-channel to the right, leaving O's or ]'s on 
squares 0,1,...,j and depositing a digit d, 0 N d N 2, on square j+1 of the channel 
concerned, for i = D.2 j and p is odd. Assuming that M has adopted this strategy, we 
prove the claim by induction on the number of steps of C, equivalently, number of 
simulation cycles of M. HD 
CLAIM 2. Assertion (b) holds at the start of each simulation cycle. 
PROOF SKETCH. As we saw in the proof of claim I, assertion (a) is implemented during 
the left-to-right sweeps. During the right-to-left sweeps assertion (b) is implemented9 
| [ I 1 1 1 . . . .  / } n-channel 
0 0 0 0 1 - - - / } y-channel 
1 2 - - - I } z-channel 
read-write head 
Figure I. The configuration on M's tape after it has simulated 
31 steps of C, consisting of, consecutively, 16 "add l"'s, 
II "add 0"'s, and 5 "add - l" 's . The head has returned 
to the start position~ 
551 
Clearly, assertion (b) holds at the start of the 1-th cycle. During its right- 
to-left sweeps, at each step M subtracts the 2-digit numbers covered on the y- and z- 
channel from each other, leaving the covered positions on at least one channel con- 
taining only 0's9 M also changes (by marking the most significant digits) leading O's 
on either channel into blanks during its right-to-left sweeps. Suppose the claim holds 
at the start of simulation cycles 1,2,...,i. We show that it then also holds at the 
start of simulation cycle i+I. It is obvious that M's strategy outlined above main- 
tains invariants (I) and (2). It is left to show that it also maintains invariant (3). 
Again this is done by induction on the number of simulation cycles of M. []n 
CLAIM 3. Assertion (c) holds at the start of each simulation cycle. 
PROOF OF CLAIM. That a square on a channel can only contain a blank if all squares 
right of it, on that channel, contain blanks, and that the representations of y and 
z have no leading O's, at the start of each simulation cycle, is a consequence of 
the proof of claim 2. That y-z = c at the conclusion of the i-th simulation cycle 
of M, where c is the count of C after i steps, follows because in the left-to-right 
sweep we add the correct amount to a channel according to claim I, and in the right- 
to-left sweep we subtract equal amounts from either channel. It remains to show that 
as a consequence of the maintainence of condition (3) assertion (c) holds under these 
conditions9 
Suppose that, at the end of the i-th simulation cycle of M, not both the y- and 
z-channel contain but blanks and that, by way of contradiction, y-z = O. Then there 
is one channel, say y, which has a leading digit in position j, j > 0, while the 
digits on the positions j and j-1 on the z-channel are blank. So the count represented 
j-2 2 i is greater or equal to 2 j while the count on z is smaller or equal to 2 ~=0j = by Y 
2J-2. So y-z ~ 2 which contradicts the assumption. (For j = 0, y-z e I.) 
It remains to show that if c # 0 then not both channels y and z contain only 
blanks. Since always, at the start of a cycle, c = y-z holds, if c # 0 then y # z; so 
in that case at least one of the y-channel and z-channel must contain a count # 0. 
Hence there must be a square which contains a digit d > 0 on one of these channels.DD 
By claims I, 2 and 3 the on-line simulation of C by M is correct as outlined. 
It is easy to see that the simulation uses 0(log n) storage for simulating n steps by 
C. We now estimate the time required for simulatin~ n steps by C. In the i-th simula- 
tion cycle M needs to travel to square j, for i = p.2 j and p is odd. Therefore, M 
needs 2j steps for this cycle9 For i = p.2 J and p is even, i.e., i is even, M needs 
I step. Hence, for simulating 2 h+1 steps by C, M needs all in all: 
552 
h 2 h-j 2j + 2 h 2h+l h T(2 h+l) = lj= 1 ~ = "Ej= I 
2.2 h+] + 2 h = 5.2 h. 
j.2 -j + 2 h 2 h+l 
< "Ej= I j.2 -j + 2 h 
2h+l Now, given n, choose h = [log nj so that 2 h ~ n < . Then T(n) N T(2 h+]) 
5.2 h ~ 5n. 
Since the movement of M's head has nothing to do with the actual counts y and z, 
but only with the number of steps passed since the start of C, we observe that a k-CM 
can be simulated on-line by an oblivious l-tape TM Mk, which is just like M, but 
equipped with Yi- and z.-channels,l I N i N k, and therefore with a total of 2k+1 
channels. Just like M, M k uses @(log n) storage and T(n) ~ 5n steps to simulate n 
steps of Ck, the simulated k-CM, which proves the Theorem. 
The covering of 2 or 3 tape squares by the head of M can be simulated easily by 
cutting out ! or 2 squares of the storage tape and buffering it in the finite control. 
The swapping to and fro, from tape to buffer, according to the storage head movement, 
is easily handled in the finite control, of which the size is blown up a bit. This is 
similar to the way to achieve the speed-up in [3]. R 
It is well-known that oblivious Turing machine computations correspond to those 
of combinational logic networks [7,9]. The networks we consider are acyclic intercon- 
nections of gates by means of wires that carry signals. It will be assumed that there 
are finitely many different types of gates available and that these form a "universal" 
basis, so that any input-output function can be implemented by a suitable network. 
Each type of a gate has a cost, which is a positive real number, say ] for each. The 
cost of a network is the sum of the costs of its gates. The method used above can be 
used to construct a combinational logic network that implements the first n steps of 
the computation by a k-CM. Such a network will have n inputs carrying suitable encod- 
ings of the symbols read from the input terminal and n outputs carrying encodings of 
the symbols written on the output terminal, where we assume, for technical reasons, 
that the k-CM is a transducer. If the input- and output-alphabets have more than two 
symbols, the inputs and outputs of the network will be "cables" of wires carrying 
binary signals. Using standard techniques, [7,9], it is easy to show, by imitation 
of the oblivious Turing machine constructed in the proof of Theorem 2, that: 
COROLLARY. If C is a k -~ transducer, then we canconstruct a combinational logic 
network implementing n steps of C with cost O(kn). 
553 
3. REAL-TIME SIMULATION BY AN OBLIVIOUS log n-HEAD TAPE UNIT AND A CORRESPONDING 
COMBINATIONAL LOGIC NETWORK 
In the simulations of the previous section we may incur a time delay of O(log n) 
between the processing of an input and the production of the corresponding output. 
For the combinational logic network with n input ports and n output ports this is in- 
terpreted as follows. The (i+|)-th input port is enabled by a signal of the i-th out- 
put port. Between this enabling and the production of the (i+l)-th output @(log n) 
time may pass. Note that we can only process the (i+1)-th input after the i-th output 
is produced, since the set of zero counts at step i influences the translation of the 
j-th input to incrementing/decrementing the various counters for j > i. To eliminate 
the unbounded time delay we construct as an intermediate step, for each n, a real- 
time simulation by an oblivious log*n-head tape unit. Nhile this doesn't solve the 
problem of simulating an arbitrary multicounter machine in real-time by a Turing ma- 
chine with a fixed number of tapes [i,2], it turns out that with respect to the re- 
suiting combinational logic network this gives as good a result as could be expected 
from simulating an arbitrary multicounter machine in real-time by an oblivious Turing 
machine with a fixed number of tapes. In the sequel we call a combinational network 
with @(I) time delay, between enabling the i-th input port and the production of the 
i-th output, a constant data rate network. 
For the log*n-head simulation we use basically that of the previous section with 
the tape divided into log*n blocks of increasing sizes, each with a resident head. 
The size of the O-th block is x = s(0) for some constant x, of block I, s(1) = 2 x-I 
and of block i, i > I, s(i) = 2 s(i-l). Since we need O(log n) length tape to simulate 
n steps, we need less than log* n blocks, where log*n is the number of consecutive 
iterations of taking the logarithm to get a number less or equal to I when we start 
from n. The O-th block is maintained in the finite control and, assuming the blocks 
are marked, all heads can travel around on local information alone. Only the head on 
block 1 needs to be connected with the finite control to exchange information regard- 
ing the counts. See Figure 2. 
Each head covers four squares, like a window, and is said to be scanning the 
leftmost square it covers. Each head, on information which is put in the first square 
of its block by the head on the previous block, makes a sweep from left-to-right over 
its block until it scans the end cell and then back from right-to-left until it scans 
the first cell. There it waits until the next sweep is due. Hence such a complete 
sweep over block i by the resident head takes 2s(i) steps. We maintain three invari- 
ant~. 
At all times t > 0 holds: 
(I) y+z -< t 
(2) y-z = current count 
554 
input 
s(l) s(2) 
/" FINITE "", 
> /, CONTROL i 
D 0 D 
> output 
log*n- 1 
s (log*n -1) 
log n 
Figure 2. 
(3) for all positions j on blocks 0 through log n: 
yj > 0 ~ Zj_l,Zj,Zj+ 1 ~ {0,-} & 
zj > 0 ~ Yj_l,Yj,yj+ I e {0,-} & 
(yj = - ~=~ z.3 = -) & 7(yj = zj = 0 & Yj+I = Zj+l = - ) .  
(For j = 0 the obvious allowances are made.) The movements of the heads are governed 
by the count on the n-channel. Here this count may contain 2's representing unpro- 
cessed carries. This does not occur on the segment of n maintained on block 0, which 
is incremented by 1 in each step. When that count reaches 0 again (modulo 2 x steps) 
a carry is sent to the head on block 1 which then resides on the first square. Upon 
receiving a carry from block 0, the head on block I makes a full sweep over block 1 
processing the carry and returning to the first square. Since this takes 2.s(1) = 2 x 
steps, it is in position to receive the next carry. When the segment of the n count 
on block 1 reaches 0 again (modulo 2 s(1) sweeps), at the right extreme of this last 
sweep a carry is propagated to the first square of block 2, starting a sweep of the 
resident head. In general, each cycle of 2 s(i) sweeps over block i produces a carry 
to the first square of block i+l starting a sweep by the resident head. Since this 
sweep takes 2.s(i+]) steps, and a carry is produced each cycle of T(i) ~ 2.s(i).2 s(i) 
steps, the head on block i+! is in position to start its sweep upon receiving the 
carry if 
(*) 2-s(i+l) ~ 2.s(i)-2 s(i) for i > I~ 
Block 0 is instantly updated, and therefore we need 2 s(1) N 2 s(0). Since the 
555 
inequalit ies are satisf ied by the chosen block sizes, each propagated carry to a b lock 
is processed immediately. Having f ixed the obl ivious head movements, by starting a 
sweep over block i+! each time a carry arrives from block i on the n channel, it re-' 
mains to prove that invariants (1) - (3) can be maintained at all times during the real-  
time simulation. (Before proceeding, we remark that it is not necessary to assume 
that the blocks are del imited on the tape initially. Using four extra counters we can, 
as soon as we have the size of block i on one of them, determine s(i+1) before the 
first sweep over block i+I is due. Determining the size of block I by the f inite con- 
trol, we can b0o~st r~ the simulation of these four counters in the main simulation 
itself, which wil l  be able to simulate an arbitrary number of counters, and so suc- 
cessively determine the blocks as they are needed. However, for the present object ive 
of eventual ly producing a combinational logic network, there is no advantage in ampli -  
fying on this construction.) 
We have to show: 
(a) Each block can always receive incoming carries on the f irst square of its 
y - [ z - ]  channel, and, in particular, block 0 receiving the inputs never overflows. 
I.e., (1) and (2) are maintained at all times. 
(b) Invariant (3) holds at all times. 
From (a) and (b) it follows, by the same reasoning as in the last section, that the 
current count y-z = 0 iff both y = z = 0 iff both y- and z-channel currently contain 
blanks only. The f inite control, containing block O, therefore knows instantly when 
the count is zero. 
CLALM I. (a) can be maintained. 
PROOF SKETCH. By induction on the consecutive blocks i. 
Base case. A sweeo over block 1 takes 2 s(1) = 2 s(0) steps. Since a channel y, z on 
(2s(~ block 0 can accomodate a count of 2. , subsequent to propagation of a carry 
to block 1 (signifying a count of 2 s[O))" block 0 contains at most 2 s(O) - I  on either 
channel. In the next 2 s(0) -1  steps the count may rise to 2.(2s(0)-1), but at the 
2s(0)-th step a new carry is propagated to b lock ], result ing from the current count 
on the channel plus the current input to that channel, restoring a count of at most 
2 s (0) _ I. 
Induction. During its left-to-r ight sweeps, the head on block i, i > O, processes a 
2 deposited in the f irst square of the y,z-channels by propagating it as far as pos ~- 
sible on the left two squares covered. So a 2 in the f irst square of a channel of 
block i may increment the contents of the f irst  square of that channel on block i+I 
by I. Assume that the f irst square of a channel on block j, I ~ j ~ i, is not incre- 
mented by more than 1 in between the starts of two consecutive sweeps over that block. 
Identifying O's and blanks, and considering only one channel, let block i contain 
00...0 or I0...0 at the start of the tl-th sweep. By assumption, if block i contains 
556 
21]...] at the start of the t2-th sweep, then t 2-  t; ~ 2 s(i) - I. So sweep t 2 causes 
an increment of 1 on the first square of block i+I, by propagating the 2 right leav- 
ing O's. Also by assumption, at the start of the (t 2 -  t| + ])-th sweep block i con- 
tains 00...0 or 10...0 again. Since block i contains only blanks initially, and 
t 2 - t I + ! ~ 2 s(i) , while a sweep over block i+l takes less time than 2 s(i) sweeps 
over block i, the assumption holds for block i+l. The assumption holds for block I by 
the base case. 
So no channel on a block i, i > 0, ever contains more than 2 s(i) + l which, to- 
gether with the base case, proves the claim. ~ 
CLAIM 2. (b) can be maintained. 
PROOF SKETCH. Contrary to the simulation in the previous section, we preserve invari- 
ant (3) while going from left-to-right on a block in propagating a carry. Going from 
right-to-left nothing is changed, so invariant (3) will hold at all times. We do so 
by subtracting the 3 bit pieces of the y- and z-count, covered by the left three posi- 
tions of the head while going from left to right. If a nonzero digit replaces a 0 or 
a blank on a channel this is in the middle position of the three positions covered 
and the three positions covered on the other channel are replaced by O's (or blanks). 
This still allows us to propagate a 2 as far as the central position of the 3 covered, 
so to the first square on the next block at the right extreme of the sweep. From the 
proof of the previous claim we have seen that a carry to the first square of the next 
block was sufficient. The rightmost (fourth) square covered by the head serves to 
detect adjacent blanks so as to return created leading O's to blanks immediately. Due 
to the fact that invariant (3) holds and 2's occur only on the first square of a 
block and underneath a head, only one new leading 0 can be created per channel in a 
sweep on the rightmost nonblank block. ~D 
Hence we have: 
THEOREM 3. We can simulate the first n steps of a multicounter machine by an oblivious 
log*n-head tape unit in real-time and lo~aritkmic space. (Similarly we can directly 
construct an oblivious log*n-tape Turing machine for the same job.) 
Just as argued in the previous section, we can construct a corresponding combin- 
ational logic network. Since only squares which are being rewritten need to be repre- 
sented by logic components, and the time to make a sweep on block i+l is 2.s(i+l) 
while there is only one such sweep in each cycle T(i), T(i) e 2-s(i).2 s(i) = 
2-s(i).s(i+]) steps, the cost of this network is reduced from the expected 0(nlog*n) 
by not representing squares covered by a head which does no rewriting. 
THEOREM 4. We can implement the first n steps of a k-counter machine on an 0(kn) cost 
combinational logic network with constant data rate. 
557 
pROOF" . The network has a constant data rate, i.e. a time interval 0(l) between enabl- 
ing the i-th input port by the (i-l)-th output and producing the i-th output, ] ~ i ~ n, 
since it is derived from a real-time simulation. Each piece of logic circuitry, repre- 
senting four squares covered by a head which! is moving, has cost c(k), depending only 
on the number k of counters simulated but not on the number of steps n. The state of 
the finite control (containing block 0) is represented by cost d(k) pieces of logic 
connected to the input ports. In each cycle T(i) ~ 2 s(i)-2 s'(i) steps, the head on 
block i+{ is active for only 2.2 s(i) steps. Hence such a head is active for only 
0(n/s(i)) steps out of n, ] ~ i < log n. Surmming this for all blocks i, I ~ i ~ log*n, 
and adding the cost for the blocks 0 connected to the input ports we obtain a total 
cost C(k,n): 
log*n-I 
C(k,n) = (( s n-c(k)/s(i)) + n(c(k) +d(k))) 
i=! 
= 0 in.k). 0 
4. SIMULATION BY CYCLIC NETWORKS (AND VLSl) 
When we are not restricted to acyclic logic networks, but are allowed cyclic 
logic networks, or work in the framework of the VLSI model of computation recently 
advanced in [5], it is not difficult to see that: 
THEOREM 5. If C is a k-CM transducer, then we can construct 
(i) a cyclic logic network simulating n steps of C with cost 0(k log n) in real-time; 
(ii) a VLSI simulating n steps of C in real-time with area 0(k log n). 
PROOF. We prove (ii), and (ii) clearly implies (i). The VLSI circuit realizing the 
claimed behaviour could look as follows: 
on-line 
input  
output __  
, F--} , r -q  
c0N-L__  
TROL ~- - -~  ~ i 
{ Q 
LO- " 
flog n] columns 
F_~igure ~. VLSI circuit simulating k-CM. 
k rows 
558 
Each row stores a count in ordinary binary notation, with the low digit contained in 
the left block. Each block Stores two bits: one for the binary digit of the count, 
and one to indicate whether the count digit contained is the most significant bit of 
that count. Car~ies are propagated along the top wire of each row, borrows along the 
bottom wire. The middle wires of each row transport information concerning the most 
significant bit in that row. Each block contains the necessary logic to process and 
transmit correctly carries, borrows and information concerning the most significant 
bit. The finite-control-logic rectangle processes the input signals and the informa- 
tion from the first blocks of each row, whether they contain a most significant bit 0 
of the corresponding count, to issue carries or borrows to the first block of each 
row and to compute the output signal. We leave it to the reader to confirm that, sub- 
sequent to receiving the input signal, the corresponding output signal can be computed 
in time 0(log k), which corresponds to the bit length of an input signal for driving 
k counters. Hence the VLSI circuit simulates the k-CM in real-time. Since the area 
occupied by the wires emanating from each block can be kept to the same size as the 
area occupied by the block itself, the blocks take 0(k logn)  area. The finite control 
logic structure contains some trees of depth log k, so its area can be kept to 
0(k logk) .  Under the assumption that k (0 (n)  this yields the required result. 
To fit a long thin rectangle in a square, as often is necessary to implement the 
structure on chip, we can fold it without increasing the surface area significantly. 
Note that the structure contains no long wires, and that it does not have to be over- 
all synchronized: local synchronization is all we need. Hence it is a practicable 
design. 
5. SIMULATION BY RAMs 
For simulation with a uniform cost RAM it is clear that we can simulate a multi- 
counter on-line with constant delay and constant storage. Constant delay is the RAM 
analogue for real-time, i.e. if T(n) is the time for simulating n steps by the multi- 
counter then the RAM simulates on-line with constant delay if T(n+]) - T(n) < c for 
some constant c and all n. It is easy to see, that a logarithmic cost RAM cannot simu- 
late a counter machine on-line with constant delay, since it can only address regi- 
sters of bounded index and bounded contents. 
At first ~lance it seems that we can do no better than 0(n log n) time for simu -~ 
lation of a countermachine by a logarithmic cost RAM, If we simulate with a tally 
mark in each register, we have to use indirect addressing to maintain the top of the 
counter requiring 0(n log n) time and 0(n) storage to simulate n steps. Using a binary 
count we need only k registers for a k-counter machine, but need again 0(n log n) time 
and 0(log n) storage. Define an oblivious RAM as one in which the sequence of executed 
instructions, as well as the sequence of accessed storage locations, is a function of 
time alone. Due to the usual restrictions of the arithmetic operations of RAMs to + 
559 
and -, as well as to the needed translation of input commands with respect to the set 
of currently zero counters into counter instructions, we need to augment the RAM with 
some constant bit length boolean/arithmetic instructions in order not to be artifi- 
cially precluded from >obtaining the following result by imitation of the simulation 
in Section 2. (If we do not add these extra operations the Theorem below might only 
hold for nonoblivious RAMs by purely irrelevant definitional reasons.) Since we view 
the RAM as an abstract storage device performing a transduction we also assume it is 
connected to the input and an output terminal and dispense with the usual 'accept' 
instruction. Using the simulation in Section 2 we obtain: 
THEOREM 6. We can simulate a k-counter machine on-line by an oblivious logarithmic 
cost RAH in 0(k-n) time and 0(k log n) storage. 
PROOF. Do the simulation of Section 2 with the R/IM, storing the head position of the 
|-tape Turing machine in register I and thej-th square contents in register j+l. Then 
the sequence of executed instructions in the RAM program, and the sequence of access- 
ed registers can be made a function of time alone. So the RAM is oblivious. The time 
for simulating sweeps of length j on the RAM is 0(k Z j+l i=2 log i) = 0(kj logj). So if 
T(2 h+l) is the time needed to execute the first 2 h+1 steps of the multicounter we 
obtain: 
h 
T(2 h+]) E 0( 
j=l 
k.2 h-j 9 j logj + k-2 h) 
= 0(k.2h+l). 
So T(n) e 0(kn) and the storage used is 0(k logn) .  D 
This simulation is optimal in both space and time, even for nonoblivious RAMs. 
6. FINAL REMARKS 
Comparing our solution of the linear time simulation of a k-CM with the nonob- 
livious one in [2], the reader will notice that our average time complexity is the 
same as the worst case time complexity in [2]. So in actual fact, the solution in [2] 
runs faster in most cases than the one presented here. In [I] it was shown that the 
Origin Crossing Problem: "report when all k counts simultaneously reach 0" admits a 
real-time one-tape Turing machine solution. Contrary to the linear time simulation of 
[2], the method in [1] seems to contain inherently nonoblivious features, preventing 
us from turning it into an oblivious version. It has been a classic question [1,2], 
whether or not the Axis Crossing Problem: "report when one out of k counters reaches 
0" or more generally "on-line simulate a k-counter machine" can be done in real-tlme 
by a (nonoblivious) k'Ttape Turing machine for k' < k. A reasonable approach may seem 
to show that, anyway, a real-time simulation of multicounter machines by oblivious 
560 
one-head tape units is impossible. In the event, intuition is wrong. We have noticed, 
cf. Section 2, that if we restrict the simulating device to its oblivious counterpart 
we have the advantage that if l counter is simulatable then k counters can be simulat- 
ed in just the same way. This key observation has led us in the meantime, by augment- 
ing the ideas presented here with an involved tape manipulation technique, to a real- 
time simulation of multicounter machines by oblivious one-head tape units, thus solv- 
ing the above problem with a considerable margin Ill]. Although superficially it would 
seem that this farther reaching result obviates the present ones we like to point out 
that: 
- The present results are far simpler to derive and will suffice for many applications, 
as will some of the distinctive techniques. 
- To derive the linear cost constant datarate combinational logic network the present 
* 
route by way of a log n-head tape unit suffices. 
- The RAM simulation result seems difficult to derive, if at all, from the simulation 
in Ill] without regressing to the simulation given here. 
REFERENCES 
[ l ]  FISCHER, M.J. & A.L. ROSENBERG, Real-time solutions of the origin-crossing prob- 
lem, Math. Systems Theory 2 (1968), 257-264. 
[2] FISCHER, P.C., A.R. MEYER & A.L. ROSENBERG, Counter machines and counter lan- 
guages, Math. Systems Theory 2 (1968), 265-283. 
[3] HARTMANIS, J. & R.E. STEARNS, On the computational complexity of algorithms, 
Trans. Amer. Math. Soc, I17 (1965), 285-306. 
[4] MINSKY, M., Recursive unsolvability of Post ts problem of tag and other topics in 
the theory of Turing machines, Ann. of Math. 74  (1961), 437-455. 
[5] MEAD, C.A. & L.A. CONWAY, Introduction to VLSI Systems, Addison-Wesley, NewYork, 
1980. 
[6] PATERSON, M.S., M.J. FISCHER & A.R. MEYER, An improved overlap argument for on- 
line multiplication, SIAM-AMS Proceedings, Vol. 7, (Complexity of Computation) 
1974, 97-112. 
[7] PIPPENGER, N. & M.J. FISCHER, Relations among complexity measures, Journal ACM, 
26 (1979), 361-384. 
[8] ROSENBERG, A.L., Real-time definable languages, Journal ACM 14 (1967), 645-662. 
[9] SCHNORR, C.P., The network complexity and Turing machine complexity of finite 
functions, Acta Informatica 7, (1976), 95-I07 
[10] VIT~NYI, P,M.B., Relativized Obliviousness, in Lecture Notes in Computer Science 
88 (1980), 665-672, Springer Verlag, New York. (Proc. MFCS '80). 
I l l ]  VITANYI, P.M.B., Real-time simulation of multicounters by oblivious one-tape 
Turing machines, Proceedings 14th ACM Symp. on Theory of Computing, 1982. 
