Efficient On-Line Simulations of Tree Machines and Multidimensional Turing Machines by Random Access Machines by Loui, Michael C. & Luginbuhl, David R.
August 1989 UILU -EN G-89-2222 
ACT-108
COORDINATED SCIENCE LABORATORY
College o f Engineering 
Applied Computation Theory
EFFICIENT ON-LINE SIMULATIONS OF TREE MACHINES AND MULTIDIMENSIONAL TURING MACHINES BY RANDOM ACCESS MACHINES
Michael C. Loui David R. Luginbuhl
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
U n i v e r s i t y _________
6c AODRESS (City, Stste, end ZIP Cod*)
1101 W. Springfield Ave 
Urbana, IL 61801
800 N. Quincy 
Arlington, VA
Wright-Patterson AFB 
22217 Ohio 45433
8«. NAME OF FUNDING / SPONSORING 
ORGANIZATION AF IT
Office of Naval Research
8b. OFFICE SYMBOL 
(If spplicsbie)
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
ONR N00014-85-K-0570
800 N. Quincy Wright-Patterson AFB 
Arlington, VA 22217 Ohio 45433
PROGRAM 
ELEMENT NO.
PROJECT
NO.
TASK
NO
WORK UNIT 
ACCESSION NO.
11. TITLE (Include Security Oessificetion)
Efficient On-Line Simulations of Tree Machines and Multidimensional Turing
12. PERSONAL AUTHOR(S)
13a. TYPE OF REPORT |13b. TIME COVERED August, 1989 1 31
16. SUPPLEMENTARY NOTATION
17. COSATI COOES
FIELD GROUP SUB-GROUP
18. SUBJECT TERMS (Continue on reverse if  necessary snd identify by block number)
random access machine, multidimensional Turing machine, 
tree machine, on-line simulation, time complexity,
19. ABSTRACT (Continue on reverse if necessary end identify by block number)
We establish an optimal on-line relationship between tree machines 
and random access machines (RAMs). We present an on-line sim­
ulation of a tree machine of time complexity t by a log-cost RAM 
of time complexity 0 ((t log i)/log  log i). Using information-theoretic 
techniques, we show that this simulation is optimal.
We adapt the simulation of a tree machine to devise an on-line 
simulation of a d-dimensional Turing machine of time complexity t by 
a log-cost RAM running in time O(t(logi)1_1/,d(loglogi)1/d).
20. DISTRIBUTION y AVAILABILITY OF ABSTRACT
[3  UNCLASSIFIED/UNLIMITED □  SAME AS RPT. □  DTIC USERS
21. ABSTRACT SECURITY CLASSIFICATION
Unclassified — — — — — — — —
22c. OFFICE SYMBOL22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code)
DO Form 1473, JUN 86 Previous editions are obsolete. SECURITY CLASSIFICATION OF THIS PAGE-
UNCLASSIFIED
Efficient On-Line Simulations of Tree 
Machines and Multidimensional Turing 
Machines by Random Access Machines *
Michael C. Loui*
Department of Electrical and Computer Engineering and
Beckman Institute
University of Illinois at Urbana-Champaign 
Urbana, IL 61801
David R. Luginbuhl* 
Department of Computer Science and 
Beckman Institute
University of Illinois at Urbana-Champaign 
Urbana, IL 61801
August 24, 1989
*The views, opinions, and conclusions in this paper are those of the authors and should 
not be construed as an official position of the Department of Defense, U.S. Air Force, or 
other U.S. government agency.
tSupported by the Office of Naval Research under Contract N00014-85-K-0570. 
^Supported by the Air Force Institute of Technology, AFIT/CIRD, Wright-Patterson 
AFB, OH 45433.
Abstract
We establish an optimal on-line relationship between tree machines 
and random access machines (RAMs). We present an on-line sim­
ulation of a tree machine of time complexity i by a log-cost RAM 
of time complexity O((tlogt)/loglog/). Using information-theoretic 
techniques, we show that this simulation is optimal.
We adapt the simulation of a tree machine to devise an on-line 
simulation of a d-dimensional Turing machine of time complexity t by 
a log-cost RAM running in time O(t(logi)1-1/ci(loglogi)1/d).
1
1 Introduction
The random access machine (RAM) and the Turing machine (TM) are the 
standard models for sequential computation. Research into the use of time 
and space by these and other models gives us insight into their computational 
power. This research includes analyzing how two different models use time 
and space, and comparing time and space utilization within a single model. 
Another avenue of investigation is determining how altering the definitions of 
time and space (for example, log-cost versus unit-cost) for a model affects its 
computational power. Slot and van Emde Boas (1988), for example, showed 
how space equivalence of RAMs and Turing machines is affected by varying 
the definition of space complexity for RAMs.
Paul and Reischuk (1981) used tree machines to investigate the relation­
ships between time and space for random access machines and multidimen­
sional Turing machines. They presented a simulation of a log-cost RAM of 
time complexity t by a tree machine of time complexity 0(t). They also 
showed that a tree machine of time complexity t can be simulated off-line 
by a unit-cost RAM of time complexity 0(t/ log log t). Loui (1984b) showed 
that a multihead tree machine of time complexity t can be simulated by a 
tree machine with only two worktape heads in time 0 ((t  log^)/loglogi).
We present an on-line simulation of a tree machine of time complexity t 
by a log-cost RAM of time complexity 0 ((t  log t) /  log log t). Using the notion 
of incompressibility from Kolmogorov complexity (Li and Vitanyi, 1988), we 
show that this simulation is optimal. This appears to be the first application 
of Kolmogorov complexity to sequential RAMs. It is significant because few
2
algorithms have been shown to be optimal.
Using similar techniques, we design an efficient on-line simulation of a 
d-dimensional Turing machine of time complexity t by a log-cost RAM run­
ning in time 0 (i(lo g t)1-1/d(log log i)1/d). For d =  1, the running time is 
O (tloglogt), which is the same as the result of Katajainen et al. (1988).
This work is a complement to Loui’s (1983) simulation of tree machines 
by multidimensional Turing machines and Reischuk’s (1982) simulation of 
multidimensional Turing machines by tree machines.
All logarithms in this paper are taken to base 2.
2 Machine Definitions
All machines that we consider have a two-way read-only input tape and a 
one-way write-only output tape. The principal differences in the machines 
are in their storage structures.
A tree machine, a generalization of a Turing machine, has a storage struc­
ture that consists of a finite collection of complete infinite rooted binary trees, 
called tree worktapes. Each cell of a worktape can store a 0 or 1. Each work- 
tape has one head. A worktape head can shift to a cell’s parent or to its left 
or right child. Initially, every worktape head is on the root of its worktape, 
and all cells contain 0.
Let W  be a tree worktape. We fix a natural bijection between the positive 
integers and cells of W . We refer to the integer corresponding to a particular 
cell as that cell’s location. Write cell(6) for the cell at location 6. Define cell(l) 
as the root of W. Then cell(26) is the left child of cell(6) and cell(26 -f 1) is
3
the right child of cell(6).
Each step of a tree machine consists of reading the contents of the work- 
tape cells and input cell currently scanned, writing back on the same work- 
tape cells and (possibly) to the currently accessed output cell, and (possibly) 
shifting each worktape head and the input head. If the tree machine writes 
on the output tape, it also shifts the output head.
The time complexity t(n) of a tree machine is defined in the natural way.
A multihead d-dimensional Turing machine consists of a finite control 
and a finite number of d-dimensional worktapes, each with one worktape 
head. A d-dimensional worktape comprises an infinite number of cells, each 
of which is assigned a d-tuple of integers called the coordinates of the cell. 
The coordinates of adjacent cells differ in just one component of the d-tuple 
by ±1. At each step of the computation, the machine reads the symbols in 
the currently accessed input and worktape cells, (possibly) writes symbols on 
the currently accessed output and worktape cells, (possibly) shifts the input 
head, and shifts each worktape head in one of 2d -f 1 directions -  either to 
one of 2d adjacent cells or to the same cell.
The random access machine (RAM) (Aho et al., 1974; Cook and Reckhow, 
1973; Katajainen et al., 1988) consists of the following: a finite sequence of 
labeled instructions; a memory consisting of an infinite sequence of registers, 
indexed by nonnegative integer addresses (register r ( j)  has address j)\ and 
a special register AC, called the accumulator, used for operating on data. 
Each register, including AC, holds a nonnegative integer; initially all registers 
contain 0. Each cell on the input tape contains a 0 or 1. The following RAM 
instructions are allowed ((x) denotes the contents of register r(;c); (AC)
4
denotes the contents of AC):
input. Read the current input symbol into AC  and move the input head 
one cell to the right.
output. Write {AC) to the output tape and move output head one cell 
to the right.
jump 9. Unconditional transfer of control to instruction labeled 9.
jgtz 9. Transfer control to instruction labeled 9 if {AC) > 0.
load =C. Load integer C into AC.
load j .  Load (j ) into AC.
load *j. (Load indirect) Load ((j )) into AC.
store j .  Store {AC) into r{j).
store * /. (Store indirect) Store {AC) into register r ((j)) . 
add j .  Add (j ) to {AC) and place result in AC.
sub j .  If {j) > {AC), then load 0 into AC ; otherwise, subtract (j ) from 
{AC) and place result in AC.
Define the length of a nonnegative integer i as the minimum positive 
integer w such that i < 2W — 1 (approximately the logarithm of z). The 
length of a register is the length of the integer contained in the register (note
5
that the length of a register is a time-dependent quantity).
We consider two time complexity measures for RAMs, based on the cost 
of each RAM instruction. For the unit-cost RAM, we charge each instruction 
one unit of time. For the log-cost RAM, we charge each instruction according 
to the logarithmic cost criterion (Katajainen et al., 1988): the time for each 
instruction is the sum of the lengths of the integers (addresses and register 
contents) involved in its execution. The time complexity t(n) of a RAM is 
the maximum total time used in computations on inputs of length n. It is 
possible, of course, to define time complexity in other ways; e.g., we could 
charge some other function f ( j )  for access to register j  (Aggarwal et al., 
1987).
In our simulations, we group the registers into a finite number of mem­
ories, each memory containing an infinite number of registers. This does 
not increase the cost in time by more than a constant factor, since we could 
simply interleave these memories into one memory (Katajainen et al., 1988).
We use a technique of Katajainen et al. (1988) to pack and unpack 
registers in order to find the bit representation of a number and vice-versa. 
This divide-and-conquer strategy involves precomputed shift tables:
Lem m a 2.1 (Katajainen et al., 1988) If the proper tables are available, then 
it is possible to compute the u-bit representation of an integer n < 2U, and the 
numeric value of a u-bit string, both in O (ulogu) time on a log-cost RAM.
Lem m a 2.2 (Katajainen et al., 1988) The tables necessary for Lemma 2.1 
can be built in 0(u2u) time on a log-cost RAM.
6
A machine M  of time complexity t is simulated by a machine M' on-line 
in time f ( t )  if for every time step s,- where M  reads/writes a symbol, there 
is a corresponding time step s[ where M' reads/writes the same symbol, and 
<  /(* )•
3 Simulation of a Tree Machine
3.1 Upper Bound
It is straightforward to simulate a tree machine with a log-cost RAM in time 
O(t\ogt). In fact, such a simulation is used in Theorem 3.2 to show that a 
tree machine can be simulated by a unit-cost RAM in real time. However, 
we can do better than the straightforward simulation for log-cost RAMs.
For simplicity, we consider tree machines with only one worktape, but our 
results generalize to multiple worktapes. Let T be a tree machine of time 
complexity t with one worktape W. We show that there is a RAM R that 
simulates T on-line in time 0((t\ogt)/ log log ¿).
We first provide a brief description of the simulation. We choose parame­
ters h and u such that u =  22/l+2 — 1. We specify the values of h and u later. 
As noted earlier, R has several memories. R maintains in the main memory 
the entire contents of W. The main memory represents W  as overlapping 
subtrees, called blocks. R represents the contents of each block Wx in one 
register rx of the main memory. When the worktape head is in a particular 
block WX1 R represents Wx in the cache memory. Step-by-step simulation 
is carried out in the cache, which represents the block Wx in breadth-first
7
order, one cell of Wx per register of the cache.
Because blocks overlap, when the worktape head exits Wx, it is positioned 
in the middle of some other block Wy. At this time R packs the contents of 
the cache back into rx in the main memory and unpacks the contents of ry 
into the cache.
The details of the simulation follow.
Let W[x,  s] be the complete subtree of W  of height s rooted at cell(:c). 
A block is any subtree Wx =  W[x,2h +  1] such that the depth of cellar) is a 
multiple of h -f 1. Since a block has height 2h -f 1, it contains 22/l+2 — 1 =  u 
cells. Let the relative location of a cell within a block be defined in a manner 
similar to the location of a cell, where the relative location of the root of the 
block is 1, the relative locations of its children are 2 and 3, and so on.
Call a block Wp the parent block of Wx if cell(p) is the ancestor of cell(:r) 
at distance h +  1 from cell(:c). If Wx is the parent block of Wc, then Wc is 
a child block of Wx. Each block has 2h+1 child blocks. The topmost block of 
W, which contains the root of W, is called the root block.
Define the top half of a block Wx as W[xjh], and define the bottom half 
of Wx as the remaining cells of the block. Note that the top half of the block 
Wx is part of the bottom half of VEp, its parent block, so that the blocks 
overlap. Call the portion of Wx shared by Wp (i.e., the subtree W[x,h]) the 
common subtree of Wx and Wp.
R precomputes in separate memories two tables, half and translate. We 
explain later how R uses these tables. Here we describe their contents and 
how they are computed. Let half(z) (respectively, translate(z)) be the regis­
ter in half (respectively, translate) at address z.
8
Half(z) contains [z/2J. For z =  l , . . . , i t /2 ,  R stores z in half(2z) and 
half(2z +  1).
For z =  22/l+1, . . . ,  u, translate(z) contains (z mod 2/l+1) +  2/l+1. it! never 
refers to any register in translate with address less than 22/l+1. Translate is 
computed as follows:
i :=  2h+1
for z =  22/l+1 to  u do 
translate(z) :=  i 
i :=  i +  1
if i =  22/l+2 then i :=  2/l+1
We now show how R simulates the tree machine using the cache. Assume 
the head of T is currently scanning a cell in block Wx. Let cache(z) be the 
register in the cache with address z and let cell(x, z) be the cell in Wx with 
relative location z. For each z =  1 , . . . ,  u, register cache(z) contains the bit in 
cell(:r,z); for example, cache( 1) contains the contents of cell(a:, 1) =  cell(:r), 
the root of Wx. Thus R uses u registers of the cache, each register containing 
one bit.
While the head of T remains in VFX, R keeps track of the head’s location 
with the cache address register in the working memory, a memory maintained 
by R for storing information necessary for miscellaneous tasks. If the cache 
address register contains z, then cell(a;,z) is currently being accessed in T.
To simulate a tree machine operation at cell(a:,z), R loads the contents 
(one bit) of cache(z) into AC. Once the contents are in AC, R simulates one 
step of T by storing either 0 or 1 in cache(z).
9
If the head of T moves to a child of cell(a;, z), then the new address for 
the cache address register, as well as the relative location of the new block 
cell being read, is either 2z or 2z + 1. With one or two additions, R computes 
this new address and places it in the cache address register. When the head 
of T moves to the parent of cell(x, z ), the address of the corresponding cache 
register is \z/2\. Because R has no division operation, it accesses the proper 
register of table half to retrieve the new address in cache.
To describe what happens when the worktape head moves out of the 
current block, we first show how the blocks are stored in main memory. Main 
memory is divided into pages consisting of 2h+1 -f 3 registers each. A page 
corresponds to a visited block of W . Let page(x) be the page representing 
Wx. Define the address of a page to be the address of the first register in 
the page. The first register in page(x) is the contents register. For the page 
representing the root block, the contents register contains the entire contents 
of that block. For every other block Wy, the contents register contains the 
contents of the bottom half of Wy. The contents of cells in a block are kept 
in breadth-first order; i.e., reading the binary string in the contents register 
from left to right is equivalent to reading the bottom half of the block it 
represents in breadth-first order. Initially, all cells of a block contain 0, so 
all contents registers initially contain 0.
Following the contents register is the rank register, containing a number 
i  between 1 and 2h+1 indicating that Wx is the Fh child of its parent block. 
The next register is the parent register, containing the address of the page 
representing the parent block of Wx. The next 2h+1 registers are the child 
registers of Wx. The mth child register of page(x) contains the address of the
10
page representing the mth child block of Wx or 0 if that child block has not 
been visited (see Figure 1).
The first page in main memory corresponds to the root block. Blocks 
are then stored in the order in which they are visited. The page address 
register, a register in working memory, contains the address of the page in 
main memory corresponding to the currently accessed block.
Let Wx be the currently accessed block and let Wp be the parent block of 
Wx. When the tree worktape head moves out of Wx so that it is positioned in 
the middle of a child block Wc, R makes the proper changes to main memory 
and load the cache from the contents register of page(c).
In main memory, R updates the contents registers of page(x) and page(p). 
To update page(x), R packs the contents of the registers of the cache which 
correspond to the bottom half of Wx into a single register in working memory 
(call it the transfer register, denoted by tr). Packing information in the cache 
consists of creating from the registers in the cache one binary string that 
represents the bottom half of a block (in the same format as a main memory 
register). The pack operation is that used by Katajainen et al. (1988). R 
then copies tr into the contents register of page(x) via AC (see Figure 2).
Updating page(p) consists of changing the bits of its contents register cor­
responding to the common subtree of Wx and Wp. R first saves the contents 
of the cache that encode the common subtree of Wx and Wc in a portion 
of working memory, since this information is needed in the cache as the top 
half of Wc. R also saves the contents of the cache that encode the common 
subtree of Wx and Wp. R then loads the contents register of page(p) into 
tr and unpacks the contents into the cache. The bits in working memory
11
TREE WORKTAPE
Wx depth (j)(h +  1)
Wx
-f 2)(A -f 1)
page(x)
page(c)
a
0
11 bottom half of Wx ”
____________ 3
____________ 7
0
“ bottom half of Wc ”
1
a
contents
rank
parent
child 1
contents
rank
parent
Figure 1: Worktape W  (head moves from Wx to Wc)
12
cache working mam
Figure 2: Updating page(p) in main memory
corresponding to the common subtree of Wx and Wp are then written into 
their proper locations in the portion of the cache representing the bottom 
half of Wp. R then packs the contents of the cache into tr and copies tr into 
the contents register of page(p).
R then determines whether Wc has been visited before by checking the 
contents of the child register of page(x) corresponding to Wc. If the child 
register contains a valid (i.e., nonzero) address, then R uses that address to 
access page(c). R then loads the contents register of page(c) into the cache. 
This action is similar to the manipulation of page(p) discussed above. R 
loads the contents of the common subtree of Wx and Wc saved in working 
memory into the registers of the cache representing the top half of the block.
If the child register of page(x) contains 0, then R allocates a new page to 
maintain the information on Wc.
R modifies the page address register to reflect the fact that the worktape
13
head is now scanning block Wc. The address currently in this register is that 
of page(x). R writes the address of page(c) in main memory to the page 
address register. R determines from the cache address register the quantity 
i  such that Wc is the £th child of Wx. Then by accessing the t th child register 
of page(x) in the main memory, R can determine the address of page(c).
To modify the cache address register to reflect the relative location of 
the head within block Wc, R first translates the relative location of the leaf 
cell(:r,2:) in Wx to its relative location in Wc. Since leaf cell(:r, z) in Wx is 
the same as cell((c, 2  mod 2/l+1) +  2h+1) in Wc, R uses the table translate 
described above. Using one or two additions, R then calculates the relative 
location in Wc of this cell’s left or right child, depending on which branch 
the worktape head used to exit Wx. R then writes this new relative location 
into the cache address register.
A similar sequence of operations occurs if the worktape head moves out of 
a block (and further) into its parent block instead of into a child block. Then 
R uses the parent register to determine the address of the page representing 
the parent block, and R uses the rank register to determine the relative 
location of the worktape head within the parent block.
If R does not know the input size n ahead of time, then we let R adopt an 
incremental technique of Galil (1976). R begins by assuming that n =  2. If 
the input head reads a third symbol, then R begins again with n =  4, but it 
does not output symbols already printed. In general, R assumes n = 2k until 
it reads the (2k +  l)th symbol, at which time R starts over with n = 2k+1.
The values of u and h depend on the value of n; therefore u and h are 
recomputed each time the value of n is doubled.
14
Let the actual simulation (without the incremental method) run in time 
t'(n), where t'(n) >  n. It can be shown by induction that the simulation 
with the incremental method runs in time at most k't'(n), for some constant 
k '>  0.
By evaluating the cost of the simulation on a log-cost RAM, we derive 
the following result.
Theorem  3.1 A tree machine running in time t(n) can be simulated on-line 
by a log-cost RAM running in time 0((t(n)  log ¿(n ))/log  log ¿(n)).
Proof. Because the blocks have height 2h -f 1 and overlap by height h +1, 
each time the worktape head moves out of a block, it is exactly in the middle 
of another block; i.e., it will take at least h' =  h -f  1 steps before it exits 
this new block. Since the tree machine computation has at most t steps, the 
work of updating main memory from cache (packing), loading a new block 
to cache (unpacking), and directly simulating h' steps is performed at most 
t/h' times.
Updating main memory and loading a new block in cache involve the pack 
and unpack operations and a constant number of accesses to main memory. 
Registers in main memory have addresses no larger than (t/h')(2/l+1 +  3). 
Thus accesses to main memory take time O(\ogt +  h).
By Lemma 2.1, the time for the pack and unpack operations is 0(u  log u). 
By Lemma 2.2, the time to create the tables necessary for these operations 
is 0(u2u). The time to compute tables half and translate is 0 (u ).
Simulating one step of the tree machine consists of a constant number of 
accesses to cache, taking time O(logtf). Thus simulating h' steps takes time
15
0(h' logu).
The total time required for R, then, is
(t / h')(0(\ogt +  h) +  O (ulogu) +  O(h'logu)) +  0(u2u).
Since h =  O(logu), the total time is
0(((t  log t) /  log u) +  tu +  t log u -f u2u).
Choose h so that u =  (log t)/ log log t. Then the total time for the simu­
lation is 0((t  log t)/ log log t). □
For unit-cost RAMs, we have a much stronger result:
Theorem  3.2 A tree machine can be simulated by a unit-cost RAM in real­
time.
Proof sketch. We design a unit-cost RAM R simulate tree machine T 
with worktape W . R has a contents memory, a parent memory, and several 
working registers. Let contents(x) (respectively, parent(x)) be the register 
with address x in the contents (respectively, parent) memory. Contents(x) 
at address x contains the contents of cell(ar) at location x in the worktape of 
T. If cell(:r) is visited by T, then parent(x) contains the worktape location 
of the parent of cell(x). The working registers are used as temporary storage 
and to keep track of which cell is currently accessed by T.
R simulates one step of T with a constant number of accesses to the two 
memories and the working registers. For example, if the head moves from 
cell(x) to a child of cell(:r), then R computes location 2x of the left child or
16
2x +  1 of the right child with one or two additions and stores x in parent(2x) 
or parent(2x +  1). Thus to simulate t steps of T takes 0(t)  time on T. □
An immediate consequence of Loui’s upper bound on the simulation of a 
tree machine by a multidimensional TM is the following:
T heorem  3.3 (Loui, 1983) A log-cost RAM running in time t(n) can be 
simulated on-line by a multihead d-dimensional Turing machine running in 
time 0 (t(n )1+1/d/ log ¿(n)).
Using our simulation of a tree machine by a log-cost RAM, we obtain a 
nonlinear lower bound for simulating a RAM by a multidimensional Turing 
machine:
C orollary 3.4 There is a log-cost RAM R running in time t(n) such that 
for any multihead d-dimensional Turing machine S, S simulates R on-line 
in time O((^(n)1+1/d(loglog t(n))1+1/d)/(log t(n))2+1/d).
Proof. Let T be the tree machine described in the lower bound proof 
of Loui(1983). Let R be the RAM that uses the method in the proof 
of Theorem 3.1 to simulate tree machine T. T runs in real time, so by 
Theorem 3.1, R runs in time t(n) =  0((n log n )/log  log n). Now assume 
there is a d-dimensional Turing machine that simulates R on-line in time 
o((£1+1/^ (loglog£)1+1/d)/(lo g t)2+1/d). We thus have an on-line simulation of 
tree machine T running in time n by a d-dimensional Turing machine running 
in time o(n1+1/d/  log n). But we know from Loui (1983) that the lower bound 
on such a simulation is f2(n1+1/,<i/log  n); hence we have a contradiction. □
17
3.2 Lower Bound
We now show that the time bound of Theorem 3.1 is optimal within a con­
stant factor. We begin with an overview of Kolmogorov complexity, which 
we use to prove the lower bound.
For strings <7, r in {0 ,1}*, let K(cr) be the Kolmogorov complexity of a 
with respect to a universal Turing machine U. Define K {o ) to be the length of 
¡3 where ¡3 is the shortest binary string such that U((3) equals a. Informally, 
K(cr) is the length of the shortest binary description of a.
We say a string cr is incompressible if K(cr) >  |cr|. Note that for all n 
there are 2n binary strings of length n, but there are only 2n — 1 strings of 
length less than n. Thus for all n, there is at least one incompressible string 
of length n.
A useful concept in Kolmogorov complexity is the self-delimiting string. 
For natural number n, let bin(n) be the binary representation of n without 
leading 0’s. For binary string iu, let W be the string resulting from placing a 
0 between each pair of adjacent bits in w and adding a 1 to the end. Thus 
110 =  101001. We call the string bin(\w\)w the self-delimiting version of 
w. The self-delimiting version of w has length |u;| +  2 flog (| it; | -f 1)}. When 
we concatenate several binary string segments of differing lengths, we can 
use self-delimiting versions of the strings so that we can determine where 
one string ends and the next string begins with little additional cost in the 
length of the concatenated string. Note that in such a concatenation it is 
not necessary to use a self-delimiting version of the last string segment.
Kolmogorov complexity has recently gained popularity as a method for
18
proving lower bounds. Li and Vitanyi (1988) provide a thorough summary 
of lower bound (and other complexity-related) results obtained using Kol­
mogorov complexity.
T heorem  3.5 There is a tree machine T running in time n such that for any 
log-cost RAM R, R requires time t{n) =  f2 ((n logn)/log  log n) to simulate T 
on-line.
Proof For simplicity, we omit floors and ceilings in this proof.
Tree machine T has one tree worktape and operates in real time. T ’s 
input alphabet is a set of commands of the form (e,^>), where e € {0 ,1 ,? }  
and ip indicates whether the worktape head moves to a child or parent of the 
current cell or remains at the current cell. Suppose T is in a configuration 
in which the cell x at which the worktape head is located contains e' . On 
input (e,^>), machine T writes e! on its output tape, and the worktape head 
writes e on cell x if e £ {0 ,1 }, but it writes e' (the current contents of x) on 
x if e =?. At the end of the step the worktape head moves according to ip. 
For every n that is a sufficiently large power of 2, we construct a series of n 
tree commands for which R requires time Q((n log n )/ log log n). As in (Loui, 
1983), the string of tree commands is divided into a filling part of length n/2 
and a query part of length n/2.
Let W  be the worktape of T, and let x0 be the root of W. Let d =  
log(n/8). Denote the complete subtree of W  of height d whose root is x0 by 
Wd. Let N =  n/8. We consider the complexity of the simulation in terms of 
N.
19
We fill Wd with an incompressible string r of length 2N  — 1 such that r 
can be retrieved by a depth-first traversal of Wd. This is the filling part, for 
which T takes time 4N  (=  n /2)).
The query part consists of a series of questions. A question is a string 
of 2 log N  tree commands that causes the worktape head to move from the 
root xq of the tree worktape to a cell at depth d and back to xq without 
changing the contents of the worktape. As the head visits each cell during 
a question, T outputs the contents of that cell. T processes N/ (4 log N) 
questions <2i, Q2, • • • during the query part. We show that after each question 
Qj, there is a question Qj+i such that R takes time fi((log2 N)/ log log N) 
to process Qj+i, and Theorem 3.5 follows.
Assume that R has just processed question Qj. Let P(N)  be the max­
imum time necessary to process any possible next question. We show that 
some next question takes time H((log2 N)/ log P). Consequently, by defini­
tion, P  =  ft((log2 N ) /log  P); thus P =  fi((log2 N)/\og\ogN).
We first determine the total time t required for R to process all possible 
next questions.
Divide worktape W  into S =  (log A")/(21og P) sections, each of height 
2 log P. For 5 =  0 ,1 , . . . ,  S — 1, there are P 2s+2 exit points (bottom cells) in 
section s. We refer to any initial segment of a question as a partial question 
and the portion of the question that is processed while the worktape head 
is in one section as a subquestion (see Figure 3). To compute t , we compute 
for 3 =  0 ,1 , . . . , 5  — 1 the total time ts required for R to process all possible 
subquestions in section s. Since the depth of Wd is log N , there are N  possible 
next questions. Each of the p 2s+2 bottom cells of section s is visited during
20
Xq
Figure 3: Processing section s of worktape W  
iV /P2s+2 of these questions.
Let aa be the string defined by the contents of the bottom cells of section 
s , from left to right; clearly, |<r3| =  P 2s+2.
Lem m a 3.6 The string as is incompressible up to a term ofO(s  log P);  i.e., 
K(as) >  Ws\- 0(5 log P ).
Proof. The incompressible string r, which gives the contents of W, can 
be specified by a string composed of the following segments:
1. a self-delimiting string encoding this discussion (0 (1 ) bits)
21
2. a self-delimiting version of a binary string of length K (crs) that specifies 
as (I<(as) +  0(s  log P) bits)
3. self-delimiting versions of the values of s and P (O(logs) +  O (logP) 
bits)
4. a string specifying the bits in r but not in crs (2N  — 1 — P 2s+2 bits).
Thus I<(t) < K{<t3) +  (2N -  1 -  P 2s+2) +  0(s  log P).  But K ( t) > 2N -  1; 
therefore, K(crs) > P2a+2 — O (slogP ). □ Lem m a 3.6
i
Lem m a 3.7 If l  >  1 then ^ l o g i > (l/2)I\ogt
t=i
Proof. For all i such that 1 <  i <  £, evidently (i — l)(I — i) >  0; hence 
¿(£ — i +  1) >  I. Consequently
>
( l / 2) X X loS i +  -  * +  !))
i=i
( l / 2 ) ^ log(i(£-i  + l))
»'=1
( i /2 )  £  log i
»=i
(1 /2 ) /log /.
□ Lem m a 3.7
Lem m a 3.8 For s =  1 ,2 , . . . ,  S — l, the maximum number of registers ac­
cessed during the processing of all partial questions through section s — 1 is 
4P2s+1/lo g P .
22
Proof. Let C =  4 P /log P . By Lemma 3.7, for P  sufficiently large, 
log * > P. The processing of each partial question through section s — 1 
could involve no more than C registers; otherwise, because of the total cost 
of addresses of registers, R would exceed time P  for some next question. 
There are P 2a different partial questions possible through section s — 1, so 
there are no more than 4P2s+1/  log P registers accessed for all possible partial 
questions. □ Lem m a 3.8
Let us consider a particular section s. Let r1? r2, . . . ,  rm be the registers, 
in order of increasing address, used to process tree commands in section s. 
The address of r,- is at least i. For 1 <  i <  m, let X{ be the set of bottom cells 
x of section s such that r,- is accessed while the worktape head is visiting some 
cell y in section s, and either y is an ancestor of x or y =  x (see Figure 3). 
We say that r; operates on the bottom cells in X{.
To compute a lower bound on ts, we assess the contribution to ts of 
accessing register r,-. For 1 < i <  m, the total access time for register r, 
in section s is at least the product of logi (since the address of r, is at 
least ¿), \X{\ (the number of bottom cells that rt- operates on), and N/P2s+2 
(the number of questions during which one of these bottom cells is visited). 
Totalling the time incurred by access to each register yields:
m
t, >  £ ( l o 6 0 l * , I W ^ +2). ( i)
i'= l
Using Lemma 3.10 below, we can determine a lower bound for f,, but we 
first introduce the following technical lemma.
23
Lem m a 3.9 (Loui, 1984a [Section 4]) Let J and M be integers such that 
M > J. A sorted J-member subset of { 0 , . . . ,  M } can be represented with no 
more than 2J log (M /J ) +  4J +  2 bits.
Let h =  ( l /7 )P 2a+1.
m
Lem m a 3.10 £ | X (| >  (l/2 3 )JP2s+2.
iszh
Proof. Assume that the conclusion is false. Then r i , . . . ,  r^-i operate on 
at least (22/23)P 2s+2 bottom cells in section s. We can specify the string 
as as follows: we obtain the bits of . . .  ,X m explicitly. We obtain the 
other bits of cr5 by simulating R on each partial question to a bottom cell of
m
section s not in (J Xk■ On each such partial question, R uses only registers
k=h m
r i , . . . ,  i and registers accessed in sections 1 , . . . ,  s — 1. Thus a3 can be 
specified with a string composed of the following segments:
1. a self-delimiting string encoding the program of R and this discussion 
(0 (1 ) bits)
2. self-delimiting versions of the addresses and initial contents of registers 
accessed in sections 1 , . . . ,  s — 1 (at most SP2s+2/ log P -f  0 (s  log P) bits 
-  by Lemma 3.8, at most 4P2s+2/  log P  registers are required, and for 
each register, the contents and the address could each require P bits.)
3. self-delimiting versions of the addresses and initial contents of r1?. . . ,  r^-i 
((2 /7 )P 2a+2 +  O (slogP ) bits)
24
4. a string specifying positions of cells in Xk for k >  h (we use Lemma 3.9 
with J =  (1/23)JP2*+2 and M  =  P 2s+2; this requires at most (14/23)P 2s+2 
bits. The encoding used to achieve Lennna 3.9 is such that the begin­
ning and end of this string can easily be determined.)
5. a string specifying the contents of cells in Xk for k > h (at most 
(1/23)P 2a+2 bits).
This means that the number of bits needed to specify crs is at most 
(151/161)P2a+2 -j- 0 (P 2s+2/log  P) < P 2s+2 — 0 (s  logP ) for sufficiently large 
P. Thus we have a contradiction of Lemma 3.6. □ Lem m a 3.10
Thus we have:m
ts >J2((l°& i)\X i\{N/p2s+2) (Inequality 1)
i=im
>£((logO |X ,|(JV /P2*+2))
t"=/l m
> ( iV /P 2*+2)( lo g ft )^ | ^ i
t=h
>  (iV /P23+2)(log h)(l/23)P2s+2 (Lemma 3.10)
>  (l/23)A^((2s -f l ) lo g P  — log 7) (definition of h)
> (l/23)Ns log P.
Now sum ts over all s to compute a lower bound for £, the total time
required for R to process all possible next questions:
5 -1
t =  E < .
3=0
> X )((l/23 )iV alog  P)
25
>  (l/23)AT(log P)((log2 N)/(4 log2 P)  -  0 ((log  N ) /lo g  P))
> (l/92)((iV log2 iV )/log P  — O(loglV)).
Since there are N questions, we divide t by N  to derive the average time 
needed by R to process the next question, fi((log2 N ) /lo g  P). Some next 
question must require time greater than or equal to this average time. Since 
P  is the maximum time for some next question, P  >  H((log2 N ) /lo g  P); 
hence, P =  ii((log2 N ) /lo g  log N).
Thus for each question Qj , we can choose a next question Qj+i that 
takes time S7((log2 N ) /lo g  log N). Since the query part has N/(2 log N) 
questions, our choice of questions means that the query part takes time 
t =  (N / (2 log A^))n((log2 N ) /  log log N)) =  Sl((N log N)/ log log N). The 
entire simulation takes at least time t. Since N  =  n/8, the lower bound 
holds for n as well. □ T heorem  3.5
Because the lower bound proof considers only the time involved in access­
ing registers, the lower bound holds for RAMs with more powerful instruc­
tions, such as boolean operations or multiplication.
4 Simulation of a Multidimensional Turing 
Machine
By composing our simulation in subsection 3.1 of a tree machine by a log- 
cost RAM with Reischuk’s (1982) simulation of a d-dimensional Turing ma­
chine by a tree machine, we obtain an on-line simulation of a d-dimensional
26
Turing machine of time complexity t by a log-cost RAM running in time 
0((5dlog*H log t)/ log log t). But we can improve this upper bound with a 
direct simulation.
T heorem  4.1 A d-dimensional Turing machine running in time t(n) can be 
simulated on-line by a log-cost RAM running in time 
O(t(n)(log t(n))1-1/d(log log ¿(ra))1^ ).
Proof sketch. We design a log-cost RAM R that simulates d-dimensional 
Turing machine M . For simplicity, assume M  has one worktape; our results 
generalize to d-dimensional Turing machines with more than one worktape. 
Let s — ((log ¿ )/ log log t)l!d. Partition the worktape of M  into d-dimensional 
cubes (call them boxes) with side length s. Let corner[i) be the cell in box i 
with the coordinates whose components are the smallest.
For box i, if corner(i) =  (¿1 , ¿2, • • •, id), let index(i) =  idtd~l 4- id-\id~2 +  
. . .  -f ¿1 . R stores the contents of box i in the register in main memory with 
address index(i). Step-by-step simulation is carried out in the cache. R con­
ducts the simulation in t/s phases, each of s steps of M. For each phase: R 
unpacks the contents of 3d boxes that are within distance s of the worktape 
head (the head remains within these boxes during the phase); R simulates 
M  for s steps; and R packs the contents of the cache back to main mem­
ory. Using precomputed values of t, t2, . . . ,  td_1, R quickly computes index(i') 
from index(i) when box i' is adjacent to box i. For each phase, R takes time 
0 (log t) to access main memory, 0(log t) to compute the address of registers 
in main memory representing the new blocks needed in cache, 0(.slog,s) to 
simulate s steps in the cache, and O(sdlogs) to pack and unpack the appro-
27
priate registers (Lemma 2.1). Thus the total time for the simulation is:
(t / s)(0(\ogt) -f 0(s  logs) +  0 (5 d logs))
=  0(((t  log t)/s) +  ¿3d“ 1 logs)
= O(i(log£)1_1/d(loglog t)1^ ) .^
Once again, the result for unit-cost RAMs is much stronger:
T heorem  4.2 A multidimensional Turing machine can be simulated by a 
unit-cost RAM in real-time.
Proof. Schònhage (1980) showed that a unit-cost successor RAM can 
simulate a multidimensional Turing machine in real-time. It follows that a 
unit-cost RAM with addition and subtraction can simulate a multidimen­
sional Turing machine in real-time as well. □
5 Conclusions
Because the log-cost RAM is considered a “standard” among models of com­
putation, it is important to determine its relationships to other models. Here 
we have shown an optimal on-line relationship between log-cost RAMs and 
tree machines. We have constructed an analogous efficient simulation of mul­
tidimensional Turing machines by log-cost RAMs. We hope that this work 
will lead to further study of relationships between other models of computa­
tion.
Some further areas of research include:
28
1. finding an off-line simulation that is faster than our on-line simulation 
of a tree machine by a log-cost RAM.
2. finding an optimal simulation of a pointer machine (Schonhage, 1980) 
by a log-cost RAM.
3. finding an optimal simulation of a unit-cost RAM by a log-cost RAM.
References
[Aggarwal et al., 1987] Alok Aggarwal, Bowen Alpern, Ashok K. Chandra, 
and Marc Snir. A model for hierarchical memory. In Proc. 19th Ann. 
ACM Symp. on Theory of Computing, pages 305-314, 1987.
[Aho et al., 1974] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. 
The Design and Analysis of Computer Algorithms. Addison-Wesley 
Publishing Company, 1974.
[Cook and Reckhow, 1973] Stephen A. Cook and Robert A. Reckhow. Time 
bounded random access machines. • J. Comput. System Sci., 7:354-375, 
1973.
[Galil, 1976] Zvi Galil. Two fast simulations which imply some fast string 
matching and palindrome-recognition algorithms. Information Process­
ing Letters, 4(4):85-87, 1976.
[Katajainen et al., 1988] Jyrki Katajainen, Jan Van Leeuwen, and Martti 
Penttonen. Fast simulation of Turing machines by random access ma­
chines. SIAM J. Comput., 17:77-88, February 1988.
29
[Li and Vitanyi, 1988] Ming Li and Paul M. B. Vitanyi. Two decades of 
applied Kolmogorov complexity. 1988. To appear in Handbook of The­
oretical Computer Science (J. van Leeuwen, Managing Editor), North- 
Holland. Preliminary version in Proc. 3rd IEEE Structure in Complexity 
Theory Conf., pages 80-101, 1988.
[Loui, 1983] Michael C. Loui. Optimal dynamic embedding of trees into 
arrays. SIAM J. Comput., 12:463-472, August 1983.
[Loui, 1984a] Michael C. Loui. The complexity of sorting on distributed 
systems. Information and Control, 60:70-85, 1984.
[Loui, 1984b] Michael C. Loui. Minimizing access pointers into trees and 
arrays. «7. Comput. System Sei., 28(3):359-378, 1984.
[Paul and Reischuk, 1981] Wolfgang Paul and Rüdiger Reischuk. On time 
versus space II. J. Comput. System Sei., 22:312-327, 1981.
[Reischuk, 1982] K. Rüdiger Reischuk. A fast implementation of a multidi­
mensional storage into a tree storage. Theoret. Comput. Sei., 19:253— 
266, 1982.
[Schönhage, 1980] Arnold Schönhage. Storage modification machines. SIAM 
J. Comput., 9(3):490-508, August 1980.
[Slot and van Emde Boas, 1988] Cees Slot and Peter van Emde Boas. The 
problem of space invariance for sequential machines. Inform, and Corn- 
put., 77:93-122, 1988.
30
