Data Independence of Read, Write, and Control Structures in PRAM Computations  by Lange, Klaus-Jörn & Niedermeier, Rolf
Journal of Computer and System Sciences 60, 109144 (2000)
Data Independence of Read, Write, and Control
Structures in PRAM Computations1
Klaus-Jo rn Lange and Rolf Niedermeier2
Wilhelm-Schickard-Institut fu r Informatik, Universita t Tu bingen, Sand 13,
D-72076 Tu bingen, Federal Republic of Germany
E-mail: langeinformatik.uni-tuebingen.de, niedermrinformatik.uni-tuebingen.de
Received May 13, 1994; revised May 12, 1998
We introduce the notions of control and communication structures in
PRAM computations and relate them to the concept of data independence.
Our main result is to characterize differences between unbounded fan-in
parallelism ACk, bounded fan-in parallelism NC k, and the sequential classes
DSPACE(log n) and LOGDCFL in terms of a PRAM’s communication
structure and instruction set. Our findings give a concrete indication that in
parallel computations writing is more powerful than reading. Further charac-
terizations are given for parallel pointer machines and the semiunbounded
fan-in circuit classes SACk. In particular, we obtain the first characterizations
of NCk and DSPACE(log n) in terms of PRAMs. Finally, we introduce
Index-PRAMs, which in some sense have built-in data independence. We
propose Index-PRAMs as a tool for the development of data-independent
parallel algorithms. Index-PRAMs serve for studying the essential differences
between the above mentioned complexity classes with respect to the underly-
ing instruction set used.  2000 Academic Press
1. INTRODUCTION
Parallel random access machines (PRAMs) are the favorite model for design and
analysis of parallel algorithms. This favoritism has led to the desire for efficient
doi:10.1006jcss.1999.1665, available online at http:www.idealibrary.com on
109 0022-000000 35.00
Copyright  2000 by Academic Press
All rights of reproduction in any form reserved.
1 A preliminary version of parts of this work appeared in the ‘‘Proceedings of the Thirteenth Con-
ference on Foundations of Software Technology and Theoretical Computer Science’’ (Lecture Notes in
Computer Science, Vol. 761, pp. 104113, Springer-Verlag, Berlin) held in Bombay, India, December
1517, 1993 under the title ‘‘Data-independences of parallel random access machines.’’
2 Part of this research was done while both authors were at the Fakulta t fu r Informatik, Technische
Universita t Mu nchen. This research was supported by the Deutsche Forschungsgemeinschaft, SFB 342,
Teilprojekt A4 ‘‘KLARA’’ and by the DFG Project La 6183-1 ‘‘KOMET.’’ The work of the first author
was also supported by the International Computer Science Institute. The work of the second author was
also supported by a Feodor Lynen fellowship of the Alexander von Humboldt-Stiftung, Bonn, and the
Center for Discrete Mathematics, Theoretical Computer Science, and Applications (DIMATIA), Prague.
general-purpose simulations of PRAMs on real machines, as opposed to special
implementations of certain algorithms on certain machine architectures. The direct,
general approach, using techniques such as hashing and slackness (see [1, 51, 61,
62]), is limited by problems such as hardware cost, (non)scalability, and intercon-
nect length (see [9, 19, 31, 45, 61, 62, 67] for discussion). The main alternative has
been to base algorithm design on restricted forms of PRAMs (see [12, 32]) or on
models that try to build in more ‘‘computational realism’’ [2, 3, 19, 21, 33, 62, 61].
Our approach is to stay with the generality and simplicity of the basic PRAM
models, but without using the classifications offered by parallel complexity theory
in terms of P-completeness and NC-membership. Since this dichotomy apparently
is not appropriate when trying to speed-up running times by using rather weak
parallel machines or networks of workstations, we tried to abstract out general
features of algorithms that lead to efficient implementations. The features we
emphasize are data independence of the read, write, and control structures of the
algorithm.
Many of the computation-intensive tasks targeted by the ‘‘grand challenges’’
[44, 57] seem to be simpler than what is needed for a general PRAM simulation.
Their communication and control structures are simple enough that an efficient
implementation on existing architectures, working with distributed memories and
message passing mechanisms, is possible. For example, tasks like FFT, parallel
prefix sums (‘‘scans’’), and matrix operations are data independent on all counts.
For operations on graphs as pointer jumping (list ranking) this may not be the
case. In general, parallel graph algorithms depend on how graphs are represented.
There are two simple representations of graphs: adjacency matrices and edge lists.
Algorithms working on adjacency matrices usually show data-independent behavior
(e.g., Warshall algorithm), whereas algorithms using edge lists (e.g., pointer jump-
ing or list ranking problem) show inherent dependence of the communication struc-
ture in the underlying datathe addresses of global memory cells used strongly
depend on the list structure. The importance of data-independent readswritescon-
trol in distributed memory computing has been pointed out in several papers, e.g.,
[29, 39, 58, 66].
The main contributions of this paper are as follows:
1. We formally introduce the notions of data-independent reads, writes, and
control: Data independence of control means that the statement executed by a pro-
cessor of a PRAM depends only on time, processor identification number (PIN for
short), and length of the input, but not on the input itself. Data independence of
communication structure means that in global read accesses (resp., the receipt of
messages) or write accesses (resp., the sending of messages) the addresses of shared
memory cells depend only on time, PIN, and input length.
2. From the formal notion we obtain surprising resultsthese restrictions
lead to complexity classes whose original definitions had seemingly nothing to do
with PRAMs: Whereas unbounded fan-in parallelism, represented by the classes
ACk, is characterized by a data-dependent control or write structure in combination
with a data-independent read structure, bounded fan-in parallelism, represented by
110 LANGE AND NIEDERMEIER
the classes NCk, is characterized by computations where all three structures have
to be data independent. The remaining case, where we have a data-dependent read
structure but data-independent control and write structures, leads to characteriza-
tions of the sequential classes DSPACE(log n), LOGDCFL, and, as an intermediate
class defined by a parallel device, of Cook’s parallel pointer machines [15, 18]
operating in logarithmic time. Eventually, we discuss the power of Akl’s concurrent
write OR-feature for PRAMs [4] and obtain a characterization of LOGCFL and,
more generally, of Venkateswaran’s semiunbounded fan-in circuit classes SACk
[64] by monotonic, fully data-independent OR-PRAMs.
3. Formal analysis of different instruction sets for PRAMs and their data-
(in)dependence finally leads to a computation model (so-called Index-PRAM)
intended to fulfill the desire for a more realistic and nevertheless simple PRAM
model: The basic idea is to consider PRAMs where the indexing of global memory
cells is only possible through special local index registers. The value of index
registers only depends on time, PIN, and input length, but not the input data.
Within the framework of Index-PRAMs, it is possible to study the differences
between various complexity classes with respect to the instruction set used by the
underlying Index-PRAM. For example, it will be shown that the fundamental dif-
ference between DSPACE(log n) and parallel pointer machines [18] operating in
logarithmic time is that for the first only a restricted form of conditional
assignments may be usedthe condition may depend only on the value of a bit of
the input word and must not depend on a result of a previous computation.
The paper is organized as follows. In the next section, we provide basic defini-
tions and concepts relevant for our work. In the third section, we present simple
PRAMs and the notions of data independence of communication and control. In
the fourth section, we present the characterizations of various complexity classes
within the unified framework of data independence as discussed above. We define
Index-PRAMs in the fifth section and we give characterizations by Index-PRAMs
that parallel those of the fourth section. Finally, we conclude this paper with a sum-
marizing table, a discussion of the main benefits of our work with respect to a more
realistic parallel complexity theory, and some directions for future research.
2. PRELIMINARIES
We assume familiarity with the basic concepts and notations of computational
complexity theory [7, 8, 10, 35, 37, 49, 69]. By DSPACE(log n), (DTIME(log n),
and ATIME(log n)) we denote the class of languages accepted by deterministic
(resp. alternating) Turing machines whose working space (resp. running time) is
bounded by log n. Augmenting a DSPACE(log n) Turing machine with an unbounded
auxiliary push-down store yields a so-called auxiliary push-down automaton [14]. We
refer to the class of languages logspace many-one reducible to context-free languages
(deterministic context-free languages) as LOGCFL (LOGDCFL). In the following,
we briefly review some concepts and facts of parallel complexity theory. For more
details, we refer to the literature [16, 27, 36, 38, 50, 53].
111DATA INDEPENDENCE IN PRAM COMPUTATIONS
2.1. Uniform Circuits
A boolean circuit C is a finite, acyclic, directed graph. Nodes of in-degree (out-
degree) zero are inputs (outputs). Inner nodes with nonzero in-degree are labeled by
Boolean functions throughout this paper by negations, disjunctions, and conjunc-
tions. We call the inner nodes gates and the edges wires. Given an assignment of
Boolean values to all inputs, each gate evaluates to either true (or 1) or false (or
0), according to the interconnection structure of C. If C has just one output, we use
C to recognize binary languages, defining L(C) to be the set of assignments to the
inputs which let the output evaluate to true. The size of C is the number of its gates,
not counting the inputs. The depth of C is the length of the longest path connecting
an input node with an output node.
A circuit family C is a sequence C=(Cn)n1 of circuits, where Cn has exactly n
inputs. Family C has polynomial size if for some polynomial p, the size of each Cn
is bounded by p(n). Similarly, the depth of C is bounded by O(logk n) if for some
constant c>0 the depth of each Cn is less than c logk n. If for some constant m
(usually m=2) the in-degree of each gate in each Cn is bounded by m, then C is
of bounded fan-in. If there is no bound on the in-degrees, then C is of unbounded
fan-in.
In order to relate classes of languages defined by circuits with standard com-
plexity classes, it is necessary to consider uniform circuit families by requiring that
the members of a circuit family are sufficiently similar to each other. There are
several uniformity conditions which have fortunately turned out to be equivalent in
most cases [55].
Throughout the paper we use the notion of DTIME(log n)-uniformity for circuits
[11, 16, 55]. A circuit family of bounded fan-in of size z(n) and depth t(n) is called
DTIME(log n)-uniform if there is a deterministic Turing machine recognizing the
extended connection language LEC in time O(log n). Here, LEC=[(1n, g, {) | gate
g has type {] _ [(1n, g, p, g$) | g$ is predecessor of g via path p] where 2n<g,
g$<z(n), { # [6, 7], and p # [0, 1]*, | p|log(z(n)). By convention, gates 1 to n
contain the input bits and gates n+1 to 2n their negations. A circuit family of
unbounded fan-in of size z(n) and depth t(n) is called DTIME(log n)-uniform if
there is a deterministic Turing machine recognizing the direct connection language
LDC in time O(log n). Here, LDC=[(1n, g, {) | gate g has type {] _ [(1n, g, g$) |
g$ is direct predecessor of g] where 2n<gz(n), 1g$z(n), and { # [6, 7]. If
we speak about the direct connection language for circuits of bounded fan-in, this
is defined to be LDC=[(1n, g, {) | gate g has type {] _ [(1n, g, p, g$) # LEC |
| p|=1]. Of main importance here is that the uniformity machine is provided only
with the length of the input word and not the input word itself. Thus, for fixed
input length n, one particular circuit is always constructed.
Clearly, each DTIME(log n)-uniform circuit is also DSPACE(log n)-uniform,
because DTIME(log n)DSPACE(log n). The classes NCk (AC k, respectively)
denote the families of languages recognizable by DTIME(log n)-uniform, polyno-
mial size, O(logk n)-depth bounded circuit families of bounded (unbounded, respec-
tively) fan-in. Recently, Venkateswaran [64] introduced the classes SACk of
languages recognized by ATIME(log n)-uniform, polynomial sized, O(logk n) depth
112 LANGE AND NIEDERMEIER
bounded circuits of semiunbounded fan-in. That is, only OR-gates may have





are well known [37, 38, 64].
At the end of this section, we rephrase some normal forms concerning uniformity
by Ruzzo and by Damm et al.
Lemma 1. (a) [55] For k2 the following uniformity conditions for NCk-cir-
cuit families C=(Cn) are equivalent:
1. LDC # DTIME(log n) (UD-uniformity).
2. LDC # ATIME(log n).
3. LEC # DTIME(log n) (UE-uniformity).
4. LDC # ATIME(log n) (UE*-uniformity).
5. The description of Cn is deterministically computable out of 1n in logarithmic
space (UBC -uniformity).
(b) [20]. For k1, NC k is equal to the class of languages recognized by
bounded fan circuits of polynomial size and depth O(logk n) such that LEC # NC k.
(c) For k2, NCk is equal to the class of languages recognized by bounded
fan circuits of polynomial size and depth O(logk n) such that LDC # NCk.
Proof. (b) Damm et al. [20, Lemma 6] prove this result only for the case
k=1. An essential part of their construction consists in transforming an NC1-circuit
into a balanced complete binary tree of alternating layers of 6 and 7 gates. In an
NCk-circuit, this must be done by decomposing it in layers of depth log n and
transforming each layer into a disjoint family of balanced complete binary trees.
After this step, the rest of their construction goes through.
(c) Follows from part (b), since Ruzzo showed for k2 that LDC # NCk
implies LEC # NCk [55, Lemma 2]. K
A corresponding result holds for circuits of unbounded and of semiunbounded
fan-in. To save space we only sketch the constructions without filling in all the
details.
Lemma 2. (a) For k1, ACk is equal to the class of languages recognized by
unbounded fan-in circuits of polynomial size and depth O(logk n) such that
LDC # ACk.
(b) For k1, SACk is equal to the class of languages recognized by semiun-
bounded fan-in circuits of polynomial size and depth O(logk n) such that LDC # SACk.
113DATA INDEPENDENCE IN PRAM COMPUTATIONS
Proof. (a) Let L be recognized by an unbounded fan-in circuit family
C=(Cn)n1 of polynomial size and depth O(logk n) such that the direct connection
language LDC of C is in ACk. We indicate how to construct a circuit family C$
recognizing L with a direct connection language in DTIME(log n). Since LDC of C
is in ACk, we know that there are DTIME(log n)-uniform circuits answering the
questions ‘‘(1n, _) # LDC ’’ (is gate i an OR-gate?), ‘‘(1n, i, \) # LDC ’’ (is gate i an
AND-gate?), and ‘‘(1n, i, j) # LDC ’’ (is gate j a predecessor of gate i?). The circuit
family C$ is now constructed by replacing in each Cn each gate i for i2n by the
following circuitry:
(i) :=_(1n, i, _) # LDC 7 
n
j=1
(( j) 7 (1n, i, j) # LDC)&
6_(1n, i, \) # LDC 7 
n
j=1
(( j) 6 (1n, i, j)  LDC)& .
In this way, we end up in a DTIME(log n)-uniform ACk circuit recognizing L.
(b) Let L be recognized by a semiunbounded fan-in circuit family C=(Cn)n1
of polynomial size and depth O(logk n) such that the direct connection language
LDC of C is in SACk. We indicate how to construct a circuit family C$ recognizing
L with a direct connection language in DTIME(log n). The direct connection
language LDC of C consists in elements of the form (1n, i, j) , indicating that gate
i is an _-gate and that j is a predecessor of j, and of those of the form (1n, i, 0 or
1, j) , indicating that gate i is an 7-gate and that j is the left or right predecessor
of i. Since LDC of C is in SACk, we know that there are DTIME(log n)-uniform
circuits answering the questions ‘‘(1n, i, j) # LDC ’’ (is gate i an OR-gate with
predecessor j?), and ‘‘(1n, i, 0 or 1, j) # LDC ’’ (is gate i an AND-gate and is j a
predecessor of i?). The circuit family C$ is now constructed by replacing in each Cn












(( j) 7 (1n, i, 1, j) # LDC)+& .
In this way, we end up in a DTIME(log n)-uniform SACk circuit recognizing L. K
Throughout the paper we will denote the class of languages recognized by
bounded fan circuits of polynomial size and logarithmic depth that fulfill
LDC # NC1 by weakly uniform NC 1.
2.2. Parallel Random Access Machines
A PRAM is a set of random access machines, called processors, that work syn-
chronously and communicate via a global shared memory. Each PRAM computation
114 LANGE AND NIEDERMEIER
step takes one time unit regardless of whether it performs a local or a global (i.e.,
remote) operation. We assume the standard definition of PRAMs [36, 38]. All
processors execute in parallel the same sequence of statements S1 , S2 , ..., SK , which
is independent of the input. In fact, allowing conditional jumps for PRAMs only
guarantees a single program, multiple data mode instead of the single instruction,
multiple data mode [5, pp. 142143] which we are assuming here. However, due to
the constant program size of the PRAM it is easy to always achieve the single
instruction, multiple data mode. For ease of presentation, we assume throughout
the paper that each processor has a constant number of local memory cells. This
is no restriction, since we can use global memory instead. Hence, our model of a
PRAM has no indirect addressing of local memory. Let each processor have a con-
stant amount of local memory cells L1 , L2 , ..., LD , and let G1 , G2 , ..., Gq(n) be the
cells of global memory, where n is the length of the input and q is some polynomial.
The input is given bitwise in G1 , ..., Gn . A usual instruction set is shown below. We
do not fix the instructions yet, but stress that it is always of finite size. Subse-
quently, a, b, and c denote some constants, Length denotes the length of the input
n, PIN yields the uniquely determined processor identification number, and NoOp
means no operation. Since the usual comparison of numbers is not an NC0 opera-
tion, we use the relation }> that asks whether the last bit is set to 1 or to 0.
Constants: La :=(constant) , La :=Length or La :=PIN,
Global Write: GLa :=Lb ,
Global Read: La :=GLb ,
Local Assignment: La :=Lb ,
Conditional Assignment: if La }>0 then (assignment) ,
Binary Operation: La :=Lb b Lc ,
Jumps: goto Sa or if La>0 then goto Sb ,
Others: if La }>0 then halt or if La }>0 then NoOp.
A PRAM A is determined or described in the form of its program which is a fixed
sequence S1 , ..., SK of instructions. We will use this instruction set freely. For
instance, we will use abbreviations like ‘‘perform the following loop O(logk n)
times’’ and leave it to the reader to realize this on a PRAM.
All PRAMs in this paper do not use more than a polynomial number of pro-
cessors. In order to get a reasonable hardware cost measure for PRAMs, we require
that they have (as usual) logarithmically bounded word length. This means that a
PRAM working on inputs of length n generates and uses only numbers of size poly-
nomial in n. For the sake of simplicity of presentation, we use PRAMs only to
accept languages and not to compute functions. The contents of global memory cell
G1 determines acceptance or rejection at the end of the computation.
We consider mainly two types of write access to global memory. A machine with
Concurrent Write access allows simultaneous writing of several processors into the
same memory cell. We assume that the value of a writer with highest priority is
actually stored (Priority-CRCW-PRAM). A machine with owner write access is
more restricted by assigning to each cell of global memory a processor, called
115DATA INDEPENDENCE IN PRAM COMPUTATIONS
write-owner, that is the only one allowed to write into this memory cell [24]. More
common than the owner concept in formulating algorithms is exclusive access,
where we only require that for each point of time there is at most one processor
writing into a cell. Exclusive write PRAMs are intermediate in computational
power between owner write and concurrent write PRAMS and the same holds for
read access. While the owner and the concurrent concept are closely related to
determinism and nondeterminism [24, 42, 59], the concept of exclusiveness
corresponds to unambiguity [41, 42, 48], which explains the inconstructive features
of this concept. Correspondingly, we get two ways to manage read access: Con-
current Read and Owner Read. In this way, we get four versions of PRAMs,
denoted as XRYW-PRAMs with X, Y # [O, C], XR specifying the type of read
access, and YW that of the write access.
Denote the class of languages recognizable in time O( f (n)) by XRYW-PRAMs
with a polynomial number of processors by XRYW-TIME( f (n)). For XRYW-






In CRCW-PRAMs, global memory behaves like a shared memory, since each
processor can access each cell of global memory. In the most restricted model, the
OROW-PRAM, however, the global memory is deteriorated to a set of one-direc-
tional channels between pairs of processors. Thus, an OROW-PRAM is something
like a completely connected network. Although this model seems to be much more
restricted than CRCW-PRAMs, the relation
NC kOROW kCROWkSACk
CRCWk=AC kNC k+1
indicates that it is a model ‘‘as parallel as’’ a CRCW-PRAM. With respect to the
implementation of algorithms on existing parallel machines, results of this work
demonstrate that even OROW-PRAMs are a parallel model that in some sense are
still too powerful. That means algorithms efficiently realizable on OROW-PRAMs
still lack some of the features (that is, restrictions) that are necessary for an efficient
implementation on distributed memory machines.
2.3. Parallel Pointer Machines
Parallel pointer machines (PPMs) were introduced by Cook [15] and are studied
in several papers [18, 22, 23, 34, 40]. In earlier papers [15, 22, 34, 40] the PPM
is called the hardware modification machine (HMM). A PPM consists of a finite
116 LANGE AND NIEDERMEIER
collection of finite state transducers, which are called unit. Each unit is connected
to a constant number of other units (points to other units). The units operate syn-
chronously. Each unit receives a constant number of input symbols from other units
via the pointer connection, produces according to the inputs read and the current
state a constant number of outputs, and changes its state according to the tran-
sition function, which is the same for all units. In each step, a unit may modify its
pointers to other units. That is, it may change its pointers to show to units that are
reachable via pointers by paths of length at most two. Initially, a single unit U0 is
the only one active, starting in some state q0 . At each time step an active unit may
activate another one. An input word w is accepted by a PPM if U0 enters an
accepting state. A PPM accesses an input word w in the following way. The starting
unit U0 points to the root of a fixed, complete binary tree of special units that do
not count for the hardware costs of the PPM. The input word w is stored from left
to right in the leaf nodes of the tree. Leaf nodes point to their neighboring leaf
nodes and to a parent node. Each inner node of the tree has pointers to its parent
and its two children nodes.
An essential property of PPMs according to Lam and Ruzzo [40] is that they
formally capture the notion of parallel computation by pointer manipulation. In
addition, PPMs, in contrast to, e.g., PRAMs, have the advantage that the unit of
hardware is a finite state transducer of constant size [18, 22]. So, PPMs are a
parallel model less powerful than PRAMs and have the potential to be a more
realistic model for existing parallel computers than PRAMs are. By PPM-
TIME(t(n)), we denote the class of languages recognizable in time O(t(n)) by
PPMs using a polynomial amount of hardware, i.e., a polynomial number of units.
We write PPMk for PPM-TIME(logk n).
Lam and Ruzzo [40] showed that PPMs and an arithmetically restricted form
of CROW-PRAMs (rCROW-PRAMs for short) are equivalent. More precisely,
arithmetic restriction means that the arithmetic capabilities of the CROW-PRAM
are limited to incrementation (‘‘+1’’) and doubling (‘‘ V 2’’). Recently, Dymond et
al. [23] demonstrated that any step-by-step simulation of a full n-processor
CROW-PRAM by a PPM using an arbitrary number of processors requires time
3(log log n) per step. This strongly suggests a separation between CROW-PRAMs
and PPMs and thus of LOGDCFL and DSPACE(log n), because CROW-
TIME(log n)=LOGDCFL [24] and DSPACE(log n)PPM1 [18, 22].
3. DATA INDEPENDENCE AND SIMPLE PRAMS
Motivated by the problem of characterizing the class of problems that are
efficiently implementable on existing, asynchronous parallel machines with dis-
tributed memory, the criterion of data independence has been considered in an
informal way [29, 39, 58]. The underlying idea is the fact that an algorithm with
a simple, data-independent communication pattern can be easier partitioned and
desynchronized at compile time than one with a more dynamic behavior. Vishkin
and Wigderson [66] studied the prospects of data independence in the context of
reducing the size of global memory used during a PRAM algorithm. Cook et al.
117DATA INDEPENDENCE IN PRAM COMPUTATIONS
[17] considered oblivious (i.e., data-independent) and semioblivious CREW-
PRAMs in order to prove lower bounds for various simple functions, including
sorting n keys and finding the logical OR of n bits.
In order to formally introduce data independence, it is first of all necessary to for-
malize notions like communication pattern or dynamic behavior. We distinguish
between three aspects of dynamic, input-dependent behavior:
(i) flow of control,
(ii) read access (or the receipt of messages), and
(iii) write access (or the sending of messages).
Data independence of control means that the statement executed by a processor
of a PRAM depends on the time, the processor identification number, and the
length of the input only, but not on the input itself. If we knew the control flow of
each processor in advance, we could determine every direct read and every direct
write. In order to determine indirect reads and writes, we need to know the content
of the participating indexing register. That is why we are mainly interested in
indirect reads and indirect writes.
Before we come to the formal definitions of these three aspects, we have to
separate the control aspect from the communication aspect. Consider the following
conditional assignment.
(V) S+ : if La }>0 then GLb :=Lc ;
S++1 : } } }
It is possible to simulate the conditional assignment S+ with the help of conditional
jumps. The following sequence of instructions has the same effect as (V).
(VV) S+ : if La }>0 then goto S++23 ;
S++13 : goto S++1
S++23 : GLb :=Lc ;
S++1 : } } }
In (VV) the problem of whether the indirect write takes place is a question whether
the control structure, that is, the index of a statement executed at a certain point
of time, is data-dependent. It depends on the value of La whether the PRAM
executes S++13 or S++23 . On the other hand, (V) has a data-independent control
structure. Thus, (V) transfers this question into the communication structure. In
order to clearly separate communication and control aspects, we will handle those
cases always in the manner of (V) and not in that of (VV). So, we always get data-
independent control structures in the following.
Definition 3. Let A be a T(n) time bounded PRAM with p(n) processors and
a program of length K. For any input w of length n, we consider the following sets,
where 1i, jp(n), 1tT(n), and 1+K:
118 LANGE AND NIEDERMEIER
(a) By the control structure CSA and the execution structure ESA , we refer to
the flow of control of A:
CSA(w) :=[(1n, t, i, +) | in step t processor i executes statement +],
ESA(w) :=[(w, t, i, +, v) | in step t processor i executes statement +
and if + is a conditional assignment, then v contains the truth value
of the condition, and contains true, otherwise].
(b) By the read structure RSA , the write structure WSA , and the semiwrite
structure SWSA , we refer to the communication structure of A:
RSA(w) :=[(1n, t, i, j) | in step t processor i executes a (conditional)
indirect read assignment of the form ‘‘(if Lc }>0 then) La :=GLb ’’
(Lc }>0 is true) and Lb contains value j],
WSA(w) :=[(1n, t, i, j) | in step t processor i executes a (conditional)
indirect write assignment of the form ‘‘(if Lc }>0 then) GLa :=Lb ’’
(Lc }>0 is true) and La contains value j],
SWSA(w) :=[(1n, t, i, j) | in step t processor i executes a (conditional)
indirect write assignment of the form ‘‘(if Lc }>0 then) GLa :=Lb ’’
and La contains value j].
(c) A structure XSA , X # [C, R, W, SW], is called data independent if for all









Observe that the elements of ESA contain the complete input while the other
structures only have information on the length of this input.
When we speak of communication structure, we address both the read and the
write structure. Note that the only difference between semiwrite and write structure
is that in the latter we know whether the if-part of a conditional assignment
evaluates to true. Formally, we have the set inclusions WSA(w)SWSA(w). There
is a close connection between semi-write structures and what Cook et al. [17] call
semioblivious PRAMs. For semioblivious PRAMs, whether or not a processor
writes into a cell may also only depend on the input. We close this subsection with
119DATA INDEPENDENCE IN PRAM COMPUTATIONS
a fundamental problem of parallel algorithmics and exemplify herein the notions of
Definition 3.
Example 1 (Pointer Jumping, List Ranking). Let us have two arrays S[1 } } } n]
of successor and P[1 } } } n] of predecessor nodes describing a set of acyclic chains for
nodes in [1, ..., n]. Assume that each node is a member of a chain beginning in
some starting node that is marked by P[i]=i and ending in some final node that
is marked by S[ j]= j. That is, we have that if k{l, then S[k]=l  P[l]=k. The
task is to determine for each node both the final node in the chain of its successors
and the first node in the chain of its predecessors. There are intricate algorithms
that solve this problem in optimal O(log n) steps on a PRAM with O(nlog n) pro-
cessors [6, 13]. To illustrate the notion of data independence, we sketch two simple
algorithms that use O(n) processors:
(a) Assign to each index 1in two processors QSi and Q
P
i that execute
log n times S[i] :=S[S[i]] resp. P[i] :=P[P[i]]. Both the control structure and
the write structure of this algorithm are data independent. On the other hand, we
use the inputs S[i] and P[i] as index values, i.e., addresses, and thus the read
structure is data dependent.
(b) Another possibility is to use a variation of Rossmanith’s OROW algo-
rithm [54]. Its underlying idea is that now QSi and Q
P
i execute log n times the
statements S[P[i]] :=S[i] resp. P[S[i]] :=P[i]. Here, both the control struc-
ture and the read structure are data independent, whereas the write structure is
data dependent.
Above we solved the pointer jumping problem either with a data-independent
read or with a data-independent write structure. To give a logarithmic time algo-
rithm with data-independent read and write structure would mean a major
breakthrough in complexity theory, because (as will be proved in the next section)
as a consequence we would have NC 1=DSPACE(log n), and thus ATIME(n)=
DSPACE(n).
If we assume, however, that the input for the list ranking problem is given in a
different way, namely in the form of an adjacency matrix instead of a pointer list,
we can obtain a logarithmic time algorithm that has data-independent control,
read, and semi write structure.
Example 1 (continued). Now assume that global memory cell G(i, j) initially
contains the value true if there is a connection from node i to node j within the list
and contains the value false otherwise. We use the repeated squaring technique. The
processor whose PIN is (i, k, j) mainly repeats a logarithmic number of times an
instruction
if G(i, k) 7 G(k, j) then G(i, j) :=true.
After that, for each node we may easily determine whether there are connections to
the initial or the final node of the list.
120 LANGE AND NIEDERMEIER
We get, however, the additional data independence of the semiwrite structure at
the expense of a cubic instead of a linear number of processors. On the other hand,
the above algorithm is monotonic in the sense that a value true of a global cell is
never overwritten by a value false. If we allow that a value true is overwritten by
a false, then, together with the feature of conditional writes where only the value
of the condition is data dependent, we can already simulate ACk-circuits. This will
be important when we consider PRAMs with OR write conflict resolution [4]
instead of CRCW-PRAMs with priority write conflict resolution later on.
Finally, for technical reasons we have to introduce two PRAM properties that
restrict the power of PRAMs, leading to so-called simple PRAMs. Investigating
Stockmeyer and Vishkin’s [59] proof of equivalence between CRCW-PRAMs and
circuits of unbounded fan-in, we show that each language in ACk can be accepted
by such simple CRCW-PRAMs in time O(logk n). This result is important for
considerations in the next section.
Definition 4. We call a PRAM simple if its instruction set fulfills two restric-
tions:
M: All operations f that modify data are monadic, i.e., of the form
La :=f (Lb).
S: All operations are computable by a DTIME(log n)-uniform, bounded
fan-in circuit of constant depth. We call these operations NC0-computable or
simple.
4. CHARACTERIZING COMPLEXITY CLASSES BY
DATA-INDEPENDENCE
In this section we will develop characterizations of the complexity classes ACk,
NCk, DSPACE(log n), LOGDCFL, PPMk, and SACk in terms of the structural
sets introduced in Definition 3.
Theorem 5. For k1, the following statements are equivalent:
(a) L # CRCW-TIME(logk n),
(b) L # AC k,
(c) L can be recognized by a simple CRCW-PRAM A in time O(logk n), A has
data-independent control, read, and semiwrite structures, and CSA , RSA , and SWSA
are in DTIME(log n).
(d) L can be recognized by a simple CRCW-PRAM A in time O(logk n), the
control, read, and semiwrite structure of which all are data-independent.
Proof. The implication from (a) to (b) is given by Stockmeyer and Vishkin
[59]. Observe that the constructed ACk-circuits are DTIME(log n)-uniform. The
implications from (c) to (d) and from (d) to (a) are trivial. Thus, it remains to
prove that (b) implies (c), Assume that L is recognized by an ACk-circuit. The basic
idea of the proof is to look at Stockmeyer and Vishkin’s simulation of an ACk-cir-
cuit by a CRCW-PRAM [591. We first refer to the simulation and then show that
121DATA INDEPENDENCE IN PRAM COMPUTATIONS
the simulation can be performed by a simple CRCW-PRAM A having the desired
properties.
For the actual simulation of an ACk-circuit, we deal with two parts. First,
assume that we are given a pointer structure in global memory representing the
interconnection structure of the circuit. The pointer structure permits the usual
simulation of a circuit of unbounded fan-in [59]. With each wire of C, there is
associated a processor P. Processor P gets the addresses of two global memory cells
representing the source and the sink of a wire corresponding to P. The gist of the
simulation of C is that each P asks O(logk n)-times the value of its source and
correspondingly updates the value of its sink. This can be done alone with global
reads and writes and a conditional assignment.
How can a simple CRCW-PRAM construct a description of the circuit in global
memory? We simulate the uniformity machine locally in each processor of the
PRAM such that after the simulation the local memory registers of the processors
contain the addresses of the source and the sink gate of the interconnection wire
represented by the processor. Since the uniformity machine only works on the input
length n as its input and since a processor of a simple PRAM can simulate in
logarithmic time a DTIME(log n)-TM making use of its constantly many local
registers of logarithmic word length, this first point follows. Observe that the
successors of TM-configurations (represented by the PINs of processors) can be
computed with NC0-operations [8, Vol. I, pp. 110115].
The above shows that the simulation can be done by a simple CRCW-PRAM. It
remains to be shown that the simulating simple CRCW-PRAM fulfills the claimed
structural restrictions with respect to data independence.
The simulation of the uniformity machine of the circuit only depends on length
n of input word w. The actual simulation of an ACk-circuit by a simple CRCW-
PRAM essentially consists of repeating O(logk n) times an instruction of the form
‘‘if Gi }>0 then Gj :=1,’’ which can be simulated by ‘‘La :=G i ; if La }>0 then
Gj :=1.’’ So, the control flow of the simulating PRAM does not depend on the
input word w except for its length n. The indirect reading of Gi in the above instruc-
tion is also done independent of waddress i does not depend on w. The only thing
that depends on w is whether the write on Gj takes place.
The above observations show that the simulation of an AC k-circuit can be done
with data-independent control and read structures. The only thing depending on
the concrete input word is whether the if-part of a conditional global write instruc-
tion evaluated to true or false. Thus, we also get a data-independent semiwrite
structure. Due to the simplicity of the actual circuit simulation, we immediately
have CSA , RSA , SWSA # DTIME(log n) for this part of the simulation. The global
read and write addresses i and j above are determined by the uniformity machine,
which is a DTIME(log n)-machine. The flow of control can be computed by a
DTIME(log n)-machine because it consists mainly of a loop repeated O(logk n)
times. Thus, given (1n, t, i, l), looking at a constant number of the last bits of the
binary representation of t completely determines the statement number currently
executed.
So, it remains to consider the construction of the circuit in global memory of
PRAM A, that is, the simulation of the circuit’s uniformity machine by A. Because
122 LANGE AND NIEDERMEIER
the simulated uniformity machine is a DTIME(log n)-machine and because of
similar considerations as before, we also have CSA , RSA , SWSA # DTIME(log n)
for the construction phase. The data independence of the control, read, and semi-
write structures for this phase follows from the data-independent definition of a
uniformity machine. K
The next theorem yields the first characterization of NCk in terms of PRAMs.
Recently, Regan [52] gave another characterization of NCk by a parallel vector
model, using a quite different approach.
Theorem 6. For k2, the following statements are equivalent:
(a) L # NCk;
(b) L can be recognized by a simple CRCW-PRAM A in time O(logk n), A has
data-independent control, read, and write structures, and CSA , RSA , and WSA are in
DTIME(log n).
(c) L can be recognized by a simple CRCW-PRAM A in time O(logk n), A has
data-independent control, read, and write structures, and CSA , RSA , and WSA are in
NCk.
Proof. (a) implies (b): As in the proof of Theorem 5 we simulate a circuit C
on an input w of length n. Since C is DTIME(log n)-uniform, we know that its
extended connection language LEC # DTIME(log n). Assume C to have p(n) gates,
where gates numbered 1, ..., 2n contain the input w and its bitwise complement.
In a first phase, for each { # [6, 7] and all 2n<g, g$, g"p(n), it is checked
whether { is the type of gate g, g$ is the left predecessor of g, and g" is the right
predecessor of g, i.e., whether (1n, g, {) , (1n, g, 0, g$), and (1n, g, 1, g") are
elements of LEC . This can be done locally without any communication deterministi-
cally in logarithmic time. In a second phase, the results of the first phase are col-
lected and combined such that for 2n<gp(n) processor Pg locally knows the type
of gate g and the indices g$ and g" of the two predecessors of gate g. These two
phases use n, the length of the input w, but do not read the input and hence are
data-independent. Furthermore, they can be done in a very regular way, for
instance, using a complete binary tree of depth O(log( p(n)))=O(log n) to collect
and compare the data, in such that in such a way that CSA , RSA , and WSA are
elements of DTIME(log n).
In a third phase, for each 2n<gp(n), let processor Pg for some constant c
execute c logk n times the following three instructions.
1. La :=Gg$ ,
2. Lb :=Gg" ,
3. Gg :=La op Lb ,
where op depends on the type of gate g. Then CSA of this phase is obviously in
DTIME(log n). Also WSA # DTIME(log n) since (1n, t, i, j) # WSA if and only if
2n<i= jp(n) and t is divisible by 3. Finally, (1n, t, i, j) # RSA if either t#1
mod 3 and (1n, i, 0, j) # LEC , i.e., j is the left predecessor of i, or t#2 mod 3 and
123DATA INDEPENDENCE IN PRAM COMPUTATIONS
(1n, i, 1, j) # LEC , i.e., j is the right predecessor of i. Since LEC # DTIME(log n), we
get RSA # DTIME(log n).
(b) implies (c): trivial
(c) implies (a): We will now construct a circuit family C=(Cn)n1 recogniz-
ing L(A). The uniformity of this construction will be done by a circuit as well.
Hence, in the following we will speak about the uniformity circuit U and the (main)
circuit C. To simulate a PRAM by a circuit, we work with recursive constructions
similar to those in several other papers [2426, 28, 41, 48]. We consider functions
Global and Locala , stating
(i) Global(t, i)= j  global memory cell i contains after step t value j, and
(ii) Locala(t, p)= j  local memory cell a of processor p after step t contains
value j.
To each such function we assign in C a bunch of logarithmically many gates that
represent its value. The main work is hidden in the interconnection structure of the
circuit and is done by the uniformity circuit.
We go through the instructions of the PRAM (see Subsection 2.2 for the underly-
ing instruction set) and show how to compute Locala and Global for all possible
cases. Because the central ideas apply to several similar contexts, we only present
the typical and most difficult cases.
Computation of Global(t, i). To compute the interconnection structure for
Global(t, i)-gates, the uniformity circuit U determines in depth O(logk n) for each
processor p whether (1n, t, p, i) # WSA in parallel. If no such p is found by U, then
each bit of Global(t, i) is connected with Global(t&1, i). Otherwise, U finds with
additional depth O(log n) and thus in total depth O(logk n+log n)=O(logk n) the
p which is minimal among all p with (1n, t, p, i) # WSA . In this way, we simulate
the priority way of solving write conflicts. After that, U determines the unique +
such that (1n, t, p, +) # CSA . Now U knows that statement S+ executed by pro-
cessor p at time t, is either ‘‘GLa :=Lb ’’ or ‘‘if Lc }>0 then GLa :=Lb ,’’ where Lc }>0
is true. In either case we connect Global(t, i) with the gates representing
Localb(t&1, p).
Computation of Locala(t, ). To compute Locala(t, p), we first let U determine
the uniquely existing + such that (1n, t, p, +) # CSA . Let S+ be the statement
executed at time t by processor p. We must consider several cases.
1. S+ #‘‘La :=GLb ’’: Here U searches for the uniquely existing index j such
that (1n, t, p, j) # RSA . Then U connects Locala(t, p) with Global(t&1, j).3
2. S+ #‘‘if Lc }>0 then La :=Lb ’’: The uniformity circuit builds two inter-
mediate bunches of gates. Each gate of Localb(t&1, p) is conjoined with the last bit
of Localc(t&1, p), giving the bunch Local +cb (t&1, p). This coincides with
Localb(t&1, p), if Localc(t&1, p) }>0, and is (0, ..., 0) , otherwise. Corre-
spondingly, U conjoins each gate of Locala(t&1, p) with the negation of the last bit
124 LANGE AND NIEDERMEIER
3 We simulate the conditional global read statement if Lc }> then La :=GLb by the two instructions
Ld=GLb ; if Lc }>0 then La :=Ld . This trick does not work for conditional global write statements.
of Localc(t&1, p), yielding the bunch LOCAL&ca (t&1, p). This coincides with
Locala(t&1, p), if not Locala(t&1, p) }>0, and is (0, ..., 0) , otherwise. Eventually,
U connects Locala(t, p) with the bitwise disjunction of Local &ca (t&1, p) and
Local+cb (t&1, p).
3. S+ #‘‘La :=f (Lb)’’: We know that function f is computable in NC0. Thus
U connects Localb(t&1, p) with the input gates of the NC 0-circuit for f and lets the
outputs of this circuit be Locala(t, p).
4. Other cases: If, for example, S+ #‘‘La :=PIN,’’ then U connects
Locala(t, p) to the binary coding of p. The other cases, ‘‘La :=Length,’’
‘‘La :=(constant) ,’’ and ‘‘La :=Lb ,’’ are also simple. If S+ is a statement where we
have no assignment to La , then U connects Locala(t, p) to Locala(t&1, p).
In total, we end up with a circuit family C=(Cn)n1 of depth O(logk n).
Obviously, the direct connection language LDC(C ) is an element of NCk. By a
result of Ruzzo [55, Lemma 2], this implies LEC(C ) # NCk. Applying Lemma 1, we
finally get L(A) is in NCk. K
Remark 7. Observe that in the construction of the PRAM simulating the circuit
C we only made queries concerning the direct predecessors of gates. Thus, for k=1
the constructions show (a) how to simulate circuits of logarithmic depth with
LDC # DTIME(log n) by fully data-independent PRAMs and (b) how to simulate
fully data-independent PRAMs by circuits of logarithmic depth with LDC # NC1,
i.e., by weakly uniform NC1. We note that it is possible to come down to
DTIME(log n) if we would work with functional uniformities where we demand on
the one hand that for a gate g its left and right predecessor can be computed deter-
ministically in logarithmic time and, on the other hand, for the communication
structures that for index i and time t the processor p writing into i (if existent) can
be computed in DTIME(log n) and for time t and processor p the cell i from which
p reads (if existent) can be computed in DTIME(log n).
Theorem 6 naturally leads to the question of whether it is at all necessary that,
besides being data independent, the complexity of the sets CSA , RSA , WSA has to
be restricted in the given way. Clearly, without any such assumptions, we have that
CSA , RSA , WSA are contained in AC k, because the running time of the considered
PRAM A is O(logk n). Thus, by the construction of ‘‘(c) implies (a)’’ given in the
above proof, we would only get that L belongs to ACk-uniform NC k where ACk-
uniform means that the extended connection language of the circuit family is ACk.
However, the construction of ‘‘(a) implies (b),’’ together with the constructions of
Damm et al. [20], give the converse. Hence, we have the following.
Corollary 8. L # ACk-uniform NC k if and only if L can be recognized by a
simple CRCW-PRAM A in time O(logk n), such that A has data-independent control,
read, and write structures.
The last result shows how essential the complexity assumptions of the control
and communication structures in Theorem 6 are. For example, by padding
arguments the equality of AC1-uniform NC 1 and NC 1 would imply ATIME(n)=
DSPACE(n), which would be a major surprise. In summary, the above consideration
125DATA INDEPENDENCE IN PRAM COMPUTATIONS
shows that the assumptions in Theorem 6, i.e., CSA , RSA , WSA , belong to NCk
(or, equivalently (cf. Lemma 1), to DTIME(log n)), appear to be necessary and
cannot be omitted.
Let us briefly recapitulate the fundamental difference between the characteriza-
tions of ACk and NCk. For NC k, we had to ‘‘fix everything.’’ Neither the read nor
the write structure are allowed to be data-dependent. By way of contrast, for ACk
‘‘everything is free.’’ Both read and write structure may be data-dependent, but it
is not necessary to allow so much in order to get ACk. As we saw in Theorem 5,
we can even demand for data-independent read and semi write structures.
Now it is only natural to ask what happens if we do not allow data-dependent
writes, but data-dependent reads instead. Does that also suffice to get ACk? No. Data-
dependent reads are only enough for a characterization of DSPACE(log n). This is a
concrete indication that in parallel computations, writing is more powerful than reading.
Theorem 9. L # DSPACE(log n) if and only if L is recognized by a simple
CRCW-PRAM A in O(log n) time, A has data-independent control and write struc-
tures, a data-dependent execution structure, and CSA , WSA , and ESA are in
DTIME(log n).
Proof. if: Again the simulation of PRAM A works with recursive constructions.
We use functions Global(t, i) and Locala(t, p), where Global(t, i)= j if global
memory cell i contains after step t value j, and Locala(t, p)= j if local memory cell
a of processor p contains after step t value j. The main idea is to compute the values
of Global and Locala by a recursion of logarithmic depth that stacks only items of
constant length. Thus, we can keep the stack on the logarithmically bounded work-
ing tape. The working tape of the simulating, logarithmically space bounded Turing
machine M is organized as follows: First, M has a stack of logarithmic depth that
stores statement numbers and certain markings concerning the progress of the
simulation. The number of statements is bounded by a constant, and thus the stack
fits onto the working tape. Then, M has space to store the parameters (step num-
ber, cell number, processor number) and the intermediate result of the last recursive
call. We proceed in the same way as in the proof of Theorem 6. We begin with the
computation of Global(t, i). Remember that a cell of global memory can only be
affected by indirect writes.
Computation of Global(t, i). To find out the value of Global(t, i) in logarithmic
space, M exhibits whether there exists a p such that (1n, t, p, i) # WSA . This is
possible due to the obvious inclusion DTIME(log n)DSPACE(log n). If there is
no such p, then M knows that no processor tried to write into Gi and M recursively
computes Global(t&1, i) by stacking a symbol ‘‘no write.’’ Otherwise, M computes
the unique statement number + such that (1n, t, p, +) # CSA . Statement S+ must be
either of the form ‘‘GLa :=Lb ’’ or ‘‘if Lc }>0 then GLa :=Lb ,’’ and in the latter case
we have Lc }>0. So, M recursively computes Localb(t&1, p), stacking the statement
number +.
Computation of Locala(t, p). To compute Locala(t, p), M first determines an
index + such that (w, t, p, +, v) # ESA . The recursion is now guided by the type of
statement S+ .
126 LANGE AND NIEDERMEIER
1. S+ #‘‘La :=GLb ’’: The simulating machine M goes into the recursion by
computing Localb(t&1, p) and stacking index +, which is marked as ‘‘undone.’’
When M returns from the recursion with a result j=Localb(t&1, p), it recognizes
the stack entry + to denote an indirect read. Thus, M transfers j on the parameter
place. Then M continues with the computation of Global(t&1, j) and + is
unmarked. Should M later on return from a higher level of recursion, it will pass
this level and simply hand through the result, popping entry +.
2. S+ #‘‘if Lc }>0 then La :=Lb ’’: Since ESA also provides the value v (true or
false) of the if-part, M simply does the following: If v evaluates to true, then go into
the recursion Localb(t&1, p), and go into the recursion Locala(t&1, p), otherwise.
3. All the other cases can be led back to the above two or are handled
similarly as in the proof of Theorem 6.
Observe that it was for statements of form ‘‘if Lc }>0 then La :=Lb ’’ where we
made decisive use of ESA # DTIME(log n). By this we could overcome the problem
that this kind of instruction is not monadic and would cause a branching of the
recursion, otherwise.
Only if: This inclusion follows along the lines of the proof of Theorem 5. The
outline of the simulation of a DSPACE(log n)-machine M is as follows: First, the
PRAM A generates all possible configurations of M, second, it computes the suc-
cessors of all configurations (thus computing the configuration graph), and third it
does pointer jumping to find out whether M accepts or rejects the input. Note that
the configuration graph of M has outdegree one and by the standard ‘‘clocking
trick’’ we may assume that it is acyclic.
As in the proof of Theorem 5, the first step of the above outline is done by inter-
preting the PIN of a processor as a configuration of M. We assume A to be
equipped with the following three NC 0-computable operations: Headpos(i) which
computes out of a configuration i the position of the input head, Next0(i) which
computes the successor configuration of i if the input bit under the input head is
a 0, and Next1(i) which computes the successor configuration of i if the input bit
under the input head is a 1. The program of A looks as follows:
1. La :=Headpos(PIN);
2. Lb :=Next0(PIN);
3. if GLa }>0 then Lb :=Next1(PIN);
4. GPIN :=Lb ;
After that A performs c log n times the loop:
5. Lb :=GLb ;
6. GPIN :=Lb ;
The control structure of this program is CSA=[(1n, t, i, t) | 1t5] _
[(1n, t, i, +) | 5<t4+c log n, +=5+(t&5) i mod 2]. The write structure is
even simpler: WSA=[(1n, 4+2j, i, i) | 1 jc log n2]. Finally, the execution
structure of A is ESA=[(w, t, i, +, true) | (1 |w|, t, i, +) # CSA , t{3] _
[(w, 3, i, 3, v) | v=true  the j th bit of w is 1], where in the last expression
j :=Headpos(i). Obviously, these three sets are in DTIME(log n). K
127DATA INDEPENDENCE IN PRAM COMPUTATIONS
Remark 10. (a) Theorem 9 is only stated for a logarithmic running time and
not for the general cases of the form O(logk n). The reason is that we know of no
analogue of DSPACE(log n) in the higher levels of the NC-hierarchy. We can inter-
pret the theorem as offering us this missing link in the form of the class of all
languages recognized by PRAMs in time O(logk n) which have CSA , WSA , and ES
in DTIME(log n).
(b) Corresponding to Theorem 6 we could afford to let the sets CSA , WSA ,
and ESA to be members of DSPACE(log n) without strengthening the computa-
tional power.
(c) Theorem 9 implies that if there was a completely data-independent algo-
rithm for the DSPACE(log n)-complete list ranking problem, then we had the
equality of NC1 and DSPACE(log n) and hence of ATIME(n) and DSPACE(n).
Reviewing the fundamental properties of the characterizations of ACk, NC k, and
DSPACE(log n), it appears that the essential differences lie in the distinct com-
munication possibilities with respect to data (in) dependence. Unbounded fan-in
parallelism allows for data-dependent writes, bounded fan-in parallelism restricts
reads and writes to be data-independent, and DSPACE(log n) allows for data-
dependent reads, but demands for data-independent writes.
In the above proof it is possible to drop the requirement that the operations of
the CRCW-PRAM be NC 0-computable. We can allow operations computable in
logarithmic space. On the other hand, it is essential that all operations are monadic,
because this leads to the linear recursion structure. If we drop the requirement for
monadic NC0-computable operators and further on do not require ESA #
DTIME(log n), then, among others, we get LOGDCFL, the class of languages
logspace many-one reducible to deterministic context-free languages [60].
Theorem 11. For each k1 the following statements are equivalent:
(a) L is recognized in time O(2logk n) by a deterministic auxiliary push-down
automaton equipped with a work tape of size O(log n).
(b) L is recognized by a CROW-PRAM in time O(logk n).
(c) L is recognized by a CRCW-PRAM A with standard PRAM operation set4
in O(logk n) time, A has data-independent control and write structures, and CSA and
WSA are in DTIME(log n).
Proof. (a) implies (b): This was shown by Dymond and Ruzzo for the case
k=1 in [24]. Their algorithm was extended by Fernau et al. to arbitrary k [25].
(b) implies (c): The constructions that simulate deterministic push-down
automata by CROW-PRAMs lead to algorithms with an extremely simple control
and write structure [24, 25]. A simple loop has to be executed O(logk n) times in
which each processor p executes conditional assignments to local registers or
unconditional assignments to global cells which are write-owned by p, i.e., only p,
is allowed to write into these cells. Given an index i in global memory, the write
128 LANGE AND NIEDERMEIER
4 More precisely, an operation set as used for CROW-PRAMs by Dymond and Ruzzo [24] suffices.
owner of i is easily computable. CSA , WSA # DTIME(log n) is easily checked for
these algorithms.
(c) implies (a): We use the same recursion as in Theorem 9. But now we aug-
ment the DSPACE(log n) Turing machine with an auxiliary push-down store (thus
yielding a so-called auxiliary push-down automaton [14]), since the recursion is no
longer linear. More precisely, the data-flow of the recursion is no longer linear.
Since each recursive predicate Global and Locala is of constant ‘‘branching-width,’’
the total amount of recursion calls is bounded exponentially in the time of the
PRAM. Observe that now the use of conditional assignments of the form ‘‘if Lc }>0
then La :=Lb ’’ does not require any more ESA # DTIME(log n), because a branch-
ing of the recursion no longer needs to be avoided.
Remark 12. (a) For k=1 we get a characterization of the complexity of deter-
ministic context-free languages since LOGDCFL is precisely the class of languages
accepted in polynomial time by deterministic auxiliary push-down automata
equipped with logarithmic work tapes [60].
(b) In the above proof it is crucial that global writes of the PRAM are data
independent. Data-dependent write instructions in the case k=1 would lead to a
recursion requiring running time nO(log n) for the simulating AuxPDA. However,
such an AuxPDA can already simulate AC1-circuits [55].
Next, we come to the characterization of parallel pointer machines or, equiv-
alently, rCROW-PRAMs [40]. Cook and Dymond [18, 22] showed the inclusion
DSPACE(log n)PPM-DTIME(log n). Dymond et al. [23] recently proved that
any step-by-step simulation of CROW-PRAMs by PPMs needs time 3(log log n)
per step. In our setting the difference appears in the requirement for simple PRAMs
(similar to Lam and Ruzzo [40]) in the case of PPMs, whereas for the CROW-
PRAMs the operation set is unrestricted (compare Theorem 11 with Theorem 13).
Theorem 13. For k1, we have L # PPM-TIME(logk n) if and only if L is
recognized by a simple CRCW-PRAM A in time O(logk n), A has data-independent
control and write structures, and CSA and WSA are in DTIME(log n).
Proof. if: For this direction we make decisive use of Lam and Ruzzo’s [40]
equality PPM-TIME(logk n)=rCROW-TIME(logk n). We perform a simulation of
the simple CRCW-PRAM A, which is data independent in the above required way
by an rCROW-PRAM. The only subtle point herein is the question how to convert
the concurrent write of A into the owner write of the rCROW-PRAM. Here we
make use of the restricted write structure of A.
We proceed in two main steps. First, we demonstrate how a concurrent write can
be simulated by an rCROW-PRAM with time-dependent owner function. Second,
we explain how to convert the latter into a time-independent one.
Let us turn to the first step. By the help of WSA # DTIME(log n), for each point
of time t, for each processor p, and for each global memory cell i of the CRCW-
PRAM, the simulating rCROW-PRAM finds out whether p writes into i at time t.
Afterward, the rCROW-PRAM determines for each t and each i some p writing
into i at t. This p then is the write-owner of i at time t. This information is stored
129DATA INDEPENDENCE IN PRAM COMPUTATIONS
in a look-up table. An rCROW-PRAM does all these computations in time
O(log n), using no more than a polynomial number of processors.
Now it remains to explain how to convert an rCROW-PRAM with time-depend-
ent write owners into one with time-independent ones. The basic idea is to replace
the various time-dependent write-owners by only one fixed write-owner that com-
municates with the respective (time-dependent) write-owners and then writes by
itself the value the previous write-owner wanted to write. To do that, this particular
write-owner has to know for each point of time the original write-owner. Here the
look-up table generated before comes into play. Thus, the new, fixed write-owner
may look up the current write-owner, communicate the value to be written and,
eventually, writes the value by itself.
only if: For the reverse direction we simulate a parallel pointer machine by a
PRAM in the usual way [40]. Each processor simulates one PPM unit, using a
block of constantly many cells of global memory to hold the state, output, and taps
of the simulated unit plus some additional housekeeping information. The simula-
tion works as follows. Each PRAM processor reads the outputs of the neighbors of
the unit it is simulating and updates the state, output, and pointers stored in its
block according to the PPM’s transition function. The case whenever a PPM unit
spawns new units requires some care (especially a ‘‘cleanup’’ phase every log n steps
to rebalance the tree of processors simulating the active PPM units is necessary:
using a prefix sums computation, which can be done in NC1 and thus (by
Theorem 6) in a data-independent way by a simple PRAM), but is basically
straightforward. Further details can be found in Lam and Ruzzo’s work [40].
From the above simulation of a PPM by a PRAM, it is easy to conclude that
the control structure of the simulating PRAM, which essentially consists of one
main loop, is data-independent and contained in DTIME(log n). To see that the
simulating PRAM also has a data-independent write structure contained in
DTIME(log n), observe that for the above described simulation of a PPM we may
w.l.o.g. lay down that each PRAM processor always (unconditionally) writes in a
fixed order into the block of constantly many global memory cells it is responsible
for. Furthermore, the updating of the pointer, state, and output information can be
done in each processor’s local memory, thus avoiding any conditional global writes.
The computation of the transition function of the PPM is done by the help of the
conditional assignment ‘‘if La }>0 then Lb :=Lc ’’ in a basically straightforward
manner. Lam and Ruzzo [40] use incrementation to address cells within the global
memory blocks. We can avoid the use of the incrementation operation and stick to
NC0-computable ones. If we lay down that the addressing within memory blocks
works with a base address where only the least significant bits have to be modified
to address a cell within a block, then this can be done by NC 0-operations without
the need for incrementation. Obviously this addressing scheme can be used without
loss of generality. Thus, NC 0-operations suffice. K
Theorems 9 and 13 show that the essential difference between DSPACE(log n)
and PPM-TIME(log n) is that for the latter we need not require a data-dependent
execution structure contained in DTIME(log n). To the authors’ best knowledge,
only DSPACE(log n)PPM-TIME(log n)DSPACE(log2 n) is known [18, 22].
130 LANGE AND NIEDERMEIER
Until now, all our characterizations have worked with CRCW-PRAMs using the
priority resolution protocol for write conflicts. Let us briefly consider enhanced
CRCW-PRAM model, Akl’s OR-PRAM [4]. The OR-PRAM resolves write con-
flicts by writing the bitwise OR of all data to be written. This seemingly slight revi-
sion of the underlying CRCW-PRAM model has drastic consequences for our
‘‘data-(in) dependent world.’’ A fully data-independent OR-PRAM suffices to get
ACk: In the characterization of ACk (Theorem 5), the decisive, data-dependent
write instruction was ‘‘if La }>0 then Gj :=1,’’ where the value of the if-part
depended heavily on the input, but the indexing values i and j were data independ-
ent. In an OR-PRAM this instruction can be replaced by an instruction ‘‘Gj :=Gi ’’
using only the last bit of Gi . So we get data-independent control, read, and write
structures for the simulation of ACk-circuits. Remember that in our standard model
of simple CRCW-PRAMs, this is a very strong restriction decreasing the computa-
tional power from AC k to NCk.
An essential property of the above, fully data-independent simulation of ACk-cir-
cuits by OR-PRAMs is the need for nonmonotonic operations. The OR-feature for
concurrent writes directly applies only to unbounded OR-gates. For AND-gates, we
make use of de Morgan’s law by x 7 y=(x 6 y ). As a consequence, ones may be
overwritten by zeroes.
By way of contrast, in a monotonic PRAM, global memory cells shall only con-
tain values 0 or 1 and a 1 is never overwritten by a 0. Observe that assuming the
input bits are given in nonnegated and negated form, by applying de Morgan’s laws
to NC-circuits these can be made monotonic. Thus, in fact, Theorem 6 can be
tightened to simple, monotonic PRAMs. Monotonicity of a PRAM algorithm can
be an important criterion concerning implementation on asynchronous machines.
A monotonic algorithm may tolerate processors that make different progress in the
course of the computation. If the slower processor needs data from the faster one,
with monotonic algorithms we avoid storing (old) data of the faster processor that
have the time stamp the slower processor has currently reached. The slower pro-
cessor can simply use the newest data delivered from the faster one; it can work
with ‘‘data from the future.’’ This avoids synchronization overhead. For example,
consider the (parallel version of the) Warshall algorithm computing the transitive
closure of graphs given as adjacency matrices. Here an existing 1, signaling the
existence of a path between two nodes, is never overwritten by a 0the algorithm
is monotonic. It does no harm to the Warshall algorithm that one processor works
with matrix entries that are produced by another one that is ahead.
We get ACk if we require monotonic OR-PRAMs, but allow data-dependent
writes: It is clear how to simulate OR-gates, but what about the AND-gates? The
trick is to simulate AND-gates just like OR-gates by interpreting an input 1 as a
0 and vice versa. Then output 1 of such computed AND-function, in fact, means 0
and 0 means 1. Data-dependent conditional write instructions are necessary to
realize such opposite interpretations of values for AND-gates.
What happens if we require a fully data-independent OR-PRAM to be
monotonic? We get SACk, the classes of languages recognized by semiunbounded
fan-in circuits of polynomial size and O(logk n) depth [64]. Venkateswaran [64]
proved that SAC1 is equal to LOGCFL, the class of languages logspace many-one
131DATA INDEPENDENCE IN PRAM COMPUTATIONS
reducible to context-free languages [60]. Observe that for the subsequent theorem
we make use of the fact that, given that input word bits etc. are stored in negated
and unnegated form, we can assume w.l.o.g. the NC0-operations to be restricted to
ANDs and ORs (no negations).
Theorem 14. For k1, we have L # SACk if and only if L is recognized by a
simple, monotonic OR-PRAM A in time O(logk n), A has data-independent control,
read, and write structures, and CSA , RSA , and WSA are in DTIME(log n).
Proof. if: We again make use of the recursive functions Global and Locala in
order to construct a semi unbounded fan-in circuit for the simulation of PRAM A.
The construction works analogous to the proof of Theorem 6, so we will only
describe the comparative differences.
The computation of Locala(t, p) is the same as in Theorem 6. It is decisive here
that A uses only monotonic operations, because in SACk-circuits negating gates are
not admissible. The computation of Global(t, i) is the same as in Theorem 6 except
for the following. For each bit of cell i, we have an unbounded fan-in OR-gate. The
inputs of these OR-gates are the respective bits of Localb(t&1, p), where p stands
for all processors writing into i by an instruction ‘‘GLa :=Lb ’’ or ‘‘if La }>0 then
GLa :=Lb .’’ This reflects the OR concurrent write feature of A. If no processor
writes, we connect Global(t, i) with Global(t&1, i).
only if: For the reverse direction, we refer to Theorems 5 and 6. Again we just
state the main changes we have to observe here. For the evaluation of bounded fan-
in OR-gates proceed as in Theorem 6sequentially read all inputs of the gate. For
the evaluation of unbounded fan-in gates, A makes use of its OR feature in order
to avoid the necessity for data-dependent conditional writes. The simulation of the
circuit’s uniformity machine works as in Theorem 6. Note that Venkateswaran’s
SACk-circuits [64] are even DTIME(log n)-uniform. Altogether, in each case
monotonic instructions are sufficient. Data independence follows in the same way
as in Theorem 6. K
5. INDEX-PRAMS
In the previous section we obtained our results by requiring several structural
restrictions for CRCW-PRAMs. By way of contrast, Index-PRAMs in some sense
possess built-in data independence. There are several additional features to the basis
model of an Index-PRAM, which are to be chosen by the programmer. The rough
idea behind them is that the fewer the deviations from the basis model, the easier
the implementation on real distributed memory machines will be.
In the first subsection we introduce our basis model and a basic lemma. In the
second subsection we provide results analogous to the structural characterizations
of the preceding section.
5.1. The Model and Its Features
The central point in the definition of Index-PRAMs is the introduction of index
registers. Index registers are exclusively used for addressing global memory cells.
132 LANGE AND NIEDERMEIER
Consequently, we distinguish between three kinds of registers for PRAMs. By G we
refer to global registers, by L to local data registers, and by I to local index
registers. In general, local data registers are not used any longer to index global
memory cells, but for this purpose are replaced by index registers. We still have,
however, a constant number of local data and index registers per processor. We
allow registers only to access the length of the input and the processor identification
number, but not to depend on the input word. A usual instruction set for Index-PRAMs
is shown below. Compare it to that of (general) PRAMs given in Subsection 2.2.
Constants: La :=(constant) , La :=Length, or La :=PIN, Ia :=(constant) ,
Ia :=Length, or Ia :=PIN,
Global Write: GIa :=Lb ,
Global Read: La :=GIb ,
Local Assignment: La :=Lb , La :=Ib , or Ia :=Ib ,
Conditional (Local ) Assignments: if(Input-Bit(Ic)) then La :=Lb , if Ic }>0 then
Ia :=Ib ,
Monadic Operations: La :=f (Lb),
Monadic and Binary Operations: Ia :=f (Ib), or Ia :=Ib b Ic ,
Jumps: goto Sa or Ia }>0 then goto Sb ,
Others: if Ic }>0 then Halt or if Ic }>0 then NoOp.
The condition ‘‘(Input-Bit(Ic)) ’’ in the above input conditional local assignment
means that here by Ic we give the position i of a bit in the input word w, where
1i|w|, and the condition is true iff the bit is 1 and is false otherwise.
Our basis model of an Index-PRAM is as follows.
1. As can be seen in the given instruction set, binary operations are only
allowed for index registers, otherwise monadic operations are obligatory.
2. Only NC0-computable operations are admissible for monadic as well as
binary operations.
Obviously, the control and communication structure of an Index-PRAM are
data-independent. The following result quantifies the complexity of the control and
communication structures and is used several times in proving the results of the
following subsection.
Lemma 15. Let A be an Index-PRAM operating in time O(logk n). Then for
k2, the control and communication structures CSA , RSA , and WSA are
recognizable in NC k. For k=1, these sets are in weakly uniform NC 1.
Proof. Let S1 , S2 , ..., SK be the program of A and c=O(log n) be the word-
length of A; that is, each register of A can hold c bits. First we describe a circuit
U that will compute the statements executed by the processors and the contents of
their index registers. For each time step t each processor p, each local index register
a, and each statement + there are gates named State(t, p, +) and Ibita(t, p, j) with
the following meanings:
133DATA INDEPENDENCE IN PRAM COMPUTATIONS
1. Gate State(t, p, +) evaluates to true if and only if in time step t processor
p executes statement S+ , and
2. gate Ibita(t, p, j) evaluates to true if and only if the j th bit of index register
Ia of processor p after the t th time step is set to one.
We now give the interconnection structure between these gates which is influenced
by a construction in [41, Theorem 4.9].
Computation of State(t, p, +). Gate State(t, p, +) is the disjunction of the follow-
ing expressions:
State(t&1, p, +$) for all +$ such that S+$ is of the form goto S+ ,
State(t&1, p, +$) 7 Ibita(t&1, p, 0) for all +$ and a such that S+$ is of the form
if Ia }>0 then goto S+ ,
State(t&1, p, +&1) 7 cIbita(t&1, p, 0) for all +$ and a such that S+&1 is of
the form if Ia }>0 then goto S+$ ,
State(t&1, p, +) if S+ is the Halt statement, and
State(t&1, p, +&1) for all + such that S+&1 is not a goto statement, a condi-
tional goto statement, or a Halt statement.
Computation of Ibita(t, p, j). Gate Ibita(t, p, j) is the disjunction of the follow-
ing expressions:
(constant) j 7 State(t, p, +) for all + such that S+ is of the form Ia :=(constant) ,
where (constant) j denotes the j th bit of the constant,
pj 7 State(t, p, +) for all + such that S+ is of the form Ia :=PIN, where pj
denotes the j th bit of p,
State(t, p, +) for all + such that S+ is of the form Ia :=Length, where n j denotes
the j th bit of n,
Ibitb(t&1, p, j) 7 State(t, p, +) for all + such that S+ is of the form Ia :=Ib ,
(Ibitb(t&1, p, j) 7 Ibitc(t&1, p, 0) 6 Ibita(t&1, p, j) 7 cIbitc(t&1, p, 0)) 7
State(t, p, +) for all + such that S+ is of the form if Ic }>0 then Ia :=Ib ,
outputj 7 State(t, p, +) for all + such that S+ is of the form Ia :=Ib b Ic , where
outputj is the j th output bit of an NC0 circuit computing the operation b with inputs
Ibitb(t&1, p, } ) and Ibitc(t&1, p, } ), and
Ibita(t&1, p, j) 7 State(t, p, +) for all + such that S+ is neither an assignment
nor an conditional assignment to index register Ia .
The width of the disjunctions for State(t, p, +) and Ibita(t, p, j) is bounded by K,
the length of A’s program. Since K is constant, the circuit U consists of O(logk n)
layers of constant depth and hence has depth O(logk n) in total. Furthermore, it
should be clear that its direct connection language is in DTIME(log n). By
Lemma 1 U fulfills the stated uniformity conditions.
134 LANGE AND NIEDERMEIER
Using U the statement now follows from the following relations:
(a) (1n, t, i, +) # CSA if and only if gate State(t, i, +) evaluates to true.
(b) (1n, t, i, j) # RSA if and only if

S+ is La :=GIb
State(t, i, +) 7 
1mc
( jm #Ibitb(t&1, i, m)),
where jm is the mth bit of j.
(c) (1n, t, i, j) # WSA if and only if

S+ is GIa :=Lb
State(t, i, +) 7 
1mc
( jm #Ibita(t&1, i, m)).
The last two expressions can be built in additional depth O(log c)=O(log log n).
The total depth stays O(logk n). K
We see that the following questions can be answered with an additional depth of
O(log n) not increasing the total depth of O(logk n):
v Is Ia=i in processor p after step t?
v Does p write into global cell i in step t?
v Is p the minimal processor writing into cell i at time t?
v Does p read from global cell i in step t?
v Does some processor write into global cell i in step t, and if so, which one?
The interested reader might wonder that we do not get any ACk-hard problems
in determining the control structure as we did in the characterization results with
data independence. The reason for this is that our model of an Index-PRAM does
not have global index registers. Hence, any communication has to be treated as
data and cannot affect the control flow.
If in subsequent characterizations the Index-PRAM has to be enhanced by
removing one or another restriction or by allowing some additional feature, we
shall always explicitly indicate the deviations from the basis model. These devia-
tions will affect data independence of the communication structure, while the con-
trol structure CSa will always stay data-independent, with a word problem in NCk.
5.2. Characterization Results
As in the previous section, we start with a characterization of AC k.
Theorem 16. For k1, we have L # AC k if and only if L is recognized by an
Index-CRCW-PRAM in time O(logk n) that is additionally equipped with the instruc-
tion ‘‘if Lc }>0 then GIa :=Lb .’’
Proof. if: This direction is clear, because a CRCW-PRAM can trivially simulate
an Index-CRCW-PRAM. The characterization of ACk by CRCW-PRAMs [59]
now yields the desired result.
135DATA INDEPENDENCE IN PRAM COMPUTATIONS
only if: Due to Stockmeyer and Vishkin [59] we can assume that the ACk-cir-
cuit to be simulated is DTIME(log n)-uniform.
Two things have to be done. First, by simulating the uniformity machine we set
up a pointer structure in global memory representing the circuit to be simulated. To
do so, each participating processor interprets its PIN as a pair (g, g$) of gates and
simulates in logarithmic time the uniformity machine to check whether g$ is a direct
predecessor of gate g and to determine the type of gate g. Observe that the
successors of TM-configurations (represented by the PINs of processors) can be
computed with NC0-operations [8, Vol. I, pp. 110115].
Second, we simulate the circuit making use of the pointer structure. The simula-
tion of the AC k-circuit represented by a pointer structure now works in the well-
known way [59]. Each wire between two gates gets a processor. The processor
logk n times reads the value from the source gate and executes a conditional write
depending on the value read and the type of the sink gate. K
Analogously, a characterization of NCk by Index-PRAMs can be given.
Theorem 17. For k2, we have L # NCk if and only if L is recognized by an
Index-CRCW-PRAM in time O(logk n) that additionally is equipped with the data-
conditional assignment ‘‘if Lc }>0 then La :=Lb .’’
Proof. if: As in the proof of Theorem 6, we will construct a circuit family
C=(Cn)n1 recognizing L(A). Again the uniformity of the construction will be
done by a uniformity circuit family U of depth O(logk n) and again the simulation
of the PRAM by an NC-circuit works with the recursive functions Global and
Locala . We assign to each of these functions a bunch of logarithmically many gates
that represent the values of Global(t, i) and Locala(t, p). The main work is done by
the uniformity circuit U in computing the interconnection structure between gates.
Observe that the possibility to use the data-conditional assignment does not have
consequences for the control or communication structure since goto statements and
memory access are controlled by index registers. Hence, Lemma 15 still applies.
With the help of the information contained in CSA , RSA , and WSA , the structure
of C becomes trivial.
Computation of Global(t, i). First uniformity circuit U checks whether some pro-
cessor writes into cell i in step t. If not, Global(t, i) is connected to Global(t&1, i).
Otherwise, U determines the processor p which writes into i and determines the
statement S+ executed by p in step t. Statement S+ must be of the form ‘‘GIa :=Lb ’’
(cf. instruction set of Index-PRAMs). In this case, U connects Global(t, i) with
Localb(t&1, p).
Computation of Locala(t, p). First, U determines the statement S+ executed by
p in step t. We have to distinguish the following cases:
1. S+ #‘‘La :=BIb ’’: Here U determines the value of Ib= j and connects
Locala(t, p) with Global(t&1, j).
2. S+ is neither a conditional nor an unconditional assignment to local cell
La . In this case, U connects Locala(t, p) with Locala(t&1, p).
136 LANGE AND NIEDERMEIER
3. All the other cases, i.e., (conditional) assignments to La with the exception
of a global read, are completely analogous to their corresponding counterparts in
Theorem 6.
In total, we get an NCk-circuit family C=(Cn)n1 with a direct connection
language in NC k. Applying Lemma 1 we get L(A) # NCk.
only if: We will now describe a PRAM A simulating a circuit family
C=(Cn)n1 of size p(n). The idea of A is of course to repeat O(logk n) times a loop
in which for each gate g a processor takes the values of the left and the right prede-
cessor of g, combines these values according to the type of g, and writes the result
in global cell Gg . The problem is that these values are data-dependent and that we
have only monadic operations for the manipulation of data. Here, and only here,
we use the conditional assignments. If Lb and Lc hold boolean values, then the
assignment La :=Lb 6 Lc can be expressed by the two statements La :=Lb ; if
Lc }>0 then La :=true and, dually, the statement La :=Lb 7 Lc is equivalent to
La :=false; if Lc }>0 then La :=Lb .
The structure of A is now as follows: in parallel, each processor Pp interprets and
decodes its PIN as a triplet of three gate names 1g, g$, g"p(n) and puts these
in its registers Ia , Ib , and Ic . Then, Pp simulates the uniformity machine of C and
computes in logarithmic time the type of gate g. If g is an unnegated input, Pp
stops. If it is a negated input, it writes 1&the content of Gg into G2g . If g is an inner
gate, Pp puts the type information in index register Id , where Id=0 stands for an
6-gate and Id=1 for an 7-gate. Then Pp checks in time O(log n) whether
(1n, g, 1, g$) and (1n, g, 1, g") are members of LEC(C). In case at least one of
these questions is answered negatively, Pp stops. Otherwise, we know that g$ and
g" are the two predecessors of gate g. After these local steps without communica-
tion, the ‘‘surviving’’ processors perform O(logk n) times the following loop:
1. Lb :=GIb ;
2. Lc :=GIc ;
3. La :=Lb 6 Lc ;
4. if Id }>0 then La :=Lb 7 Lc ;
5. GIa :=La .
After the execution of this program cell Gp(n) contains the bit output by Cn . K
Remark 18. For the case k=1, these constructions give us that Index-PRAMs
augmented with a data-conditional assignment in logarithmic time recognize all
languages in NC 1. On the other hand, Index-PRAMs recognize only languages in
weakly uniform NC 1; i.e., they can be simulated by NC 1-circuits with direct connec-
tion language in NC1.
Compare Theorem 16 to Theorem 17. The only difference between ACk and NCk
within the framework of Index-PRAMs is that for ACk we are allowed to use condi-
tional global write instructions of the form ‘‘if Lc }>0 then GIa :=Lb ,’’ whereas for
NCk we only have ‘‘if Lc }>0 then La :=Lb .’’
137DATA INDEPENDENCE IN PRAM COMPUTATIONS
We proceed with a characterization of DSPACE(log n). In order to do this, it is
necessary to relax the fundamental concept of Index-CRCW-PRAMs. Up to now,
no data registers were allowed as indexing registers. Now we loosen this by admit-
ting that global reads may be data-dependent; that is, we additionally have an
instruction of the form ‘‘La :=GLb ’’ instead of only ‘‘La :=GIb .’’
Theorem 19. L # DSPACE(log n) if and only if L is recognized by an
Index-CRCW-PRAM in time O(log n) that additionally is equipped with the instruc-
tion ‘‘La :=GLb .’’
Proof. Let A denote the Index-PRAM and M the DSPACE(log n)-TM.
if: The possibility to use local registers to control global reads affects neither
the control nor the write structure. That is, Lemma 15 still gives us that CSA and
WSA are in weakly uniform NC 1 and hence in DSPACE(log n). Also the value of
the index registers are computable in DSPACE(log n). Since we have no data-con-
ditional assignment of the form ‘‘if Lc }>0 then La :=Lb ,’’ all conditional statements
are controlled by index registers. That is also why the execution structure ESA is
in DSPACE(log n). Using Remark 10b, we can apply Theorem 9 which gives
L(A) # DSPACE(log n).
only if: This direction is nearly identical to the corresponding part of
Theorem 9. The point is to be careful to distinguish between local registers and
index registers. The only subtlety is to simulate the conditional assignment ‘‘if
GLa }>0 then Lb :=Next1(PIN)’’ which was used to set up in dependence of the
input the correct successor of a configuration. To do this, we compute the two
possible successors into local registers La and Lb and put the index of the input bit
in index register Ic . Then the statement ‘‘if (Input&Bit(Ic)) then La :=Lb ’’ does
the job. K
Relaxing some of the restrictions in the characterization of DSPACE(log n), we
get a result corresponding to Theorem 11. For k=1, it yields a characterization of
LOGDCFL in terms of Index-PRAMs. We state this result without proof, which is
just an ‘‘index version’’ of the proof of Theorem 11.
Theorem 20. L is accepted by a CROW-PRAM in time O(logk n) if and only if
L is recognized by an Index-CRCW-PRAM in time O(logk n) that additionally is
equipped with the instructions ‘‘La :=GLb ’’ and ‘‘if Lc }>0 then La :=Lb ’’ and a
standard PRAM operation set with binary operations.
Again the difference between CROW-PRAMs and PPMs lies in the operation set
used, as Lam and Ruzzo [40] already demonstrated. Again we state this result
without proof which works basically in the same way as the proof of Theorem 13.
Theorem 21. For k1, we have L # PPM-TIME(logk n) if and only if L is
recognized by an Index-CRCW-PRAM in time O(logk n) that additionally is
equipped with the instructions ‘‘La :=GLb ’’ and ‘‘if Lc }>0 then La :=Lb .’’
Remark 22. We see that the only difference between PPM-TIME(log n) and
DSPACE(log n) with respect to the Index-PRAM characterization is that for the
138 LANGE AND NIEDERMEIER
first we may use conditional instructions of the form ‘‘if Lc }>0 then La :=Lb ,’’
whereas for the latter we may only use ‘‘if (Input-Bit) then La :=Lb .’’
We conclude this section with an Index-PRAM characterization of the semiun-
bounded fan-in circuit class SACk. This parallels the structural characterization
given in Theorem 14. Again we use monotonicity and Akl’s OR-PRAMs [4],
resulting in a natural way in monotonic Index-OR-PRAMs.
Theorem 23. For k1, we have L # SACk if and only if L is recognized by a
monotonic Index-OR-PRAM in time O(logk n) that additionally is equipped with the
instruction ‘‘if Lc }>0 then La :=Lb .’’
The proof of Theorem 23 is a straightforward combination of the arguments in
the proofs of Theorems 14 and 17 and is therefore omitted.
6. CONCLUSION
We investigated the concept of data independence from the complexity theoretic
viewpoint. This notion provides a unifying framework relating many sequential and
parallel models. It allows the grouping of various complexity classes into three
levels:
Full data independence characterizes parallel models of bounded fan-in.
Data-dependent read access joins several sequential and parallel classes.
Data-dependent write access characterizes parallel models of unbounded fan-in.
Apparently, the class OROW-TIME(logk n) is missing among the second level.
Both OROW-PRAMs and PPMs are restrictions of CROW-PRAMs. The first
results are achieved by forbidding concurrent read access, the second by forbidding
binary operations. Aside from the search for an Index-PRAM characterizing
OROW-PRAMs, this also poses the question whether OROW-PRAMs without
binary operations characterize DSPACE(log n).
One might ask for the place of the exclusive write concept in this picture. Results
in [41, 47] showed closest relations to unambiguous computations. In particular,
classes defined by exclusive write PRAMs can be regarded as promise classes; i.e.,
the exclusive write access can only be guaranteed if the input fulfills some property
which cannot be checked by the PRAM itself without breaking the exclusive write.
A consequence is the apparent lack of complete problems for exclusive write classes.
Thus, it is no surprise that the concept of exclusive write has not been captured in
the framework of data independence.
Data independence of parallel algorithms appears to be a fundamental prerequi-
site for an efficient implementation on existing distributed memory machines which
may be closer to parallelism of bounded fan-in. Data independence of communica-
tion and control gives the opportunity to optimize parallel algorithms with respect
to their communication pattern at compile time. Let us close with a discussion of
the complexity-theoretical relevance of data (in) dependence. As a parallel analogue
of the fruitful notion of NP-completeness and its opposition to P-membership,
139DATA INDEPENDENCE IN PRAM COMPUTATIONS
parallel complexity theory offers the opposition of P-completeness to NC-mem-
bership, the former as a demonstration of a problem being inherently sequential
and the latter as proof of a problem being efficiently parallelizable. But in reality
not all NC-algorithms are efficient [39] and there are P-complete problems that are
in a very intuitive sense efficiently parallelizable [68].
The main reason for this problem lies in the fact that all notions of reducibility
used so far allow for a polynomial growth of the output [39]. Hence, the resulting
complexity classes are closed under ‘‘polynomially bounded padding.’’ But in order
to be able to distinguish, for example, between a quadratic and a cubic resource
bound or to work with an appropriate notion of speedup and work load, we would
need reducibilities that are based on a linear or quasilinear growth of the output
length [30, 46, 56]. We note here that nearly all results obtained in this paper rely
heavily on this freedom for polynomial growth.
A very different question is that of choosing an appropriate machine model. Our
results give further evidence that in current parallel complexity theory both the
machine model to define classes (e.g., PRAMs and circuits of unbounded fan-in)
and the machine model to define reducibilities (e.g., space-bounded Turing
machines) are not appropriate. The comparison of Theorem 6 with Theorem 9
specifically shows that DSPACE(log n) reductions spoil the communication struc-
ture. The current notions of reducibility are based on sequential machines and thus
by Theorem 9 are burdened with a data-dependent read structure. Hence, they can-
not distinguish between data dependence and data independence of communication
structures. In particular, it is possible to reduce a data-dependent computation to
a data-independent one. This defect does not matter when working with PRAMs or
circuits of unbounded fan-in, but should matter when working with more realistic
models like distributed memory machines. That underpins for the field of efficient
parallel computation the importance of the development of reducibility notions that
are finer than DSPACE(log n) reductions. As a consequence of our work, these
reducibilities should not only have (quasi)linear output length, but should, in addi-
tion, be based on Index-PRAMs or circuits of bounded fan-in.
Our results also shed some light on the classification of PRAMs according to
their read and write access to global memory. In Subsection 2.2 we gave the current
classification scheme for PRAMs and presented the OROW-PRAM as the weakest
model. Following the same argumentation as before, Rossmanith’s inclusion
DSPACE(log n)OROW-TIME(log n) [54] expresses the inadequacy of the
XRYW classification scheme of PRAMs as a criterion with respect to implemen-
tability on existing parallel machines. This is due to the observation that even
DSPACE(log n) seems to be too powerful for a simulation by fully data-indepen-
dent PRAMs in logarithmic time. With the presence of a concrete machine model
like the Index-PRAM, the possibility arises to develop algorithms that are
efficiently implementable on existing and foreseeable parallel machines.
In Table 1 we summarize the main results of our work. The purpose of this table
is to highlight the main differences between various complexity classes within our
framework of data independence and Index-PRAMs.
One direction for further research emerging from our work is to investigate far
more combinations of the restrictions applicable to PRAMs. It would be interesting





Class PRAM type RS WS deviations from the basis model
NCk Simple I I ‘‘if Lc }>0 then La :=Lb’’
DSPACE(log n) Simplea D I ‘‘La :=GLb’’
PPMk Simple D I ‘‘if Lc }>0 then La :=Lb’’ and ‘‘La :=GLb’’




I I ‘‘if Lc }>0 then La :=Lb ,’’ monotonic, and
OR-write
ACk Simple I D ‘‘if Lc }>0 then GIa :=Lb’’
Note: The PRAM type refers to the definition of simple PRAMs in Section 2. We always assume a
data-independent control structure contained in DTIME(log n). I, the corresponding structure is data-
independent and contained in DTIME(log n); D, data dependence.
a Here, in contrast to all other cases, we also have to require that the execution structure ES is
contained in DTIME(log n).
to find further classification criteria other than data independence and
monotonicity that play an important role for the transfer of PRAM algorithms to
realistic parallel machines. Among the many ideas in this direction we refer the
reader to the papers [2, 3, 12, 19, 32, 43, 58, 65] and many others. A matter of spe-
cial interest could be to analyze and classify from a complexity theoretic point of
view various degrees of synchronization that are necessary to implement parallel
algorithms in a distributed environment, i.e., in a concurrent system without a
global clock as it is still present in the Index-PRAM.
ACKNOWLEDGMENTS
Thanks to Carsten Damm and Markus Holzer for stimulating discussions and helpful comments.
In particular, we are grateful to Peter Rossmanith for useful suggestions improving presentation and
results and to a referee of Journal of Computer and System Sciences for insightful remarks and proposals
concerning the organization of the introduction and the treatment of uniformity issues in the paper.
REFERENCES
1. F. Abolhassan, R. Drefenstedt, J. Keller, W. J. Paul, and D. Scheerer, On the physical design of
PRAMs, Comput. J. 36 (1993), 756762.
2. A. Aggarwal, A. K. Chandra, and M. Snir, On communication latency in PRAM computations, in
‘‘Proceedings of the 1st ACM Symposium on Parallel Algorithms and Architectures,’’ pp. 1121,
1989.
3. A. Aggarwal, A. K. Chandra, and M. Snir, Communication complexity of PRAMs, Theoret. Comput.
Sci. 71 (1990), 328.
4. S. G. Akl, On the power of concurrent memory access, in ‘‘Comput. and Inform.,’’ pp. 4955, 1989.
5. G. S. Almasi and A. Gottlieb, ‘‘Highly Parallel Computing,’’ 2nd ed., BenjaminCummings,
Redwood City, CA, 1994.
141DATA INDEPENDENCE IN PRAM COMPUTATIONS
6. R. J. Anderson and G. L. Miller, Deterministic parallel list ranking, Algorithmica 6 (1991), 859868.
7. J. Balca zar, J. D@ az, and J. Gabarro , ‘‘Structural Complexity Theory II,’’ Springer-Verlag, Berlin
New York, 1990.
8. J. Balca zar, J. D@ az, and J. Gabarro , ‘‘Structural Complexity Theory I,’’ 2nd ed., Springer-Verlag,
BerlinNew York, 1995.
9. G. Bilardi and F. P. Preparata, Horizons of parallel computation, J. Parallel Distrib. Comput. 27
(1995), 172182.
10. D. P. Bovet and P. Crescenzi, ‘‘Introduction to the Theory of Complexity,’’ PrenticeHall, New
York, 1994.
11. S. R. Buss, The Boolean formula problem is in ALOGTIME, in ‘‘Proceedings of the 19th ACM
Symposium on Theory of Computing,’’ pp. 123131, 1987.
12. A. Chin, Complexity models for all-purpose parallel computation, in ‘‘Lectures on Parallel Com-
putation,’’ Cambridge International Series on Parallel Computation, Chap. 14, pp. 393404,
Cambridge Univ. Press, Cambridge, MA, 1993.
13. R. Cole and U. Vishkin, Approximate parallel scheduling, Part i: The basic technique with applica-
tions to optimal parallel list ranking in logarithmic time, SIAM J. Comput. 17 (1988), 128142.
14. S. A. Cook, Characterizations of pushdown machines in terms of time-bounded computers, J. Assoc.
Comput. Mach. 18 (1971), 418.
15. S. A. Cook, Towards a complexity theory of synchronous parallel computation, Enseign. Math. 27
(1981), 99124.
16. S. A. Cook, A taxonomy of problems with fast parallel algorithms, Inform. and Control 64 (1985),
222.
17. S. A. Cook, C. Dwork, and R. Reischuk, Upper and lower time bounds for parallel random access
machines without simultaneous writes, SIAM J. Comput. 15 (1986), 8797.
18. S. A. Cook and P. W. Dymond, Parallel pointer machines, Comput. Complexity 3 (1993), 1930.
19. D. E. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and
T. von Eicken, LogP: A practical model of parallel computation, Commun. Accos. Comput. Mach.
39(11) (1996), 7885.
20. C. Damm, M. Holzer, and P. Rossmanith, Expressing uniformity via oracles, Theory Comput.
Systems 30 (1997), 355366.
21. P. de la Torre and C. P. Kruskal, Towards a single model of efficient computation in real parallel
machines, Future Generation Comput. Systems 8 (1992), 395408.
22. P. W. Dymond and S. A. Cook, Hardware complexity and parallel computation, in ‘‘Proceedings of
the 21st IEEE Conference on Foundations of Computer Science,’’ pp. 360372, 1980.
23. P. W. Dymond, F. E. Fich, N. Nishimura, P. Ragde, and W. L. Ruzzo, Pointers versus arithmetic
in PRAMs, J. Comput. System Sci. 53 (1996), 218232.
24. P. W. Dymond and W. L. Ruzzo, Parallel RAMs with owned global memory and deterministic
language recognition, in ‘‘Proceedings of the 13th International Conference on Automata,
Languages, and Programming,’’ Lecture Notes in Computer Science, Vol. 226, pp. 95104, Springer-
Verlag, 1986.
25. H. Fernau, K.-J. Lange, and K. Reinhardt, Advocating ownership, in ‘‘Proceedings of the 16th Con-
ference on Foundations of Software Technology and Theoretical Computer Science, India, Dec.
1996,’’ Vol. 118, pp. 286297, Springer-Verlag, Berlin, 1996.
26. S. Fortune and J. Wyllie, Parallelism in random access machines, in ‘‘Proceedings of the 10th ACM
Symposium on Theory of Computing,’’ pp. 114118, Assoc. Comput. Mach., New York, 1978.
27. A. Gibbons and P. Spirakis, Eds., ‘‘Lectures on Parallel Computation,’’ Cambridge International
Series on Parallel Computation, Cambridge University Press, Cambridge, UK, 1993.
28. L. M. Goldschlager, A unified approach to models of synchronous parallel machines, in
‘‘Proceedings of the 10th ACM Symposium on Theory of Computing,’’ pp. 8994, Assoc. Comput.
Mach., New York, 1978.
142 LANGE AND NIEDERMEIER
29. D. Gomm, M. Heckner, K.-J. Lange, and G. Riedle, On the design of parallel programs for
machines with distributed memory, in ‘‘Proceedings of the 2nd European Conference on Distributed
Memory Computing Munich, Federal Republic of Germany, April 1991’’ (A. Bode, Ed.), Lecture
Notes in Computer Science, Vol. 48, pp. 381391, Springer-verlag, Berlin, 1991.
30. E. Grandjean, Linear time algorithms and NP-complete problems, in ‘‘6th Workshop on Computer
Science Logic,’’ Lecture Notes in Computer Science, Vol. 742, pp. 248273, Springer-Verlag, Berlin,
1992.
31. T. J. Harris, A survey of PRAM simulation techniques, Assoc. Comput. Mach. Comput. Surveys 26
(1994), 187206.
32. T. Heywood and C. Leopold, Models of parallelism, in ‘‘Abstract Machine Models for Highly
Parallel Computers’’ (J. R. Davy and P. M. Dew, Eds.), Chap. 1, pp. 116, Oxford University Press,
Oxford, 1995.
33. T. Heywood and S. Ranka, A practical hierarchical model of parallel computation: I. The model,
J. Parallel Distrib. Comput. 16 (1992), 212232.
34. J.-W. Hong, On similarity and duality of computation, Inform. and Control 62 (1984), 109128.
35. J. E. Hopfcroft and J. D. Ullman, ‘‘Introduction to Automata Theory, Languages and Computa-
tion,’’ AddisonWesley, Reading, MA, 1979.
36. J. Ja Ja , ‘‘An Introduction to Parallel Algorithms,’’ AddisonWesley, Reading, MA, 1992.
37. D. S. Johnson, A catalog of complexity classes, in ‘‘Algorithms and complexity’’ (J. van Leeuwen,
Ed.), Handbook of Theoretical Computer Science, Vol. A, Chap. 2, pp. 67161, Elsevier, Amsterdam,
1990.
38. R. M. Karp and V. Ramachandran, Parallel algorithms for shared-memory machines, in ‘‘Algorithms
and Complexity’’ (J. van Leeuwen, Ed.), Handbook of Theoretical Computer Science, Vol. A,
pp. 869941, Elsevier, Amsterdam, 1990.
39. C. P. Kruskal, L. Rudolph, and M. Snir, A complexity theory of efficient parallel algorithms,
Theoret. Comput. Sci. 71 (1990), 95132.
40. T. W. Lam and W. L. Ruzzo, The power of parallel pointer manipulation, in ‘‘Proceedings of the
1st ACM Symposium on Parallel Algorithms and Architectures,’’ pp. 92102, 1989.
41. K.-J. Lange, Unambiguity of circuits, Theoret. Comput. Sci. 107 (1993), 7794.
42. K.-J. Lange and P. Rossmanith, Unambiguous polynomial hierarchies and exponential size, in
‘‘Proceedings of the 9th IEEE Symposium on Structure in Complexity,’’ pp. 106115, 1994.
43. C. E. Leiserson and B. M. Maggs, Communication-efficient parallel algorithms for distributed
random-access machines, Algorithmica 3 (1988), 5377.
44. T. G. Lewis and H. El-Rewini, ‘‘Introduction to Parallel Computing,’’ PrenticeHall, New York,
1992.
45. W. F. McColl, General purpose parallel computing, in ‘‘Lectures on Parallel Computations’’
(A. Gibbons, and P. Spirakis, Eds.), Cambridge International Series on Parallel Computation,
Chap. 13, pp. 337391, Cambridge University Press, Cambridge, 1993.
46. A. V. Naik, K. W. Regan, and D. Sivakumar, Quasilinear time complexity theory, in ‘‘Proceedings
of the 11th Symposium on Theoretical Aspects of Computer Science’’ (P. Enjalbert, E. W. Mayr, and
K. W. Wagner, Eds.), Lecture Notes in Computer Science, Vol. 775, pp. 97108, Springer-Verlag,
Berlin, 1994.
47. R. Niedermeier and P. Rossmanith, Unambiguous auxiliary pushdown, automata and semi-unbounded
fan-in circuits, Inform. and Comput. 118 (1995), 227245.
48. I. Niepel and P. Rossmanith, Uniform circuits and exclusive read PRAMs, in ‘‘Proceedings of the
11th Conference on Foundations of Software Technology and Theoretical Computer Science, New
Delhi, India, Dec. 1991’’ (S. Biswas and K. V. Nori, Eds.), Lecture Notes in Computer Science,
Vol. 560, pp. 307318, Springer-Verlag, Berlin, 1991.
49. C. H. Papadimitriou, ‘‘Computational Complexity,’’ AddisonWesley, Reading, MA, 1994.
50. I. Parberry, ‘‘Parallel Complexity Theory,’’ Pitman, London, 1987.
143DATA INDEPENDENCE IN PRAM COMPUTATIONS
51. A. G. Ranade, How to emulate shared memory, J. Comput. Systems Sci. 42 (1991), 307326.
52. K. W. Regan, A new parallel vector model, with exact Characterization of NC k, in ‘‘Proceedings of
the 11th Symposium on Theoretical Aspects of Computer Science’’ (P. Enjalbert, E. W. Mayr, and
K. W. Wagner, Eds.), Lecture Notes in Computer Science, Vol. 775, pp. 289300, Springer-Verlag,
Berlin, 1994.
53. J. H. Reif, ‘‘Synthesis of Parallel Algorithms,’’ Morgan Kaufman, San Mateo, CA, 1993.
54. P. Rossmanith, The owner concept for PRAMs, in ‘‘Proceedings of the 8th Symposium on Theoreti-
cal Aspects of Computer Science Hamburg, Federal Republic of Germany, Feb. 1991’’ (C. Choffrut
and M. Jantzen, Eds.), Lecture Notes in Computer Science, Vol. 480, pp. 172183, Springer-Verlag,
Berlin, 1991.
55. W. L. Ruzzo, On uniform circuit complexity, J. Comput. Systems Sci. 22 (1981), 365383.
56. C. P. Schnorr, Satisfiability is quasilinear complete in NQL, J. Assoc. Comput. Mach. 25 (1978),
136145.
57. H. J. Siegel, S. Abraham, W. L. Batcher, T. L. Casavant, D. DeGroot, J. B. Dennis, D. C. Douglas,
T.-Y. Feng, J. R. Goodman, A. Huang, H. F. Jordan, J. R. Jump, Y. N. Patt, A. J. Smith, J. E. Smith,
L. Snyder, H. S. Stone, R. Tuck, and B. W. Wah, Report on the Purdue Workshop on Grand
Challenges in computer architecture for the support of high performance computing,
J. Parallel Distrib. Comput. 16 (1992), 199211.
58. M. Snir, Scalable parallel computers and scalable parallel codes: From theory to practice, in
‘‘Proceedings of the 1st Heinz Nixdorf Symposium on Parallel Architectures and Their Efficient Use,
Paderborn, Federal Republic of Germany, 1993’’ (F. Meyer auf der Heide, B. Monien, and A. L.
Rosenberg, Eds.), Lecture Notes in Computer Science, Vol. 678, pp. 176184, Springer-Verlag,
Berlin, 1993.
59. L. Stockmeyer and U. Vishkin, Simulation of parallel random access machines by circuits, SIAM J.
Comput. 13 (1984), 409422.
60. I. H. Sudborough, On the tape complexity of deterministic context-free languages, J. Assoc. Comput.
Mach. 25 (1978), 405414.
61. L. G. Valiant, A bridging model for parallel computation, Commun. Assoc. Comput. Mach. 33
(1990), 103111.
62. L. G. Valiant, General purpose parallel architectures, in ‘‘Algorithms and Complexity’’ (J. van
Leeuwen, Eds.), Handbook of Theoretical Computer Science, Vol. A, Chap. 18, pp. 943971,
Elsevier, Amsterdam, 1990.
63. J. van Leeuwen, Ed., ‘‘Algorithms and Complexity,’’ Handbook of Theoretical Computer Science,
Vol. A, Elsevier, Amsterdam, 1990.
64. H. Venkateswaran, Properties that characterize LOGCFL, J. Comput. Systems Sci. 43 (1991),
380404.
65. U. Vishkin, Preliminary announcement of workshop on ‘‘Suggesting Computer Science Agenda(s)
for High-Performance Computing,’’ Communicated by e-mail on TheoryNet, January 1994.
66. U. Vishkin and A. Wigderson, Dynamic parallel memories, Inform. and Control 56 (1983), 174182.
67. P. Vita nyi, Locality, communication, and interconnect length in multicomputers, SIAM J. Comput.
17 (1988), 659672.
68. J. S. Vitter and R. A. Simons, New classes for parallel complexity: A study of unification and other
complete problems for P, IEEE Trans. Comput. 35 (1986), 403418.
69. K. W. Wagner and G. Wechsung, ‘‘Computational Complexity,’’ Reidel, Reidel, Dordrecht and
VEB, Berlin, 1986.
144 LANGE AND NIEDERMEIER
