Size-Time Complexity of Boolean Networks for Prefix Computations by Bilardi, G. & Preparata, F.P.
January 1987 UILU-ENG-87-2202 
ACT-74
COORDINATED SCIENCE LABORATORY
College o f Engineering 
Applied Computation Theory
SIZE-TIME 
COMPLEXITY OF 
BOOLEAN NETWORKS 
FOR PREFIX 
COMPUTATIONS
G. Bilardi 
F. P. Preparata
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved xor Public Release. Distribution Unlimited.
UNCLASSIFIED
E C U H t T Y  C L A S S I F I C A T I O N  O F  T H IS  PA GE
REPORT DOCUMENTATION PAGE
¡1*. R E P O R T  S E C U R I T Y  C L A S S I F I C A T I O N
Unclassified
lb .  R E S T R I C T I V E  M A R K I N G S
None
12*. S E C U R I T Y  C L A S S I F I C A T I O N  A U T H O R I T Y
N/A_________________________
[2b. O E C L A S S I F I C A T I O N / D O W N G R A O f N G  S C H E D U L E
N/A _____________________  _____
3. D I S T R I B U T I O N / A V A I L A B I L I T Y  O F  R E P O R T
Approved for public release; 
distribution unlimited
I A. P E R F O R M I N G  O R G A N I Z A T I O N  R E P O R T  N U M B E R ( S )
UILU-ENG-87-2202 (ACT-74)
5. M O N I T O R I N G  O R G A N I Z A T I O N  R E P O R T  N U M 8 E R ( S )
N/A
[6 * .  N A M E  OF P E R F O R M I N G  O R G A N I Z A T I O N
Coordinated Science Lab 
University of Illinois_____
j6b. O F F I C E  S Y M B O L  
(If applicable}
N/A
7a. N A M E  O F  M O N I T O R I N G  O R G A N I Z A T I O N
National Science Foundation
|Sc. A O O R E S S  (City. State and ZIP Code)
1101 W. Springfield Avenue 
Urbana, Illinois 61801
7b. A O O R E S S  (City. State and ZIP Code)
1800 G. Street, NW 
Washington, DC 20550
| Sa. N A M E  O F  F U N O I N G / S P O N S O R I N G
O r g a n i z a t i o n  National 
Science Foundation
O F F I C E  S Y M B O L  
(If applicable)
N/A
9. P R O C U R E M E N T  I N S T R U M E N T  I D E N T I F I C A T I O N  N U M B E R
DCI-8602256 and ECS-84-10902
¡3c. A O O R E S S  (City. State and ZIP Code)
1800 G. Street, NW 
Washington, DC 20550
10. S O U R C E  O F  F U N O I N G  N O S .
1 1 T ! T L $  (include Security Classification). "Size-Time Complexity 
[of Boolean Networks for Prefix Computations"
P R O G R A M P R O J E C T T A S K W O R K  U N I T
E L E M E N T  N O . NO. NO. N O .
N/A N/A N/A N/A
12. P E R S O N A L  A U T H O R ( S )
G. Bilardi and F.P. Preparata
I  13«. T Y P E  O F  R E P O R T 13b. T I M E  C O V E R E D 14. D A T E  O F  R E P O R T  (Yr.. .Wo., Day) 15. P A G E  C O U N T
1 Technical F R O M  T O January 1987 16
I  16. S U P P L E M E N T A R Y  N O T A T I O N
N/A
11 7. C O S A T I  C O O E S
,, Fife L O G R O U P  1 SUB. GR
18. S U B J E C T  T E R M S  (Continue on reverie if necessary and identify by block number)
prefix problem, boolean networks, cycle-freedom
19. A B S T R A C T  iContinue on reverie if necessary and identify by block number)
The prefix problem consists of computing all the products x ^ .  . .x_. (j=0,... ,N-1), given 
a sequence x = (Xq ,x ,^ ... ,xN_-^ ) of elements in a semigroup. In this paper we completely 
characterize the size-time complexity of computing prefixes with boolean networks, which are 
synchronized interconnections of boolean gates and one-bit storage devices. This complexity 
crucially depends upon a property of the underlying semigroup, which we call cycle-freedom 
(no cycle of length greater than one in Cayley graph of the semigroup) . Denoting by S and T 
size and computation time, respectively, we have S = 0((N/T) log(N/T)), for non-cycle-free 
semigroups, and S = 0(N/T), for cycle-free semigroups. In both cases, T [f2(logN), 0(N)] .
120. O I S T  RI S U T  I O N / A V A  i i_A 81 L l T Y  OF A 8 S T R A C T  
I U N  C L A S S  I F ì E D / U N L I M I T E D  X .  S A M E  AS SPT. ~  O T IC  U S E R S  □
21. A B S T R A C T  S E C U R I T Y  C L A S S I F I C A T I O N
Unclassified
22a. N A M E  OF R E S P O N S IB L E  ¡ N O l V l O U A L 22b. T E L E P H O N E  N U M B E R  
!Include Area Code,
22c. O F F  ICE S Y M B O L
NONE
DD FGRM 1473, 83 APR E O r i O N  DF 1 J A N  73 IS O B S O L E T E . UNCLASSIFIED
SIZE-TIME COMPLEXITY OF BOOLEAN NETWORKS FOR PREFIX COMPUTATIONS
G. Bilardi1 and F.P. Preparata2
ABSTRACT
The prefix problem consists of computing all the products XqXi ...Xj (/ =  0,...,iV — 1), given a 
sequence x = (x0,xi,."7*xV-i) of elements in a semigroup. In this paper we completely characterize 
the size-time complexity of computing prefixes with boolean networks, which are synchronized inter­
connections of boolean gates and one-bit storage devices. This complexity crucially depends upon a 
property of the underlying semigroup, which we call cycle-freedom (no cycle of length greater than 
one in the Cayley graph of the semigroup). Denoting by S and T size and computation time, 
respectively, we have S = 0((iV/T) log(iWT)), for non-cycle-free semigroups, and S = 0(N7T), for 
cycle-free semigroups. In both cases, T 6 [Q( logN),0(N)].
1. Introduction
The prefix problem consists of computing all the products x0x ^.jcj (j — 0,...,iV —1), given a 
sequence x = (x0^ i,...^ iv-i) of elements in a semigroup. Prefix computations occur in the solution 
of several significant problems such as carry-look-ahead addition [BK82], the evolution of finite- 
state machines [LF80], linear recurrences [K78], digital filtering [BP86b], various graph problems 
[KRS85], sorting in bit-models of computation [CS85, BP85], and others.
The prefix problem has been extensively investigated in the boolean-circuit model, where the 
computation is carried out by an acyclic network of combinational gates. Various complexity meas­
ures such as size, depth, width, and their trade-offs have been studied in this context [LF80, F83, 
CFL83, S86]. Algorithms for the EREW-PRAM model have been proposed in [KRS85].
In this paper we study the complexity of computing prefixes with boolean networks, which are 
synchronized interconnections of boolean gates and one-bit storage devices. Relevant measures are 
computation time T and size S, defined as the total number of components (combinational and 
sequential) in the network. Our model of computation is essentially the same as the aggregate of 
[DC80], from which it differs only in the input/output conventions. Both models afford the study of 
the role of sequential logic in circuits, and allow the consideration of circuits of size sublinear in the
^Department of Computer Science, Cornell University, Ithaca, NY 14853. Research supported in part by NSF Grant 
DCI-8602256.
^Coordinated Science Laboratory, Departments of Electrical and Computer Engineering and of Computer Science, 
University of Illinois at Urban a-Champaign, IL 61801, USA. Research supported in part by NSF Grant ECS-84-10902.
2input size. Results on boolean networks have also interesting implications for other models of 
parallel computation such as fixed interconnections of processors and VLSI circuits [T80].
We have found that the size-time complexity of the prefix problem is determined by a property 
of the underlying semigroup, which we call cycle-freedom. We call a semigroup cycle-free if its Cay­
ley graph has no cycle of length greater than one and non-cycle-free otherwise. Our results, which 
completely characterize the size-time complexity of the prefix problem, are encompassed by the fol­
lowing theorem, which summarizes Theorems 3,4, and 5.
Theorem 1. The size-time complexity of the prefix problem on a boolean network is 
S = 9({N/T) log(N/T)), for non-cycle-free semigroups, and is S = 0(iV/T), for cycle-free semi­
groups. In both cases, T 6 [S2( logN),0{N)].
For non-cycle-free semigroups, the upper bound can be achieved by known constructions based 
on binary-tree networks, or twisted-reflected-tree networks [LF80, BK82, BP86b], whereas the 
lower bound (Section 3, Theorem 3) is less obvious, and is based on arguments of computational 
friction [BP86a]. For cycle-free semigroups, the lower bound is based on a trivial input/output 
argument, while the upper bound is achieved by a rather sophisticated algorithm (Section 4, 
Theorem 5) executed by a tree-connected network.
It may be interesting to contrast Theorem 1 with the result of [CFL85] that there are 
constant-depth, polynomial-size (unbounded fan-in) boolean circuits to compute prefixes for a semi­
group, if and only if the semigroup is group free, an attribute stronger than cycle free.
A result completely anologous to Theorem 1 could be stated for the area-time complexity of 
prefix computation in the VLSI model [T80]. Indeed, since a VLSI circuit is the layout of a boolean 
network, size-time lower bounds for the latter immediately translate into area-time lower bounds 
for the former. In general, area is larger than size due to space occupied by wires. However, the 
circuits considered in this paper can be laid out so that the total wire area is of the same order as 
the size, hence they are area-time optimal as well as size-time optimal.
32. Definitions and Problem  Statement
A finite semigroup is a pair <A ,*>  where A =  J  is a set s z^e s an(  ^ ' an associ~
ative binary operation on A, which we call product. We denote by xy the product of elements x,y€A.
A finite monoid is a finite semigroup with a distinguished element e, called the identity, such that 
xe — ex — x, for all x€A. Any semigroup can be easily transformed into a monoid by the addition 
of an element with the properties of the identity.
For a sequence x =  (x0,xl ,...,xiv -t)€ A iV, the sequence of prefixes of x is defined as y = 
(y0,ylf...Jiv-i)» with yj =  x0x t ... Xy. The prefix problem consists in computing y from x. In the 
study of the complexity of the prefix problem, an important role is played by the Cayley graph 
G(A) = {AJE) of A, containing for each ordered pair (x,y) an arc of the form (x,xy), labelled by y. 
It is easy to see that each node of G(A) has out-degree s, that the labels of the self loops of a given 
node always form a subsemigroup of A, and that G(A) is transitively closed.
We call a semigroup cycle-free (CF) if the only cycles in its Cayley graph are self-loops, and 
non cycle-free (NCF) otherwise (we avoid the term "cyclic" here, since it has a different established 
meaning in group theory). We shall see that cycle-freedom is the crucial property of a semigroup in 
determining the complexity of the prefix problem. Among CF semigroups, of particular interest are 
insertion semigroups, characterized by the following property: for all x,y,z,w€A,
xyz = xy => xwyz — xwy. (1)
We now give examples of semigroups that belong to the various classes introduced above. If 
any element x€A different from the identity has an inverse x ~1 such that xx 1 =  e, then (.e,x,e) 
forms a nontrivial cycle in G(A) and A is NCF. As a corollary, all groups are NCF.
All abelian CF semigroups are also insertion semigroups. An instance of abelian CF semi­
group is given by the set A = fO,l,...,s — 1/ with respect to the operation threshold-(s — l ) addition 
denned as xy = minix-i-y, s -1 ) .  The prefix operation on this semigroup represents the cumulative 
sum of the sequence x with the value (s —I) replacing each larger value.
4Further examples of insertion semigroups are all semilattices, where the semigroup operation 
is commutative and idempotent. Examples of semilattices are the set of the 0-1 vectors of length n 
with respect to component-wise OR (AND), and the set of the first s nonnegative integers with the 
MINIMUM (MAXIMUM) operation.
An interesting insertion semigroup that is not a abelian is the set of the rankings of n items 
with respect to the operation of rank concatenation, which plays an important role in VLSI sorting 
[CS85, BP85, BP86a]. Identifying the n items with the integers from 1 to n, a ranking is an 
ordered partition of the set {1,2, . . . »  n}, that is, a sequence of disjoint sets whose union equals {1,2,
. . . , ft}. Intuitively, all the elements in a given set have the same rank, and have rank higher 
than those in the next set. The concatenation of two rankings u =  (ui,U2,—,uf>) and 
v = (vi,v2,...,uq) is uu =  (wi,u}2,...,w,,) with Wj equal to the subsequence of the nonempty terms of
( Uj n v i ,  u j n v 2, —, Uj n uq).
Yet another class of semigroups is that of strongly cycle-free (SCF) semigroups defined by the 
property that, for all x€A, the set of solutions y  of the equation xy — x is either empty or is A 
itself. In the latter case, x is a left zero of A. The prefix problem for SCF semigroups degenerates, 
in the sense that arbitrarily long input sequences are not of interest. Indeed, for j  greater than the 
length of the longest simple path of G(A), output y7 is guaranteed to be constant with j  (and equal 
to some left zero of A).
To exclude this uninteresting case, and without any substantial loss of generality, all semi­
groups in this paper are assumed to be monoids.
3. Lower bounds
A boolean network is a directed graph with the following types of nodes: (1) input nodes, with 
in-degree zero and out-degree one; (2) output nodes, with in-degree one and out-degree zero; (3) 
combinational nodes, each labelled by a boolean function of one or two input variables, with in­
degree equal to the number of input variables, and out-degree one or two (to allow fan-out); (4) 
one-bit storage nodes, with in-degree one and out-degree one or two.
5The notions of computation of, and of function computed by, a boolean network can be formal­
ized as done in [DC80]. Here we appeal to the intuitive meaning of these notions, and just discuss 
the input/output protocol, since it slightly differs from that of [DC80]. We assume that each input 
[output] variable of the problem is assigned one input [output] node and one input [output] time. 
Two variables can be assigned the same node, but only at different times. Only one node and one 
time are assigned to a given variable (unilocal, semellective protocol), and this node and time are 
independent of the input value (place-determinate, time-determinate protocol).
Clearly, when solving the prefix problem by a boolean network, a specific binary encoding of 
the semigroup elements must be chosen. Since our present aim is to study the dependence of the 
complexity of the prefix problem upon the length N of the input sequence, and not its dependence 
on semigroup size or representation, we assume that the bits that encode a given semigroup vari­
able are input (or output) all at the same time. We call an input/output protocol with this property 
word-instantaneous, in analogy with the term word-local introduced in [Th80].
The following result is a simple consequence of the bounded fan-in assumption and of the fact 
that, in a semigroup which is not SCF, yj =  xqXi -.jcj is a true function of x9rx\
Proposition 1. For any boolean network that solves the prefix problem of size /V for a non-SCF 
semigroup, the computation time satisfies the bound T = Q( log ¿V).
Our lower bound for the prefix problem is based on the mechanism of computational friction 
developed in [BP86a] as a generalization of arguments previously applied to binary addition in 
[J80] and [B81]. Computational friction, so denoted in the context of a fluidodynamic analogy for 
VLSI computations, is a phenomenon that slows down the flow of information from input to output 
nodes below the rate allowed by the number of I/O nodes, and therefore, when present, yields lower 
bounds stronger than the trivial ST = 12(N) bound. Two phenomena contribute to the appearance 
of friction: (i) A fixed fraction of the information carried by each wavefront of input variables is 
transferred to the output variables, and (ii) this information must be stored within the network for 
a time logarithmic in the wavefront size since, for bounded fan-in, functional dependence imposes a
6delay between reading the inputs and computing the outputs. These phenomena can be precisely 
analyzed and lead to the quantitative bounds embodied by the following theorem, a more general 
version of which is proved in [BP86a].
Theorem 2. Given a computational problem P with a set X  of input variables and a set Y of output 
variables, let U be a subset of X such that for any partition U t of U there exists a collection 
WX,...,WT of disjoint subsets of Y (not necessarily a partition) satisfying the following properties:
(1) Each variable in W, is functionally dependent upon Q(|£/,|) variables of (/,.
(2) The values of the variables in X — U can be selected so that, for each t — 1,2,...,T the vari­
ables of Wt carry Q(|£/,|) bits of information about Ut.
Then for any word-istantaneous boolean network that solves P , size and time satisfy the bound
S =  Q((\U\/T) \og(\U\/T)). (2)
We now have the tools to prove the following result.
Theorem 3. For any word-istantaneous boolean network that solves the prefix problem of size N 
for a NCF semigroup A , size and time satisfy the bound
S = Q((AT/T) logUV/D). (3)
Proof. We show that Theorem 2 can be applied, with X  =  {xq,x x,...,x N- i}, Y = {y o J b -J .v - i / .  and 
U =  /x1,X3,...,x2t + i,.../ containing all odd-indexed input variables. Given a partition U X,...,Ur of U, 
let us define the disjoint subsets of Y
Wt={yj:j is among the [j(//|/2] largest fe’s such that .
(1) Clearly each yj in Wt is functionally dependent upon all the x(’s with i^ j ,  and there are at
least =  £2(|t/J) of them in U,.
(2) By assumption, the Cayley graph of A has a cycle of length S 2, and hence a cycle of length 
exactly two, that is, there are four elements a,b,c,d€A  (not necessarily all distinct, but with 
a^ b), such that ac — b and bd —a. Consider the input sequences for which .to — o, 
x^{c,d,cd} for i ^  1, and satisfying the following two properties: (i) If y , . !  = 6, then x! = d\
7(ii) If yt_ i = a and i is even, then xt =  cd. These properties guarantee that the even-indexed 
outputs are all equal to a, and the odd-indexed outputs can be either a (if the corresponding 
input is cd) or 6 (if the corresponding input is c). In conclusion, each of the [|i/J/2| variables 
of W, carries one bit of information about Ut. Then Equation (3) follows from Equation (2), 
considering that|t/| = 0 (N). □
The previous argument cannot be extended to cover CF semigroups; in fact, in the next sec­
tion we shall describe networks for prefix computation in CF semigroups that violate bound (3). 
Indeed, the low information content of the output suggests the absence of friction. More 
specifically, let A be CF. A sequence y =  (yo, -^ v - i )  of prefixes corresponds to a generally non­
simple path of length N in G(A). Let fA be the minimum number of bits required to describe a sim­
ple path in G(A), and let lA be the number of arcs of the longest such path (the height of G(A)). 
Then, an arbitrary path of length N  can be described with O ifA +  /A logiV) bits, O ifA) for the 
underlying simple path and 0 ( logiV) to specify the length of each of the O il,*) runs of self-loops. 
Therefore, 0 (  logiV) bits are sufficient to describe the output prefixes on a CF semigroup, whereas 
Q(iV) are necessary for NCF semigroups.
4. Upper Bounds
In this section, we present three algorithms for the prefix problem, which are respectively 
designed for general semigroups, for CF semigroups, and for insertion semigroups. We first 
describe some general features of the three algorithms, and then present their specific details.
The algorithms are executed by a network having the structure of a binary tree K with w 
leaves. Leaf nodes perform input/output operations, while internal nodes perform data processing. 
Each node is bidirectionally connected to its parent and its offsprings.
The input sequence x = is segmented into N/w wavefronts of width w , where
1 iV/log N (for ease of discussion, we assume that N  is a power of two). The £-th wavefront is
denoted x, =  (x[U;,x [m; + 1,...,jc(i + 1)„,_ i ), where l = 0,1,...¿V/u;- 1 .  The wavefronts are sequentially 
fed to the network, with xW!+J input at the j -th leaf (See Figure 1). A fixed wavefront is processed
8• • •
XN-w XN-w + l x N-l
Figure 1. Input protocol for prefix computation on a binary-tree network, 
by the network in two phases: an ascending phase (consisting of one input step and logu; processing 
steps), when information flows from leaves to root, and a descending phase (consisting of logw pro­
cessing steps and one output 3tep), when the direction of the flow is reversed.
Let the level of a node V, denoted leueUV), be the number of edges on the path between V and 
the root of K. For each of the algorithms described below, a step takes time 0(1) (independent of 
jV). Moreover, a given step on a given wavefront is carried out by a single level of nodes, so that 
the network can be pipelined at a constant rate. Clearly, processing of the N -term sequence is com­
pleted in N/w ■+• 2 log w +  2 steps, and hence in T — 0(N/w),
More subtle is the use of storage at each node, which determines the global size of the net­
work. A fixed wavefront is processed by a given level twice: once during the ascending phase, and 
then again - 2level(V )+l steps later - during the descending phase. (For uniformity of presenta­
tion, we assume that the root too processes a wavefront in two (contiguous) steps, although these 
actions are obviously combinable into a single step.) In the interval of time between the two steps 
(one in the ascending phase, the other in the descending phase) performed by nodes of a given level 
on the same wavefront, some information relative to that wavefront must be stored at the nodes. 
As we shall see, the algorithms for CF semigroups are more size-efficient than the ones for general
9semigroups exactly because less information is explicitly stored at the nodes. Correspondingly, the 
correctness of these algorithms is less immediate to establish.
4.1 General Semigroups
Binary tree K  emulates in a straightforward manner the behavior of the well-known prefix 
network described in [LF80, BK82, BP86b] and called "twisted-reflected-tree" in [BP86b]. The algo­
rithms are best explained as follows.
A given internal node V of K  determines a segmentation of the input sequence x as x 
=  ctofioalfii...aN/w- i p N/w- i ,  where fiofii... is the subsequence of x which is input by the successive 
wavefronts to the leaves of the subtree rooted at V (a0 may be empty). For a given sequence x, let 
x denote the product of its terms. Each is further segmented as /?, = fij'P j", where /?', and / ? / '  
are input at the left and right subtrees of V, respectively.
Referring to the j -th wavefront, during the ascending phase internal node V computes 
3^ =  Pjfi'j from the values and /?'j received from its offsprings. In addition, the root (for which 
all the a’s are empty) maintains a state <7 initialized to e (the monoid identity) and updated as 
<r: =  afij. During the descending phase, nonroot node V must receive from its parent the prefix 
y = ao/3o - « 7- i A - i a 7; if V has stored /? / ,  then it can provide the correct prefixes y and y/J/ to its 
offsprings.
Below we describe in detail the actions of each node. We use a comma to separate con­
currently executable actions, and a semicolon to separate actions to be sequentially executed. The 
ASCENDING PHASE substep below is thought of as preceding the DESCENDING‘ PHASE sub­
step, although various degrees of concurrency are realizable. Note that, for correct synchronization, 
each internal nodeV uses a queue (called /T-queue) capable of storing 2level(V) +  l semigroup ele­
ments (the fij). In addition, each nonroot node has three cells to store the elements to be forwarded 
in the next step; note that, for the root, one of these elements, fij, is "forwarded" to the root itself. 
The contents of all cells are initialized to e, the monoid identity. In summary, the generic step runs
as follows:
10
Generic Step
ASCENDING PHASE
begin forward /? to parent, [root: <t: =  <t^8,]
/?': =  term received from left child,
/?": =  term received from right child; 
insert /S' into /3'-queue;
0: = 07T
end
DESCENDING PHASE
begin forward y to left child, 
forward yd' to right child; 
y: =  term received from parent; [root: y: = cr;] 
extract 5' from ¡3'-queue; 
compute yd'
end
The algorithm is readily implemented by endowing the module of a node with a semigroup 
multiplier and a queue capable to store {2level{V) + 1) = 0(log w) semigroup elements. Thus, the 
total size of the network is S = 0(w  logu;) (ignoring the dependence upon the semigroup size and 
operation). Therefore we have:
Theorem 4. The size-time complexity of the prefix problem on a boolean network is
s = 0m / T )  log(N/T)), for T^[Q(logN),0{N)].
For NCF semigroups the bound of Theorem 4 is optimal, as shown by Theorem 3. For 
T = Q(log N ), the time lower bound of Proposition 1 is achieved. For T = 0(iV), the obvious S = 
12(1) lower bound is achieved.
4.2 Cycle-Free Semigroups
As we have already noted in the concluding remarks of Section 3, the information content of a 
sequence of prefixes in a CF semigroup is only logarithmic in the length of the sequence. This fact 
indicates the possibility of reducing the amount of information relative to a given wavefront that 
the network has to store in the ascending phase for completing processing in the descending phase.
11
The memory requirement in the algorithm of Section 4.1 comes from the necessity to store at 
each node V the product ft' forwarded by the left child for (2level{V) +  l ) steps. In the steady state, 
this implies the simultaneous storage of data relative to (2 level{V) + 1) wavefronts. On the other 
hand, if yfij =  y, the prefix is constant for all leaves of the subtree rooted at V for "the j -th output 
wavefront: in this case just y must be passed to both offsprings. Therefore a record of (fi, 7 ') 
must be kept from the ascending phase only for those values of j  for which y fij^ y , a situation that 
in a CF semigroup can occur at most lA times.
How can this condition be tested in the input phase if y is yet unknown? The following 
scheme is proposed to answer this question. The root of K  maintains a state a initially set to e and 
updated as <x = <T0 , as in the previous algorithm. Each remaining internal node V of K  constructs 
during the ascending phase a history tree H {V ) of depth at most l\ as follows. Each vertex of H{V) 
is labelled with an element of A and is either active or inactive; each arc is labelled either by a, or 
by fij, where j  = 0,1, . . . (N/w — l). Initially, H (V ) consists of its root, labelled by the identity e. 
The general j -th step consists of two substeps: the first for processing (i.e., guessing) a,, the second 
for processing (i.e., observing) jSj. Let a vertex u of H (V) labelled a be active at the beginning of 
the first substep: the offsprings of u are a set of vertices labelled by /aa7:a76A,aaty^a /, where the 
arc from a to act, is labelled a/, u remains active if, for some aJt aa7 =  a. The second substep is 
analogous, except that an active u labelled by a has an offspring only if <2/8, 5* a. In this case, a 
record of /8y is kept at u. In other words, a vertex v of H{V) keeps record only of a transition in 
G(A) caused by an input /?, as the pair (/3,',/3,"). For each V in K, H{V) has 0 (s A) vertices.
When processing the j - th wavefront in the ascending phase, the history tree H(V) can be 
updated at V in time dependent only upon the semigroup size and operation.
When constructing the 7-th wavefront in the descending phase, each nonroot internal node V 
traces its history tree under the control of labels provided by its parent and of labels stored in 
H {V ). Specifically, each instance ao/So^i^i... corresponds to a unique path in the history tree: a, is 
implicitly provided by the parent in the form of the product a0/ 3 o . p i c k e d  up, if nonempty,
12
at the node reached by a^Q...aj. V is now in a position to pass the appropriate terms to its 
offsprings.
In summary, the generic node V of K  performs the following actions:
Generic Step
ASCENDING PHASE
begin forward ¡3 to parent, [root <r: = <r/?,] 
f3': =  term received from left child, 
f3": =  term received from right child; 
update H(V)\
(3: = fi'fi"
end
DESCENDING PHASE
begin forward y to left child and y/T to right child; 
y: = term received from parent; [root y: =  cr;] 
in H{V) move pointer to vertex u labelled y among children 
of current vertex (and obtain /?',/?"));
sft(A,A)then in H(V) move pointer to (unique) child of u 
labelled y fl 'p ” ; 
store y and y(31
end
Finally we note that K consists of 0 {w ) modules, each of size independent of N. Computation 
is completed after 0(N/w) steps both for the input phase and the output phase, and, again, the 
time used by each step is independent of N. The above construction yields:
Theorem 5. For a CF semigroup A the prefix computation for an iV-term sequence can be done in 
time T and size S, with S = 0{N/T), for T 6 [fì( logN),0(N)].
4.3 Insertion Semigroups
The above result holds for any CF semigroup and is clearly optimal. However, for the very 
important case of insertion semigroups the tree module need not be as complicated as outlined 
above. Each internal node V of K still contains a semigroup multiplier, and a queue with 21 \ cells, 
each capable of storing a semigroup element; an additional cell stores the node state stateiV).
13
Initially, for each V in K, state(V): =  e. Nonroot internal node V performs the following 
actions:
Generic Step
ASCENDING PHASE
begin forward /? to parent,
fi': =  term received from left child,
¡3": =  term received from right child; 
if state{V) ft ^  state{V) then 
begin state(V): =  state{V) /$;
insert into queue
end;
P-- =  P 'P "
end
DESCENDING PHASE
begin forward (y/„,/?/J to left child and (y«,/?«) to right child, 
(y,/?): = terms received from parent;
(/?',/?"): =  next pair in queue; 
if =  p 'fi"  then
begin {yLtfiL): =  (y,P'\ (yr ,Pr): = (yfi',P"); 
extract (fi'JH") from queue;
end
else (yL,pL): = (yr ,Pr)’- =  (y,e)
end
Lemma. The above scheme correctly computes the prefix sequence for insertion semigroups.
Proof. The decision whether to retain or not the pair (/?',/?") in the queue at node V rests on the 
condition state{V)f$& stateiV) or, equivalently, (for the j-th  step) ftofii—P j We 
must show that this condition is implied by
<*oP9<*iPi~Pi-iatjPj *  _ ijij-ictj.
Indeed fa-.-Pj-iP j =  0 O...fij-i implies 0 O = p Qa l. . . p by the insertion property (1) 
and therefore a o / ? o — ao/?o—a ,- D  
We can therefore conclude:
Theorem 6. For an insertion semigroup A, the prefix computation for an iV-term sequence can be 
done in time T and size S with S = 0{N /T ) for T€ [S2( \ogN),0{N)]. The memory used by each
14
nonroot module is 0(1 \ • logs) bits, where s =  JA| and is the height of the Cayley graph of A.
Proof. Indeed, the result S = 0(N/T) follows from Theorem 5. Each nonroot module has a queue 
of 2lA cells, each with [ log2s] bits. □
5. Open Problem s
In this paper, we have considered the computation of the prefixes of an N — term sequence of 
semigroup elements on boolean networks. We have completely characterized the size-time complex­
ity of the networks as a function of N.
The major outstanding problem is the investigation of the dependence of network complexity 
upon semigroup size and operation. For example, in Theorem 6 we have shown that, for the impor­
tant case of insertion semigroups, the upper bounds of Theorem 5 can be considerably improved. 
However, the construction of prefix boolean networks which are optimal also with reference to semi­
group size remains an open problem.
15
6. Acknow ledgem ent
We are indebted to D.E. Muller for his valuable suggestions.
[B81]
References
G.M. Baudet, "On the area required by VLSI circuits,” in H.T. Kung, R. Sproull, and G. 
Steele (eds.) VLSI Systems and Computations, pp. 100-107, Computer Science Press, 
Rockville, MD, 1981.
[BK82] R.P. Brent and H.T. Kung, ”A regular layout for parallel adders,” IEEE Transactions on 
Computers, vol. C-31, n. 3, pp. 260-264, March 1982.
[BP85] G. Bilardi and F.P. Preparata, 'The influence of key length on the area-time complexity 
of sorting,” Proceedings I.CAJL.P., (Ed. W. Brauer) Nafplion, Greece, Springer-Verlag, 
pp. 53-62, July 1985.
[BP86a] G. Bilardi, and F.P. Preparata, "Area-time lower-bound techniques with applications to 
sorting,” Algorithmica, vol. 1, n. 1, pp. 65-91, 1986.
[BP86b] G. Bilardi and F.P. Preparata, "Digital filtering in VLSI,” Proceedings of Aegean 
Workshop on Computing (Eds.F. Makedon et al.) Loutraki, Greece, Springer-Verlag, pp. 
1-11, July 1986.
[CFL83] A.K. Chandra, S. Fortune, R. Lipton, "Unbounded fan-in circuits and associative func­
tions," Proceedings of the 15th Annual Symposium on Theory o f Computing, Boston, 
MA, pp. 52-60, April 1983.
[CS85] R. Cole and A. Siegel, "On information flow and sorting: New upper and lower bounds 
for VLSI circuits,” Proceedings 26th Annual Symposium on the Foundations of Computer 
Science, Portland, OR, pp. 208-221, October 1985.
[DC80] P.W. Dymond and S.A. Cook, "Hardware complexity and parallel computation," 
Proceedings 21st Annual Symposium on the Foundations of Computer Science, Syracuse, 
NY, pp. 360-372, October 1980.
[F83] F.E. Fitch, "New bounds for parallel prefix circuits," Proceedings 15th Annual ACM  
Symposium on Theory of Computing, Boston, MA, pp. 100-109, April 1983.
[J80] R. B. Johnson, 'The complexity of a VLSI adder," Information Processing Letters, vol. 
11, n. 2, pp. 92-93, October 1980.
[K78]
[KRS85]
D.J. Kuck, The structure o f computers and computations, New York: Wiley, 1978.
C.P. Kruskal, L. Rudolph, M. Snir, "The power of parallel prefix," IEEE Transactions on 
Computers, vol. C-34, n. 10, pp. 965-968, October 1985.
[LF80] R.E. Ladner and M.J. Fischer, "Parallel prefix computation," Journal of the ACM, vol. 
27, n. 4, pp. 831-838, October 1980.
[S86] M. Snir, "Depth-size trade-offs for parallel prefix computation," Journal of Algorithms, 
vol. 7, no. 2, pp. 185-201, June 1986.
[T80] C.D. Thompson, "A complexity theory for VLSI," PhD. Thesis, Department o f Computer 
Science, Carnegie-MelUm University, August 1980.
