Purdue University

Purdue e-Pubs
Department of Electrical and Computer
Engineering Technical Reports

Department of Electrical and Computer
Engineering

6-1-1987

A Decomposition Approach for Balancing LargeScale Acyclic Data Flow Graphs
P. R. Chang
Purdue University

C. S. G. Lee
Purdue University

Follow this and additional works at: https://docs.lib.purdue.edu/ecetr
Chang, P. R. and Lee, C. S. G., "A Decomposition Approach for Balancing Large-Scale Acyclic Data Flow Graphs" (1987). Department
of Electrical and Computer Engineering Technical Reports. Paper 567.
https://docs.lib.purdue.edu/ecetr/567

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.

A Decomposition Approach
for Balancing Large-Scale
Acyclic Data Flow Graphs
P. R. Chang
C. S. G. Lee

TR-EE 87-23
June 1987

School of Electrical Engineering
Purdue University
West Lafayette, Indiana 47907

A Decomposition Approach for Balancing
Large-Scale Acyclic Data Flow Graphs

P. R. Chang and C. S. G. Lee

School of Electrical Engineering
Purdue University
West Lafayette, Indiana 47907

TR-EE 87-23

Abstract

In designing VLSI architectures for a complex computational task, the functional
decomposition of the task into a set of computational modules can be represented as
a directed task graph, and the inclusion of input data modifies the task graph to an
acyclic data flow graph (ADFG). Due to different paths of traveling and computation
time of each computational module, operands may arrive at multi-input modules at
different arrival times, causing a longer pipelined time. Delay buffers may be inserted
along various paths to balance the ADFG to achieve maximum pipelining. This paper
presents an efficient decomposition technique which provides a more systematic
approach in solving the optimal buffer assignment problem of an ADFG with a large
number of computational nodes. The buffer assignment problem is formulated as an
integer linear optimization problem which can be solved in pseudo-polynomial time.
However, if the size of an ADFG increases, then integer linear constraint equations
may grow exponentially, making the optimization problem more intractable. The
decomposition approach utilizes the critical path concept to decompose a directed
ADFG into a set of connected subgraphs, and the integer linear optimization tech
nique can be used to solve the buffer assignment problem in each subgraph. In other
words, a large-scale integer linear optimization problem is divided into a number of
smaller-scale subproblems, each of which can be easily solved in pseudo-polynomial
time. Examples are given to illustrate the proposed decomposition technique.

This work was supported in part by the National Science Foundation Engineering Research Center
Grant CDR-8500022.

1. Introduction

With the advent of VLSI technology, the rapid decrease in computational costs,
reduced power consumption and physical size, and increase in computational power
suggest that an interconnection of VLSI processors, which are configured and
arranged based on a functional decomposition of the computational task to exploit
the great potential of pipelining and multiprocessing, provides a novel and costeffective solution for many computational problems in pattern recognition [6], signal
processing [12], and robotics [14]-[15]. This type of computational structure has been
referred to as a systolic array or system [10]. One of the main advantages of using a
systolic array is that each input data item can be used a number of times once it is
accessed, and thus, a high computation throughput can be achieved with only a mod
est bandwidth. Other advantages include modular expandability, and simple and reg
ular data and control flow.
In general, a computational task of interest is partitioned or decomposed into a
set of smaller computational modules, and the interconnection of these computational
modules can be represented as a directed task graph. The inclusion of input data
modifies the task graph to an acyclic data flow graph (ADFG). The nodes of an
ADFG correspond to the computational modules, each of which can be realized by a
linear pipelined functional unit for increasing the system throughput [ll]. The
operands or data move along the edges, each of which connects a pair of nodes. Due
to different computational time of the modules, data flow (both inputs and results
from one module to another) in an ADFG may occur at different speeds in different
directions. Thus, operands may arrive at multi-input modules at different arrival
times, causing an unnecessary longer pipelined time in the ADFG. A conventional
approach is to insert delay buffers (FIFO queues) at various paths to buffer the inputs
or the output results from one module to another to achieve a balanced (or synchro
nous) ADFG. This is exemplified in Figure 1(a) which is a graph and consists of nodes
A, D, and C whose numbers of computing stages are assumed to be, respectively, 3,
5, and 6. From the figure, there are two paths from node A to node C. For path 2,
it takes 5 computing stages before an operand arrives at node C, while path 1
requires no computing stages. Node C can not start computation until all of its
operands are available. As a result, the second set of data values can not be fed into
the pipeline in 5 computing stages because data will only exist in path 1. So the
minimum latency of the pipeline is greater than 5 computing stages and the maximum
throughput is less than l/5. To eliminate this undesirable behavior so that successive
data of an array may pipeline through the ADFG with maximum throughput, a delay
buffer D which is equivalent to 5 computing stages can be inserted in path 1 so that
the “length” (or the cost) of path 1 and path 2 will be balanced (Figure 1(b)). Thus,
the latency of the ADFG will be decreased to one computing stage, and maximum
pipelining can be achieved. Once the balanced ADFG has been established, a

systolization procedure can be used to transform the balanced ADFG into a systolic
array [16].
■
The problem of balancing a directed ADFG by inserting appropriate buffers along
appropriate paths to achieve maximum pipelining has been solved previously by the
cut-set theorem [ll]-[l2], the local correctness criterion [12], and the graph-theoretic
approach [4]-[5]. Furthermore, Hwang and Xu [9] showed that the balanced ADFG
can be realized in a two-level pipeline network which is reconfigurable and provides
the flexibility in various vector processing applications. The delay matching may be
handled by programmable buffers, so that proper non-compute delays can be inserted
in each data flow path. An example is the design of the LINC chip [8], which is an 8by-8 crossbar up to 32 units of programmable delays in each data flow path,
This paper presents an efficient decomposition technique which provides a more
systematic approach in solving the optimal buffer assignment problem of an ADFG
with a large number of computational nodes. Since it is of vital importance to minim
ize the number of buffers used in a systolic system to minimize the design cost, the
optimal buffer assignment problem is formulated as an integer linear optimization
problem, which can be easily solved in computers in pseudo-polynomial time [18].
However, if the number of computational nodes in an ADFG is quite large, then
integer linear constraint equations may grow exponentially, making the optimization
problem more difficult than it should be. The construction of integer linear constraint
equations in a large-scale ADFG reveals the existence of many redundant integer
linear constraint equations; so, the optimization problem may become intractable.
The redundant integer linear constraint equations come from the path overlapping
between two paths of two different multi-input nodes. They can be removed easily by
recognizing the overlapping path (or common path) traversed by different paths. In
an effort to reduce the difficulty of optimizing a large number of integer linear con
straint equations, an efficient and systematic decomposition technique is proposed to
recognize all the decomposable subgraphs in an ADFG and generate their associated
integer linear constraint equations. The decomposition approach utilizes the critical
path concept to decompose a directed AJDFG into a set of connected subgraphs, and
the integer linear optimization technique can be used to solve the buffer assignment
problem in each subgraph. In other words, a large-scale integer linear optimization
problem is divided into a number of smaller-scale subproblems, each of which can be
easily solved in pseudo-polynomial time. Examples are given to illustrate the decom
position approach; and, finally, the proposed decomposition technique is used to bal
ance an interconnection of CORDIC (Coordinate Rotation Digital Computer [l], [20])
processors for computing the robot inverse kinematic position solution [15].

■ :.Y- .

;

:

' - 4-

•

.

2. Formulation For Balancing Acyclic Data Flow Graphs
In formulating the optimal buffer assignment problem, we shall assume that the
number of computing stages of any computational module of an ADFG is finite and
that the execution time of any stage is a constant, called a basic time unit or stage
latency. An ADFG is maximum pipelined if the minimum number of time units
needed for obtaining two successive outputs from the pipeline is equal to one basic
time unit. Before giving a formal formulation of the balancing problem, we concen
trate our interests on single input single output (SISO) ADFG’s and introduce some
necessary definitions for formulation:
Definition 1: A weighted ADFG GW = (V-., E , W) corresponding to an ADFG
G = [V , E) is a weighted direct graph where W is a weight function from E to a set
of non-negative real number. V = (v1,v2, ' ' ' ,vn ) is a finite set of computational
nodes (or modules), and E = (e1,e2, • • , en ) is a finite set of edges. An edge con
necting node to node Vj is denoted by e (i , j).
A logical way to convert an ADFG to a corresponding weighted ADFG is to assign
weights to each output edge of a computational node such that the weight assigned, to
each edge equals to the number of the computing stages of the computational node.
For example, the weight w(e(i , j)) assigned to the edge eft , j) equals to the
number of computing stages of node v,-.
Definition 2: The cost (or weight) of any k th path ^(up ,vq) from node vp to
node vq can be defined as the sum of the weights of all edges along the path. That is,
w{h) =
E
w(c_(»y)>.
e(*.
,»,)
Thus, the cost of a path from node vp to node vq equals to the number of computing
stages needed for an operand to travel along the corresponding path from node vp to
node Vg.
Definition 3: A weighted ADFG GW with an input node u0 is said to be bal
anced if the cost for any two different paths from the input node u0 to an arbitrary
multi-input node uk is equal.
This definition indicates that a balanced ADFG achieves maximum pipelining. Unfor
tunately, most ADFG’s derived from given tasks are usually unbalanced. To balance
an ADFG, appropriate delay buffers must be inserted along appropriate paths from
the input node u0 to any particular multi-input node of interest. Thus, any different
paths from the input node u0 to a multi-input node will have equal costs. The
appropriate buffering graph in which delay buffers are inserted to balance an unbal
anced ADFG can be defined as:

Definition 4: Let GW — (V , E , W) be a weighted ADFG and
GB = (V , E , WB) be a corresponding weighted graph, where the weight WB
corresponds to the buffering introduced on E. Then, GB is called a buffering graph of
GW. Furthermore, an ADFG GW — {V , E , W ) can be composed from GW and
GB such that w (e (i , j)) = w(e (i, j)) + wb (e (t, j)) ; for all e(i , j) E E, where
wb (e (i , j)) is the weight of the buffers from node u* to node Vj. If GW is a balanced
ADFG, then GB is a balanced buffering graph for GW.
It can be shown that a buffering graph GB for a corresponding GW always exists,
though it may not be unique. In order to minimize the cost for implementing an
ADFG in a VLSI device, it is desirable to obtain a balanced buffering graph with a
minimum number of delay buffers.
Since the cost for any two different paths from the input node uQ to an arbitrary
multi-input node uk must be equal for a balanced ADFG, buffer delays can be applied
to balance the cost for all paths from the input node u0 to a multi-input node uk.
Assume U = {u0, ul,u2,
, un} is a finite set of all multi-input nodes and the
input and end nodes in GW and there are mk paths from the input node uQ to a
multi-input node uk, that is, 4>i{uoiuk)\ , 1 < l <
and 1 < k < n. The critical
path, (fit’ (uk) of a multi-input node uk in GW is the path from the input node uQ to
the node uk, 1 < k < n, having the “heaviest” path weight defined as
wc(uk) 4 wc ((f),- M) = , max

E

1 < / <mk e(* ,i)e <t>i{uh)

w(e(i , j))

(1)

No other path from the input node u0 to the node uk can have a path weight greater
than the critical path weight wc (uk). Thus, the cost of the critical path from the
input node u0 to the end node un constitutes the initial delay time of the pipeline. In
order to balance an ADFG, buffers B(e(i,j)) are introduced to insert into appropri
ate paths <j>[{uk), from the input node u0 to a multi-input node uk, 1 < k < n, to
achieve all paths entering the node uk to have the same cost- That is,
S
e(* ,y)E <l>t{uk)

«'(e(*'.i'))+

S

I B(e

|

(2)

B(e[i ,j)) e <j>i{uk)

The critical path
Buffer stages added
cost of uk
+ to the critical path
of uk in GB
in GW

f We use the notation
,uk) to indicate an / th path from node Uj to node U^ . If node U2* is the
input node Uq, then (f>i[uQ^uk) = <j>i{uk).

= wc(uk) +

E

I if(e (z , y)) I

B(e(i , j)) e t^iuk)

where \ B(e(i , j)) | is the weight or the number of computing stages in the buffer
B[e (i , j)), 1 < l < mk and 1 < k < n. The first term in Eq. (2) is a constant and
can be easily computed. The problem of finding all critical paths of uk, 1 < k< n, is
known to be solvable by applying Bellman’s equation with time complexity of
0( |AT |2) [13], where N is the number of computational nodes in the GW.
Since it is desirable to minimize the initial delay time of the pipeline so that it
equals to wc (un), no buffers B{e(i ,j)) should be assigned to the critical path (j>t> (un)
of the end node un. We can state this fact in a lemma.
Lemma 1. The critical path <f>t* (un) of the end node
buffer stage variables.

is independent of the

Taking this into consideration and rewriting Eq. (2), we have
'

-E .
S(e (* , y)) €
= [ wc{uk) ....

■ :/i- |B(e(i,j)) |>:
K)
-B(e(*.i)) 6
E
«(*'.;)£

|Hlo!»'-i))l

w

w(e(f,y)) ] = 6(e ,n)

where b (e , n) is a computed integer constant, | B(e (z , j)) | are undetermined buffer
stages, 1 < / < mk, 1 < k < n, and the notation
denotes set subtrac
tion and is defined as </>i{uk)/<f>i; {un} = <f>i(uk)

{4>i{uk) Pi

(un))>

Equation (3) is a set of linear simultaneous equations and can be expressed in a
matrix-vector form as Ax — b, where A is a matrix introduced from the paths, x and
b are unknown buffer stage vector and constant vector, respectively. The solution x
is usually not unique, however, we can impose some restrictions on the problem to
become an integer linear optimization problem. That is, we would like to minimize
the total number of buffer stages in a balanced buffering graph GB,
Minimize the total number of buffer stages in GB
= Min
‘*

S

':.EEv'(4)ft

Subject to the equality constraints of

'

•

•

E
B(e(i . j)] 6

I B(e (i ,j))
n

(«»)'

E - .

| J3(e (z , j)) I = 6(e , n)

B{e (t , j)) e <f>,> M/t,*'(«»)

. (5)

and
| B(e (i , j)) | > 0, integer

(6)

where 1 < l < mk and 1 < <: < ». The above integer linear programming problem
can be solved in pseudo-polynomial time [18].
In the above buffer assignment problem, the number of buffer stages are obtained
from the solution of the integer linear programming problem and the buffers are
placed on the edges in the buffering graph GB corresponding to the GW except the
critical path <^* (un) of the end node un. In order to reduce the total number of
buffer stage variables in the optimal buffer assignment problem, a useful equivalent
transformation on a balanced buffering graph is introduced. A transformation of a
balanced buffering graph GB with respect to a weighted ADFG GW by adjusting the
position and amount of its buffering is said to be an equivalent transformation if the
new transformed buffering graph GB is also a balanced buffering graph (since a bal
anced buffering graph is not unique) with respect to the weighted ADFG GW. In gen
eral, the equivalent transformation has the following three properties:
(a) A buffer stage can be moved along a chain which is defined as a directed path in
a buffering graph GB such that the incoming and outgoing edge for all nodes
along the path is equal to one, except the starting and ending nodes of the chain.
(b) Two or more buffers on the same chain can be combined together to form a new
buffer which has the same number of computing stages as the sum of these
buffers.
(c) Combination of properties (a) and (b).
Based on the equivalent transformation of a balanced buffering graph GB, we
can move the buffers along the chains of GB to multi-output nodes (or multi-input
nodes). The new balanced buffering graph GB has the same properties as the GB,
with the buffers attached to the multi-output nodes (or multi-input nodes). We say
that the new balanced buffering graph GB is normalized. As an example, in Figure
2(a), paths A—B—C—D, E—F—D, and E—G—D are chains. By combining the
buffers along the chains and moving the resultant buffers to the output edges of the
multi-output nodes A and E, we arrive at a new balanced buffering graph GB as
shown in Figure 2(b).
With the equivalent transformation on a balanced buffering graph, the optimal
buffer assignment problem can be reformulated for the normalized balanced buffering
graph instead of the balanced buffering graph. This, in effect, greatly reduces the

\v,

.

.

- 8 -

■

,

total number of buffer stage variables because these variables are attached to multioutput (or multi-input) nodes. While constructing the integer linear programming for
mulation for the normalized balanced buffering graph for a weighted ADFG GW, it
can be shown that many redundant integer linear constraint equations (in Eq. (5))
exist, making the optimization problem more difficult than it should be. The redun
dant integer linear constraint equations come from the path overlapping between two
paths of two different multi-input nodes. They can be removed easily by recognizing
the overlapping path (or common path) traversed by the different paths. A path
decomposition technique is utilized to remove redundant integer linear constraint
equations. Let <!>i{uk) denote an l th path from the input node uQ to a multi-input
node uk which passes through some other multi-input nodes. Among these multiinput nodes, a multi-input node u which is nearest to the node uk is selected to
decompose the path <f>i(uk) into two sub-paths, that is, <Pi{uk) = <j>i(u ) + <f>i(u , uk).
Thus, the integer linear constraint equations of the path <f>i{uk) with respect to the
node uk can be written as:
E
■'«<(*(*'./)):+
eft ,]) £ fa[vt)
E

£
(*', j)) €

«'(*(*,i)) +

e(* ,j) e <£;(«*)

+

l«!'iv.yi)i

E V

;

m

w(e(f,y))

e(i ,j) C: «>/(«* . «i)

^

| I

J3(e(t.y)) £ <£,(«*)

£

,

Inl.' (*' • •>')) I

B{e:(i,j))e<i>,{u‘ ,ut)

where 1 < l < mk. Using Eq. (2) for the path 4>t(u ) to the node u , Eq. (7) becomes
E

™(e(i,j)) +

= wc{u*)+

E

£

E
,

1 B(e(i,j)) j

\B{e(i,j))\+

| B(e (t, j)) | +

B{e{i,jj) €

(8)

E
e (i , j) E

> “*)

Using the result of Lemma 1 and Eq. (3), Eq. (8) becomes
E

|^B(e (z , i» ]+

B(e(i ,y)) e Mu‘ . «t)/^,*{«n)
n

E ;

|yB(e (z , j» |

B(e(t, j)) € ^( *(“*)/^(•(«„)
n

-

E
\B(e(i,j)) |
B(e(.» ,y)) e ^*K)/^*K)

= [ ™CK)

wc(u*) -

E

(9)

w(e{i ,j)) }

e{* ,-j) £ Mu* • uk).

With the above procedure for reducing redundant equations, the integer linear
constraint equations for the normalized balanced buffering graph with respect to a
weighted ADFG GW can be constructed according to the Procedure ILEG (Integer
Linear Equation Generator) listed below.
Procedure ILEG (GW ,ILCE(GB)). This procedure generates integer linear
constraint equations ILCE(GB ) for a normalized balanced buffering graph GB with
respect to a given weighted ADFG GW with labeled nodes.
Input: A weighted ADFG GW with labeled nodes.
Output: A set of integer linear constraint equations, ILCE(GB ), for a normalized

balanced buffering graph GB with respect to the given weighted ADFG
GW with labeled nodes.
Step 1. [.Determine all critical paths] Find all the critical paths 4>^(uk) and the cost of
each critical path wc (uk) with respect to a multi-input node uk, 1 < k < n,
by applying the Bellman’s equation [13].
Step 2. [Assign, buffer stage variables] Assign buffer stage variables to the output (or
input) edges which are attached to multi-output (or multi-input) nodes,
except for the output (or input) edges belonging to the critical path of the
end node un. Output edges will be preferred if output and input edges are
on the same chain. It is worth pointing out that a node may be both multiinput and multi-output node.
Step 3. [Generate integer linear constraint equations] For any path 4>i(uk) with
respect to a multi-input node uk , 1 < / < mk , 1 < k < n, if <f>i(uk) does not
pass through any other multi-input nodes, then we have
S

| B(e(» ■,/)).| -

B{e{i,j)) e i>i(ui)/i>l{un)
n

= [ wc(uk) -

E
e (* > j) €

£

| B(e(i ,j))| (10a)

B{e{i , j)) E <j>{uk)/<j>{u^)
k
n

w(e(f , i)) ]
:

Otherwise, a multi-input node u 6 4>i(uk) nearest to the node uk is selected
for path decomposition

-

E
B(e(i, j)) 6

10

-

I B(e(i,j)) |+

E /

\B(e(i,j))\

B{e{i , j)) e i>l*{u')/i>tiun)

71

n

-

E
B(e(i;j)) e

tC

= [ wc(uk) - wc(u*) -

I B{e(i,j)) |

(10b)

Tl>

E

w(e(* >i)) ] •

e(i,j) € </'/(«* , tii)

Note that the paths and their costs between two multi-input nodes may be
found with time complexity 0(n3) by using the path-finding algorithm [2].
Step 4. [Output integer linear constraint equations] Output the integer linear con
straint equations from Eq. (10a) or Eq. (10b) and return.
END ILEG

Let us illustrate the above Procedure ELEG by an example. Figure 3(a) shows a
weighted ADFG GW. We would like to obtain an optimal normalized balanced
buffering graph GB corresponding to the GW.
Step 1. Nodes G, J, K, and M are multi-input nodes. Then the critical path for
(a) Node G:
(b) Node J:

(G) = Path A-C-G , wc (G) = 25.
= Path A—C—G—J, wc (J) = 31.

(c) Node K: </>2‘ (K) = Path A—C—G—K, wc (K) = 31.
(d) Node M:

(M) = Path A-C-G-J-M, wc (M) = 41.

Step 2. Applying the buffer assignment rules, we obtain the normalized buffering
» graph as shown in Figure 3(b).
Step 3. The integer linear constraint equations are generated according to Eq. (10a)
or Eq. (10b):
(a) The integer linear constraint equations generated with respect to the
paths associated with node G:
(i) Path A-B-E-G: | Bx \ + | B4 \ = (25-5-6-2) = 12.
(ii) Path A—B—G: \ Bx \ + | B5 \ = (25-5-6) = 14.
(b) The integer linear constraint equations generated with respect to the
paths associated with node J:
(i) Path A—B—F—J: \ Bx \ + | J33 | = (31-5-6-12) = 8.
(c) The integer linear constraint equations generated -with respect to the
paths associated with node K:

■ - 11 -

,■ '

.

(i) Path A—D—II—K: \B2 | + | B6 | - | B8 \ = (31-5-7-6) = 13.
(ii) Path A—D—I—K: \ B2 \ + \ B7 \ - \ B8 \ = (31-5-7-10) = 9.
The above integer linear constraint equations have been generated according
to Eq. (10a). The following ease will show the integer linear constraint equa
tions generated by using Eq. (10b).
.(d) In generating the integer linear constraint equations with respect to the
paths associated with node Af, we select u = K as the multi-input
node nearest to the node M and in the path <f>i(M) from the input node
u0 passing through the buffer | B7 | to the node Af. The path <^/(A4) can
be decomposed into two sub-paths, that is, 4>i(M) = <^/(u ) + <f>i(u ,Af).
According to Eq. (10b), we have | B8 | + | Bg | = (41—31—8) = 2.
Step 4. Applying the integer linear programming to minimize the total number of
buffer stages, we have:
:

'

9

Minimize

IA I
t==l

y“:-

subject to the constraints of the integer linear equations generated in Step 3.
The optimization gives |
| = 8, 13B2 | — 9} | B3 | = 0, | 54 | = 4,
| S5 I = 6, | S6 | = 4, | B7 1 = 0, |S8 | = 0, \Bg I = 2, and the total number
of buffer stages is 33.
3. Formulation for Decomposition Approach

The previous section indicates that the optimal buffer assignment problem can be
solved by formulating it as an integer linear optimization problem. If the task graph
is simple, then the buffer assignment problem can be easily solved as illustrated in the
above example. However, if the number of computational nodes in an ADFG is quite
large, then integer linear constraint equations may grow tremendously, making the
optimization problem more intractable. Thus, a systematic approach in reducing the
computational difficulty in a large-scale integer linear optimization for the buffer
assignment problem must be devised. A decomposition approach, which utilizes the
critical path concept to decompose the task graph into a set of connected subgraphs
from which the integer linear optimization technique can be used to solve the buffer
assignment problem in each subgraph, will be addressed in this section.
Lemma 2. If a multi-input node uk €
and its critical path is <%(«*),
then 4>i^uk) C ^»(un), that is, 4>lk{uk) is the path from the input node u0 to the node
uk along the critical path ^‘(un), where un is the end node.
Proofs We shall prove Lemma 2 by contradiction. Assume that Lemma 2 is not

true, then there exists a critical path <^» (uk) for node uk which is not a path segment

-12

-

of the critical path <f>^ (un) for node un and its cost u;(<^*(uj.)) is greater than the cost
of any other path from the input node u0 to the node u,k, that is, we have w(^^(uk))
> w(4>l-(u0, uk)), where <j>i’(uoi uk) is the path from the input node u0 to the node uk
along the critical path
Let
be the remainder of the critical path of
that is, 4>[*(un) - <f>i*{u0,uk) + ^(uk)un). A new path ^;(un ) can be con
structed by connecting the two subpaths, that is,
un ) — ^i*(uk) + <j)l>(uk,un).
However, the cost of <^ *( un ) is greater than the cost of ^*(uR), that is, w((^>l*(un)) =
w(($i;(uk)) + w(((Pi:(uk>un)) > w(0i;(uo>uk)) + w{4>ifok,un)) = w((f>i*[un)) = wc(nn).
This conclusion contradicts the definition of the critical path, thus, ?>/.*(%) =
<f>ik{uk) Q
□
Definition 5: Let GW = (V,E,W) be an undirected graph with N = | V [ and
M — \ E |. A connected component 7Tm of GW is a maximal connected subgraph,
which is a connected subgraph that is not contained in any larger connected sub
graphs.
Definition 6: A directed block 7fw of a directed graph GW is a directed sub
graph and its corresponding undirected subgraph 7rm (i.e. 7Tm = Undirect1' (7fm)) is a
connected
component
of
the
corresponding
undirected
graph
GW
(GW = Undirect(GW*)).
The problem of finding all the connected components of an undirected graph GW
may be solved with the time complexity of O (iV + M) by using the depth-first search
algorithm SEARCH (GW, ixm) in [3], where GW is an input undirected graph and 7rm,
1 < m < mcc, are output connected components, where mcc is the number of the
directed blocks in the corresponding directed graph GW of GW. The problem of
finding the directed blocks 7fm of a given directed graph GW may be solved by a
modified deptln-first search algorithm which is described in the Procedure DBS1
(Directed Blocks Searcherl) listed below:
Procedure DBS! (GW* , 7?m). This procedure finds all the directed blocks of a
$
given directed graph GW .
Input: A directed graph GW

*

Output: The directed blocks of GW *,
$
the directed blocks in GW .

^The notation

Undirect (jfm)

7?m,

1 < m < mcc, where mcc is the number of

means taking the directed arrow of 7fm out.

Step 1. [Obtain the undirected graph of GW ] Let GW = Undirect [GW ). That is,
remove the directed arrow of GW .
Step 2. [Determine undirected connected components of GW\ Find all the undirected
connected components, 7rm, 1 < m < mcc, of GW by the depth-first search
algorithm SEARCH [GW
Step 3. [Determine directed blocks] Obtain all the directed blocks 7fm, 1 < m < racc,
by assigning the directed arrow back to 7Tm, 1 < m < mcc, according to the
*
.
•
,
input directed graph GW .
Step 4. [Output the directed, blocks) Output all the directed blocks 7?m, 1 < m < mcc.
END DBS 1

The connected components 7Tm from the algorithm SEARCH (GW, 7Tm) and the
directed blocks 7fm, 1 < m < mcc, from Procedure DBS1 will be used in our decompo
sition approach in obtaining a set of connected subgraphs from which the integer
linear optimization technique can be applied to each subgraph to solve the buffer
assignment problem. Our decomposition approach utilizes the critical path of the end
node un, i.e., </>p( un), as a cut set to partition an ADFG GW into several subgraphs.
The procedure of graph partition and the determination of decomposed subgraphs (or
directed blocks) is called graph decomposition [19]. The idea of the graph decomposi
tion approach is first to take the critical path of the given directed graph out. This
creates several edge disjoint subgraphs with some of the edges not connecting a pair
of nodes because the nodes in the critical path are removed. In order to remedy this,
nodes that are in the critical path ^«(un ) and are attached to two or more edges
(incoming or outgoing) are called the decomposed nodes and denoted by
(the ktb.
decomposed node); each of these decomposed nodes uk will be “splitted” into several
independent pseudo-nodes uk, 1 <i^ < d^, which are labeled according to the
attached edges from left to right, and the last pseudo-node u^ is always assigned to
the kth decomposed node in the critical path 4>i*(un), where df. is the number of
independent pseudo-nodes for the kth decomposed node. Thus, a new directed graph
GW* containing splitted directed subgraphs of the ADFG GW can be obtained by
removing the critical path <j>i‘(un) and “splitting” the decomposed nodes. That is,
GW* = (GW/<f>i’(un )) U {labeled pseudo-nodes ulk, 1 < i < (dk — l), 1 < k < kDN},
where kpjy is the number of the decomposed nodes in GW. The determination of the
directed blocks 7fm of an ADFG GW when the critical path ^*(un ) is taken out is
|
^
very similar to the Procedure DBS! for finding the directed blocks 7?m of GW . The
directed blocks 7fm and H'm are always equivalent except for the existence of the
pseudo-nodes, uk. The procedure for determining the directed blocks 1fm of an ADFG

■

14-

■

GW when the critical path <j>i,['un) of the end node un is taken out can be described
in the following Procedure DBS2 (Directed Blocks Searcher2):
Procedure DBS2 (GW,7fm). This procedure finds all the directed blocks of
GW when its critical path <j)^( un ) is taken out.
Input: A weighted graph GW and its critical path <^* (un ) of the end node un.
Output: The directed blocks, 7fm, 1 < m < mcc, of the ADFG GW when the critical
path <f>i ♦( un ) of the end node un is taken out.
Step 1. [Remove the critical path in GW and label the decomposed nodes]
(i)
Obtain all the subgraphs from the ADFG GW by removing the critical
path
un ) of the end node un and splitting the decomposed nodes
(ii)

^Jfc> 1 ^ — ^DN‘
Label the independent pseudo-nodes of the decomposed node uk, that is
Uk, 1 < * < dk, and uk =
where ® is the direct sum
of the pseudo-nodes coming from the same decomposed node.

Step 2. [Construct GW ] Construct a new directed graph GW which is the splitted
directed subgraphs with labeled pseudo-nodes in Step 1
^DN {dk — l)
gw* =(GWM-(u„))U(,U
UI - {»»}■)•

n

A;=l

t=l

Step 3. [Find the directed blocks of GW*] Apply the Procedure DBSl to find the
directed blocks lt'm of GW*, that is, DSBl (GW*, t4).
Step 4. [Identify and merge pseudo^nodes in each directed block] Determine the
labeled pseudo-nodes which come from the same decomposed node and are in
the same directed block tm. These labeled pseudo-nodes will be merged into
a big labeled pseudo-node by the direct sum operator ®.
Step 5. [Determine and output the directed blocks 7fm] Obtain 7fm from 7fm by apply
ing the pseudo-nodes merging procedure in Step 4 and output 7fm,
1 < m < mcc.
END DBS2

Using the Procedure DBS2 (GW , 7fm), we can obtain all the directed blocks of
GW, 7fm, 1 < m < mcc. Furthermore, new subgraphs can be constructed from vfm
and defined as
=.Ttm UlDN 0/n*( un ), for 1< m < mcc , where the operator LW
means performing the set union of
and .<f>^ *( un ) (except the pseudo-nodes) and the
direct sum on the pseudo-nodes coming from the same decomposed nodes in 7fm and

■v;:: ■

;■ ;■ : ;,

■■ - isy'v'.

^^simultaneously. These new subgraphs are called pseudo-connected com
ponents of the AJDFG GW and will be used to decompose the buffer assignment prob
lem into several small subproblems.
It has been shown in section 2 that the buffer stage variables in GB are deter
mined from solving the associated integer linear constraint equations which are
obtained from the Procedure ILEG. Let KB^ be a normalized balanced buffering
graph for 7?* and ILCE (ttb^) be the associated integer linear constraint equations
which are obtained from the Procedure ILEG. Since an ADFG GW may have a large
number of nodes, determining the buffer stage variables in GB from its large number
of integer linear constraint equations may not be desirable. Since GB =

Up/v

m—1 •
we would like to use this fact to see whether solving the buffer stage variables in each
7TB*, 1 < m < rncc, separately and independently is equivalent to solving the buffer
stage variables in GB. If this is true, then we have divided a large-scale integer
linear optimization problem into mcc smaller-scale subproblems, each of which can be
easily solved. This is stated in Theorem 1.
Theorem 1. Let GB and 7TB1 <m < mcc, be, respectively, the normalized
balanced buffering graphs of GW and its pseudo-connected components 7fj£,
1 < m < mcc. The buffer stage variables in GB can be determined from their associ
ated integer linear constraint equations, ILCE {ttb^), 1 < m < mcc, separately and
independently. Furthermore, the buffer stage variables determined from the integer
linear constraint equations, /LC'E (ttb^), have no relations to the buffer stage vari
ables determined from the equations, ILCE (kb*2)1 where ml ^ m2.
Proof: In order to prove the above theorem, we follow the procedure for con

structing the associated integer linear constraint equations for GB and show how they
can be replaced by ILCE (ttb*), 1 < m < mcc. For convenience, we assume there is a
multi-input node uk in both GB (or the corresponding GW) and ttb^ (or the
corresponding 7f+). 7is the mth pseudo-connected component of GB. Assume
that the associated paths from the input node uQ to the node uk in GB (or GW) are
4>l(uk), 1 < l < mk. Two cases are possible: (1) some of these paths pass through
7TB + only, and (2) some of them pass through some other pseudo-connected com
ponents of GB. In case (1), because the paths in GB are also the paths in 7TBwe
will obtain the same resulting associated integer linear equations for the paths in GB
and the paths in 7TBIn case (2), the paths from the input node u0 to the node uk
may pass through some other pseudo-connected components, but they must intersect
the critical path 4>^(un) of the end node un at some nodes, and finally end at the node
uk in 7rB*. It has been shown previously that a multi-input node u , which is on the
critical path <^/*(uR) and nearest to the node uk, can be selected to decompose the
path into two subpaths, that is, 4>i(uk) — <f>i(u ) + <f>i(u ,uk), where
) is the

16'-

;■

path from the input node u0
the node u and passes through some other pseudoconnected components, and the entire traversal of the path 4)iiu iuk) ’ls ’m the ttb
Thus, the associated integer linear equation for the path
) 'n GB can be rewrit
ten as follows:
J]

w(e(i,j)) +

e{i,j)e4iM

=

E

I B(e(i,j)) |

E
w(e(t,i)) +
e(*,y) € Mu‘)

>

(11)

B{e{i,j))e<l>i[uk)

E

E
I B{e{i,j) \
£ «!>,(«*)

wi€(*‘>i)) +

E

;

l5(e(ui))!

B(e[i,j))e<j>i(u’,uk)

e (*,j) e

Using Lemma 1 and Eq. (2), the first two terms on the right hand side of Eq. (11) can
be written as
E

w(e(i,y)) +

e{i,j) £ <j>i{u*)

E

I B(e(i,j)) |

(12)

B(e (i,j)) € <j>i(u*)

— wc (u *) -f-

E

I £f(e («,j)) I

B(e{i,j)) 6
ifc

ifc

*

Using the result of Lemma 2, the critical path to the node u , <^* (u ), is the path
from the input node u0 to the node u along the critical path ^*(«n), that is,
^(u*) = ^-(uq^u*), which is independent of the buffer stage variables. Then Eq.
(12) becomes
E

w{e(i,j))+

e(i,j).e.~M*0

E
B(e(i,j))

< | B(e(i,j)) \ = wc(u*) = aconstant

(13)

6

Substituting Eq. (13) into Eq. (11), we have,
E
c(*,y) e 4>iM

u>(e(t,i)) +

E

IB(e(i,j)) I = wC{u ) +

C14)

B(e(i,j)) e <l>i(uk)

E ■ " w{e{i,j)) +
e{i,j)e h{^,^k)

E

|£(e(z,y)) |

B(e {i,j)) £

Equation (14) indicates two things: First, the associated integer linear equations
with respect to the node uk E 7rB* depend on the buffer stage variables in 7TB+ and
are independent of the buffer stage variables in the other pseudo-connected com
ponents because fain* ,uk) E 7TB+. Second, Eq. (14) can be generated and replaced
by a path in 7TBthat is, the path travels from the input node u0 to the node u

: -17 -

.

.

. . v-//,-

along the critical path <f>i*(un), then from node u to node uk along the path d>i(u ,uk)
in 7TB
So, for any multi-input node uk belonging to GB and 7tb£> it has been shown
that the associated integer linear equation system for node uk in GB can be replaced
by the associated integer linear equation system for node uk in 7TBIn other words,
the associated integer linear equation system for GB, i.e., ILCE {GB), can be replaced
by the associated integer linear equation systems for nB^, i.e., ILCE
1 < m < mcc.
□
■

Using the results from Theorem 1 and based on the fact that GB — Udn
-uV.-:■
m=i ■

i

mcc

:

\B{e{i,j)) | becomes

£
|£(e (*,./)) |, and the integer linear
B{e(i,i))eGB
m=l B(e(t,j')) g is*
optimization problem in Eqs. (4) and (5) can be rewritten as follows:
mcc

Min

£

£

\B{e(i,j)) |

(15)

m-l B{e(i,j)) £ nut

subject to the associated integer linear equation systems ILCE [irB^) , 1 < m < mcc.
Because the buffer stage variables in different pseudo-connected components of GB
are independent, Eq. (15) can be decomposed into the following subproblems:
For each m =1,2, ••• ,mcc:
Min

^ I -B(«(*ii)) I
-B(eU,y)) e «?,>:

(16)

subject to the associated integer linear equation system ILCE (ttb+).
This graph decomposition approach provides us with a technique to divide a
large-scale integer linear optimization problem into a number of smaller-seale subproblems (mcc subproblems), each of which can be easily solved in pseudo-polynomial
time.
Let us apply the above decomposition approach to solve the same buffer assign
ment problem in section 2.
Step 1. (a)
Decompose the ADFG GW in Figure 3(a) into subgraphs by removing
the critical path </>^(M) of the end node M.
■ (b) Label the pseudo-nodes of the decomposed nodes A, G, J, M, that is,
{Al,A2,A3},{Gl,G2,G3,G4},{J1,J2},a.nd{M1,M2}.
(c) Construct GW* = (GW/^(M)) U {A1,A2,G1,G2,G'3,Jr1,M1}. Note that
pseudo-nodes A3, G4, J2, and M2 are attached to the critical path
MM). GW , <^»(M), and the labeled pseudo-nodes are shown in Fig
ure 4. (4(a), 4(b), and 4(c))

-

18

-

Step 2. This step is the same as the Procedure DBS2 {GW ,lfm).
' '

•

'

£

I

(a) Apply the directed block search Procedure DBSl {GW , J?m) to find the
1^, 1 < m < 2, in GW*. These directed blocks are shown inFigure 4
(4(a) and 4(b)).
(b) Merge the labeled pseudo-nodes that come from the same decomposed
node and are in the same directed block ltm into a big labeled pseudo
node by the direct sum operator. For example, Gx and C2 are the
labeled pseudo-nodes coming from the decomposed node G in
and
will be merged into G12 = Gf1®G2.
(c) Obtain 7fm from
m = 1, 2, by applying the pseudo-nodes merging
procedure. The directed block
is shown in Figure 4(d).

Step 3. Let tT* =
UDN <^(M), 1 < m < 2, which are the pseudo-connected com
ponents of GW. If? and If? are shown in Figure 4 (4(e) and 4(f) respec
tively).
Step 4. The corresponding normalized balanced buffering graphs GB and kb? can be
easily obtained by the buffer assignment rules and have the same graph
structure as GW and 7f*, respectively. The buffer stage variables
B\,
B\ in kb? as shown in Figure 4(g) and B x , B?, B$, B%, .Bf
in irB£ as shown in Figure 4(h) correspond to the buffer stage variables
Bx, Bz, B4, Bs and B2,
B7, B8, Bg in GB as shown in Figure 3(b),
respectively.
Step 5. Generate the associated integer linear equations system for kb? and
the Procedure ILEG, that is, ILCE {kb?) and ILCE {kb?) as follows:
\Bl \ + ]Bl \ = 8
ILCE (tts^): | Bl | + | B31 | = 12 ,
I

1+15] | = 14

KB?

by

\B? | + | Bl \ -\Bl | ~ 13
ILCE {kb?): \ B?

| + | B| \-\b\ j = 9

\Bl \+\Bg | = 2

It has stated in Step 4 that Bx
B? = Bz, B^■ = B4, B?..■= B5 and
Bl = B2, Bl = B6, Bl = B7, B\ =B&, Bl = P9. Thus, ILCE {kb?) and
ILCE {kb2“) become:
| Bx | + | B3 I = 8
I B2 I -h | -B6 I
|
=
ILCE{kb?):

\Bx\+ I B4 I = 12 ,
\Bl\+\Bi\^U

ILCE {kb?): \B2

\ + j B7 \ -\B% | - 9

| B8 | + | B9 | = 2

Step 6. The integer linear programming problem for GB can be solved by two
separated subproblems:

-19

(1)

Min

(2)

Min

-

>]) | Bi I =► Min [ | Bt | + | B3 | + | B4 | + | S5 |]
b, e ub i '
subject to ILCE (ttBi) (found in Step 5).
\Bi | =► Min [ | B2 | + | £6 | + | B7 | + | B8 | + | i?9 | ]
B, € TTf/.j

subject to ILCE {nB2) (found in Step 5).
The optimization of subproblem (1) yields \B4 | = 8, | B3 | — 0, | B4 | = 4,
| Bs | = 6, and the optimization of subproblem (2) gives | B2 | — 9,
|
| = 4, \ B7 | = 0, | B8 | = 0, | Bg | — 2. The results and solution are the
same as given in the example in section 2, but the optimization is much fas
ter and simplier.
4. Application to CORDIC Pipeline for Robot Inverse Kinematic Solution

Let us apply the above decomposition approach to solving the buffer assignment
problem of a larger problem — balancing a CORDIC-based pipeline architecture for
computing the robot inverse kinematic position solution [15]. The task of computing
the joint solution of a PUMA-like robot manipulator is shown in Figure 5. The nodes
in Figure 5 represent CORDIC processors. The objective is to balance this task graph
to achieve maximum pipelining [15]. By using the Procedure DBS2 (GW , 7fm), where
GW is the directed task graph shown in Figure 5, 16 directed blocks, lfm,
1 < m < 16, in GW a,re obtained. From these directed blocks, we can obtain the 16
pseudo-connected components, 7f*, 1 < m <16. The corresponding normalized bal
anced buffering graph GB for GW and the 16 pseudo-connected components in GB,
1 < m < 16, can be created. The associated integer linear equation systems for
7ns*, 1 < m < 16, are obtained as follows:
(a)

ILCE {itB\} :

| £4 | = 3.

(b)

ILCE{7TB+) :

|S16 | = 1.

(c)

ILCE{™+) :

|B5 | = 3.

(d)

ILCE (irB4) : \BX\- \B17 | = 3, | B2 | - | R17 | = 3,
| B17 | + | B19 | = 10 , |B17 | -f | B20 1 = 9,

| B17 | + ] B2l | =

(e)

ILCE {kb?) : | B3 | = 2.

(f)

ILCE {kb?) : \B22\ = 15.

(g)

ILCE (kb?) : |S6 | + | B10 | = 5,
\B6 | + | Bn |+ | 512 | = 8,
\B6\+\Bn |+ |S13 |+ |B15 | = 10,
I -B6 | + \Bn 1+ |s13| + \b14 I- |5301- |b33 1 = 10,
\B23 I- 1^30 1 = 5, |B30 |+ \B33 |+ \B34 1=0.

-

20

-

(h)

ILCE(kb£) :

| B24 | = 4.

(i)

ILGE ^Bi) :

|B25| = 1.

(j)
(k)

ILCE{kb?q) :
aGE(^) :

(l)

ILCE{kb?2) :

|B26| = 5.
|B7 | - | B36 | = 12 , | B8
I B18 | + | B27 | — | B36 | =
IB36 I + I A**9 I = 3 > | B36
\B41\+\ B44 I = 1 , I S41
| B29 | + I BZI I - | B35 j =
I -®35 I + I -®46 I = 3*

(m) iLCE\im&

: |b37 1 = 1.

(n)

ILCE (7$14)

: | B38 [ = 2.

(o)

ilce(kb&)

: |b42 j = 4. ■

(p)

ILCE (TTBie) : I #43 I = 2-

|8
[+
I+
4 ,

| B18 | = 3 , |
| - [B18 | = 3
| B18 | + | B28 | — | B41 I - 10,
| B40 J — 1B4i | = 1,
I ^45 I = 2I S29 I H- I B32 | = 10^:-;':

;;

Each of the above integer linear equation systems, ILCE {kb+), 1 < m < 16,
corresponds to an integer linear optimization subproblem. For example, the integer
linear optimization subproblem for ILCE ('kb-j') is to:
Min
■

^
B,

.

G

|b, 1 ;■ v";

7rk-;

:

■

subject to the associated integer linear equation system ILCE (kb?'). That is,
Min ( | B6 | + | B10 | + | Bn | + B12 I +IB\Z | + | Bu I .+ I ^15 I ,+ I A23 I
+ I b30 I + | B33 I + | B34 |)
subject to the associated integer linear equation system ILCE {txb^). The optimiza
tion of this subproblem gives [ B6 | = 5, | Bio I = 0, I ^11 I “ 3> I -®12 I ^
|B13 1 = 2, | B14 I = I Bis | = 0, | B23 I = 5, J B30 1=0, | B33 | = |B34 | = 0, and
X I AI = 15 delay buffers.
.

Bi E nBi

■

Similarly, the integer linear optimization subproblems for ILCE (kb4),
ILCE^ix), and ILCE^kb^) give the optimization solutions {| Bx | = | B2 | = 3 ,

| B17 | = 0, | B19 | =

10

J B8 I = I Bg | = 3 , j B18
|B40 I = 1, | B41 1= 0 ,
I B32 j = 6 , | B35 | = 0,
these subproblems are,

,

I B20 1 = 9,

=

15

.

,

{I B-j J =

12

,

1 = 0, | B27 J — 8 , | B2g I = IQ , | B36 J '?= 0 , J -B39 I = 3 ,
| B44 I = 1 , I B45 I = 2}, and (| B29 | = 4 , | B3i [ = 0 ,
| B46 j = 5}, respectively. The numbers of delay buffers in
respectively,
X I A I = 33>
XI 1 A I “ 43, and
Bt E

|

■' ■ 1-®2i I =.

3t#4

Bi E nB n

-

21

-

The optimization solution for all the integer linear optimization subproblems
yields a total of 159 buffer stages which agree with the solution given in [15]. This
example shows that the graph decomposition approach can solve a large-scale buffer
assignment problem by decomposing it into a set of smaller subproblems, each of
which can be solved separately as an integer linear optimization problem. The initial
delay time of the CORDIC pipeline is the cost of the critical path from the input node
to the end node and is equal to 18 basic time units. For a commercial CMOS
CORDIC processor [7], a reasonable stage latency is 40 /J-s. Thus, the initial delay
time of the pipeline is equal to 720 ps. The balanced ADFG can be realized with 25
CORDIC processors and 159 buffer stages. The resultant maximum pipelined
CORDIC architecture has a pipelined time of one basic time unit or 40 [is. The
number of buffer stages can be further reduced if a special buffering device, tappeddelay-line buffer, is used [15].
5. Conclusion

An efficient graph decomposition technique which provides a systematic approach
in solving the optimal buffer assignment problem of a large-scale ADFG has been
presented and discussed. The optimal buffer assignment problem is formulated as an
integer linear programming problem. The construction of integer linear constraint
equations in a large-scale ADFG reveals the existence of many redundant integer
linear constraint equations, making the optimization more intractable. The redun
dant integer linear constraint equations come from the path overlapping between two
paths of two different multi-input nodes. The proposed graph decomposition
approach utilizes the critical path concept to decompose an ADFG into a set of con
nected subgraphs from which the integer linear optimization technique can be used to
solve the buffer assignment problem in each subgraph. Thus, a large-scale integer
linear optimization problem is divided into a number of smaller-scale subproblems
which can be easily solved in computers in pseudo-polynomial time. The proposed
graph decomposition technique is illustrated by two examples and its efficiency and
advantages can be seen in the example for balancing a CORDIC pipeline for comput
ing the robot inverse kinematic position solution.

6. References

[1]

H. M. Ahmed, J. M. Delosme, and M. Morf, “Highly Concurrent Computing
Structures for Matrix Arithmetic and Signal Processing,” IEEE Computer, vol.
15, no. 1, pp. 65-82, Jan. 1982.2

[2]

A. Y. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Com
puter Algorithms, Addison-Wesley, 1974, pp. 195-199.

-

22

-

[3]

S. Baase, Computer Algorithms: Introduction to Design and Analysis, AddisonWesley, 1978, pp. 145-148.

[4]

J. D. Brock and L. B. Montz, “Translation and Optimization of Data Flow Pro
grams,” Proc. of1979 Int’l. Conf. on Parallel Processing, pp. 46-54, Aug. 1979.

[5]

J. B. Dennis and R. G. Gao, “Maximum Pipelining of Array Operations on
Static Data Flow Machine,” Proc. of 1983 Int’l. Conf. on Parallel Processing,
pp. 331-334, Aug. 1983.

[6]

K. S. Fu (editor), VLSI for Pattern Recognition and Image Processing,
Springer-Verlag, 1984.

[7]

G. L. Haviland and A. A. Tuszynski, “A CORDIC Arithmetic Processor Chip,”
IEEE Trans. Comput., vol. C-29, no. 2, pp. 8- , Feb. 1980.

[8]

F. H. Hsu, H. T. Kung, T. Nishizawa, and A Sussman, “LINC: The Link and
Interconnection Chip,” Department of Computer Science, Carnegie-Mellon
University, 1984.

[9]

K. Hwang and Z. Xu, “Multipipeline Networking for Fast Evaluation of Vector
Compound Functions,” Proc. of 1986 Int’l Conf. on Parallel Processing, pp.
495-502, August 1986.

[10]

H. T. Kung, “Why Systolic Architectures?” IEEE Computer, pp. 37-46, Jan.
1982.

[11]

H. T. Kung and M. Lam, “Wafer-Scale Integration and Two-level Pipelined
Implementation of Systolic Arrays,” J: of Parallel and Distributed, Computing,
vol. 1, no. 1, Sept. 1984, pp. 32-63.

[12]

S. Y. Kung, H. J. Whitehouse, and T. Kailath, (editors), VLSI and Modern Sig
nal Processing, Prentice-Hall, 1985.

[13]

E. L. Lawler, Combinatorial Optimization: Networks and Matroids.
Rinehart and Winston, New York, 1976.

[14]

C. S. G. Lee and P. R. Chang, “Efficient Parallel Algorithm for Inverse Dynam
ics Computation,” IEEE Trans. Syst. Man, Cybern., vol. SMC-16, no. 4, pp.
532-542, July/August 1986.

[15]

C. S. G. Lee and P. R. Chang, “A Maximum Pipelined CORDIC Architecture
for Robot Inverse Kinematic Position Computation,” Technical Report TR-EE
86-5, School of Electrical Engineering, Purdue University, January 1986.
Also to appear in IEEE J. of Robotics and Automation.

Holt,

-

23

-

[16]

C. E. Leiserson and J. B. Saxe, “Optimizing Synchronous Systems,” J. VLSI
and Computer Systems, vol. 1, 1983, pp. 41-68.

[17]

C. Mead and L. Conway, Introduction to VLSI systems, Addison-Wesley, 1980.

[18]

C. H. Papadimitrious and K. Steiglitz, Combinatorial Optimization: Algorithms
and Complexity, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982.

[19]

J. A. Starzyk and A. Konczykowska, “Flowgraph Analysis of Large Electronic
Networks," IEEE Trans, on Circuits and Systems, vol. CAS-33, pp. 302-315,
March, 1986.

[20]

J. E. Voider, “The CORDIC Trigonometric Computing Technique,” IRE Trans.
Electronic Computers, vol. EC-8, no. 3, pp. 330-334, Sept. 1959.

-

24

-

Path 1

Path 2

Path 1

Path 2

Figure 1

An Example for Inserting Delay Buffer Stages

-

(a)

Figure 2

25

-

(b)

Normalization of a Balanced Buffering Graph

2 (G

(a) GW

Figure 3

(b) GB

-

-

(e) jt,

(d) rti

(S) nBi+

Figure 4

27

(f)

(h)

%,

jcb2+

Graph Decomposition of the Example in Figure 3

Figure 5

An ADFG for a

