Model for Scheduling Protocol-Constrained Components and Environments by Steve Haynal & Forrest Brewer
ABSTRACT
This paper presents a technique for highly con-
strained event sequence scheduling. System
resource protocols as well as an external inter-
face protocol are described by non-determinis-
tic ﬁnite automata (NFA). All valid schedules
which adhere to interfacing constraints and
resource bounds for ﬂow graph described
behavior are determined exactly. A model and
scheduling results are presented for an exten-
sive design example.
Keywords
Interface protocols, protocol-constrained scheduling, automata.
1. INTRODUCTION
Scheduling is an important problem occurring in diverse areas from
manufacturing to networking to high-level synthesis of digital sys-
tems (HLS). Although there has been extensive work done in HLS
scheduling, much of this work has disregarded how the ﬁnal sched-
uled system must communicate to other systems. In particular,
scheduling systems containing components with complex interface
protocols is neglected. This situation is the norm in modern digital
systems, and use of sequential protocols is likely to increase in
future designs. This paper presents a technique which addresses
data-ﬂow scheduling subject to arbitrary sequential protocols. Sys-
tem resource usage protocols as well as an external interface proto-
col are described by non-deterministic ﬁnite automata (NFA). Next,
constraints derived from a behavioral ﬂow graph are applied to an
implicit product NFA. Finally, reduced ordered binary decision dia-
gram (ROBDD) symbolic reachability techniques are used to ﬁnd
all valid schedules exactly. A model and scheduling results are pre-
sented for an extensive design example.
We classify previous high level scheduling work into three catego-
ries: i) heuristic, ii) integer linear programming (ILP) and iii) sym-
bolic methods. Heuristic schedulers (i.e [1][10]) ﬁnd good
solutions for large problems quickly but suffer with tightly con-
A Model for Scheduling Protocol-Constrained Components
and Environments
Steve Haynal Forrest Brewer
Department of Electrical and Computer Engineering
University of California, Santa Barbara, U.S.A.
haynal@umbra.ece.ucsb.edu, forrest@ece.ucsb.edu
strained problems where early pruning decisions exclude candi-
dates leading to superior solutions. ILP schedulers (i.e. [3][6])
exactly solve scheduling but have difﬁculties with time complexity
and complex control constraint formulation. Symbolic methods
(i.e. [2][4][7][8][11]) are often effective in ﬁnding exact solutions
in highly constrained problem formulations but may suffer from
representation explosion. The technique described in this paper
falls in the symbolic methods category. The most closely related
previous work is found in [2][11] where system timing and syn-
chronization requirements are encapsulated in ﬁnite-state machine
(FSM) descriptions. Our work differs in two ways. First, we intro-
duce non-determinism as a preferred representation for protocols.
The work described in [9] supports this decision. Second, and more
importantly, our formulation is hierarchical and amenable to
abstraction. We believe hierarchy and abstraction are key compo-
nents in making symbolic techniques manageable.
2. PROBLEM DESCRIPTION
Input to this problem consists of three types of information. First,
protocol interface NFAs are provided for all internal resource units
and the external interface. For internal resource units, these autom-
ata models describe when local communication events may occur.
Local communication is operand passing between local resource
modules. External communication events are modeled by the exter-
nal interface NFA. Second, a data ﬂow graph (DFG) is provided.
The behavior or algorithm to be implemented is given in terms of
this graph. DFG nodes represent operations and arcs represent
operands. In this case, nodes are executed by resources with poten-
tially complex interface protocols. Finally, instance resource and
operand register bounds are given.
The problem is to ﬁnd a valid event sequence or schedule imple-
menting the DFG described behavior, meeting all protocol con-
straints, and using only available resources. The technique
presented here ﬁnds all valid schedules exactly.
3. PROBLEM FORMULATION
This section describes how the problem input information is used to
construct a scheduling NFA which represents all valid execution
sequences or schedules. The process involves building a product
NFA from small local NFAs and applying constraints to this prod-
uct NFA. ROBDDs provide efﬁcient representation for these NFAs.
3.1 Operand NFAs
Each arc in the DFG is represented by an operand NFA (Fig. 1) in
our formulation. The meanings of each state and transition may be
inferred from the ﬁgure. The start state is in bold.
 Figure 1. Operand NFA.
unknown known
create
remember
_
___________________________
Permission to make digital/hardcopy of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, the copyright notice, the title of the publication
and its date appear, and notice is given that copying is by permission of ACM, Inc.
To copy otherwise, to republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
DAC 99, New Orleans, Louisiana
(c) 1999 ACM 1-58113-109-7/99/06..$5.003.2 Resource NFAs
In our formulation, each node in the DFG is modeled by a
resource NFA. Several instances of a given resource may be spec-
iﬁed. The example resource NFA of Fig. 2 represents the protocol
for a unit with a restricted bus. Two input operands are required
but must be presented sequentially. Furthermore, input operands
may not be accepted while an output operand is present.
3.3 Binding NFAs
Several DFG nodes may be bound to the same resource instance.
Consequently, it is necessary to distinguish which DFG node is
bound to a resource instance at any given time. To make this dis-
tinction, a binding NFA is paired with each resource instance
NFA. Fig. 3 shows a binding NFA capable of binding to two pos-
sible DFG nodes plus a null node. When this binding NFA is in a
DFG node state, then the local operands of the mated resource
instance NFA map directly to operands accepted or produced by
that DFG node. Which local operand maps to which DFG operand
is speciﬁed by the designer. Finally, constraints are added later to
restrict a binding NFA’s rebind transitions (a change in state) to
occur only in sync with its mated resource NFA’s rebind transi-
tions.
3.4 Interface Protocol NFA
The interface protocol NFA describes when DFG operand input or
output transactions may occur given external timing constraints. It
is similar in construction to a resource NFA (Fig. 2) with states
associated with input and output events. Unlike a resource NFA,
there is no need for a mated binding NFA.
3.5 Implication Constraints
To create a scheduling NFA, implication constraints are applied
between operand, resource, binding and interface protocol NFAs.
For example, let Q be the proposition, “The interface protocol
NFA is transiting to a state where input of operand 1 is allowed”
and let P be the proposition, “The operand NFA for operand 1 has
a create transition.” The desired implication would then be P®Q
or if P is true, then Q must also be true. In the ROBDD structure,
an implication is constructed as,
where P and Q are ROBDDs.  (1)
3.5.1 Operand Create Implications
DFG operands are only allowed to be created (their NFA transits
from unknown to known) when they are available from the inter-
face protocol or a bound resource instance NFA. For each operand
i expected from the protocol interface, the implication is,
 (2)
 Figure 2. Example of Bus Restricted Resource NFA.
in 1 in 2 out
rebind rebind
bound
 Figure 3. Binding NFA.
rebind transitions node 1 node 2
null
PQ ® PQ =
icreate interfaceins
®
where icreate is the create transition of the expected operand NFA
and interfacei is any transition to a next state, ns, where operand i
is available in the interface protocol.
A DFG operand i is available from a resource instance, r, when
two conditions are true. First, r’s paired binding NFA, b, must be
transiting to a next state bound to the DFG node, nd, producing i.
Second, suppose l is a local operand of r which maps to DFG
operand i when the ﬁrst condition is true. Thenr must also be tran-
siting to a next state where local operand l is available. Further-
more, since any resource instance capable of producing DFG
operand i may actually produce i, all capable resource instances
must be examined. Formally, for each expected DFG operand i,
this is described as,
 (3)
where the summation is over all capable resource instance and
mated binding NFAs.
3.5.2 Operand Accept Implications
Resource and interface protocol NFAs are only allowed to transit
to states requiring operands if the required operands will exist. For
the interface protocol, this is enforced for each required DFG
operand i with,
 (4)
where interfacei is any transition to a next state requiring operand
i and iknown is any transition to the known state of operand i.
The implication describing this for a local resource and mated
binding NFA for each required DFG operand i is,
 (5)
where rl, bnd and iknown are as described earlier. By writing this
implication in terms of the present state, ps, of the binding NFA, it
is possible to create a resource NFA which produces an output
operand and rebinds to a new DFG node during the same cycle.
3.6 Rebind Synchronization Constraints
A binding NFA may only rebind to a new DFG node when its
mated resource NFA transits through a rebind transition. This syn-
chronization is enforced for each resource r and binding b NFA
pair with the constraint,
 (6)
where rrebind and brebind are the rebind transitions of the resource
and mated binding NFAs respectively. Furthermore, it is wasteful
for the resource NFA to transit through a bound transition with a
null binding. Consequently, for each resource r and binding b
NFA pair, the constraint,
 (7)
where rbound is the bound transition of the resource NFA (Fig. 2)
and bnull is any transition to a next state of null in the mated bind-
ing NFA, is applied to the scheduling NFA.
3.7 Memory Constraints
Only a ﬁnite number, n, of storage elements may be available to
store DFG operands. Let A be the set of all combinations of at
most n operands from the set of all DFG operands. Then
icreate rlns
bndns
() å ®
interfaceins
iknownns
®
rlns
bndps
iknownns
®
rrebindbrebind
rbound bnullns
®is allowed.  (8)
This constraint essentially limits the number of operand remember
transitions which may coexist in any transition of the scheduling
NFA.
4. SCHEDULING SOLUTIONS
The product of all local NFAs and constraints described in
Section 3 form a scheduling NFA. Every possible valid schedule
is a path in this scheduling NFA from a starting state set Si(V),
where no operands are known and all resources are null bound, to
a termination state set Sf(V¢), where all desired operands have
existed. Each shortest path from Si(V) to Sf(V¢) represents a mini-
mum latency schedule.
We leverage symbolic reachable state analysis techniques to deter-
mine the existence of valid schedules. Let the scheduling NFA be
deﬁned by the four-tuple ( V, d, Si(V), Sf(V¢) ) where V is the
ﬁnite, non-empty set of states, d: V®V¢ is the next-state function
and Si(V) and Sf(V¢) are sets of initial and ﬁnal states respectively.
Starting with Si(V), reachable state analysis is performed. Once
completed, if Sf(V¢) is not present in the reachable state set, then
no schedules are possible with the current constraints and schedul-
ing terminates. On the other hand, if Sf(V¢) is present, then valid
schedules do exist and we can use the technique described in [4] to
ﬁnd a shortest path and hence a minimum latency schedule.
Finally, we are not bound to perform complete reachable state
analysis but may use reﬁnements, optimizations and other tech-
niques to ﬁnd any desired subset of paths or schedules in the
scheduling NFA.
5. 2-POINT DFT EXAMPLE
We develop a 2-point DFT example in detail to demonstrate the
versatility of our protocol-based scheduler. Fig. 4 shows the DFG
used in this example. Although this DFG appears simple enough
to be handled by traditional scheduling techniques, the advantage
of our method is the ability to tightly deﬁne data transfer protocols
and resource constraints.
Fig. 5 shows the interface protocol constraint NFA for this exam-
ple. In reality, this constraint describes an external controller
which computes correct indices and memory addresses. Due to
controller and communication bandwidth limitations, the index,
A’s memory address and B’s memory address must be passed in
three consecutive cycles. After this, the controller non-determinis-
jremember
j a Î Õ ()
a A Î å
 Figure 4. 2-Point DFT Example DFG.
Memory
Read
Memory
Read
Function
Subtract
Function
Add
Function
Multiply
Memory
Write
Memory
Write
Table
Lookup
Coefﬁcient Index
A Address
B Address
A
B
New A
New B
A Written
B Written
i
 Figure 5. DFT Interface Protocol.
A Address B Address Index
A Written
B Written
Idle
B Written
A Written Idle
Idle
tically does not proceed to the next iteration until it knows that
both computed terms have been successfully written to memory.
The table lookup protocol is shown in Fig. 6. Two cycles after an
index is presented, a stored coefﬁcient is produced. A unique
behavior is that the coefﬁcient remains available for two cycles.
Furthermore, a new index may be provided during the second
cycle of coefﬁcient availability.
The memory resource uses the protocol detailed in Fig. 7. The
data and address busses are time multiplexed with addresses
accepted on odd cycles and data passed on even cycles or vice
versa. The read protocol requires an address and provides the
requested data after three cycles. The write protocol requires an
address and the write data in two consecutive cycles. Three cycles
later a write acknowledge is produced. A new address may be
accepted during the same cycle that a write acknowledge is pro-
duced.
Fig. 8 shows the arithmetic processor protocol. This unit performs
three ﬂoating-point operations: addition, subtraction and multipli-
cation. Due to limited communication bandwidth, input operands
1 and 2 and the result operand must be passed during separate
cycles. An add or subtract result is produced two cycles after the
last input. Given the higher complexity of multiplication, its result
is produced three cycles after the last input. For addition and mul-
tiplication, ordering of the input operands is irrelevant but for sub-
traction operand ordering is important. In this example, operand 1
must be accepted ﬁrst.
The protocol in Fig. 8 is an example of alternative behaviors.
There are two valid start states and three variations of correct
behavior. A ﬂexibility of our formulation is this ability to handle
numerous alternatives. As long as one valid path exists containing
all required input and output operands for a DFG node, symbolic
exploration will not fail. Additional resource operands not
required by the DFG are ignored.
6. RESULTS
A tool was developed to demonstrate the feasibility of our sched-
uling technique. It was written in python and utilized a standard
BDD library. The reported results were produced on a 400 MHz
Pentium PII system running Linux with 512MB of memory. These
results were duplicated with runtimes 3 to 4 times longer on a 166
MHz Pentium laptop Linux system with 32MB of memory.
Table 1 presents results for various resource conﬁgurations of the
example presented in Section 5. The ﬁrst three columns list
instances of these available resources. Since by observation, A
 Figure 6. DFT Table Lookup Resource.
Index Coefﬁcient Index
Coefﬁcient rebind
rebind
rebind
bound
bound
 Figure 7. DFT Memory Resource.
Address Write Data Address
Write OK
rebind
rebind
bound
Read Data
 Figure 8. Arithmetic Processor Resource.
Operand 1
rebind
rebind
bound
Add Operand 2
Operand 2
Operand 1
Subtract
Multiply
boundAddress and B Address from the DFG in Fig. 4, are needed at both
the beginning and end, these operands are given dedicated storage
which is not included in the Memory Registers column.
As one might expect, scheduling performance is best for conﬁgu-
rations with the least amount of freedom. Increasing instances of
any resource causes a resulting increase in CPU time. All solu-
tions are exact and produced in reasonable time. Although a mini-
mum latency schedule is reported, all valid schedules of all
lengths were actually computed. As far as we know, we are the
ﬁrst to report exact solutions for protocol-constrained scheduling
problems of this type.
Arithmetic processors marked with an asterisk use a slightly dif-
ferent protocol than described in Fig. 8. First, the multiply extra
cycle penalty is removed. Second, a new penalty for reconﬁgura-
tion is added. Any multiplication following an add or subtract or
any add or subtract following a multiplication pays an extra cycle
penalty for reconﬁguration. In these cases we see the same results
for a minimum latency schedule but with a possibly simpler arith-
metic processor and control structure.
The minimum latency schedule marked with an asterisk uses a
modiﬁed interface protocol which is intended to explore alterna-
tive controllers. A address, B address and the index may now be
produced by the controller in any consecutive order. Although this
added freedom increases the runtime by 12 seconds, there is no
gain in the minimum latency schedule.
7. FUTURE WORK
The models used in this paper were chosen with great care to be
amenable to future work with scheduling hierarchy and abstrac-
tion. A scheduled external interface NFA and a resource NFA
have interchangeable meanings. This allows for a general protocol
NFA to be a vehicle for reﬁnement and abstraction in a hierarchy.
This protocol NFA can be a resource instance NFA at one level of
hierarchy or an external interface NFA at another level. With a
bottom up design ﬂow through the hierarchy, internal complexity
of lower levels is hidden from higher levels since only external
communication events are propagated up. With a top down design
ﬂow, local freedom of lower levels is restricted by the protocol
TABLE 1: 2-Point DFT Results
Memory
Registers
Memory
Ports
Arithmetic
Processors
Minimum
Latency
Schedule
CPU Time
in Seconds
1 1 1 26 2.2
2 1 1 20 6.2
3 1 1 20 8.2
2 2 1 15 11.9
2 1 2 18 54.5
3 2 1 15 20.0
3 1 2 18 50.4
no limit 2 2 13 15.7
2 1 2* 18 164.0
2 2 1* 15 42.4
2 2 1* 15* 54.1
NFA of the higher level. The entire synthesis and scheduling pro-
cess involves reﬁning all protocol NFAs through repeated con-
straint propagation when coexecuting protocols at adjacent
hierarchy levels. We believe that such a hierarchical model is nec-
essary when synthesizing and scheduling systems of meaningful
scale.
Although simple looping structures were present in the example,
our future work will address loops in a more general way. The
method described in [8] provides a starting point. Furthermore,
control structures were not directly addressed in this short paper.
Our previous work in [4] demonstrates a possible way of adding
control. Finally, our use of symbolic reachability to determine
valid schedules will be reﬁned by related work in symbolic tra-
versal techniques for veriﬁcation.
8. CONCLUSIONS
This paper presented a model and technique for representing all
valid schedules of a data ﬂow graph mapped to a protocol-inten-
sive environment. Both an external interface protocol as well as
internal resource protocol constraints were adhered to. All valid
schedules were modeled exactly using a ROBDD NFA composed
of local smaller protocol NFAs and additional constraints applied
between local NFAs. An extensive design example with results
showed the versatility of this technique.
9. REFERENCES
[1] R. Camposano, “Path-Based Scheduling for Synthesis”,IEEE
Trans. CAD/ICAS, vol. 10, no. 1, pp. 85-93, Jan. 1991.
[2]  C. N. Coelho Jr, G. De Micheli, “Dynamic Scheduling and
Synchronization Synthesis of Concurrent Digital Systems
under System-Level Constraints”, Proc. IEEE Int. Conf.
Computer-Aided Design, pp. 175-181, 1994.
[3]  C. H. Gebotys and M. I. Elmasry, “Global Optimization
Approach for Architectural Synthesis”, IEEE Trans. CAD/
ICAS, vol. 12, no. 9, pp. 1266-1278, Sep. 1993.
[4]  S. Haynal and F. Brewer, “Efﬁcient Encoding for Exact
Symbolic Automata-Based Scheduling”, Proc. IEEE Int.
Conf. Computer-Aided Design, to appear, 1998.
[5]  H. Hulgaard S.M. Burns, T. Amon, G. Borriello, “An Algo-
rithm for Exact Bounds on the Time Separation of Events in
Concurrent Systems”, IEEE Transactions on Computers,
vol. 44, no.11, pp. 1306-1317, Nov. 1995.
[6]  C.-T. Hwang and Y.-C. Hsu, “A Formal Approach to the
Scheduling Problem in High Level Synthesis”, IEEE Trans.
CAD/ICAS, vol. 10, no. 4, pp. 464-475, Apr. 1991.
[7]  C. Monahan and F. Brewer, “Scheduling and Binding
Bounds for RT-Level Symbolic Execution”, Proc. IEEE Int.
Conf. Computer-Aided Design, pp. 230-235, 1997.
[8]  I. Radivojevic and F. Brewer, “A New Symbolic Technique
for Control-Dependent Scheduling”, IEEE Trans. CAD/
ICAS, vol. 15, no. 1, pp. 45-57, Jan. 1996.
[9]  A. Seawright and F. Brewer, “Clairvoyant: A Synthesis Sys-
tem for Production-Based Speciﬁcation”, Proc. IEEE Trans.
on VLSI Systems, vol. 2, no. 2, pp. 172-185, June 1994.
[10] K. Wakabayashi and H. Tanaka, “Global Scheduling Inde-
pendent of Control Dependencies Based on Condition Vec-
tors”, Proc. 29th ACM/IEEE Design Automation Conf., pp.
112-115, 1992.
[11] J. C.-Y. Yang, G. De Micheli, and M. Damiani, “Scheduling
and Control Generation with Environmental Constraints
based on Automata Representations”, IEEE Trans. CAD/
ICAS, vol. 15, no. 2, pp. 166-183, Feb. 1996.