A BMC-Formulation for the Scheduling Problem in Highly Constrained Hardware Systems  by Cabodi, Gianpiero et al.
Electronic Notes in Theoretical Computer Science 89 No. 4 (2003)
URL: http://www.elsevier.nl/locate/entcs/volume89.html 16 pages
A BMC{Formulation for the Scheduling
Problem in Highly Constrained Hardware
Systems
Gianpiero Cabodi Sergio Nocco Stefano Quer
Politecnico di Torino
Dip. di Automatica e Informatica
Turin, ITALY
Alex Kondratiev Luciano Lavagno Yosinori Watanabe
Cadence Design Systems, Inc.
Berkeley, CA
Abstract
This paper describes a novel application for SAT{based Bounded Model Checking
(BMC) within hardware scheduling problems.
First of all, it introduces a new model for control-dependent systems. In this
model, alternative executions (producing \tree-like" scheduling traces) are managed
as concurrent systems, where alternative behaviors are followed in parallel. This
enables standard BMC techniques, producing solutions made up of single paths
connecting initial and terminal states.
Secondly, it discusses the main problem arising from the above choice, i.e., re-
writing resource bounds, so that they take into account the articial concurrencies
introduced for controlled behaviors.
Thirdly, we exploit SAT-based Bounded Model Checking as a verication tech-
nique mostly oriented to bug hunting and counter-example extraction. In order to
consider resource constraints, the solutions of modifying the SAT solver or adding
extra clauses are both taken into consideration.
Preliminary experimental results, comparing our SAT based approach to state-
of-the art BDD-based techniques are eventually presented.
1 Introduction
Synthesis of eÆcient and high performance control units and data paths from
high-level behavioral specications has long been considered a very promising
technique for tackling the ever growing complexity of digital design. At the
same time, it is a very elusive goal, because after more than twenty years
of intensive research, and even the appearance on the market of some indus-
c
2003 Published by Elsevier Science B. V. CC BY-NC-ND license.  Open access under 
Cabodi et al.
trial CAD tools, high-level synthesis is still far from being widely used as its
predecessors, register-transfer level and logic synthesis.
Within this framework, BDD-based manipulations have recently attained
interesting results, as an alternative to ILP and heuristic techniques. In this
approach a non-deterministic nite automata describes design alternatives
for highly-constrained control-dominated models. After that, the automata's
state space is symbolically visited, adopting model checking's state-of-the-art
techniques. These techniques are mix of forward and backward traversals,
aimed at nding a scheduling solution as a trace connecting initial and termi-
nal states.
In the simplest case of systems without control choices (if-then-else con-
struct), a schedule is a path, and symbolic scheduling works just like invariant
checking with counter-example extraction. However, control-dependent be-
havior produces scheduling instances as DAGs (or trees), where fork and join
nodes are introduced to represent scheduling choices, depending on values
of control operands. This has required a specic backward traversal proce-
dure (called validation in [3]), which, albeit not far from standard BDD-based
traversals, is not directly mapped to standard Model Checking (e.g., CTL)
procedures.
In this work we propose to change the original automaton model introduced
in [2,3,4,11] for control-dependent systems, so that standard model checking
procedures are supported. More specically we transform alternative sub-
traces to concurrent behaviors which are followed in parallel. In this way the
resulting scheduling is always a path (instead of a DAG) connecting initial
and nal states. As a byproduct, we can exploit SAT-based Bounded Model
Checking. Indeed, as the designer's aim is to nd a schedule, not to prove
its absence, we believe BMC can work at its best, as a verication technique
mostly oriented to bug hunting and counter-example extraction, rather than
proof of correctness. Nevertheless, in order to enable this method, we also
must re-write the resource bounds, so that they take into account the articial
concurrencies introduced for controlled behaviors.
As a nal remark, notice that many High Level Synthesis tools use Con-
trol Data Flow Graphs (CDFGs) as their internal model and do not model
well constraints coming from input/output operations with the external world
(e.g., synchronization, min/max rate, jitter, etc.) and often mostly data de-
pendencies are handled, while control is either ignored or handled by complete
case splitting
1
. Although we use CDFGs as the input specication for our
tool, we adopt the model introduced by [3], which is at the same time formal
(based on concurrent automata), eÆcient (it is possible to use symbolic repre-
sentation techniques with enhancements derived from concurrent specication
models), control-oriented (condition evaluation and speculative execution are
1
Approaches that specically address control-intensive CDFGs (such as [8]) have been
introduced only recently.
2
Cabodi et al.
specic features of [3]), and exible (I/O constraints can be represented by
restrictions on the automata state space). As [3] we represent implicitly the
full solution space by means of the state space of a product of automata.
2 Background
We assume the reader is familiar with BDDs, SAT and Bounded Model Check-
ing. As a consequence we briey review only the basic concepts within our
application framework.
2.1 High-Level Synthesis Methodologies
Historically two basic approaches have been used for scheduling: Heuristics
algorithms and Integer Linear Programming. On the one hand, priority-based
heuristic methods (e.g., [10]) can accommodate a variety of data-dominated
and control-dominated behaviors, quickly nding good solutions for large
problems. On the other hand, they may fail to nd an optimal solution in
tightly constrained problems, where early pruning decisions may exclude can-
didates eventually leading to superior solutions. Integer Linear Programming
methods (e.g., [7]) can solve scheduling exactly. However, the ILP complex-
ity signicantly increases by considering control constraints (if-then-else and
loops), and thus may lead to unacceptable execution times. Moreover, they
consider only one solution at a time, and hence are not particularly suitable
for interactive synthesis.
2.2 Symbolic Scheduling
More recently [2,3,4,11] symbolic methods have been proved eective in nding
exact solutions in highly constrained problem formulations.
In [11], the authors propose a symbolic formulation that allows speculative
operation execution and exact resource-constrained scheduling. In [2,3], the
authors improved the previous method by proposing a new eÆcient encoding
to reduce execution time. This encoding only indicates \whether or not" and
not \when" an operation has been scheduled. Finally, [4] handles loops in
Data Flow Graphs (DFGs).
Their scheduling technique (as well as ours) assumes an input in the form of
a CDFG. A CDFG is a directed acyclic graph describing both data-ow and
control dependencies between the operations. Operation nodes are atomic
actions potentially requiring the use of hardware resources for one or more
clock cycles. Directed arcs establish a link between each operation and the
predecessors that produce data required by it. A source and a sink are added
before every operation without predecessors and after every operation without
successors. Conditional behavior is specied by means of fork and join nodes,
and directed arcs also establish a link between the operation evaluating the
condition and the related fork/join pair. Operations that are neither connected
3
Cabodi et al.
by a directed path, nor mutually exclusive due to a preceding fork node, are
concurrent
2
.
Example 2.1 Figure 1 shows an example of CDFG. In particular Figure 1(a)
shows the pseudo-code for a conditional statement and Figure 1(b) the corre-
sponding CDFG.
else
if (x>0)
  y = x + 1
  y = x − 1
T F
y
x
>
CDFG source
op2 op3
CDFG sink
op1
Fork
Join
Data Dependency
Control Dependency
−+
(a)
(b)
Fig. 1. An example of CDFG.
2.3 Scheduling Automata
A scheduling problem, originally described as a CDFG, can be translated into
an automaton, dened by the four-tuple (V;TR; S
i
; S
f
), where V is the nite,
non-empty set of states, TR : V ! V
0
is the transition relation, and S
i
and
S
f
are respectively the sets of initial and nal states.
The generic i-th operation in the CDFG (excluding fork and join opera-
tions) is modeled by a two-state automaton. Its transition relation is encoded
with exactly two Boolean variables (p
i
for the present state and n
i
for the next
state), with the following meaning:

p
i
= 0; n
i
= 0: operation i has not been scheduled previously and will not
be scheduled in the next cycle.

p
i
= 0; n
i
= 1: operation i has not been scheduled previously and will be
scheduled in the next cycle.

p
i
= 1; n
i
= 0: operation i has been scheduled previously but the result
will no longer be available in the next cycle; this is forbidden in [3], as well
as in our solution, in order to reduce the amount of equivalent schedules
generated.

p
i
= 1; n
i
= 1: operation i has been scheduled previously and the result
remains available.
The complete scheduling is the Cartesian product of the above automata re-
stricted by several constraints, each one representing a particular allowed be-
2
The same model, if the sink is connected back to the source, can also be viewed as a safe
Petri Net. In this paper we use the automata-based notation for consistency with [11].
4
Cabodi et al.
havior.
TR(p; n) =
Y
i
(p
i
+ n
i
)  TR
dep
(p; n)  TR
res
(p; n)
The modeling automaton described by TR encapsulates all legal execution
sequences of a system. Fundamentally, it represents multiple legal execution
sequences via nondeterministic choices, yet a real implementation must make
deterministic choices. If nondeterministic choices are pruned to leave only one
deterministic choice, or if multiple choices are made deterministic by condi-
tions, then a nite state machine controller may be directly synthesized. The
criterion used to eliminate nondeterminism is usually minimum execution la-
tency. Variations of this exist for control-dependent behavior, where some
control cases might be more favored than others.
Let us briey summarize here dependency and resource constraints, since
they will be used in the sequel:

TR
dep
represents data dependencies, i.e., it is illegal to schedule an operation
with a predecessor that has not yet been scheduled:
p
i
n
j
is illegal for all i ! j data dependencies (dd)
TR
dep
(p; n) =
Y
i!j2dd
(p
i
+ n
j
)

TR
res
represents resource constraints. Let us have a resource set with b
resources of a given kind (e.g., multipliers) available, and a set  of opera-
tions competing for such a resource set. It is illegal to schedule more than
b concurrent operations from .
(p
i
n
i
 : : :  p
k
n
k
) with fi::kg 2  is illegal if jfi::kgj > b
Let S
0
(p) be the initial state of the scheduling product automaton, in which
no operation has been scheduled. The set of states reachable at the i-th clock
cycle may be computed by a standard iterative image computation:
S
i
(n) = Img(TR; S
i 1
) = 9
p
[TR(p; n)  S
i 1
(p)]
(1)
Valid schedules are represented by state paths that reach a nal set of states
in which terminal operations have been scheduled.
The exploration techniques presented here are directed by a minimum
latency objective. They determine whether, given all constraints imposed and
a target latency l, a valid execution sequence of length  l exists. With
control-dependent models, some additional validity criteria are imposed, and
speculative execution may allow some operations after a fork and before a join
to be scheduled before the condition evaluation has been scheduled.
5
Cabodi et al.
3 Handling control dependence through concurrency
Unlike the simpler case of data ow graphs, a witness schedule for control-
dependent models is not a single path in a path set but rather a set of paths
from start to nal states. Such a set of paths is called an ensemble schedule
in [3] and must include a path for each distinct control-dependent execution
sequence. For instance, a RISC processor must be able to execute all in-
structions and therefore an ensemble schedule for a RISC processor contains
sequences for every instruction. As a more specic example, consider some
control-dependent behavior that branches into two sets of behaviors depend-
ing on a true/false control resolution. An ensemble schedule for this example
must contain a path from the start state to the nal one that represents exe-
cution of true control resolution behavior, and another path that covers false
control resolution behavior.
In BDD-based formulation, this requires the introduction of control guard
variables, representing non-deterministic choices of each controlling operation.
A guard is a binary abstraction of the data value controlling a branching
condition. A \completeness" check (i.e., all guard values have reached the
terminal state) is added to termination conditions.
Furthermore, a validation procedure operates a backward pruning over
the state sets computed by forward BFS. Validation is the most expensive
symbolic operation and the main cause for BDD blow-up. It consists of a
preimage routine with universal quantication of control guards at control
resolution points. This is necessary to enforce causality (identical initial sub-
path) for outgoing paths at fork points.
Apart from complexity issues, branching schedules and the related vali-
dation steps are a major problem for a SAT-based formulation. In order to
avoid them, we interpret choice vertices as concurrent forks, and we transform
alternative branches into concurrent paths.
So we remove fork and join nodes from the CDFG, and we replace them
with unconditioned data dependencies. As a result, a CDFG becomes a DFG,
and SAT can explore simultaneously all conditional branches of the original
CDFG.
Figure 2 shows the above transformation applied to the example of Fig-
ure 1.
Fork and join have been removed, control dependency maintained (as data
dependency) for the operations following the join recombination. Therefore,
in our solution joins work as synchronization points, as no operation following
a join is allowed to be executed if both the branches of the control resolution
have not been completed yet. This means that our model does not allow
control prioritization (as we always have the worst delay), but we have no
loss if the objective is minimizing the worst case execution latency. Moreover,
since we remove the dependency at forks, speculation is still allowed.
6
Cabodi et al.
else
if (x>0)
  y = x + 1
  y = x − 1
y
x
>
CDFG source
op2 op3
CDFG sink
op1
Data Dependency
−+
Control Dependency
(a)
(b)
Fig. 2. A CDFG after fork/join removal.
4 Concurrent forks and resource constraints
The articial fork concurrencies we have introduced have no side-eect in
the case of scheduling with unbounded resources. In fact, given any set of
concurrent operations, if a \large enough" set of resources can be allocated,
all operations may always be executed.
The case of bounded resources is less trivial. In this case not all concurrent
executions are \real" concurrencies, since some of them are just articial. As a
consequence, not all concurrent operations are competing for resources. As a
direct outcome, we need to modify resource constraints, to take into account
that some operations could be allocated to the same resource at the same
execution time.
More specically, let us work on a CDFG with a set of N operations
Op = fop
1
; :::; op
N
g, each one mapped to (i.e., executable by a resource of)
a resource class within the set R = fr
1
; :::; r
M
g. The generic resource class
r
i
is characterized by a bound b
r
i
, representing the amount of operation unit
available for that class, whereas n
r
i
is the total number of operations in Op
mapped to the r
i
resource class. The resource bound problem is obviously
trivial for class r
i
if n
r
i
 b
r
i
, since there can never be a request of resources
greater than the available ones (as for the case of innite resources).
A much more challenging problem is the case of resource bounds actually
reducing the amount of possible concurrencies. Let op
i
and op
j
be two opera-
tions mapped to the same resource class, scheduled for concurrent execution
(there is a state transition where p
i
n
i
p
j
n
j
holds). Then, resource allocation
may fall in one of the following three cases:

Unconditioned concurrency. The two operations do not belong to dierent
conditional branches in the original CDFG, so their concurrency is a \real"
one, requiring the allocation of two resources.

Mutual exclusion. The two operations are controlled by mutually exclusive
conditions, i.e., they are on dierent branches of some fork in the original
CDFG. Their concurrence is articial, so just one resource is required.
7
Cabodi et al.

Speculative execution. Speculation occurs whenever an operation is exe-
cuted before its controlling condition is resolved. If op
i
and op
j
are both
executed before their distinguishing condition is known, concurrence is real,
and two resources are required.
In other words, we may have couple of operations for which concurrency
might be unconditioned (rst item in the above list), and other ones charac-
terized by conditioned concurrency (second and third item).
4.1 Resource bounds within the SAT solver
Resource bounds can be accounted for directly by a SAT solver. In this solu-
tion the SAT solver has to be properly modied in order to count the allocated
resources while recursively building a scheduling solution. This is a special pur-
pose solution to follow only in the case the generated overhead is negligible.
It basically relies on identifying active operations through variable decisions
and implications, and keeping resource allocation counters. A resource conict
occurs whenever an allocation counter is greater than the allowed bound.
4.2 Resource bounds as a Boolean constraint
Although the above solution is feasible, we prefer exploring an alternative
one, that is compatible with a generic SAT solver, since no modication to
the SAT algorithm is necessary. We simply follow BDD-based approaches,
by generating a resource constraint for the transition relation (TR
res
), which
lters out invalid sets of concurrent executions.
There are various strategies for building such a constraint as a Boolean
function returning true on allowed sets of operation executions.
4.2.1 Cliques of concurrency graph
A straightforward approach works on the graph of possible concurrencies,
where operations are nodes and edges connect pairwise concurrent operations.
Such a graph can be generated as the transitive closure of a graph where pairs
of operations are concurrent if no data dependency connects them and no
resolved control makes them mutually exclusive. The graph can be viewed
as an upper bound of concurrencies within a schedule. Given the projection
of the concurrency graph to resource class r
i
, cliques of size larger than the
allowed bound (b
r
i
) are forbidden.
This is an attractive solution, especially for explicit enumeration, but it is
practically limited to small cases, due to its binomial complexity. In particular,
it blows up in problems with high degree of concurrency, such as, for example,
models of pipelined behaviors.
8
Cabodi et al.
4.2.2 R-combination ltering function
A more eÆcient formulation, for the case of unconditioned concurrency, is
proposed in [2] for BDD representation. If we omit considering data depen-
dencies, and we simply work on operations of the same resource class, then
the resource bound constraint is an r-combination expression, selecting com-
binations of up to b
r
i
operations out of n
r
i
. We call this ltering function
Rlter(Op; bound). Its size complexity, when expressed as a BDD instead of a
two level form, is O(n
r
i
 b
r
i
), i.e., number of operations mapped to the class
times the bound for the class. The function is easily translated to CNF format
(with intermediate additional variables), with similar complexity.
Unfortunately, as previously shown, we have conditioned (i.e., articial)
concurrencies, that complicate our model compared to [2], and make the above
solution exponential in the number of control choices (forks): we should ex-
pand one instance of concurrency graph for each case of resolved/unresolved
control operation.
4.2.3 Hybrid two-level approach
Since none of the two previous approaches alone is able to eÆciently solve
our problem, we developed a hybrid technique, which follows the concurrency
graph strategy locally, within control components of the CDFG, and the r-
combination approach on a global perspective.
More in detail, we express the resource constraint function (for a given
resource class r
i
) as a composition of two sub-functions
TR
r
i
res
(p; n) = Rlter(Alloc(p; n); b
r
i
)
The outermost function is (a slight modication of) the previously de-
scribed r-combination lter, whereas Alloc is a function that remaps operation
transitions to a set of allocation variables, with the following rules:

Each uncontrolled operation op
i
is remapped to an allocation variable a
i
=
p
i
n
i
, which evaluates true when the operation is executing.

Controlled subsets of the CDFG (subgraphs included between fork and join
nodes) are globally remapped to a proper set of allocation variables, over
whom the Alloc functions returns a number of ones exactly corresponding
with the amount of resources required. So all articial concurrencies and/or
speculations are taken into account by this function.
The composition is never computed explicitly, but intermediate allocation
variables are kept and transferred to the CNF formulation of TR
res
, which
allows us to face the main size bottlenecks: (1) The complexity of conditional
concurrency (function Alloc) is kept within small regions of the CDFG. Espe-
cially for the important case of looping and/or pipelined behaviors, modeled
by serial and parallel instances of the same reference CDFG, this makes the
size of Alloc linear in the number of serial/parallel instances. (2) Rlter, the
9
Cabodi et al.
function taking care of the overall problem, has size O(n
r
i
 b
r
i
), i.e., it is lin-
ear in the number of operations, for a given resource bound. As an overall
result, our result constraint function is scalable, and well suited for looping
and pipelined behaviors, which are the most diÆcult problems in BDD-based
approaches.
4.2.4 Implementation details
Figure 3 and 4 show the pseudo-code of the Alloc and Rlter functions respec-
tively. For sake of simplicity it is assumed that all operations in the CDFG are
mapped onto the same resource class, for which maxAlloc units are available.
In our implementation, every operation is labeled with two attributes: (1)
the set of all possibly concurrent nodes and (2) a BDD representing the control
case for which the operation is enabled. Actually, in order to cover speculation,
the meaning of such a BDD is that the operation is disabled if the evaluation
of the BDD for the already resolved controls returns 0.
As regards the Alloc function, all possible cliques (over the set of opera-
tions belonging to the received sub-graph) are recursively built by means of
the auxiliary generateCliques function. At each level of recursion, a new node
is added to the previously generated clique, checking for speculative execution.
In fact, the AND between the node's enable and the clique's enable returns a
0 result only if the current node and at least one node already belonging to the
clique are in two dierent control branches. Therefore, the concurrency of the
node w.r.t. the clique is real only if the controlling operations discriminating
the branch are not resolved yet (i.e., the nodes in the new clique are exe-
cuted speculatively). Such controlling operations are therefore added to the
unresolved set. The transition corresponding to the new clique is then stored
as a BDD, and the cliques of bigger sizes are built (the set of possibly con-
current nodes being restricted as the clique has to be completely connected).
Eventually, the last loop in the Alloc function denes the allocation variables:
variable a
i
takes a value of 1 i there is a transition in the current sub-graph
involving the usage of at least i resources of the current resource class.
Once all the resource cliques have been generated, the Rlter function
symbolically builds all valid transitions in terms of the allocation variables.
To do this, it combines the allocation variables coming from the dierent calls
to the Alloc function to form an expression representing all possible illegal
allocations (i.e., those requiring at least maxAlloc+1 resources). Then the
complementation of such expression, which indeed represents all allocations of
at mostmaxAlloc resources, is returned (and then directly used as a component
of TR).
4.2.5 A small example
Let us consider again the CDFG shown in Figure 1 and let us assume that
all the operations are mapped on a single ALU. The CDFG is divided by the
algorithm into two sub-graphs: the rst is composed by the comparison only,
10
Cabodi et al.
Alloc (graph)
for (i 1 TO maxAlloc+ 1)
tList[i] BDD ZERO
for (node 2 graph:nodesSet)
cliqueSet ;
unresolved ;
enable BDD ONE
generateCliques(tList, node, cliqueSet, unresolved, enable, node:concur)
for (i 1 TO maxAlloc+ 1)
graph:a
i
 new var
TR  BDD AND(TR, BDD XNOR(graph:a
i
, tList[i]))
generateCliques (tList, node, cliqueSet, unresolved, enable, concurSet)
if jcliqueSetj > maxAlloc
return
newEnable  BDD AND EXIST(enable, node:enable, unresolved)
newUnresolved unresolved
if BDD IS ZERO(newEnable)
newUnresolved unresolved [ conictingControls(enable, node:enable)
newEnable BDD AND(BDD EXIST(enable, newUnresolved),
BDD EXIST(node:enable, newUnresolved))
newClique cliqueSet [ node
tList[jnewCliquej] BDD OR(tList[jnewCliquej],
transition(newClique, newUnresolved))
newConcur  concurSet \ node:concur
for (op 2 newConcur)
generateCliques(tList, op, newClique, newUnresolved, newEnable, newConcur)
Fig. 3. The Alloc function.
Rfilter ()
allocations[0] BDD ONE
for (k  1 TO maxAlloc+ 1)
allocations[k] BDD ZERO
for (i 1 TO N
graphs
)
newAllocations allocations
for (j  1 TO maxAlloc+ 1)
for (k  0 TO maxAlloc+ 1)
if j + k > maxAlloc+ 1
break
alloc BDD AND(allocations[k], graph
i
:a
j
)
newAllocations[j + k] BDD OR(newAllocations[j + k], alloc)
allocations newAllocations
return BDD NOT(allocations[maxAlloc+ 1])
Fig. 4. The Rlter function.
whereas the second includes both the ADD and SUBTRACT operations. Then
the relations dened by the two calls of the Alloc function are respectively:
a
C
1
= p
C
n
C
; a
C
2
= 0
11
Cabodi et al.
and
a
AS
1
= p
A
n
A
+ p
S
n
S
; a
AS
2
= p
A
n
A
p
S
n
S
p
C
Indeed the rst sub-graph may present only a transition requiring 1 ALU
unit, whereas the second sub-graph might require 2 ALU units, but only in
the case that the ADD and SUBTRACT operations are both executed before
the control resolution has been solved. Eventually, the nal constraint built
by the Rlter function is:
constraint = a
C
2
+ a
C
1
a
AS
1
+ a
AS
2
5 BMC Formulation
Once we have generated the transition relation of the CDFG, as previously
described, we have to produce the verication problem which will give us the
scheduling solution. This is done by unrolling the transition relation a certain
number of times and then trying to prove the mutual reachability between
initial and nal states.
The BDD representing the transition relation (in monolithic or conjunctive
form) is stored as a CNF formula as described in [13].
The verication strategy usually starts with a path of length equal to 1 and
increases it till the problem is solved or computation resources are exceeded.
For the above reasons the technique works well in falsication and partial
verication, whereas full verication is usually achieved by BMC with longer
and longer bounds.
Our problem is somehow simpler as, with a proper number of registers,
there is always a solution to the scheduling problem. Moreover, our experience
shows that unsatisable problems are much harder to solve than satisable
instances. To this respect SAT solvers often present an exponential behavior
as Figure 5 shows.
For these reasons the standard previously described technique proved to
be quite ineÆcient. On the contrary we do have an estimate of the maximum
latency, which is equal to the number of operations in the CDFG. This suggests
a second strategy, namely starting from the highest bound and decreasing it in
order to nd the rst unsatisable instance. The drawback of this method is
that the estimate of the maximum latency can be extremely inaccurate. As a
direct consequence, we propose a solution adopting a binary search. Starting
with an estimate of the optimal latency, we create the corresponding CNF
problem and call the SAT solver giving it a (small) time limit. Accordingly
to the result produced by the solver, the estimate of the latency is corrected,
and a new bound is tried. Notice that if the SAT solver is unable to solve the
CNF problem within the time limit, we consider the instance as unsatisable.
In general, this might lead to incorrect (i.e., sub-optimal) results, in the sense
that a satisable instance may be considered as unsatisable, but the problem
can be solved simply increasing the \unsat" threshold, with an at most linear
12
Cabodi et al.
 0
 200
 400
 600
 800
 1000
 1200
 0  2  4  6  8  10  12  14  16  18
CP
U 
Ti
me
Bound
rotor benchmark
Fig. 5. SAT CPU Time Versus SAT Bound. The data are obtained with a run of
the scheduler on the rotor benchmark with two iterates and a resource availability
corresponding to the rst row of table 2.
loss in performances.
6 Experimental Results
We show experimental results on well known benchmarks [2]. Table 1 shows
the complexity of the benchmark set in terms of number of operations, and
number of conditions checked.
Circuit # Operations # Conditions
rotor 28 3
s2r 48 6
fdct 42 0
Table 1
Circuit Complexity in terms of Number of Operations and Conditions Checked.
The data are referred to the acyclic version of the model, i.e., with just one iterate.
We ran our experiments on a 1700 MHz Pentium IV with 1 GByte of main
memory. For all the experiments we used BerkMin [12] as SAT engine.
Table 2 summarizes our results. We compare the results obtained with the
strategy presented in this paper with the software presented in [2] and locally
re-run. More in detail, our data are obtained adopting the binary search (as
described in the previous section) with a threshold of two minutes. Notice that
all the satisable instances were well recognized by the SAT solver; indeed the
numbers of scheduled cycles represent the true optimal latencies.
The meaning of columns is the following: # Iterates indicates the num-
ber of parallel instances considered (when 1, we refer to the acyclic problem,
13
Cabodi et al.
Circuit # Iterates # Resources # Cycles BDD [2] SAT { This Paper
# BDD Time # Vars # Clauses Time
[nodes] [s] [s]
rotor 1 1T,1C,1A 12 74606 0.7 994 4116 0.3
1 1T,1C,2A 8 74606 0.7 892 4232 0.3
1 1T,1C,2A,1 10 84826 0.8 1142 4174 0.4
1 1T,1C,2A,2 8 84826 0.7 920 4152 0.3
2 1T,1C,2A,1 10 871766 4.6 4749 36047 10.0
2 1T,1C,2A,2 9 1447152 6.8 4392 32873 10.8
2 1T,1C,2A,3 9 1864128 8.2 4428 33026 8.6
2 1T,1C,3A,2 8 415954 2.7 4299 31503 6.7
3 1T,1C,2A,1 12 18635148 1524.0 9573 72436 118.9
3 1T,1C,2A,2 12 OVF   9861 73612 308.9
3 1T,1C,2A,3 12 OVF   9957 74008 288.3
3 1T,1C,3A,2 9 OVF   8229 59803 37.6
s2r 1 1T,1C,2A,1 10 1006670 5.9 2532 12484 1.8
1 1T,1C,3A,2 9 532462 4.1 2788 14774 2.2
1 1T,{C,2A,2 8 411866 3.0 4749 12832 1.7
2 1T,1C,2A,1 13 OVF   12158 109623 328.6
2 1T,1C,3A,2 10 OVF   10839 87817 66.8
2 1T,{C,2A,2 10 OVF   10539 87137 62.9
fdct 1 1+,1 ,1 19 306600 1.5 2775 8362 133.7
1 1+,1 ,2 13 200312 1.2 2138 6772 1.1
2 1+,1 ,1 32   OVF 19121 223433 522.0
2 1+,1 ,2 26   OVF 17103 188719 454.6
Table 2
Schedule Results. Terminology for columns # Resources: ADD=+, ALU=A,
COMPARATOR=C, SUB= , MULT=, LookUpTable=T. MULT is a two-time
steps pipelined multiplier (when not present, multiplications are performed by the
ALU). All other resources are single time step. OVF indicates overow (in terms
of memory or CPU time). We use a time limit equal to 1 hour and a memory
limit equal to 500 MBytes.
otherwise we are handling a looping behavior); column # Resources indicates
the number and type of resources allowed; # Cycles is the nal solution in
term of scheduled cycles. For each experiment we report the data obtained
with [2], i.e., the number of BDD nodes and the CPU time required, and with
our method (number of variables and clauses generated for the CNF problem
corresponding to the solution, and the total CPU time).
Overall, we can make the following observations. For acyclic problems, the
times required by the two compared methods are quite similar (with only one
exception, the rst experiment for fdct). However, when we move to looping
behaviors, while the method used in [2] becomes unfeasible, our strategy still
produces the optimal result in a limited amount of time. These experiments
14
Cabodi et al.
demonstrate that our solution can be very eective.
7 Conclusions and Future Work
We present a new approach for symbolic scheduling based on a new problem
formulation and the use of SAT solvers and BMC verication methodology.
Experimental results on DFGs and CDFGs show that our solution can be
very eective and competitive with symbolic BDD-based techniques.
Future work will include investigation of better strategies for the CNF
problem generation and solution searching.
References
[1] F. Balarin, E. Sentovich, M. Chiodo, P. Giusto, H. Hsieh, B. Tabbara,
A. Jurecska, L. Lavagno, C. Passerone, K. Suzuki, and A. Sangiovanni-
Vincentelli, \Hardware-Software Co-design of Embedded Systems { The POLIS
Approach," Kluwer Academic Publishers, 1997.
[2] S. Haynal, \Automata-Based Symbolic Scheduling," PhD thesis, University of
California Santa Barbara, Dec. 2000.
[3] S. Haynal and F. Brewer, \EÆcient Encoding for Exact Symbolic Automata{
Based Scheduling," Proc. IEEE ICCAD'98, pages 477{481, San Jose, California,
Nov. 1998.
[4] S. Haynal and F. Brewer, \Automata-Based Scheduling for Looping DFGs",
Internal Report EC99 14, Oct. 1999.
[5] http://ftp.ics.uci.edu/pub/hlsynth/fHLSynth92,HLSynth95g.
[6] http://www.synopsys.com/products/logic/design compiler.html.
[7] C. T. Hwang, J. H. Lee, and Y. C. Hsu, \A Formal Approach to the Scheduling
Problem in High-Level Synthesis," IEEE Trans. on Computer-Aided Design,
10:464{475, Apr. 1991.
[8] K. Khouri, G. Lakkshminarayana, and N. Jha, \High-level synthesis of low-
power control-ow intensive circuits," IEEE Trans. on Computer-Aided Design,
18(12):1715{1729, Dec. 1999.
[9] A. C. Parker, J. T. Pizzarro, and M. Mlinar, \MAHA: A Program for Datapath
Synthesis," Proc. IEEE/ACM ICCAD'91, pages 461{466, Las Vegas, June 1986.
[10] P. G. Paulin and J. P. Knight, \Force{Directed Scheduling for the Behavioral
Synthesis of ASICs," IEEE Trans. on Computer-Aided Design, 8:661{679, June
1989.
[11] I. Radivojevic and F. Brewer, \A New Symbolic Technique for Control-
Dependent Scheduling," IEEE Trans. on Computer-Aided Design, C{15(1):45{
57, Jan. 1996.
15
Cabodi et al.
[12] E. Goldberg and Y. Novikov, \BerkMin: a Fast and Robust SAT-Solver," Proc.
IEEE/ACM DATE'02, pages 142{149 Paris, Feb. 2002.
[13] G. Cabodi and S. Nocco and S. Quer, \Improving SAT-based Bounded Model
Checking by Means of BDD-based Approximate Traversals," Proc. IEEE/ACM
DATE 2003, pages 898{903, Munich, Germany, March 2003.
16
