Constrained scheduling of VLSI algorithms by Aarts, E.H.L. (Emile) et al.
Constrained Scheduling 
of VLSI Algorithms 
Emile Aarts1 2 
Ja.n Korst1 
J a.n Karel Lenetra3 ~ 
Abstract 
The problem is studied of scheduling VLSI algorithms on signal-processor 
chips. The objective is to find aehedulea that minimize in a hierarchical way 
(1) the makespan and (2) the number of required storage units. The paper 
concentrates on a formulation of the corresponding constrain.kl scheduling 
problem. In addition, some related complexity issues are addressed, and 
aome preliminary results are diacussed obtained with a aimul&ted annoa.ling 
based approximation algorithm. 
Keywords: Combinatorial Optimizotion, Oonatra.in.d Scheduling, Simulated 
Annealing, VLSI Design. 
1. Introduction 
The complexity of Very Largo Scale Integrated (VLSI) systems necessitates the 
use of Computer Aided Design (CAD) methods. These methods generally start off . 
with a. high level specification of a VLSI algorithm which is transformed into a low 
level implementation (a chip layout) through a sequence of synthesis steps, where 
ln each successive step more detail is added. This process is often ea.lied silicon 
compilation, by a.na.logy with software compilation; for an introductory review see 
the special issue of Computer [2). 
The various synthesis steps can be viewed as complex decision processes, i.e. 
the "best implementation" must be chosen among a. potentially very large number 
of alternatives. A number of problems within these decision processes can be 
mathematically formulated as combinatorial optimization problems. Well-known 
examples a.re placement a.nd routing problems [2). 
In this paper, we consider the class of combinatorial optimization problems in 
VLSI design tha.t is related to the problem of finding i3Jl efficient and effective map-
ping of a VLSI algorithm onto a special class of VLSI circuits, i.e. signal processors. 
1 Philips Research Laboratories, P.O. Box 80.000, 5600 JA Eindhoven, the Netherlands 
0 Eindhoven University of Technology, Department of Mathematics and Computer Sci-
ence, P.O. Box 513, 5600 MB Eindhoven, the Netherlands 
•Centre for Mathematics and Computer Science, P.O. Box 4079, 1009 AB Amsterdam, 
the Netherlands 
'Econometric Institute, Erasmus University, P.O. Box 17381 3000 DR Rotterdam, the 
Netherlands 
240 
Such a signal processor can be viewed as a general data processing structure con-
taining a.n arithmeti( 8ection all.d a storag~ section [lOj. The arithmetic section is 
composed of a limited number of concurrently operating arithmetic units which 
carry out elementa.ry operations sucll as multiplica.tions, additions and subtrac-
tions. The storage section acco=odates a number of storage units that are used 
for intermediate storage of the variables used in the VLSI algorithm. 
Mapping a. VLSI algorithm onto a signal processor poses two minimization objec-
tives: 
1. As a result of the typical field of application (audio, video, and telecommu-
nieation), data processing ta.kes place at high frequencies. Consequently, the 
mapping should ta.Jee as much advantage as possible of the inherent paral-
lelism of the VLSI.algorithm, resulting in a. minima.I execution time. 
2. The economic yield of a VLSI circuit strongly depends on the silicon area 
occupied by it. In practice it turn out that the area of a signal processor is 
ma.inly determined by the storage section. Hence the mapping should yield 
a. minima.I number of storage units. 
The approach pumued in this paper assumes a hierarchy between these two objec-
tives. The execution time is of prime importance: the number of storage units is 
minimized subject to minimum execution time. 
The remainder of this paper focusses on a. ma.thematical fonnulation of the con-
strained minimization problem introduced above and a discussion of some related 
complexity issues (§2). Furthermore, we present some results of a first approach 
to solve the problem (a.pproxima.tely) by simula.ted annealing (§3). 
2. Problem Formulation 
Consider a signal processor consisting of a collection A= {1,2, ... , n} of n types 
of arithmetic units, each type j with multiplicity "" and a. VLSI algorithm repre-
sented by a eomputation scheme (O,t,R,-+), where 
- 0 ie the set of N (indivisible) opcratiorw in the VLSI a.lgorithm, 
- t epecifiea for each operation i E 0 the execution time t1 E z+, 
- R specifies for ea.eh operation i E 0 the type of arithmetic unit R; E A 
necessary to execute operation i, and 
- -> is a precedence relation which induces a partial ordering on O. More 
specifically, i -+ i implies tha.t i must be completed before j can be started 
For a given signal processor, a given computa.tion scheme (O,t,R, ->) and a given 
deadline D, a. flllJaiblc s'11iedule is defined as a set S = {S1i ••. , SN} of starting 
times 81 E z+ for all i E 0 satillfying the following constra.ints: 
( 1) the resource constraints, i.e. the maximum number of available arithmetic 
units of each type j ma.y not be exceeded at any time instance k, i.e. 
2.41 
(ii) the precedence constraints, i.e. Si 2: (S;+t;) whenever i-+ i, for all i,i E 0, 
and 
(iii} the deadline D, i.e. {S1 + t;} :5 D for all i E O. 
The makeapan M of a feasible schedule S is defined as the maximum completion 
time of the operations, i.e. M = 31Ji1C{S; + t;}. 
The problem imposed by objective l, can now be formulated as follows: 
PRECEDENCE AND RESOURCE CONSTRAINED SCHEDULING 
Given a computation scheme (0,t,R,->), a deadline D, and bounds n1 on the 
number of arithmetic unit.a of type i, find a feasible schedule. 
With respect to the complexity of PRECEDENCE AND RESOURCE CON-
STRAINED SCHEDULING, we can state the following theorem: 
Theorem 1 
PREOEDENOE AND RESOURCE OONSTRAINED SCHEDULING ia NP-hard. 
Proof 
We consider two special C813e8, by relaxing firstly the resource constraints, i.e. for 
n "" 1 and n.1 == m, a.nd secondly the precedence constraints, i.e. for -+ empty. 
These special cases are called PRECEDENCE CONSTRAINED SCHEDULING 
and RESOURCE CONSTRAINED SCHEDULING, respectively, and both are 
known to be NP-complete [3}. 0 
Clearly, we aim at solving this problem with deadline D equal to M,,..,., the mini-
mum makespan. 
At this point we mention that for most instances of the constrained scheduling 
problem described a.hove, even if D = M,.;,,, the number of feasible achedules is 
large. This is intuitively clear from the following arguments. Each feasible sched-
ule with a minimum makespan contains one or more critical paths. Any change in 
the start times of one of the operations on a critical path leads to an in<:rement of 
the makespan. For the operations not on a critical path, there usually exist time 
intervals such tha.t the starting times of the operations may vary within the limits 
of these intervals without changing the makespan or violating the resource and 
precedence constraints. This so-called alack can be used to minimize the number 
of storage units, our secondary objective. 
Before discussing the problem imposed by both objectives 1 and 2, we consider the 
storage requirements of a given feasible schedule. To model the storage requir&-
ments of a VLSI algorithm. we introduce the concept of /i/stimr, intervala. Let r 
denote the set of variables used in the algorithm. An operation i E 0 uaea the 
values of one or more variables and assigns new values to one or more variables. 
Here, we distinguish between aingle aaaignment, i.e. each va.ria.ble i € X receives 
at most one value during the execution of the algoritlun, and multiplt a1Bignment, 
i.e. variables may receive more than one value. For a given feasibl11 schedule, a 
varia.ble i is said to be olive between the completion tim.e Si +t; of operation j that 
242 
usignll a (ne\'-) value to ;' and the starting time S; of operation k that uses this 
value for the last time. In the case of single assignment, the lifetime of variable 
j is denoted by the lifetime interval (S; + t;, Stl· In the case of multiple assign-
ment, with each variable i a. set of lifetime intervals is associated, each interval 
corresponding to a. different value of i. If a V&riable is alive, then a 5torage unit is 
required to store it• value. To simplify the control of the signal processor as much 
as possible, it is required that the subsequent values of one variable are all stored 
in the same storage unit. 
3 
1K72(l(.:::3) 
3~4 
(a) 
'1 
<Ii\ l I 2 < 
J Jf. ' l I 
i ';i :31 ~ I 
:1~ 
~ 
' 
1M2(Jl.=4l 
3~4 
(b} 
Figure 1: Lifetime analysis and corresponding interval graph in th• c11Se of single 
assigmnent {a), and multiple assignm•nt {b}. 
Evidently, two variables that a.re simultaneously alive, cannot be stored in the 
sa.me storage unit. Hence, the problem of finding a minimal number of storage 
units for a given schedule S can be fon:Dulated as follows; 
VARIABLE STORAGE 
For a giv= feasible S<:hedule S, find a mapping f: X ..... {1,2, ... ,p} such that 
f(i) 'I J(j) whenever the varia.bles i and i (i 'Ii) a:re simulateously alive, and p, 
the number of required storage units, is minimal. 
To discuss the complexity of VARIABLE STORAGE, we distinguish between sin-
gle and multiple assignment. 
Theorem l 
VARIABLE STORAGE C<In ~ sofoal in polynomial time in t/Le eiue of single as-
signment. In the ca1e of multipfr 4Bngnment, the problem ia NP-hard. 
Proof 
For single llBllignment, VARIABLE STORAGE ca.n be solved iJ:I O(JXI log IXI) by 
the left ulge algorithm IS). In the case of multiple assignment, we consider a special 
caae of VARIABLE STORAGE where for each variable the number of different 
valuee ill at moat. two. The problem REGISTER SUFFICIENCY FOR LOOPS, 
243 
which is known to be NP-complete [3], reduces to this special case. O 
Now, the problem imposed by objective 2 can be formulated as follows: 
SCHEDULING TO MINIMIZE VARIABLE STORAGE 
Given the set of feasible schedules, specified by a computation scheme ( Q, t,R,-+ ), 
a deadline D, and b-iunds n; on the number of arithmetic units of type j, find a 
feasible schedule that minimizes the number of required storage units. 
We assume that the set of feasible schedules is not listed explicitly but that it can 
be specified by some compact representation. With respect to the complexity of 
SCHEDULING TO MINThfIZE VARIABLE STORAGE, we can state the follow-
ing theorem: 
Theorem 3 
SCHEDULING TO MINIMIZE VARIABLE STORAGE is NP-hard. 
Proof 
We consider the special case where n = 1 and n1 = IOI (no resource constraints) 
and X = {l, ... , m + l}. For e&eh variable i E X three operations are defined, 
say i1 , i 2 and is, where operation i 1 a.ssigns a value to i and operations i2 and is 
use this value (single assignment). Evidently, we have IOI = 3(m + 1). Further-
more, i 1 --> i 2 , i 2 -+ i8 , t;, = t,, = O, and t;, = 11;. By choosing am+I = D, and 
L::,1 11; = 2D, it is straightforward to show that PARTITION, which is known to 
be NP-complete 13], reduces to this special case of SCHEDULING TO MINIMIZE 
VARIABLE STORAGE. D 
Now, the problem imposed by both objectives 1 and 2 can be formulated as fol-
lows: 
COMPUTATION SCHEME SCHEDULING 
Given a. computation scheme ( Q, t, R,-->) and a deadline D, and bounds ni on the 
number of arithmetic units of type j, find a feasible schedule that minimizes the 
number of required storage units. 
Theorem 4t 
COMPUTATION SCHEME SCHEDULING is NP-hard. 
Proof 
Evidently, COMPUTATION SCHEME SCHEDULING is NP-hard since finding 
a schedule that is feasible is already NP-hard (Theorem 1) and even if the set of 
feasible schedules is given the problem remains NP-hard (Theorem 3). 0 
We end this section with the following remarks. VARIABLE STORAGE reduces 
to GRAPH COLOURING {3] by constructing the interual graph G = (V, E) where 
V = X and { i, ;'} E E if variables i and j a.re simultaneously alive. The number 
of required storage units is then given by the chromatic number x of G (see Fig-
ure 1). Furthermore, it is easy to show that GRAPH COLOURING reduces to 
VARIABLE STORAGE with multiple assignment. 
244 
3. Solution Approaches 
So far the literature does not present algorithms (neither optimization nor approx-
imation algorithms) for solving the combined VLSI scheduling problem described 
above. However, a number of approaches have been reported to the separate 
problems. For pr-0blems related to PRECEDENCE AND RESOURCE CON-
STRAINED SCHEDULING see !4,7,8,10]; for VARIABLE STORAGE see [8,10). 
Here we report on a first approach to the combined problem, i.e. COMPU-
TATION SCHEME SCHEDULING with D equal to the minimal makespan. Our 
approach is baaed on a combination of heuristic algorithms, approximating an op-
timal solution -0f the problem by following the two step decomposition imposed by 
the two minimization objectives given in the introduction. 
Step 1: A near-optimal feasible schedule is determined for the PRECEDENCE 
AND RESOURCE CONSTRAINED SCHEDULING problem by using a simple 
dispatch rule [7], which schedules operations according to the following priority 
criterion: Ltvolling [1] of the precedence relation yields for each operation the 
earliest possible starting time (neglecting the constraints induced by the limited 
number of arithmetic units of each type). These operations are placed in a waiting 
list, which gives for each instance of time the operations that can be started at 
that time. Next, the time instances of the waiting list a.re examined consecutively, 
starting off at time zero. If the operations in the waiting list at a given time 
instance do not lead to conflicts, i.e. if the number of available arithmetic units 
is not exceeded, then the operations are scheduled at the time indicated by the 
waiting list. Otherwise, a conflict free subset of operations from the waiting list 
is selected, whilst the operations that are not selected are moved to the next time 
instance in the waiting list. The rule for selecting a conll.ict free subset is baaed on 
the length of the critical path of the operations in the corresponding precedence 
graph [4,8,lOJ, such that operations with a long critical path are scheduled first. 
Step 2: Given a near-optimal feasible schedule with ma.kespan M, then the COM-
PUTATION SCHEME SCHEDULING problem is solved approximately by using 
the principle of simulated annealing. This is a randomized veraion of neighbour-
hood search, a.pplying a atochastk aueptance rule, which allows the algorithm to 
eventually escape from local optima.; for a.n overview see [6j. For this, the aoJu-
tiori space is given by all feasible schedules with makespan M. The cost -0f ea.eh 
solution is given by the chromatic number of the interval graph that goes with a 
given solution, and is approximated by the well-known graph colouring algorithm 
of Welsh and Powell [9). A ntighbour of a given solution is determined by taking 
a noncritica.l operation and selecting a new starting time within its 5Jack interval 
such that the schedule remains feasible. 
Preliminary results obtained with our approximation algorithm show that 
reductions of the number of storage units up to 503 can be obtained (relative to 
the number of units required by the schedule obtained in step 1). Computation 
times, however, are large, i.e. up to several hours on a VAXll/750. This is mainly 
due to the use of the simulated annealing algorithm. 
245 
Further research will concentrate on investigating the potentials for speeding up 
the l!.lgorithm, predominantly by using more sophisticated neighbourhood struc· 
tures. Other research efforts will concentrate on a combined approach in which 
the makespan and the number of storage units are minimized simultaneously, i.e. 
instea.d of following the hierarchical decomposition imposed by objectives 1 a.nd 2, 
a. otra.tegy is pursued that uses a weighted sum of the ma.kespan and the number 
of atora.ge units as a. cost function. 
References 
{l] Aho, A.V., and J.D. Ullman, Principle• of Oompiltr Dtaigri, Addioon-Wesley, Read-
ing M&8S. 
[2] Oompuler, 19(1986). 
[3] Garey, M.R., and D.S. Johnsan, Oomputera 4nd Jntr4cl116ilily: A Guide to tht Thtor11 
of NP-Completene,., Freeman, San Fransisco, 1979. 
[4] Gooaaena, G., J, Rabeay, J. Vanhoof, J. Vandewalle and H. De Man, /ul. Ellicient 
Microcode-Compiler for Cuatom Multiprocessor DSP-Syatema, Proc. Jnt. Con/. on 
Computer Aideil De•ign, Santa Clara, California, 1987, pp. 24-57. 
[5] Hashimoto, A., and J, Stevens, Wire Routing by Optimizing Channel Aasignment 
within Large Appertures, Proc. Bth Deaign Aulom11tian WorlcahDp, 1971, pp. 155-169. 
[6] Laarhoven, P.J .M. van, and E.H.L. Aarta, Simu14ttd Annealing: Theoey 11nd Appli· 
cation.e1 D. Reidel Publiahing Company, Dordrecht, The Notherh1.11da, 1987. 
[7] Lawler, E.L., J.K. Lenatra and A.H.G. Rmnooy Kan, Recent Developments in De-
terministic Sequencing and Scheduling: A Survey, in: M.A.H. Demater, J.K. Lenatra 
and A.H.G. Rinnooy Kan (Eda.), Determini1tic and StDCh.utic Scheduling, D. Reidel 
Publishing Company, Dordrocht, The Netherlanda, 1982. 
[8] Tseng, C .J ., and D .P. Siewiorelc, Automated Synthesis of Data Pat ha in Digital Sys-
tems, IEEE Trana. on Oomputer-Aided Duign 5, 379 (1986). 
[9] Welsh, D.J.A., and M.B. Powell, A:JJ. Upper Bound for the Chromatic Number of a 
Graph and it. Application to Timetabling Problems, The Computer JDurnal 10, 85 
(1967). 
(10} Zeman, J., and G.S. Moecliyt1, Systematic Design and Programming of Signal Pro-
cenora, Using Project Management Techniques, IEEE Tr4n.t. on AeoU1tica, Speech 
and Sign11l PrDCcuing 31, 1536 (1988). 
