I. INTRODUCTION
ECENTLY, automatic data-path synthesis of a digital sys-R tem from a behavioral description has gained much attention in the CAD research community [1]- [25] . The synthesis task starts with a behavioral description of a digital system and a set of time and/or resource constraints. The goal is to produce a structure of the digital system that satisfies the constraints. It includes four subtasks. The first subtask is to describe the behavior of the digital system using a hardware description language (HDL). This step is usually followed by a translation of the description into a graph-based representation called the control data flow graph (CDFG). The next subtask is operation scheduling, where each operation in the CDFG is assigned to a control step. The third subtask allocates the resources for the digital system. Here, function units are assigned to execute the operations, storage units are assigned to store the values, and wires are allocated to interconnect them using the data transfer information derived from the CDFG. At this point, a data path is completed. Finally, based on the schedule graph and the data path, a control unit is synthesized to synchronize the executions of the operations.
Among the above steps, operation scheduling and hardware allocation are the two major subtasks. These two subtasks are Manuscript received April 7, 1989 ; revised January 19, 1990. This work was supported in part by the National Science Council, Republic of China, under Grants NSC78-0404-E007-13 and NSC79-0404-E007-24. This paper was recommended by Associate Editor R. K. Brayton.
The authors are with the Department of Computer Science, Tsing Hua University, Hsin-Chu, Taiwan 30043, Republic of China.
IEEE Log Number 9042077.
interdependent. In order to have an optimal design, a system should perform both subtasks simultaneously [l], [2] . However, due to the time complexity, many systems perform them separately Roughly speaking, operation scheduling determines the costspeed tradeoffs of the design. If the design is subject to a speed constraint, the scheduling algorithm will attempt to parallelize the operations to meet the timing constraint. Conversely, if there is a limit on the cost (area or resources), the scheduler will serialize operations to meet the resource constraint. Once the operations are scheduled, the number and types of function units, the lifetimes of variables, and the timing constraints are fixed. Thus a good scheduler is very important to an automated datapath synthesis system According to Gajski [ 171, it is "perhaps the most important step during the structure synthesis. " We address in this paper three scheduling problems, each with different requirements.
(PI) Time-Constrained Scheduling: Given constraints on the maximum number of time steps, find the cheapest schedule which satisfies the constraints.
(P2) Resource-Constrained Scheduling: Given constraints on the resources, find the fastest scheduling which satisfies the constraints.
(P3) Feasible-Constrained Scheduling: Given constraints on the resources and the time steps, decide if there exists a schedule which satisfies the constraints. Output the solution if it exists.
Instead of giving heuristic algorithms to schedule the operations of the CDFG as most systems do, we begin with a mathematic description of the scheduling objectives and constraints which can be translated easily into integer linear programming (ILP) formulations.
We also extend the formulations to various requirements which are encountered in the real world. The complexity of the number of variables in our formulation is 0 ( s n ) where s is the number of control steps and n is the number of operations. While formulating the equations, we try to reduce the solution space as much as possible. Experiments show that optimal solutions for a practical example such as the fifth-order filter can be obtained in a very short time.
This paper is organized as follows: Section I1 reviews previous work and related research. Section 111 gives the approaches and formulations of the three scheduling problems. Various extensions are introduced in Section IV. Section V shows the experimental results. Finally, concluding remarks are made in Section VI. give a good tutorial on the high level synthesis problem, where they show how the synthesis task can be decomposed into a number of distinct but not independent subtasks and give a survey on the techniques for solving these subtasks. A recent survey of the synthesis task is by Paulin [3] , where the concentration is on scheduling techniques.
In this section, some of the basic scheduling techniques are discussed. We also address some considerations which are peculiar to scheduling as a part of data-path synthesis. The integer programming technique, which is the technique used in this paper, is discussed in the last subsection.
Basic Scheduling Techniques
The simplest scheduling technique is as soon as possible (ASAP) scheduling [4] , [SI where the operations in the CDFG are scheduled step by step from the first control step to the last. An operation is called ready operation if all of its predecessors are scheduled. This procedure repeatedly schedules ready operations to the next control step until all the operations are scheduled.
As late as possible (ALAP) scheduling [8] performs a very similar procedure as ASAP. In contrast to ASAP, ALAP scheduling schedules the operations from the last control step toward the first. An operation is scheduled to the next control step as all its successors are scheduled. Fig. 1 gives an example of ASAP and ALAP scheduling.
Since it is not practical to assign too many operations of the same type into a control step due to the constraint on the number of function units, a variation of ASAP is to delay the ready operations when their number exceeds the number of function units. Selection of the operations to be delayed is arbitrary. This technique is called ASAP with conditional post-ponement [6]-t81.
The list scheduling technique [9]-[ 131, which was originally used in microcode compaction [9] , has been adopted by many high level synthesis systems. Similar to ASAP, the operations in the CDFG are assigned to control steps from the first control step to the last. The ready operations are given a priority according to heuristic rules and are scheduled into the next control step according to this predefined priority. When the number of scheduled operations exceeds the number of resources, the remaining operations are delayed.
The third type of scheduling is "global" in the way it selects the next operation to be scheduled and in the way it decides the control step in which to put it. There are two variations-freedom-based scheduling and force-directed scheduling. In freedom-based scheduling [ 121, [ 141, [ 151, the operations on the critical path are scheduled first. The operations not on the critical path are assigned one at a time according to their degree of freedom. In force-directed scheduling [3], [21] , "force" values are calculated for all operations at all feasible control steps. The pairing of operation and control step that has the most attractive force is selected and assigned. After the assignment, the forces of the unscheduled operations are re-evaluated. Assignment and evaluation are iterated until all the operations are assigned.
Among the above scheduling techniques, list scheduling requires that the number of function units be specified, while force-directed scheduling requires that the maximum number of control steps be specified. They correspond to resource-constrained and time-constrained scheduling, respectively.
Considerations in Data-Path Synthesis
In the real world, many variations on how the operations are implemented and the different structures of the data flow graph have to be considered during the scheduling phase. We can join several operations with data dependencies in one cycle (chaining) [3], [4] , [12], [15] or execute an operation which crosses more than one cycle (multicycle operation). A multicycle operation can be performed by either a pipelined [ 121 or nonpipelined function unit.
The data flow graph may contain mutually exclusive operations. Mutually exclusive operations occur when there are multiple branches in the CDFG and only one branch occurs at a time. In this case, we want to find a schedule in which the resources are efficiently utilized.
With very little additional hardware, the throughput of a data can be improved by pipelining the path 1143, [22] . A pipelined data path (also called functional pipelining) is a data path in which operations of different instances can be performed concurrently. Sehwa 1221 was the first system for synthesizing a pipelined data path. It uses a modified list scheduling technique to schedule the operations. A graph partitioning technique for scheduling a pipelined data path was presented in [23] . The force-directed scheduling algorithm has also been adapted to solve the same problem in [3], [21] .
Spaid [25] incorporates structure transformations and retiming to modify the CDFG in order to reduce the critical path. sult can be obtained if retiming and scheduling are performed at the same time.
The idea of loop folding was proposed in [24] . The goal of loop folding is to reduce the running time of a loop by overlapping the execution of different loop iterations. In that paper, operations in one loop iteration are folded to the next loop iteration iteratively until the loop length is unlikely to be reduced. The list scheduling technique is used to schedule the overlapped loop iterations.
ILP Approaches
Since the above scheduling methods assign operations to control steps one at a time, their results depend strongly on the order of the assignments. We state the scheduling problem by a mathematic description and then solve it using an ILP method. Although, strictly speaking, it is not new, it is the first one with a realistic approach. In the following paragraphs, we survey some similar approaches which were used to describe or solve the synthesis of digital logic systems.
An integer programming model for synthesizing a digital logic at the register-transfer level (RTL) was formulated in [l] . The model gives detailed specifications for a data-path synthesis. All of the characteristics such as variable storage, operation precedence, resource sharing, and control structures are included in this model. Due to the complexity of the formulation, only a small problem can be solved.
An integer programming approach was proposed for microcode scheduling in CATHEDRAL-I1 [ 131, which is a synthesis system for multiprocessor DSP systems. After a customized data path has been synthesized and the high level operations are mapped onto a set of RTL operations, the microcode scheduling is performed. The model contains data precedence, resource conflict, and controller pipelining constraints. Since excessive CPU time is required to solve large problems, the model is replaced by a graph-based heuristic scheduling algorithm.
After extensively studying these two papers, we have found there are places where we can reduce the solution space of the scheduling problem. First, a formulation with linear cost function can be solved much more easily than that of a nonlinear cost function. By carefully arranging the data dependency relationships in the formulation, we found it is possible to formulate the cost function as linear. Second, the search space for each operation can be reduced by restricting the range of control steps for each operation. This can be done by using both ASAP and ALAP scheduling. Third, by introducing a lower limit and an upper limit on the number of function units of each type, we can prevent many unnecessary searches.
By taking these considerations into our formulation and using an ILP package, optimal solutions for a practical sized problem like the fifth-order filter can be achieved within a few seconds. Moreover, we believe that the detailed formulations of the scheduling problem will lead to a deeper understanding of the problem. Better algorithms or heuristics for near optimal solutions can be expected.
ILP FORMULATIONS FOR THE SCHEDULING PROBLEMS
In this section, we will present ILP formulations for timeconstrained scheduling, resource-constrained scheduling, and feasible scheduling problems. For simplicity of explanation, two assumptions are made.
1) Each operation is assumed to have a one-cycle propagation delay.
)
Only nonpipelined data paths are considered. Other general considerations will be discussed in the next section.
ASAP, ALAP, and list scheduling are used to trim the solution space in our formulation. ASAP and ALAP determine, respectively, the earliest possible time and latest possible time of an operation. List scheduling sets an upper limit on the number of control steps for resource-constrained scheduling.
The notations used in our formulations are defined as follows: suppose the data flow graph, G ( V , E ), contains n ( I V I ) operations, e (1 E I ) data dependencies, and is going to be 
Time-Cbnstrained Scheduling
A time-constrained scheduling problem can be defined as follows. Given the maximum number of control steps, find a minimal cost schedule that satisfies the given set of constraints.
Here the cost of a data path may be the costs of function units, interconnections, and registers. For simplicity of the formulation, only the cost of function units is considered. The others are considered in the next section.
It is obvious that the cost of function units is minimized if all the function units arefully utilized in a system. In other words, operations of the same type should be evenly distributed among all control steps. This is achieved in our model by minimizing the maximal number of operations of the same type in each control step.
Our approach to time-constrained scheduling includes three substeps: Step 4
The objective function in (1) states that we are going to minimize the total cost of function units. Constraint (2) states that no schedule should have a control step containing more than M,, function units of type tk. It is clear that U, can only be scheduled into a step between S, and L,, which is reflected in (3). Constraint (4) ensures that the precedence relations of the data flow graph (DFG) will be preserved. Let us illustrate the above formulation using the example below.
Consider the data flow graph in Fig. 1 , which is going to be scheduled into four control steps. The ASAP and ALAP schedules are shown in Fig. l(a) and (b), respectively. The distribution of variables is shown in Table I . Here, the horizontal rows represent the control steps, and each column represents an operation. A variable in the table means that the operation could be assigned to the step. The available function units are multipliers (FU,,) and ALU's (FU,,) which are capable of performing addition, subtraction, and comparison. and x 1 1 , 2 are set to 1. In this case, two multipliers and two ALU's are used. The scheduling result is shown in Fig. l(c) .
variables, x~,~,
Resource-Constrained Scheduling
A resource-constrained scheduling problem can be formally stated as follows: given the maximum number of resources, find the fastest schedule that satisjies the given set of constraints. In general, the resources given are the number of function units, such as adders, multipliers, ALU's, and buses. Although registers and interconnections also contribute to the total area, they are difficult to specify as resource constraints.
Resource-constrained scheduling includes four substeps:
1) list scheduling: determine the upper limit on the number of control steps; 2) modified ASAP: determine the earliest possible time for each operation; 3) modified ALAP: determine the latest possible time for each operation;
4)
ILP: minimize the number of control steps needed for the data path.
ASAP and ALAP scheduling can be modified so that a tighter range for each operation is obtained by taking the resource constraints into account. Assume there are p i operations which are executed by the same type of function unit as oi and proceed to oi, and the available number of function units for oi is n,. Then, oi cannot be scheduled before step r p i / n i l . It also follows that any successor of 0, cannot be scheduled before r p i / n , l + 1. Modified ASAP and ALAP are particularly useful when the number of function units or buses is small.
The variables used in the formulation are the following.
1) C,,,, is an integer variable which is the total number of
2 ) x i , j are 0-1 integer variables associated with o,, x i , j = 1,
A resource-constrained scheduling problem is formulated as control steps required.
if oi is scheduled into step j ; otherwise, 
The objective function in (1.1) states that we are going to minimize the total number of control steps. Constraint (2.1) states that no schedule should have a control step containing more than kf,k function units of type tk. Note that the M,, in (2.1) is a constant. Constraints (3) and (4) are the same as those in time-constrained scheduling. No operations should be scheduled after Cstep, as described in constraint (5) .
Once again, we use the data flow in Fig. 1 to demonstrate this formulation. Assume the maximum number of multipliers and ALU's are all set at 2. Under these constraints, list scheduling is first used to decide the upper limit on the number of time steps (which is 4). Applying this limit, we then use ASAP and ALAP scheduling to obtain the range for each operation.
Our formulation is to minimize Cnep under certain constraints. Since the constraints for (3) and (4) are the same as in the previous example, we list only the constraints (2.1) and (5) This formulation will obtain the same result as previous example, At the mean time, the minimum number of control steps ( Cstep = 4 ) is obtained.
Feasible Scheduling
In this subsection, we combine the previous two formulations into a third scheduling problem. This problem does not ask for an optimum but asks whether a feasible solution exists. The formal definition of a feasible scheduling problem is as follows: given a frxed amount of resources and a specijied number of time steps, decide ifthere is a schedule which satisjies all the constraints. Output the solution if it exists.
The formulation for the problem includes no objective function but does have a set of constraints: By solving feasible scheduling problems, the solution for previous scheduling problems can be constructed. The advantages of using this approach are the following.
1) The formulation is a 0-1 ILP problem, and good heuristics exist to solve this type of problem. Also, the time required to find a solution by feasible scheduling is a lot less than by optimizing scheduling since we only need to search part of the solution space.
2) The number of function units and the number of time steps can be estimated by other fast heuristics.
3 ) Since we are, in fact, deciding a set of values for all variables in solving an ILP formulation, the time complexity is increased with the number of variables in the formulation. From this point of view, the range for each operation is smaller, due to the constraints on the number of function units and the time. This corresponds to a smaller solution space.
4)
This approach allows a user or an expert system to control the speed-time tradeoff. Thus we can generate a set of optimal solutions and leave the selection of the best time/area implementation to the user.
Based on the above arguments, feasible scheduling seems to provide a general paradigm for solving a scheduling problem. Therefore, in the following section, we will not specify the kind of scheduling problem unless it is necessary.
Complexity of the Scheduling Problem
The complexity of feasible scheduling is analyzed in terms of the number of variables and equations. In feasible scheduling, the number of resources and the number of control steps have been fixed. Thus the only unknowns are 0-1 variables x,,,.
The exact number ofx,,, is E:= (Lz -SI + 1 ) which is bounded by s -n . Note that, due to constraint (3), only n variables will have a value of 1. Thus once a x , ,~ is decided to be 1, the remaitling L, -S, variables are implicitly set at 0. Therefore, the problem is easier to solve than it might appear.
The number of equations required for constraints (2.1), (3), and (4) is (s * m ) , n and e respectively, where e is the number of edges in the DFG.
In all, the number of variables in our formulation grows as For the sake of notational convenience, we define a new integer variable-time variable (T,)-which is the control step by which operation 0, is scheduled. It is easy to verify that T, is equal to ( j . x,, / ) . By using this notation, constraint (4) is simplified to T, -T, 5 -1. Fig. 2 shows a DFG where a multiplication ( * ) requires 100 ns and an additional ( + ) requires 40 ns. Let the cycle time be 120 ns. For a regular design, the graph must be scheduled into two control steps as shown in Fig. 2(a) . By chaining two additions in one cycle, the graph can be scheduled in one 120-ns cycle (Fig. 2(b) ).
O ( s n ) , and the number of equations is as

Chaining and Multicycle Operations
If we reduce the clock cycle time to 60 ns, then a multiplication has to cross two cycles. In this cse, the multiplication is a multicycle operation. In Fig. 2(c) , we need only one adder, one multiplier, and two 60-ns cycles to implement the graph. The multiplication can be executed by either a pipelined function unit or a nonpipelined function unit. The first and third examples need 2 control words, and the second needs only one. The time, resources, and control words required by the three examples are summarized in Fig. 2(d) .
4.1.1) Chaining:
In some cases, we can chain several operations in one cycle if their total running time is less than the cycle time. We call it scheduling with chaining. To formulate a chaining problem, we define a new precedence relation, * , 
and additional constraints are included in the formulation Constraint (4.1) states that if oi immediately precedes o, in the DFG, then oi should be scheduled before or at the same step as oJ. Constraint (6) states if oi * oJ, then oi should be scheduled before 0,. a ) Complexity analysis: Suppose that, at most, c operations are chained into a control step and the number of fan ins for an operation is k . Given an operation oJ, there are, at most, &"-operations which satisfy the relation oi =) oJ. Thus the total number of equations introduced by (6) is determined by k" . n .
In practical examples, k is smaller than 2 and c is a small number, so the constraint has O ( n ) complexity.
I . 2) Multicycle Operations with Nonpipelined Implementation:
A multicycle operation can be performed either by a nonpipelined or a pipelined function unit. The difference between them is the existence of latches between the cycles. For nonpipelined implementation, once the operation is assigned, the function unit cannot be shared by other operations until the operation is completed. and constraints (4) and (5) Before the derivation of the formulation, we begin with an example in which 12 operations are scheduled into 8 steps (Fig.  4(a) ). Suppose that the 12 operations are to be performed by the same type function unit with d = 8 and 1 = 3. Since a pipelined function unit can be shared by the operations of any two steps s,, sJ, where 1 s, -sJ I is an integer multiple of 1, we can group the 8 control steps into 3 clusters, c1 = { s8, s5, s2}, c2 = { s7, s4, sI } and c3 = { s67 s3 }. Here, a function unit can be shared by the operations in different steps within a cluster but cannot be shared by those in different clusters. Therefore, the number of function units needed is the total of the function units required by the three clusters; the number of function units required by a cluster is the maximum number of function units of the steps within this cluster. For the example in Fig. 4(a) , the number of function units used at s8 equals max ( 3 , 0 , 1 ) + max ( 0 , 3 , 1 ) + max (3, 1 ) = 9. A better schedule (Fig. 4(b) ) needs 
Functional Pipelining (Pipelined Data Path)
A pipelined data path allows the execution of multiple tasks concurrently. Two consecutive tasks can be initiated with a certain interval, which is called the latency of the pipelined data path. In [22] , a theorem for pipelined data path is available, stated as the following.
Theorem 1: Given a DFG, the necessary and sufficient number of function units of each type ( M , k ) to realize a pipelined data path with a fixed latency 1 is r N l k / l ] , where NIA is the maximum number of operations which must be performed by U
We can state the theorem in other words, omitting the proof. meorem 2: Given a DFG and the number of function units available for each type ( M l k ) , we can realize an optimal pipe-
0
The above theorems are very valuable for exploring the solution space. Suppose we are going to generate a table of optimal implementations. We can start with a minimum latency 1 = 1 and work towards a large one. For each latency I , we calculate the minimum number of resources for each type (M,) according to theorem 1 . If the number of resources for latency 1 is equal to those for latency 1 -1, the solution for latency 1 -1 is also the optimal solution for latency 1. If the delay time is important in the design, the latency 1 and the number of resources M,, can be used to generate an ILP formulation which aims at minimizing the delay time. Thus we can obtain a set of implementations that are all optimal in terms of latency, number of resources, and delay time. 
Loop Folding
The concept of loop folding is very similar to that of functional pipelining. The difference is that in loop folding, there are data dependencies between loop iterations, while in function pipelining, there is no data dependency between different instances. Thus the latency of a pipelined data path can be arbitrarily small, provided that the resources are unlimited. In the case of loop folding, the latency (or loop length [24] ) depends on the number of resources given and the structure of DFG.
In the example of Fig. 5(a) , suppose there exists a I-deg 1241 data dependency between oi and 0,. In other words, the value generated by 0, will be used by 0, in the next iteration. If there is a path of length L from 0, to oi in the DFG (Fig. 5(b) ), the lower bound of the loop length would be L + 1, i.e., it is impossible to fold the loop into a schedule with a loop length less than L + 1.
Let oi --+ 0, denote a d-deg [24] data dependency between oi ( oi E F U,k) and oj and T;' be the time where 0, is executed at d iterations later. Suppose the loop length after folding is known to be I ; we have T; = + d . 1. Therefore, a new constraint
is introduced into the previous formulations to enforce the data dependency between different loop iterations. The remaining constraints are the same as those for functional pipelining.
A loop folding problem is closely related to a retiming problem in synchronous circuits [26] . Retiming relocates the positions of the separating registers in the CDFG to obtain a shorter critical path, and hence, higher throughput. Algorithms [26] for retiming have been proposed to optimize synchronous circuits. In [25] , after retiming on the CDFG, scheduling is performed on the modified DFG. The separation of retiming and scheduling will produce suboptimal design. Our formulation for loop folding performs both retiming and scheduling at the same time.
Complexity analysis: For a single ILP formulation, the number of equations added is equal to the number of d-deg data dependencies, which has worst-case complexity O ( e ) . There are no additional variables.
Mutualljl Exclusive Operations
As in the case of structured programming, the relationships among a set of operations,O, can be represented as a tree where the intemal nodes are of two types, XOR and AND, and the leaves are the operations. Let a node have n subtrees and the number of function units needed for each subtree be N:.,, N:", . . * , N;, , respectively. NFU can be defined as follows:
if the node is a leaf. Let NFU(, , ( 0 ) be the number of function units of type t, required at control step j , which is defined in the above function. Constraint (2) is changed to (2.5)
As an example, the DFG in Fig. 6 (a) can be represented as a tree in Fig. 6(b) . Suppose that two function units are given and the upper limit of the number of control steps is 2. The distribution graph is shown in Fig. 6(c) . Fig. 6(d) shows part of the constraints. Here, y 1 and y 2 are introduced to satisfy y 1 r m a x ( c 1 + d l , e l + f l ) a n d y 2 > m a x ( c 2 + d 2 , e 2 + f 2 ) . By solving the ILP formulation, an optimal schedule is obtained as shown in Fig. 6(e) . a) Complexity analysis: Suppose that there are N,,,, nodes in the execution tree and each node has N, branches. In considering any control step of the DFG, we need to introduce N,,,, 
Scheduling Under Bus Constraint
In the previous sections, we have focused on the minimization of the cost of function units. However, as the complexity of VLSI grows, the area for routing becomes important. Interconnection cost becomes a dominant factor of the cost function. In this subsection, we extend the formulation to minimize the cost of connections. Two models are considered for minimizing the number of buses. In the first model, the number of buses is calculated as twice the maximum number of operations among all the control steps. The second model makes a more sophisticated calculation by considering the broadcasting capability of a bus.
4.5.1) A Simple Model f o r Bus Minimization:
Consider the example in Fig. 7 . Although the number of function units required in Fig. 7(a) and (b) is the same, the latter needs 4 buses while the former needs 6 buses to connect the data path. Therefore, we prefer a Fig. 7(b) schedule. Since each function unit has two input buses, the number of buses can be estimated by multiplying the maximum number of operations among all the control steps by 2 : 
, = I a ) Complexity analysis: Constraint (9) generates s equations, and no additional variables are introduced.
4.5.2) Bus with Broadcasting:
In a bused architecture, when more than one operation which share a common input variable are scheduled into the same control step, the number of buses needed for that variable is only one (via broadcasting). Thus the number of buses required at a control step equals the number of distinct input variables of all the operations assigned to this step. Suppose the input variables at control step j are U , , v2, . , vI We introduce a 0-1 integer variable for U , ( 1 I r I I U ( ) at step j , where yr, is 1 if U , is accessed at this step; otherwise, 0. We have the constraint that the number of which are assigned to be 1 is less than the number of buses, i.e.,
Since the transfer of variables during a control step ( is directly related to the assignment of operations to a control step 
Minimizing Lifetimes of Variables
The lifetime of a variable is defined as the duration from the control step where it is defined to the step where it is last used. A variable must be assigned to a register during its lifetime; several variables can share the same register, provided that their lifetimes do not overlap. Thus a reduction in the lifetimes of variables has the potential of reducing the number of registers required. For example, both the two scheduled DFG's in Fig.  7 need the same number of operators and control steps; however, the schedule in Fig. 7 (a) needs one more register than that in Fig. 7(b) .
In addition to minimizing the function unit cost and the total number of steps, we can also take the lifetimes of variables into consideration. Let SLK, be the longest duration that the output of 0, must be kept, i.e., SLK, = max,,, ( T , -T, -do,). In other words, SL K, is the lifetime interval of the value computed by 0,. The cost function is modified so as to minimize a) Complexity analysis: For a given DFG G ( V , E ), where the nodes are operations and the edges are the data precedence between operations, the above formulations require n integer variables each for an SLK,, 1 5 i c: n , and e equations each for a data dependency.
V. EXPERIMENTAL RESULTS
The system called ALPS has been implemented and tested. The programs for list scheduling, ASAP, ALAP, and ILP formulations are written in C on a VAX 11/8550 running UL-TRIX, and the ILP formulation is solved using the LINDO [27] package on a VAX 11/8800 running VMS. LINDO starts with an optimal linear programming solution and produces an optimal integer solution using the branch-and-bound method. The fifth-order wave filter which was borrowed from [8] is given to illustrate various requirements. It contains 26 additions and 8 multiplications. As most systems do, we suppose a multiplication takes 2 cycles while an addition takes 1 cycle to complete. The critical path length is 17 cycles. The runtime for the various experiments of the example depends on the number of 0-1 variables and is within tens of seconds. Tables I1 and I11 show the results with a nonpipelined data path. The multiplier can be nonpipelined (Table 11) or pipelined ( Table 111) . We also take into account the cost of buses. All the results are optimal.
Nonpipelined Data Path
Functional Pipelining (Pipelined Data Path)
The DFG of the fifth-order filter is used to test functional pipelining. (We assume that there are no data dependencies between iterations; i.e., the outputs of the DFG will not feed back into the inputs.) We have achieved the minimal number of resources for each latency and we have also minimized the delay time. The results are shown in the first and second parts of Table IV for nonpipelined and pipelined multipliers; respectively. The third part of Table IV important performance criteria. Second, a longer delay increases the lifetimes of the variables. Thus minimizing the delay time will potentially reduce the register cost. Note that Spaid first retimed the DFG, then, performed a scheduling to find the loop length (called clock cycle in Spaid). Our loop folding technique performs retiming and scheduling simultaneously, which makes a better solution possible. Our scheduler is also able to make a scheduling under the self-timed [25] requirement. Fig.   8 shows the scheduled DFG of 1 multiplier and 2 adders under the self-timed requirement.
VI. CONCLUSION
We have proposed an approach to the scneduling problems in high level synthesis. Our approach includes list scheduling, ASAP, ALAP, and ILP. The ILP formulation is very efficient, and its complexity in the number of variables is O ( s * n) where s and n are the number of control steps and operations, respectively. With the feasible scheduling formulation, we can explore the solution space more efficiently. For the problem of the fifth-order filter, optimal solution is obtained in a few seconds. We have also generalized the formulations to include practical requirements such as chaining, multicycling operations, structure and functional pipelining, loop folding, mutually exclusive operations, and the minimization of the cost of buses and registers.
