Abstract. We present an approach to aid in debugging/development of scheduling algorithm implementations. Our technique makes use of a sequence of a correctness-preserving RTL transformation called Register Transfer Split (RTS), to collectively perform the same task as that of a scheduler. Violation of the transformation precondition signals an error and the sequence of RTS transformations applied so far forms a trace which can be used for debugging purposes.
Introduction
Researchers have addressed the problem of creating bug-free synthesis systems into separately verifying each synthesis stage. In this paper, we specifically deal with the verification of the scheduling stage.
Lock et al [1] captured the scheduling result in the form of a table and checks for correctness inside the HOL theorem prover. Ashar et al [2] used model checking and symbolic simulation to check the signal correspondences between RTL and behavior. Importantly, all arithmetic operations are uninterpreted and loops are handled via identifying loop invariants. In Eveking et al [3] , failure to transform the behavioral description towards the scheduled result signals an error. Naren et al [4] proved the correctness of the force-directed list scheduler (FDLS) algorithm in PVS and embedded the correctness conditions developed during the exercise as program assertions in the implementation.
We do not handle constraint violations in the scheduler and hence only consider legal schedules and not optimal ones [5] . Also, our approach is applicable only for schedules for sequences of straight-line code or basic blocks.
In Sect. 2 we introduce models for a register transfer and the RTS transformation. Section 3 discusses our methodology. The conclusion is presented in Sect. 4.
Models
We use the models of a register transfer and the RTS transformation from [6] , where a completeness proof for a set of RTL transformations (including RTS) is presented. Completeness of a set of behavior-preserving RTL transformations means: for any two behaviorally-equivalent RTL designs, by applying a finite sequence of these correctness-preserving transformations we can move from one design to the other.
Register Transfer
A register transfer maps a set of source registers to a set of destination registers. It denotes the activity performed at a certain part in the data path at a unique control step. We use the term expressions to collectively refer to operators, registers and their interconnections. Let E refer to the set of expressions, OP to the set of operators in E and REG to the set of registers in E.
The data path consists of a set of operators, a set of registers and the interconnect between them. Using the notations above, we can now define the data path (DP) as the following tuple:
The activity inside a data path during a register transfer RT is represented by a subset of expressions from E. The interconnect between these expressions are determined by the computations scheduled to be performed in the data path at the control step defined by RT.
Definition 1 A register transfer RT associated with a data path is a tuple of the form:
where E RT ⊆ E and REG out RT ⊆ REG is the set of output registers in RT. f op and f reg define the interconnect between expressions of the data path at the control step corresponding to RT as follows: function f op maps an operator to a pair of expressions (for each of the two source expressions of the operator) and function f reg maps an output register to an expression (its input).
Well-Formed Register Transfer
In our model, we do not allow register transfers to contain combinational cycles, floating inputs for operators and registers or concurrent operations to be performed on the same hardware resource.
To formally define such requirements we first introduce the definition of an ancestors set Anc for an expression e (operator or register) as the set of all expressions which are connected via a direct path to e.
Definition 2 The ancestors set Anc of an expression e of a data path = (E, OP, REG), with respect to a mapping function f op : OP → (E × E), is defined recursively as:
e : operator (3) where f op (e)'1 and f op (e)'2 represent the first and second projections of f op respectively. Step 1. Copy the sub-image in RT induced by the operators in split set and place it in a new register transfer RT 2 . The remaining sub-image (induced by any remaining operators) in RT goes into another new register transfer RT 1 .
Step 2. The output of RT 2 is connected to the inputs of the temporary registers.
The outputs of the temporary registers are connected to the inputs of components of RT 1 . temp set represents the set of temporary registers.
No change is made to RT if split set has only one operator. The application of the RTS is deemed correct if its precondition, called the Well-Formedness precondition, is satisfied.
Well-Formedness Precondition (WFP):
Equation 4 states that the ancestor operator(s) Anc(e) must also be in the split set. This statement corresponds to the first condition for a well-formed register transfer. Equation 5 states that the computational behavior (comp behavior ) of the design is preserved if there is no WFP violation.
Thus, the application of RTS is sound if and only if the precondition is satisfied and further, is complete with respect to the scheduling task [6] . Hence the task performed by most scheduling algorithms (excluding those that move code across basic blocks) can be viewed as a sequence of RTS transformations.
Methodology
Our methodology is based on the precondition-based correctness of RTS and on the completeness of RTS transformations to perform the scheduling task. Figure 1 
Control
Step Witness Generator (CSWG) and the RTS transformation Engine (shown in the dotted box in the above figure) . The CSWG takes an initial RTL representation of the behavioral design and the schedule table. The schedule table is generated by the scheduling algorithm under test and contains a mapping of operations to control steps. For each control step in the schedule table, the CSWG generates an RTS transformation with the operators scheduled at that control step as one of its arguments. All applications of RTS (with its arguments) are written to a log file called the Witness trace. If any WFP violation occurs, it too is written to the witness trace along with the offending operators in question. If a violation does occur, the trace file shows the exact steps taken by the scheduler, hence can be used to understand why it scheduled operators that violated the WFP. An initial RTL design of a behavioral specification is obtained by creating a register transfer for every basic block in the behavior. We assign a structural operator/register to each operation/carrier respectively in the behavior. The control flow remains the same. This initial RTL is successively modified by the RTS transformations based on the schedule table. Figure 2 shows an initial RTL being split based on the schedule table using RTS transformations.
Results
Our system can be used for debugging an existing (or new) implementation of a scheduling algorithm. If the operators are referenced via a special naming scheme in the schedule table, this can be provided as another input to our system (not shown in the Fig. 1) .
We used an existing implementation of Force-Directed List Scheduler (FDLS) [5] from DSS [7] . A bug was seeded in at the dependency graph creation routine used by FDLS to get successor and predecessor node information. The following From Fig. 2 , operation 10 which is the successor of operation 12 was omitted. Also operation 15 was scheduled before operation 14 which in turn required operation 13. There were no WFP violations in the other designs as they were legal schedules, that is, no dependencies were violated.
Though we considered only FDLS, any of the other scheduling algorithms whose output can be captured in the form of a schedule table, could have been used. Our approach cannot be directly applied to algorithms like percolation scheduling [10] which perform scheduling across basic block boundaries. However a schedule table can be extracted after the code motions are performed.
Conclusion
We presented an approach to verifying legal, basic block schedules produced by a scheduler in a high-level synthesis system. Based on the schedule table, a sequence of RTS transformations is used to perform the same task as the scheduler. The Well-Formedness precondition of RTS checks the correctness of the input to RTS. If a precondition is violated then the sequence of transformations applied so far forms a trace and can be used for debugging purposes.
