We address the problem of considering debugging requirements during high level synthesis by providing low-cost hardware support and scheduling and assignment methods for ensuring controllability and observability of the user specijied variables. Two key conceptually new design ideas that enable efficient debugging are developed: pipelining of debugging variables for improving their scheduling and assignment freedom and use of WO buffers for improving resource utilization of U0 pins.
Introduction
It is well-known that functional debugging usually dominates the cost of design development. Debugging is in particular a difficult activity when real-time full-custom ASIC designs are targeted, due to the strict timing constraints and a lack of flexibility during execution.
We have four main objectives of the research presented in this paper:
1. to formalize an intuitive notion of ASIC debugging so that it can be treated as a design and CAD activity; 2. to identify key design and high level synthesis principles which support debugging; 3. to developed efficient high level synthesis algorithms for optimization problems related to ASIC debugging; and 4. to give an impetus for creation of design-for-debugging and synthesis-for-debugging methodologies.
Debugging is a process of detecting, diagnosing, and correcting errors in the specification of an ASIC implementation. Error is any discrepancy between desired and realized behavior of the specification of the design. The debugging process can be divided into three phases [Ren89] . The first step is error detection, in which the designer discovers that a program (design) does not function correctly OMiodrag Potkonjak is now with UCLA , CS Department for a particular input. The second phase is error diagnosis in which the programmeddesigner identifies the statement or the section of the code which is causing the incorrect behavior. The third step is error correction, in which the faulty section or the statement responsible for the observed fault is replaced by the corrected section.
In the research presented in this paper, we concentrate on the error detection phase. Even when only this phase is considered there can be numerous different strategic approaches. However, it is widely accepted that providing simultaneous controllability and observability of as many as possible variables of the program under execution immensely facilitates the debugging process.
Therefore, we will informally define the design-fordebugging problem in the following way. Given is an ASIC design. The design is fully specified: the control-data flow graph (CDFG) of the computation, timing constraints in terms of the available number of control steps, and the schedule and assignment of each operation, variable and constant, and data transfer are given. Furthermore, a set of desired controllable debug variables (write variables) and observable debug variables (read variables) is specified by the user. The goals of a design-for-debugging (DfD) technique is to modify design such that the set of desired debug variables are made controllable/observable, satisfying given timing constraints, while adding a minimal additional hardware.
The key constraint of the DfD is that functionality of the design should not be altered in any way, except when requested by the user when debugging variables should be altered by the user provided values. The key idea is to use available I/O pins for reading and writing debug variables in control steps when they are not used by the design variables.
We conclude this section by pointing out key differences between testing and debugging. The key difference is that while testing targets controllability and observability in the test mode, debugging targets enhanced controllability and observability during the functionally correct mode of operation. Furthermore, while testing has as the goal to make all hardware elements of the design (e.g. all execution units, all registers, and complete control logic) controllable and observable, debugging concentrates only on a selected set of registers at the selected control steps in which user-specified debugging variables are stored. Finally, debugging an ASIC design usually requires that all controllable variables are set simultaneously and that all observable variables are simultaneously obtained.
The rest of the paper is organized as follows. In the next section we review related work. Sections 3 and 4 introduce all preliminaries and the design-for-debugging process. We present the algorithm for minimization of debugging hardware using life-time spliting of debugging variables in Section 5. After presenting experimental results in Section 6, we summarize the DfD method in Section 7.
Related Work
Debugging is as old as building of digital electronic computing systems. Debugging has been recognized as a crucial design and compilation activity. However, initially it was relatively rarely addressed due to its high conceptual complexity [Hen82, Ze1831. Recently, the situation changed and the importance of debugging has been documented by a great deal of research in several research and development communities, including compilers and computer architecture, 
Preliminaries and Problem Formulation
In this section we present all the essential assumptions for introducing and developing our approach for design-fordebugging. We conclude the section by explicitly stating the considered design-for-debugging problem. We a s s u m e t h e s y n c h r o n o u s dataflow m o d e l of computations [Lee871 which is widely used in many computationally intensive applications. The selected computational model has two important implications for the design-for-debugging approach. First, it states that computation is conducted on infinite stream of data implying a need for periodic controllability and observability of variables in each iteration. Second, it implies static compiletime scheduling and assignment and full predictability of the earliest and the latest time when a particular debugging variable can be observed or controlled.
We do not put any restriction on the interconnect scheme of the assumed hardware model at the register-transfer level. We considered two types of I/O mechanisms. In the first type each pin can be used to both input and output data. In the second case, a pin can be used exclusively as either an input or an output unit. While the first type of I/O pins provides higher flexibility, its hardware realization is more expensive.
The four key debugging assumptions are the following.
The design is fully specified (scheduled and assigned) and its functionality and realization should not be disturbed by the debugging process, except for bringing the user specified values to the controllable variables. All controllable/observable variables are known at compilekynthesis time. Usually debugging variables are one which are states in the functionality of the computations which denote boundaries between successive program or internal loop iterations. For proper support of debugging, all controllable and observable variable should be simultaneously controllable and observable. During design-for-debugging, we allocate additional debugging hardware to t satisfy all (or as many as possible) debugging requirements. The goal is, of course, to add as little as possible hardware. In particular, we do not allow increase in the number of I10 pins, since this is the hardware constraint which usually dominates other hardware constraints in modem designs. The DfD problem can be summarized as follows. Given a design and a list of debugging variables. Add as little as possible additional hardware resources and schedule and assign the desired debugging variables and associated data transfers so as to satisfy all the debugging requirements. Associated with every input (output) variable of the design is an input (output) operation. Similar to scheduling/assigning other operations, an input (output) operation has to be scheduled in a clock cycle in which an available input (output) pin resource can be used to write in (read out) the variable from (to) the environment. Consequently, the specified design has one input pin and one output pin. In the rest of the paper, an input (output) operation of data to (from) an input (output) variable will be referred to by the name of the variable itself.
The Design-for-Debug Process
Without any design-for-debugging, to debug the design, the designer can only write to the primary input In, and read from the primary output variable Out. To make the design easier to debug, suppose the designer wishes to be able to write and read the state variable SI, S2, S3, and S4 during debugging. The ability to write and read the state variables would enable the designer to control and observe the state of the computation after every iteration. Consequently, for the DfD technique to be described in this paper, the debug write and read variables are (Sl, S2, S3, S4}, and the debug requirements are ( WR(Sl), WR(S2), WR(S3), WR(S4), RD(Sl), RD(S2), RD($3), RD(S4)}. Note that in this case, the debug write variables are the same as the debug read variables; but, in general, this may not be the case.
Incorporating the Debug Requirements
To incorporate the desired debug requirements, the original CDFG in Figure l If (Debug) then S1 <-Out(+2); else S1 c-DZI;
In general, a separate input variable DIi and its input operation are needed for each debug write variable OWj, so that each OWi can be written independently, OWi <-DIi, during the debug mode. Similarly, for each debug read variable D R , an output variable DOi and its output operation is needed to accomplish the debug read DOi <-DRk Note that the debug requirement of independent write (read) of each debug variable is very different from a typical testing requirement, where each variable whose controllability (observability) needs to be assigned can be written from (read to) the same, even existing, input (output) variable. 
4.2
After the original CDFG has been modified to incorporate the debug requirements, the next step is to schedule and assign the inputJoutput operations, so as to satisfy the specified clock cycle and inputloutput pin constraints. Scheduling a given debug inputJoutput operation is constrained by the following two factors: 1. The As Soon As Possible (ASAP) and the As Late As Possible (ALAP) control steps that the writelread debug variable can be writtedread, and 2. The availability of an input/output pin in that clock cycle.
Let prod(X) be the set of control steps in which a variable X in the CDFG is possibly produced. (X can be produced in Consider a debug input operation for the debug write X <-DI, where X is the debug variable to be written, and DI is the input variable that will be used to write X . The debug variable X has to be written anytime from control step 1, and before the earliest control step X is required. Hence, the (ASAP, L A P ) control steps for the input operation associated with input DI are:
Satisfying the
(1.P1) . , (1 ,O), as S1 is needed in control step 1. Hence, input variables DI1 and 012 cannot be scheduled. Since, input In has been scheduled in control step 1, the only available control step to schedule both D13 and D14 is control step 2. However, this can be done only if an extra input pin is made available. Similarly, it can be seen that to satisfy the debug output requirements, an extra output pin is required for either DO3 or DO,. Consequently, when the modified CDFG in Figure 1 (b) is given to the high level synthesis system Hyper, it can schedule the inputloutput operations D13, D14, D o l , DO2, DO3, and DO4 using an extra input pin and an extra output pin, but cannot satisfy DII and D12. However, if no U 0 pins are available, only 1 input operation and 3 output operations can be satisfied.
Pipelining Debug InputIOutput Variables
We now show how the debug I/O variables can be functionally pipelined to satisfy the desired debug requirements. Pipelining is a widely used transformation technique which changes positioning of a selected set of variables from one iteration to another iteration of the computation. Positioning of a variable to previous iteration is usually denoted by adding @ 1 to the name of the variable. Pipelining the debug I/O variables gives more freedom to schedule the debug I/O operations. Consider the CDFG with the debug U 0 variables pipelined, shown in Figure 2 (a). For instance, variable S1 is last written in the previous iteration in control step 3, and is required in the current iteration in control step 0. Consequently, if the write operation (S1 <-011) can be performed after step 3 in previous iteration, it will not be re-written in the previous iteration. Similarly, the write operation needs to be completed before control step 1 in the current iteration for it to be used. Hence, the (ASAP, ALAP) times for scheduling the input operation for 011, is (4@1,0). When the left-edge algorithm is applied to the interval graph shown in Figure 2(b) , all the YO operations can be scheduled and assigned as shown in Figure 2(b) . The four input operations are scheduled in the control steps 4, 5 , 6 , and 7 of the previous iteration respectively, and assigned to input pin pl. The four output operations are scheduled in the control steps 4, 5, 6 of the current iteration, and control step 1 of the next iteration, and assigned to output pinpl. The net effect is that at the beginning of every iteration, all the debug write variables S1, S2, S3, and S4 can been written for use in the current iteration. Also, by the second control step of every iteration, the values of the debug read variables S1, S2, S3, and S4 in the previous iteration can be read out.
Debug U 0 buffering
Note that till now, we have attempted to write and read the desired debug variables at every iteration of t h e computation.However, when a solution does not exist with a single iteration writehead, we can increase the periodicity at which the debug variables can be written and read. We define debug periodicity as the number of iterations needed to write and read the desired debug variables. In this section, we introduce I/O buffering as a way to be always able to achieve n debug requirements using r n / ( c c * i o p -(1+0))1 iterations, where cc and iop are the available control steps and inputloutput pins, respectively, and I and 0 are the number of primary inputs and primary outputs.
Let us consider the same CDFG of the 4th order IIR parallel filter as before, but with a more constrained available time of 5 control steps. Assume there is one input pin, and one output pin available. The debug writehead required are the same as before: { WR(Sl), WR(S2), WR(S3), WR(S4), RD(Sl), RD(S2), RD(S3), RD(S4)}. Figure 3(a) shows the given schedule and assignment of the original CDFG nodes, as well as the modification done to add the desired debug writeheads, with the debug inputloutput variables pipelined.
The maximum number of debug variables that can be satisfied is (5*2*1 -2*1) = 8. Since the number of debug variables that we have to satisfy is exactly 8, a periodicity of 1 may be sufficient to satisfy them. In other words, the minimum debug periodicity required to satisfy the given 8 debug readdwrites is 1. Consequently, let us try to schedule1 assign the added debug inputloutput operations with a debug periodicity of 1. Application of the left-edge algorithm [Kurt371 on the corresponding interval graph is shown in Figure 3(b) . As can be seen from Figure 3(b) , only two of the four desired debug writes can be satisfied.
Analysis of the interval graph shows that even though the input pin is available in two control steps, 2 and 3 in the previous iteration, it cannot be utilized for the debug inputs because the debug variables cannot still be written, as shown by their ASAP times in Figure 3(b) . This is because if S1, S2, S3, or S4 are written in control steps 2 or 3, they will be rewritten by the functional operations, +2, T1, +6, or T2
respectively. This shows that though control steps and pins may be available, they may not be utilizable because of the ASAP, ALAP constraints imposed by the original CDFG on the debug variables.
I/O buffering is a possible way to eliminate the restriction imposed on the inputloutput operations by the ASAP, ALAP times of the writehead variables. With input buffering, any input operation for a write X e-DI can be performed with any available input pin at any control step, and then stored in an input buffer IB, to be later transferred to the register storing variable X anytime during [ASAP,ALAP](X). Similarly, output buffering can be used to transfer Y to an output buffer, BO, during [ASAP,ALAP](Y), and then use an output pin when it becomes available later. The I/O buffering strategy, while taking up some extra hardware resources of registers and interconnects, allows all the available pins and control steps to be utilized for the desired inputloutput operations. Hence, if the number of U 0 buffers is not limited, a solution to the debug writelread requirements of dv variables can be always satisfied with a periodicity of dp.
(EQ 4)
Using inputloutput buffering, all the debug requirements for Figure 3 (a) can be satisfied as shown by the interval graph in Figure 3 (c). The input operations D13 and D14 are performed in control steps 2 and 3, and stored in input buffers IB1 and IB2 respectively. Subsequently, in control step 4, the data from IB1 and IB2 are transferred to the register storing variables S3 and $4 respectively. Consequently, with 2 input buffers, and two interconnects, (IB1 -> Reg(S3)) and (IB2 -> Reg(S4)), all 8 debug read/write can be accomplished, with a periodicity of 1, under the available time and pins constraints of 5 and 2 respectively.
Minimizing Debugging Hardware Overhead
As we already indicated, one of the most important features of the behavioral synthesis debugging process is that it usually incurs a relatively small hardware overhead. In this Section we will show a technique which can even further reduce debugging hardware overhead.
We will introduce the procedure for debugging hardware reduction using the following small example shown in Figure  4 . The key idea is to use already available registers in the designs when they are not used for storing design variables and to use already available interconnects for transferring debugging variables among the available registers in the design when they are free. The straightforward way to accomplish this part of debugging task is to introduce a new interconnect from the I/O buffer to the register C. However, one can avoid the introduction of a new interconnect by first transferring variable dv from I/O to register A, and then consequently to register B and eventually to register C. During this process two requirements must be always satisfied. First, during period the debugging variable is stored in a particular register the register should not be already allocated for either design or another debug variable. Second, each transfer from a register to another register must be accomplished in one of control steps when this interconnect is not used for transfer or any other data.
Assuming, that interconnects I/O -> reg A, reg A -> reg B, and reg B -> reg C are not allocated in control steps 4, 6, 7, and 10 respectively one can transfer variable dv from I/O to reg C as it is shown in the last column of Figure 4 (b).
So, the problem of debugging hardware minimization using life-time spliting can be now stated in the following way. Given is a design and all debugging variables and their destinations. Reduce U 0 buffer and additional interconnect requirements by appropriately scheduling and assigning datatransfers of the debugging variables, without impeding proper functionality of the design.
To solve the optimization problem, we developed the heuristic algorithm described by the following pseudo-code: 
Update-list-of-debugging-variables();

Example
Controller
The key idea of the algorithm is to select at each stage a debugging variable which will least reduce the number of choices in which debugging variables can be transferred to their destinations, by allocating registers for the shortest amount of time, and by allocating interconnects which are in smallest demand for future possible use by other debug variables. If in a particular stage of the algorithm there is no debug variable which can be transferred using existing resources to its destination, a new interconnect is allocated for directly transferring a debugging variable with the shortest life-time. The run time of the heuristics is O(n2m), where n is the number of debugging variables, and m is the number of resources in the pool.
Hyper DfD
OD
(pins) (registers) 108/10 3 1
Experimental Results
We applied our approach for design-for-debugging and optimization algorithms on 6 industrial examples. Table 1 gives the size characteristics of the considered designs. We next applied the proposed DfD approach to the initial designs (with no debugging variables) produced by Hyper. The DfD approach could satisfy all the debugging requirements without addition of any new 110 pins. The number of registers in U 0 buffers needed is shown in the last column of Table 1 . The area overhead was minimal, in all cases less than 5 % of the initial area.
Conclusion
We addressed a new and important problem of considering hardware and synthesis support for debugging during behavioral synthesis. The ASIC debugging process has been defined. Pipelining of debug variables, addition of YO buffers for intermediate storage of debug variables, and approach for minimizing hardware overhead using life-time splitting technique are conceptual and implementation basis for efficient, yet inexpensive design for debugging. The practical effectiveness of DfD approach is demonstrated on several examples by providing observability and controllability of debug variables with a minimal hardware overhead.
