Abstract
Introduction
Project managers in ASIC design houses grapple with the problem of estimating the manpower requirement of design projects. Currently, manpower estimation is more of an art than a science. A manager bases the estimate on past experience and the size and complexity of the new project. However, these estimates can be grossly incorrect, throwing the project plan off the balance and creating practical difficulties in resource scheduling. Adding to this complication is the complex nature of modern-day ASIC design: the design flow is iterative, hierarchical, and concurrent. The flow has well-defined steps such as RTL design, Verification, Synthesis, Physical Design, and Physical Verification. Design steps may have to be iterated more than once in order to achieve timing closure or due to constraints on area, test cost, and reliability. Concurrency in the design flow is introduced by the project manager in order to speed up the project execution. For example, the manager may subdivide the design into smaller subdesigns and exploit spatial parallelism. Accurate prediction of the project execution time will require careful calibration of the design process to estimate the parameters associated with the design flow, such as the probability of iteration. In addition to the problem of predicting the manpower requirement for a project, the manager must also consider altering the design flow in order to speed up project execution. A "what if" analysis tool will be invaluable during the project planning phase.
In this paper, we consider the problem of project time prediction and project flow improvement in a theoretical framework. We describe the use of a model known as the hierarchical concurrent flow graph (HCFG) to capture the iterative, hierarchical, and concurrent nature of ASIC design flows. The model can be constructed from a textual description of the process flow and can efficiently predict the project execution time. Using two examples from the VLSI system design domain, we illustrate the power of the HCFG model in the improvement of process flows.
The paper is organized as follows. In the next section, we briefly discuss the HCFG model and the associated analytical techniques. In Section 3, we consider a physical design flow and estimate the improvement in the execution time of the flow when the design is partitioned. Section 4 considers a software design flow and illustrates the use of HCFG in predicting the improvement in the flow execution time when OR concurrency is introduced in the flow. Conclusions are presented in Section 5.
HCFG approach
A node in an HCFG corresponds to a task in the ASIC design flow. Two special nodes I and F in the HCFG denote the initial task and the final task in the design flow. A weight T j is associated with the node j and corresponds to the completion time of the task. Since task completion time can rarely be predicted with point accuracy, we treat T j as a discrete random variable with a specified distribution. Notation E T j¡ or T j is used to denote the expected value of T j . A directed edge 
Control theory, is used to compute the transmittance T I ¥ F of the HCFG [3] . Before computing the transmittance, the weight of edge ¢ i £ j¤ is calculated as z T j , where z is the variable of the z-Transform. The graph transmittance can be used to obtain various attributes of
Process improvement approaches
There are many options that a project manager will consider to reduce the expected project completion time. These can be summarized as follows. (1) Process model parameters can be tuned to reduce E T P¡ . Improving the design parameters will result in an improvement in E T P¡ . (2) Shuffling the order of tasks may result in reduction in E T P¡ . It is feasible only when the interdependency of the tasks allows for such reshuffling [4] . (3) Decomposing an activity into smaller tasks and deploying them concurrently using a reassignment of manpower. The AND Concurrency construct of HCFG is useful in expressing this type of concurrency. (4) Identifying alternate ways to solve a problem and executing all of these sub-flows concurrently. The best of these solutions will be actually used. The OR concurrency construct in an HCFG is useful in expressing this option.
In this paper, we shall consider the third and the fourth options for design time improvement. We develop a framework for decision making, which uses quantitative analysis of E T P¡ . We consider two example design flows to illustrate the process improvement paradigm-timing driven layout design flow of Figure 1 and the flow of Figure 4 for a software design process. This software design flow may be considered as a subflow of a larger hardware-software codesign flow. We also describe a method for task time parameterization in terms of design and designer characteristics.
Improving physical design flow
As the first example, we consider a timing-driven physical design flow and consider how the estimated execution time for the flow can be improved through the use of AND concurrency. Since interconnect delays play a dominant role in determining the performance of deep submicron integrated circuits, it is common to use a timing-driven physical design flow, with the objective of improving the chances Figure 1 shows a design flow that may be useful in the physical design of highperformance chips. The design tasks in the flow are timingdriven placement, routing, and layout verification. Timingdriven placement consists of the three subtasks, namely, initial placement, timing analysis and optimization, and clocktree generation. Since the complexity of these design tasks increases superlinearly with the size of the netlist, there is a definite benefit in partitioning the circuit into several blocks and carrying out the physical design flow separately on these blocks. The layouts of the different blocks can then be merged together to result in the complete chip layout. Techniques such a buffer insertion, gate resizing, wire resizing, and fanout decomposition can be used to improve the timing of paths that are local to blocks as well as the paths that cut across different blocks if some paths are found to violate the timing constraints.
The parameters that affect the execution time of the different tasks are the number of partitions and the average size of each partition. Figure 1 (b) shows the HCFG for the flow of Figure 1 (a). We use the notation T T P , T R , and T LV to denote the completion times of the timing-driven placement task, the routing task, and the layout verification task respectively. Let p be the probability of repeating the entire flow due to timing failure detected during layout verification. Figure 2 shows the HCFG for a flow that uses design partitioning to speed up the original design flow. We assume that the partitioning task subdivides the original circuit into n 1 blocks. In the HCFG of Figure 2 , two new tasks, namely, Partitioning and Module Integration & Verification have been added. In addition, two special nodes called AND Concurrency nodes have also been added (nodes marked with a ). We make n copies of the flow corresponding to the flow of Figure 1 and insert these as subflows between the two AND concurrency nodes. Note that a layout verification step is carried out on the complete chip layout after the merging step. Let p I indicate the probability of detecting timing violations after this final verification step. Let T T P j , T R j , and T LV j indicate the time taken for timing-driven placement, routing, and layout verification for the block j. Let p j indicate the probability of detecting timing violations at the block-level in block j. Let T part and T IV indicate the completion time for partitioning and module integration & verification tasks, respectively. 
Process completion time
Using the techniques described in [3] , the graph transmittance T I ¥ F and the expected run time T P for the flow of Figure 1 can be found as 
In the above equations, we use the notation T AND to indicate the expected time for the entire sub-flow that appears between the two AND node pairs of Figure 3 . The discrete density function (DDF) of T AND can be found using the technique of [3] , and we primarily focus on the expected value T AND . This quantity is the expected value of the maximum of the random variables T LD j , where T LD j denotes the run time of the subflow j. We can approximate T AND by the maximum of the expected values of T LD j (see equation below).
Now, using the earlier result for the expected completion time for the flow of Figure 1 , we can write T LD j as
Here, x j denotes the normalized size of block j with respect to the size of the complete circuit. Clearly, the largest sized block will dictate the execution time of the AND subflow. This is expressed by the following equation.
T AND
Characterizing model parameters
We now consider the tasks Partitioning and Module Integration & Verification and analyze the execution time of these tasks. Our experience with a partitioning tool indicates that the time for partitioning depends mainly on the size of the original netlist. Since this size is a constant for a given problem, we can treat T part as a constant. The time for module integration depends on the number of partitions n and the number of modules N in the original netlist. It also depends on the Rent's coefficient q, which relates the number of IO pins of a block with the number of gates in the block [1] . We noticed that the time for the module integration & verification task can be described using the following equations, which give T IV for different ranges of n.
The probability p I is characterized as p I
p, where r f 1. As this expression indicates, p I is no greater than the probability p of the original flow shown in Figure 1(b) . When n is large, the module integration task essentially boils down to the chip-level layout verification task. Thus, in the limit as n approaches the number of modules N, p I approaches p. The value of the constant r depends on the circuit and the quality of the partitioning tool; we tuned the value of r to 1.9 in our experimentation. We also assumed that p j is proportional to the size of the block j; this is based on the intuition that the layout of larger blocks is more likely to involve several timing iterations.
Results
We conducted experiments in order to compare the execution times of the the flows of Figures 1(b) and 2 . In our experimentation, we assumed the following. Figure 1 . The HCFG analysis provides the discrete density function for the completion times from which various other attributes of completion time can be computed. HCFG analysis indicates that opting for second flow results in reduced completion time. There is a value of n, denoted by n opt , at which T % P is minimum. T % P increases when n either decreases or increases around n opt . For large p, the second flow is a better alternative for a large range of n (2 h n i n opt ) whereas for small p, the second flow is still a better choice but range of n is restricted i.e. n opt is small. We performed an analysis based partly on HCFG approach to compute n opt . We made use of equations 1-3 to obtain an analytical expression for T % P in the following manner. We substituted T IV from equation 3 and p I into equation 2 to obtain the expressions for T AND and T % P given in equations 4 and 5 as functions of n.
The optimum value n opt can now be obtained by solving the 
Improving a software design flow
For our next illustration, we consider the flow of Figure  4 (a) for a software design process for an Application Specific Instruction-set Processor (ASIP) or a DSP [2, 5] . The chief concern here is one of retargetable code generation. The specifications of the SW part of the system, obtained after hardware-software partitioning, are translated into a high level language. The requirements of the application guide the selection of processor architecture and the number of processors required. The code is first partitioned and the scheduling of the processes is carried out on multiple processors. Once the appropriate architectures are chosen, the code assigned to respective architectures is translated into architecture-specific assembly code. It is then followed by a verification step which is done either through SW emulation of the target architecture or by running the code on an actual processor.
Introducing OR concurrency
For the design flow in Figure 4 (a), we consider two alternate execution scenarios. In Scenario 1, the flow gets executed as a purely sequential and iterative design flow. The HCFG equivalent is shown in Figure 4(b) . In Scenario 2, a team of n designers is provided with the same specifications to execute the design flow of Figure 4(b) . In this situation, we assign the design task to n designers with each of them working concurrently on a separate but identical subflow. In the HCFG equivalent of the flow, we introduce a pair of OR node which encloses all the subflows executed by these individual designers. Such a graph is shown in Figure 5 for n T D In Scenario 2, the execution of all the subflows within an OR node-pair is stopped as soon as any one of the n designers provides a design that satisfies the quality criterion. Let T TC , T CG , T CC , and T ST be the completion times for activities code translation, code generation, code compilation and simulation & test respectively for Scenario 1. Let p be the probability of transition from the simulation and testing task to the code translation task. Let T TC i , T CG i , T CC i , and T ST i be the corresponding quantities for the i th designer in Figure 5 . Let T SW i denote the completion time of subflow executed by the i th designer. Similarly, let p OR be the probability of transition to repeat the OR subflow; this iteration is caused if the quality of the product is unacceptable.
Process completion time
For project 1 T I ¥ F and T P can be written as
If the graph transmittance of the subflow confined in the OR-node pair is given by T OR , the graph transmittance T 
The process completion time T % P is given by
The graph transmittance T OR and completion time T OR for the OR-node pair subflow is obtained using HCFG analysis. T OR can be approximated to be equal to the minimum of the expected values of T SW i , as shown in equation 8.
The average completion time for the i th designer T SW i is the expected value of completion time of the i th subgraph between the OR nodes in Figure 5 . It can be rewritten as
Characterizing model parameters
To represent the flow of Figure 5 by HCFG, two dummy tasks of zero completion times have to be introduced. Now we discuss the probability p OR in the following sections.
Probability p OR
The probability p OR is associated with the quality of design artifact provided by all the n designers relative to the specified quality Q D . Let Q i max denote the highest quality for the design artifacts provided by n designers. Intuitively, we expect p OR to be larger when Q i max is low and vice-versa. Consistent with this intuitive expectation, we assume
The actual values of c and d can be calibrated using the design history meta-data. We choose c 
Results
In this section, we discuss the results obtained by comparing the process completion times of Scenario 1 and Scenario 2. We assume T TC HCFG analysis provides DDF of T % P . Also shown are various attributes of T % P computed from its DDF. We make these observations from the results. We infer that (a) for a given team of designers, Flow 2 is advisable only when p is sufficiently large and (b) for small p, Flow 2 provides improved E T P¡ when all designers produce high quality designs. For the SW design projects where a choice of different architectures is available, this paradigm for E T P¡ improvement can be combined with the usual design space exploration, adding one more dimension to the design space. Given constraints on available manpower, a suitable value of n can be found which results in minimum process completion time.
Conclusions
We illustrated a methodology for process improvement using the HCFG approach. For DSM chips, the effect of partitioning the circuit or considering elaborate interconnect models on design completion time was explored using the HCFG approach. For a software design flow, for a given size and complexity of the design, the effect of designer expertise on design completion time was explored. There is a penalty involved in using concurrency constructs for reducing the design time. In the use of OR construct, the penalty is in the form of increased design effort, without excessive degradation in utilization factor. Use of AND construct decreases the utilization for a marginal increase in design effort.
