Data flow graph dominant designs, such as communication video and audio applications, are common in today's IC industry. In these designs, the datapath resources (e.g., adders, multipliers) count more than 90% in area. Different datapath resources have very different properties in terms of area, delay, power and yield. Considering yield during system level design can result in significant benefits. A Mixed Integer Linear Programming (MILP) formulation for yield-aware architectural synthesis is presented in this paper. The proposed approach attempts to maximize the yield of the design while satisfying other constraints like area and delay. Through experiments on several benchmarks, we show that incorporating the yield as an objective during architectural synthesis can significantly improve the yield compared to conventional methods. Transistor sizing at the circuit level can also be incorporated in our method to further improve the yield.
I. Introduction
A common goal in IC industry is the reduction of the manufacturing cost. Two factors affect the die cost: die area and die yield [1] . Considerable effort has been devoted to decreasing manufacturing cost by minimizing the die area. Reducing the die cost through yield maximization, however, is more complex since the relation between yield and die area is not monotonic. The sensitivity of a layout to defects is measured through its critical area [2] and by minimizing the critical area the yield can be maximized.
Because the critical area is directly related to the layout of the design, many schemes have been developed for reducing the critical area at the physical design level, such as during placement [3] , routing [4] , compaction [5] , and cell library preparation [6] . Usually, the yield is dealt with as a secondary optimization goal. Current techniques can increase the yield by up to 10% [7] .
In [8] , the authors claim that somewhere below the 0.18µm technology node, the old rules-of-thumb cease to apply. The yield drops dramatically and is severely limited by design content. As a result, the yield should no longer be regarded as a secondary goal. To achieve additional improvements in yield it is necessary to modify the circuit topology itself. Synthesis-based yield optimization would allow the designers more flexibility than just layout-based changes. One of the design stages that can prove to be particularly beneficial is high-level synthesis, which is the process to determine the block level (functional units) structure of the circuit. Conventional architectural synthesis focuses mainly on area, delay, or power constraints, but not on yield.
Currently, data flow graph (DFG) dominant designs such as communication video and audio devices are very common. In these designs, the datapath resources (i.e., adders, multipliers, comparators), account for more than 90% of the die area while the controllers in these designs are relatively small and simple. To the best of our knowledge, very little research on improving yield at the architectural synthesis level has been published, since yield is traditionally considered as a secondary objective. In this paper, we show that considering yield during the system level design stage can result in significant benefits.
We present here an approach in which we represent the yield optimization problem as a mixed integer linear programming (MILP) problem. First, we use MILP to formulate the conventional scheduling and binding during the architectural synthesis for the delay and area constraints. Then, we incorporate the yield objective using the Poisson yield model. Since the yield objective is non-linear, we adopt an approximate yield cost function which is linear. Finally, we utilize a commercial linear programming solver to find the global optimization solution for the synthesis process. Our major contribution in this paper is that the yield is regarded as an objective at the system level. As a result, the yield is maximized under the area, delay and power constraints. We also analyze the problem of minimizing the area with a yield constraint. Compared to the conventional solutions which do not take yield into account, our results show substantial yield improvements. We also show that the well-known method of transistor sizing can be incorporated into our approach to further improve the yield even without extra area penalty.
This paper is organized as follows. In Section II, a yield model at the system level is presented and a yield extraction tool for functional units is introduced. In Section III, the yield as a constraint is discussed and a mathematical formulation is proposed. Both area and delay optimization problems with a yield constraint are formulated. Our experimental results are shown in Section IV. Conclusions and future work are presented in Section V.
II. Manufacturing yield

A. Yield model of RTL-level
Various types of defects are introduced during the manufacturing process and may result in open or short circuits. However, not all the defects will necessarily cause a failure. It is common to quantify layout sensitivity to defects through the critical area [9] . The critical area, A c i (x), is defined for a defect of type i and diameter x as the size of the area in which the center of the defect must fall in order to cause a circuit failure. A c i is defined as the critical area for a defect of type i averaged over all possible defect diameters.
where f d (x) is the defect size probability density function. Then, d i is defined as the average number of defects of type i per unit area. Finally, the average number of faults in the circuit, denoted by λ, is:
For a given circuit block b with an average number of faults λ b , the yield is expressed using the Poisson model as [9] :
We assume that, at the architectural level, all the blocks are statistically independent, and the defects are uniformly distributed. Thus, for a system which has N functional blocks, the yield can be expressed as:
where Y k is the yield for block k. ¿From (3) and (4), we have: 
B. Yield simulation package
Our goal is to maximize the yield by selecting functional units from the library while satisfying given constraints. Conventional libraries characterize the area, delay and power properties of their cells/functional units. The yield value is not included and had to be calculated for our purposes. To this end, we use the EYES T M (Edinburgh Yield Estimator Sampling) simulator [10] for calculating the yield of the various functional units. EYES implements a sampling based methodology for critical area estimation and then calculates the yield, based on the layout of the design. The procedure is shown in Figure 1 .
C. Yield of library units
Adders and multipliers are the main units optimized in architectural synthesis of signal processing designs. There are various types of adders and multipliers, each with different area, delay and power characteristics. In order to compute the yield for functional units such as adders and multipliers, we generated the layout for each functional unit we needed. In our examples, we prepared a RTL-verilog file [19] for the different implementations of the functional units, synthesized to gate-level using Synopsys Design Compiler T M and then used standard logic cell design methodology to generate the layout. The Cadence Silicon Ensemble
(Qplace+Wroute) has been adopted for this task. After the layout generation, EYES was used to estimate the yield as well as the area. The delay is estimated by the DC compiler in the gate level netlist. The detailed flow is shown in Figure 2 . All the data are obtained from the TSMC 0.25 µm standard cell technology.
In our functional unit library, we implemented two different adders and two different multipliers. The two adders are the Ripple Carry Adder (RCA) and the Carry Look Ahead adder (CLA). The two multipliers are the Overturned-Stairs (OS) multiplier and the Balanced-Tree (BT) multiplier [18] . The measured results are shown in Table I . Column 2 shows the results of the Synopsys Design Compiler after logic synthesis to the gate level netlist. Column 3 and 4 show the results reported by EYES. Note that all inputs and outputs are 16 bits long.
It is not always the case that the smaller the area, the higher the yield. The yield value is highly related to the circuit structure as well as to the physical routing style. For example, the OS multiplier has a smaller delay (12.43ns) compared to the BT multiplier (15.94ns). The OS multiplier also has a smaller area (986 um 2 vs. 1025 um 2 ), however, it has an irregular structure and needs more routing tracks. Since the number of routing tracks between cell rows is limited, more tracks means that more routing layers are needed. As a result, for the OS multiplier, 3 layers are needed to complete the routing, while only 2 layers are needed for the BT multiplier. The yield value we obtained for the BT multiplier (0.966) is larger than that for the OS multiplier (0.946). 
A. Problem definition
A data-flow graph (DFG) is a polar directed acyclic graph G(V, E), where the set V = {v i : i = 0, 1, ...n} is the vertex set, and E = {(V i , V j ) : i, j ∈ 0, 1, ..., n} is the edge set. The vertex set represents the operators, and the edge set represents the data dependencies between operators. Given an unscheduled DFG, with available resources as shown in Table I , we have to find a yield-enhanced high-level synthesis such that the delay and area constraints are met and the overall system yield is maximized.
• application: DFG dominant designs (i.e., the datapath consumes most of the circuit area compared to the control logic), such as communication video and audio algorithms.
• inputs: Unscheduled DFG, with available resource library (functional units) to provide delay, area and yield properties.
• outputs: Scheduled DFG with binding information, such that the delay and area meet the constraints and the overall yield is maximized, or the delay and yield are constrained while the area is minimized.
B. MILP formulation for concurrent scheduling and binding
The parameters used in our MILP model are defined as follows. Let x il denote a binary variable indicating that operator i is scheduled at time frame l, where 1 ≤ i ≤ n ops , and 1 ≤ l ≤ L, where n ops is the number of operators and L is the system latency. There are n res resources (some may belong to the same functional unit) available. For each resource k, there are a k implementations, each with delay D k and area A k . The binary variable b ir indicates the binding of operator i to resource r, where 1 ≤ i ≤ n ops , and 1 ≤ r ≤ n res .
The mixed integer programming for concurrent scheduling and binding is shown in Figure 3 where A is the total area and d i is the actual resource delay for implementation of operator i. Expression (6) is the unique slot scheduling constraint for each operator [11] . (7) is the unique resource binding constraint for each operator. (8) is the actual implemented delay for operator i. (9) is the data dependency relationship of the DFG [11] . (10) means that at most a r copies of resource r are available. (11) is the total area constraint. (12) states that x il and b ir are binary variables. Finally, (13) indicates that the intermediate variable a i should be an integer (N denotes the integer set) between 0 and n ops .
Note that (10) is nonlinear. Since both variables x il and b ir are binary, the product of x il and b ir is also binary. There is a common technique [12] for linearizing the product by introducing another intermediate binary variable k ilr , defined such that k ilr = 1 if operator i is scheduled at time step l and is bound to the resource r, otherwise k ilr = 0. Then, (10) can be rewritten as:
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05)
nres r=1 a r A r ≤ A (11) 
B.1 Mixed Integer Linear Programming for Area constrained yield optimization
¿From (5), the yield maximization problem can be converted to the following:
The Mixed Integer Linear Programming model consists of (15) , subject to (6)- (9), (11)- (14).
B.2 MILP for yield constrained area minimization
In this problem, the yield constraint is obtained from (5):
where 0 < Y bound < 1, is a parameter preset by the users. Again, the optimization objective is:
subject to(6)- (9), (12)- (14) and (16).
IV. Experimental results
GAUT [17] was used to generate unscheduled DFG. We solved the MILP problems using a commercial MILP solver CPLEX [13] . The AMPL [13] language was used to model the formulations listed above.
A. Verification of area-constrained maximal yield synthesis
In this section, we illustrate the benefits of the proposed approach by comparing the results of the areaconstrained maximal yield optimization with the conventional minimal area synthesis. Figure 4 shows the scheduled DFG for the FIR16 [17] benchmark, with the latency requirement of 19 cycles using the resource library specified in Table I . The left part is the minimal area solution which uses a fast adder (CLA) and five fast multipliers (OS). We assume that each functional unit (adder, multiplier) mapped needs a 16-bit register. The total area of this design is 6226 units and the total yield is 0.716. The right part of Figure  4 shows the result of the maximum yield synthesis with an area constraint of 120% of the minimal area solution. A fast adder (CLA), two fast multipliers (OS) and three slow multipliers (BT) are used. The total yield of this design is 0.760, improving the yield by 6.0% at the expense of a 4.5% extra area. In order to verify our yield driven architectural synthesis, we generated the final layout for the above example and then used the EYES tool for final yield evaluation. The left part of Figure 5 is the layout we manually generated according to the scheduled DFG result in Figure 4 . The right part is the critical area map reported by EYES. In this case, the reported yield value for Figure 4 is 0.717 for the minimal area solution and 0.773 for the yield driven solution. These are quite close to the predicted values, and indicate a possible increase of 7.8% in the yield. Note that the layout we generated ignores the controller.
Another example is shown in Figure 6 . FFT is widely used in compression and decompression algorithms [17] . Our latency requirement is 13 clock cycles. The yield for the minimal area solution is 0.656, using two CLA adders and six OS multipliers. The yield for our yield-driven solution is 0.738, using two CLA adders and six BT multipliers. The relative yield improvement is 12.4% at an expense of 7.9% extra area.
B. Results for area-constrained maximal yield synthesis
We have selected several architectural synthesis examples [16] to evaluate our approach. We first performed a minimal area synthesis disregarding the yield (Algorithm B.2 without Equation (16)). Then, we generated an area constrained maximal yield synthesis (Algorithm B.1). For each example, 3 to 5 points were selected with different latency requirements. We set the area constraint to be 1.2 times the minimal area. Figures  7 and 8 filter. Figure 7 shows the area for both the minimal area solution and the constrained area maximal yield solution. As expected, the required resource numbers decrease because of the increase in latency. Figure 8 shows a steady improvement of the yield. Figure 9 shows the maximal yield improvement relative to the yield of the minimal area solution. The area is also normalized to its value for the minimal-area solution. From the figure, we see that the yield can be improved by as much as 12.4% with an area penalty of 7.9% (FFT). Note that for the Elliptic benchmark circuits, the minimal area solution and the area-constrained yield driven solution are the same, since only adders are used in this circuit. In our library (RCA,CLA), the RCA adder is better in terms of both area and yield. The solution which reduces the area will also improves the yield, and as a result, the synthesis solutions are the same.
C. Transistor sizing effect in library design
Transistor sizing can not only change the delay and area, but also affect the yield. For example, by carefully sizing the RCA adder we can pay little in terms of the area and yield and greatly improve the delay, as shown in Table II .
From the table above, the sized RCA 2 adder makes a trade-off between the original RCA 1 adder and CLA adder. By slightly sizing the transistors, the delay can be reduced by half with the yield only slightly affected. We incorporate the sized RCA adder into our resource library and re-synthesize the Elliptic circuit. Table III shows the results. In both latency cases, the yield is further improved (2.5% to 3.2%) because of the newly introduced resources while the area can be reduced by as much as 38.6%. We may conclude that by carefully sizing the resources, the yield can be further improved even without extra area penalty.
V. Conclusions and Future work
We have formulated in this paper the yield optimization problem as an MILP problem during the architectural synthesis procedure. In contrast to previous publications, the yield is optimized prior to the conventional circuit or layout design stages. From our experimental results, the yield can be improved by as much as 12.4% with an area penalty of 7.9%. The yield may be further improved by incorporating the conventional method of transistor sizing. An important concern that may arise when using MILP is that the number of variables grows exponentially as the problem size increases. However, in some recent papers, relaxing the MILP problem to a linear programming (LP) problem is presented [14, 15] resulting in a polynomial rather than exponential growth.
VI. Acknowledgment
