Abstract-Nanometer IC designs are increasingly challenged by manufacturing closure, i.e., being fabricated with high product yield, mainly due to aggressive technology scaling and increasing process/environmental variations. Realizing the criticality of addressing manufacturability for higher yield and tolerance to variations during design, there has been a surge of research activities recently from both academia and industry. In this paper, we will survey the key activities in synergistic physical synthesis and shed lights on some of the future research directions.
I. INTRODUCTION
After four decades of Moore's Law [1] empowered by CMOS scaling [2] , the semiconductor industry is facing unprecedented design and manufacturing challenges [3] . Design closure is more challenging to achieve than ever due to many entangled deep sub-micron (DSM) physical effects including interconnect, leakage, and noise. Furthermore, design closure based on conventional methodology can no longer guarantee manufacturing closure with high yield due to various manufacturing challenges. A fundamental reason is the sub-wavelength optical lithography. The industry is currently stuck with the 193nm optical lithography as the dominant integrated circuit manufacturing process, which is likely to remain so for at least another 5 years. Therefore, it is expected to use 193nm optical lithography for 32nm or even 22nm technology nodes, with the adoption of immersion lithography [3] and advanced resolution enhancement techniques (RET) [4] - [6] . There are other important manufacturing/process challenges, such as topography variations due to chemical-mechanical polishing (CMP), random defects due to missing/extra material, via void/failure, etc., all resulting in additional yield loss (including functional and parametric losses). Fig. 1 shows a simplified view of manufacturing process from design to silicon and various sources of process variations during manufacturing.
Conventionally, circuit design and manufacturing process are two separate domains and they usually communicate through design rules [7] . By following these rules, designs are guaranteed to be manufactured with sufficient yield. However, as the feature size gets smaller, design rules are becoming much more complicated and no longer sufficient (either too conservative or not accurate) to convey manufacture constraints. It is projected that as the feature size gets smaller, the product yield loss during ramping up stage grows dramatically. Hence, it is obvious that the manufacturing effects will impact design success much more heavily than ever. A prominent feature of the deep sub-wavelength (DSW) lithography is its proximity effects and layout-dependent variations which can account for the majority yield loss. Since such yield loss are due to systematic effects, they should be modeled and compensated during design.
Realizing the critical importance of addressing manufacturability during design (which is loosely termed as "design for manufacturability" -DFM), there has been a surge of research activities recently from both academia and industry under the "DFM" umbrella. The majority of existing efforts can be roughly grouped into the following main categories: 1) Mask synthesis, also known as mask data preparation through resolution enhancement techniques (RET) or fill insertion [5] , [8] , [9] to "massage" the mask. One should note that mask synthesis is actually post-design, so the name DFM is not very accurate. 2) Manufacturability-aware physical design and synthesis through either yield-enhanced rules or manufacturing/yield models built into the physical synthesis, to compensate the layout-dependent variations. 3) Variational characterization and analysis (e.g., statistical static timing analysis), and variation aware/tolerant designs.
In this paper, we will not cover mask synthesis, but mainly focus on several key aspects of synergistic physical synthesis for manufacturability/variability, including manufacturability aware physical design and variation tolerant designs. There are multiple physical synthesis stages for manufacturability optimization such as routing, placement, and clock synthesis. Intuitively, a natural way to extend existing physical design tools is to add more and more manufacturability-friendly design rules. However, as technology moves to 45nm and below, the number and complexity of rules quickly explode [10] , [11] . They may be either too conservative/restrictive or not accurate at all. Meanwhile, the model-based approach holds promise [12] , but fast yet high-fidelity and layoutdependent models are still yet to be defined and derived. Once such manufacturing model is developed in compact way, it can be plugged into routing, placement, and even system/high level optimization to enable synergistic DFM flow as shown in Fig. 2 . Therefore, this is a wide-open area where a lot of fundamental researches are needed. The rest of the paper is organized as follows. Section II covers DFM aware routing for chemical-mechanical polishing (CMP), random defects, lithography, and redundant via. In Section III and IV, we discuss efforts in placement/synthesis and clock synthesis for manufacturability, respectively. We also cover a bit statistical optimization in Section V for completeness. Finally, we draw the conclusion and point out some research directions in Section VI.
II. MANUFACTURABILITY AWARE ROUTING
Routing is naturally the first step during physical design to embed the manufacturability awareness. Therefore, manufacturability aware routing draws large attention from industry and academia, and many different approaches at different routing stages have been proposed. Fig. 3 shows one example of manufacturability aware routing where multiple key DFM issues are addressed at different routing steps according to the characteristics of the issue. In this subsection, we will discuss several key model-based manufacturability aware routing research on topography variation due to CMP, yield loss due to random defects, lithography-related printability, and via failure, respectively. 1) CMP aware Routing: Topography (thickness) variation after CMP is shown to be systematically determined by wire density distribution [13] - [17] . Even after CMP, intra-chip topography variation can still be on the order of 20-40% [13] , [18] . Such topography variation leads to not only significant performance degradation due to increased wire resistance and capacitances, but also acute manufacturing issues like etching and printability [13] , [16] - [18] . The main reason for the copper CMP problems is wire density distribution. Higher wire density usually leads to copper thickness reduction due to erosion after CMP [14] , [15] . Also, the reduced copper thickness after CMP can worsen the scattering effect, further increasing resistance [19] .
In [20] , a predictive copper (Cu) CMP model to evaluate the topography variation is proposed for the first time to guide CMP-aware global routing. Topography variation after CMP is estimated by underlying metal density which includes both wires and dummies. As dummy fill in turn depends on wire density, the required dummy density and Cu thickness can be predicted from a given wire density. The estimation of dummy density and copper thickness is performed by fast lookup table approach by using the current wire density from global router. The lookup table is built based on multiple VLSI designs, by extracting wire density, dummy density, and corresponding Cu thickness for each global routing grid. The illustration of CMP-aware global routing is shown in Fig. 4 where the predicted Cu thickness guides the global router for less topography variation. According to [20] , 7-10% improvement for topography variation and timing can be achieved.
Another recent CMP aware routing is proposed in [21] , where wire density distribution is further optimized during track routing and layer assignment in addition to global routing using Voronoi diagram and graph coloring technique.
2) Critical Area-aware Routing: Smaller feature size makes nanometer VLSI designs more vulnerable to random defects, which can be further divided into open or short defects [22] , [23] . While it is generally believed that the yield loss due to systematic sources is greater than that due to random defects during the technology and process ramp-up stage, the systematic yield loss might be largely eliminated when the process becomes mature and systematic variations are extracted/compensated [24] . However, the random defects which are inherent due to manufacturing limitations will still be there even for mature processes. Thus, its relative importance will 
2D-4
indeed be much bigger for mature process with systematic variations designed in [25] .
Random defect related yield can be well modeled by critical area analysis. In [26] , [27] , critical area aware yield optimization in channel routing is studied. Weight interval graph is proposed [26] to facilitate the channel routing algorithm in a way that net merging in vertical constraint graph will minimize the number of channels as well as critical area. In [27] a wire segment is shifted either from top layer to bottom layer (net burying) or vice versa (net floating) like wrong way routing to reduce critical area in greedy manner. Critical area minimization during global routing is proposed in [28] where a linearized critical area is one of cost factors in multicommodity flow optimization. Redundant link insertion to minimize open defect is another technique proposed in [29] .
However, these works have a few drawbacks: (a) one single defect size is considered, rather than a defect size distribution [26] , [27] , (b) the trade-off between open and short defects due to fixed routing area is ignored [26] , [27] , [29] - [31] , (c) localized/greedy optimization is performed, which may be suboptimal [29] , [30] , [32] - [34] , (d) wire adjacency information is not available for accurate critical area estimation [28] , [35] .
In [25] , a tracking router, TROY based on mathematical programming and graph theory is proposed to find the best trade-off between open and short defects within fixed routing area w.r.t a defect size distribution through wire planning (wire ordering, sizing and spacing) in the context of track routing. The wire sizing and spacing problem can be optimally solved by second order cone programming (SOCP), which provides a global optimal solution in an efficient way [36] .
3) Lithography Aware Routing: The first attempt to address the lithography problem in routing step is litho-aware maze routing in [37] . Based on optical simulation, it stores the optical interference in a lookup table. When routing a new pattern, the interferences from the existing patterns in the neighborhood window is looked up, then summed up to evaluate the OPC cost for the existing pattern. Then, a vectorweighted graph method is applied to map the grid routing model to the graph, where the edge cost is a vector consisted of the interference from the existing patterns as well as the impact of a new pattern to the existing patterns. With such vector-weighted graph, maze routing can be casted as multi-constrained shortest path problem which can be solved by Lagrangian relaxation. The Lagrangian relaxation based approach, however, is very time-consuming. Furthermore, the interference metric in [37] is not a direct measurement as some interference is "good" (such as biases used in OPC).
In [37] , a direct metric, edge placement error (EPE) is used to measure the difference between printed versus desired silicon images. The concept of the litho-hotspot map is generated based on efficient EPE computation through fast lithography simulations through kernel decomposition and efficient table look-up. Guided by the litho hotspot map, RET-aware detailed routing (RADAR) is performed through wire spreading and ripup/rerouting to remove the litho hotspots [12] . Similar ripup&rerouting approach is also proposed in [38] , but the difference from [12] is that simple, yet effective pattern searching is adopted. Instead of using lithography simulations, a set of known undesirable patterns are compared to identity litho hotspots. Then, the identified undesirable routing patterns are either removed or modified by performing ripup&rerouting.
4) Redundant-Via Aware Routing:
A via may fail due to various reasons such as random defects, electromigration, cut misalignment, and/or thermal stress induced voiding effects. However, if redundant-via (or double-via) is inserted, it can work as a fault-tolerant replacement for the failing one. Redundant via is known to be highly effective, leading to 10-100x lower failure rate [39] .
The first redundant-via aware routing is presented in [40] . The problem is formulated as a multi-objective maze routing problem by assigning double-via cost to the routing graph, and solved by applying Lagrangian relaxation technique. In [39] , the redundant via is reflected as a factor in the maze routing cost. Each original via has different number of possible redundant via locations, namely degree of freedom. Wherever the wire occupies a possible redundant via location during maze routing, it is inversely penalized by degree of freedom of its corresponding original via.
Redundant via insertion is also applied during the postlayout optimization stage. In [41] , the redundant via insertion is formulated as a maximum independent set (MIS) problem, and solved by heuristic approach. Different redundant via insertion based on geotopography information is proposed in [42] where a redundant via is tried for each original via in greedy manner. However, as excessive number of vias can even worsen yield, redundant via insertion under via density constraint is required which is addressed in [43] based on the integer linear programming.
III. MANUFACTURABILITY AWARE PLACEMENT & SYNTHESIS
Compared to routing, earlier design stages such as placement and logic synthesis have less research results so far to address manufacturability due to the bottom up nature of building DFM framework. In [44] , sub-resolution assist features (SRAFs) (or scattering bar) aware detailed placement algorithm is proposed to enhance the depth of focus and critical dimension (CD) control. The key idea is that certain minimum spacing between assist and poly or between assist and assist is required to prevent SRAFs from printing [4] . Besides, smaller SRAFs to fit into narrower space will require higher-resolution mask inspection tools. Therefore, it is better to take SRAFs insertion during placement to enable improved printability and easier mask inspection.
Lens aberration aware timing placement is presented in [45] . Since the lens aberration itself has significant impact on CD, timing analysis should keep this effect in mind for accurate post-silicon performance. Therefore, aberration aware timing analysis is proposed and applied for timing driven placement.
Another detailed placement algorithm based on local standard cell rearrangement/flipping is presented in [46] . Since
2D-4
each cell can show different printability depending on its adjacent cells, manufacturability can be improved by altering the neighborhood structures through cell rearrangement/flipping, which can be solved by finding minimum Hamiltonian path. However, this work does not take crucial OPC into account while computing cell adjacency cost (printability cost).
There is some effort to improve manufacturability even at the earlier stage than placement. In general, manufacturability logic synthesis tools can well benefit from incorporating design for manufacturability models into their objective functions. In [47] , manufacturability-aware logic synthesis is proposed where cell manufacturability instead of cell area is considered in the cost function along with performance. As most logic synthesis algorithms are based on dynamic programming, the yield function which is originally not additive is approximated to a simpler additive form. The result shows that yield can be improved by up to 5% on International Workshop on Logic Synthesis 93 (IWLS93) benchmark suite.
IV. VARIATION TOLERANT CLOCK SYNTHESIS
Process or environmental variations cause significant timing uncertainty and yield degradation in deep sub-micron technologies. Since clock distribution in synchronous VLSI design governs the chip performance, any small variation can incur additional clock skew resulting in performance or yield loss. Therefore, there are research works on robust or tunable clock synthesis.
In the variation aware clock tree synthesis [48] - [51] , the original zero-skew clock tree synthesis [52] is modified such that the resulting clock tree is more tolerant to variation effects such as temperature, wire-width variation etc. For example, in the work of [49] , the merging points of the sub-trees are modified under systematic temperature variations. A clock tree so obtained will have lower skew in the presence of temperature variation compared to the clock tree built using traditional Deferred-Merge-Embedding (DME) method [52] .
The approach of constructing a non-tree clock network which is inherently more tolerant to variation than clock trees is taken in the works [53] - [61] . The non-tree clock network can be either obtained by starting from a tree and gradually adding cross links between the sinks [53] , [55] or by starting off with a complete mesh and removing parts of the mesh to reduce wirelength [61] . These techniques essentially take advantage of the robustness from clock mesh while lowing the wirelength/power overhead. It should be noted that most mesh works are heavily manual, tuned for microprocessors. In a most recent work [62] , an efficient framework (MeshWorks) is proposed for mesh planning, synthesis, and optimization to achieve similar robustness to variations with significantly less resources of buffers and wiring costs.
Another approach for DFM aware clock synthesis is to use post-silicon-tunable (PST) clock buffer so that clock distribution can be tuned according to process variation after being manufactured. However, PST clock buffer not only involves overhead in terms of area but also requires long design time and tuning time. In the work of [63] , the clock shield nets are used to convey the skew information during the tuning phase. This is used to tune the buffer delays to almost eliminate the skew, either dynamically or statically. In [64] , statistical and bottom-up algorithm is proposed to reduce the number and area of PST buffers while maximizing tunability. Experimental results on ISCAS89 benchmark circuits show that this approach achieves up to a 90% area or a 90% number of tunable clock buffer reductions compared to existing design methods.
V. STATISTICAL DESIGN OPTIMIZATION
Statistical algorithms have been proposed for gate sizing to improve timing yield [65] - [69] and it has been applied to some microprocessor design flow [70] . Statistical techniques have been proposed to minimize power under timing constraints [71] , [72] and in buffer insertion to improve timing [73] , [74] . On the other hand, recent studies claim that intelligent deterministic methods can lead to similar results compared to statistical methods in buffer insertion [75] and sizing [76] .
Post-silicon optimization techniques have also been investigated extensively. In post-silicon techniques, after chips are manufactured, the frequency and leakage can be tested and external voltage can be adjusted to help each chip to meet the design target. Forward Body Biasing (FBB) is used to improve the timing by lowering threshold voltage (V th ) while Reverse Body Biasing (RBB) is used to reduce leakage with higher threshold voltage. A bidirectional adaptive body bias (ABB) [77] , [78] technique is used to compensate for die-todie parameter variations by applying an optimum body bias voltage to each die. A hybrid approach of statistical gate sizing along with post-silicon tuning has also been proposed [79] , [80] .
VI. CONCLUSION AND FUTURE DIRECTION
In this paper, we show several key aspects/advancements of recent physical synthesis for manufacturability/variability in a synergistic manner, namely manufacturability (or DFM) aware routing, placement/logic synthesis, variation tolerant clock synthesis, and statistical optimizations. It shall be noted that DFM is under heavy research by both industry and academia. Thus, these individual stages by themselves as well as integrated cross-stage, multi-objective optimizations still have a lot of room to improve, as we have better understanding of the manufacturing/variation models at different levels of abstractions.
While the main focus of this paper is on the physical synthesis, it shall be noted that as the variations grow, we need more vertical integration through the system-circuit-synthesis co-optimization. By understanding pros/cons of each approach at each design level, we can find the most effective solution at the lowest overhead across whole system-circuit-synthesis flow for a certain challenge by harmonizing multiple approaches. For example, in making a system more fault tolerance, we can implement system level error correction scheme (e.g., SEC, SECDED), introduce fault-tolerance circuitries, or synthesize 2D-4 design with strict "error budgeting". Clearly, each approach involves overhead in terms of area and power, but depending on target application, we should be able to find out the combination of more than two approaches which can provide a more cost-effective solution.
As CMOS scales into 45nm and beyond, the balance between regularity and flexibility in layout shall need more comprehensive study, at not only poly but also metal layers. Regularity based approaches such as structured ASIC, restricted design rule (RDR), or regular fabric [81] provide better printability due to highly tunable OPC and illumination, thus widely used in cell library design (at poly layer), at certain cost of compromised performance and area. Meanwhile, unlikely the poly layer where a limited number of cells account for most patterns, metal layers tend to have highly irregular patterns (especially at lower metal layers to pin access) depending on cell and blockages distributions. Therefore, it should be further studied on how to achieve the maximum regularity with minimal performance impacts, in particular on lower metal layers.
Looking further, extreme ultra-violet lithography (EUVL), which is under heavy research to replace the 193nm lithography (e.g., for sub-22nm lithography if it can be deployed successfully on time) [3] still needs strong support from manufacturability aware design. It is known that EUVL suffers from strong flare effect, which is also strongly layout density/pattern dependent. Therefore, design for manufacturability/variability still plays an very important role, yet we need to take different DFM approaches to deal with EUVL or other next generation lithography by developing different manufacturing models.
To summarize, the technology scaling at 45nm and beyond is reshaping the semiconductor industry. It is clear that synergistic design/process integration and joint optimization will be needed more than ever to further extend the Moore's Law through multi-community efforts by process development, circuit design, and CAD tools.
