Abstract-Sleep transistors in industrial power-gating designs are custom designed with an optimal size. Consequently, sleep transistor P/G network optimization becomes a problem of finding the optimal number of sleep transistors and their placement as well as optimal P/G network grids, wire widths and layers. This paper presents a fake via based sleep transistor P/G network synthesis method, which addresses the requirements from industrial power-gating designs. The method produces optimal sleep transistor P/G networks by simultaneously optimizing sleep transistor insertion and placement as well as the power network grids and wires for minimum area, maximum routability with a given IR-drop target.
I. INTRODUCTION
Leakage power has been increasing exponentially with technology scaling [1] [2] and has reached over 30% of chip power in 90nm node and close to 45% in 65nm node. Power-gating is one of the most effective methods recently developed [3] - [11] to reduce power in the standby mode. In the power gating designs, low-leakage PMOS transistors are implemented as header switches to shut off power supplies (VDD) to parts of a design in standby or sleep mode. Alternatively, low-leakage NMOS transistors can be used as footer switches to control VSS supplies. Power gating of both VDD and VSS is not practical due to the low voltage headroom in sub-90nm node. The header or footer switches are often called "sleep transistors" which control power supply from the permanent power supply to circuit power supply or "virtual power supply".
The sleep transistor insertion results in a complex power network consisting of three components: the permanent power network, the sleep transistors and the virtual power network. These three components have combinational effects on the power integrity and the area cost of the design. Having wide wires and dense metal grids helps reduce area penalty of the sleep transistors, however, it costs more routing resources and has negative impact on design routability. On the other hand, over design of the sleep transistors improves the design routability at cost of large silicon area introduced by the sleep transistors. Consequently, the power network design becomes challenging in the power-gating designs.
In industrial power-gating designs, the power network design is done in two modes: the normal operation mode and the wakeup mode. In the normal operation mode, all sleep transistors are conducting which connects the permanent and virtual power networks to form a complete power supply network. In this mode, the objective of the power network synthesis is to minimize area penalty introduced by the sleep transistors while meeting the IR-drop target. The number of sleep transistors and their placement are determined in this mode. In the wakeup mode design, the power-on control signals of the sleep transistors are connected in a daisy chain style or a more advanced structure for an optimal power-on sequence that minimize the rush current during wakeup period while meeting wakeup latency target. This paper addresses the issues and methods in the powergating network synthesis for a power gating design in the normal operation mode.
II. RELATED WORK
In early stage power-gating network synthesis [4] - [6] , a single sleep transistor is used for a circuit or module. The size of the sleep transistor is decided based on the worst case current of the module. These methods were developed for small size power-gating designs. For current size SoC designs, the single sleep transistor based methods are not suitable.
Later, cluster-based sleep transistor sizing methods [7] were developed where a sleep transistor is assigned to a cell cluster which was formed by grouping cells based on cell locality and mutual exclusive switching. The size of the sleep transistor was determined based on cluster current. One of the issues in the cluster-based method is that the mutually exclusive switching based cell clustering often conflicts with the delay-based cell clustering. This results in a trade-off of design speed for sleep transistor size reduction. Large dynamic IR-drop variation between high and low active clusters during operations is another problem in the cluster based methods. The variation will cause performance degradation and possible hold time violations and hence mal-function.
In [8] [9], cell-based sleep transistor implementations were proposed, where each cell had a built-in sleep transistor. In this case the sizing became a small scale local problem and was based on the cell's worst case current [8] and timing criticality and temporal currents [9] . One of the major issues of the cell-based implementations is the large area penalty introduced by adding a sleep transistor in every cell. Another issue in cell-based sleep transistor implementations is the increased design sensitivity to PVT variations due to the power supply variations in individual cells introduced by sleep transistors.
Currently, most, if not all, industrial power-gating designs adopt distributed sleep transistor network implementation [10] - [14] , where sleep transistors are connected between the permanent power supply and the virtual power supply networks as shown in Fig. 1 . The main advantage of the distributed sleep transistor implementation is the ability to share current charge or discharge among the sleep transistors. Consequently, it is less sensitive to PVT variation and introduces smaller IRdrop variations than the cell-based and cluster-based implementations. The sleep transistor sharing also reduces the area overhead significantly.
Power/Ground (P/G) network synthesis becomes a challenge in the distributed sleep transistor implementations because the P/G network consists of three components: a permanent power network, a virtual power network and an array of the sleep transistors that connect the permanent and virtual power networks. All of these components contribute to the quality of the sleep transistor P/G network in terms of IR drops, routing resources and sleep transistor introduced silicon area.
A number of P/G network synthesis methods for the distributed sleep transistor P/G network have been reported [10] [11] . In these methods, the permanent and virtual power networks are generated by conventional power network synthesis methods. Then, the sleep transistors are inserted and sized based on current drawn and IR-drop requirement. The sleep transistor insertion is defined by the user, based on either circuit clusters or design heuristic. The sleep transistor is sized based on the current through the sleep transistor branches to satisfy the IR-drop target. The IR-drops and the current of the sleep transistor branches are calculated by conventional P/G resistive network methods.
The work reported in [12] moved the sleep transistor P/G network synthesis a step forward by simultaneously sizing sleep transistors and P/G network wires using the sequential linear programming method. The sleep transistor insertion points are determined based on cutsets of P/G branches that disconnect all cells from the power supply. Such a cut-set can be found by getting all P/G branches connected to a cell. The size of the sleep transistor of each cut-set is determined based on the current of the cells in the cut-set and the IR-drop target using a constant sheet channel resistance to model the sleep transistor's drive. The optimization variables include the resistances of not only the P/G wire branches but also the sleep transistor branches. However, the method still relies on the pre-defined or pre-synthesized power network grids. The number of the sleep transistors and their insertion positions are defined by the user.
Recently, a delay degradation effect based powergating network synthesis method was proposed [13] . In this method, a simple close-form analytic equation was proposed to model the delay degradation effect on a design due to the IR-drop on the sleep transistors. Differing from other methods, the method sizes the sleep transistors based on delay degradation constraints, namely trying to reduce sleep transistor size and hence leakage while meeting the design speed target considering the sleep transistor introduced delay degradation.
So far, the reported sleep transistor P/G network sizing methods are research oriented and tested on small size academic benchmark designs. Most methods are based on an assumption that sleep transistors can be sized without constraints in a design during optimization. Looking at sleep transistors and their implementation in leading edge power gating designs in industry, it is noted that the sleep transistors in industrial low-power designs are all custom designed and optimized with considerations of key effects, such as switch efficiency (Ion/Ioff) and area efficiency [14] . Such custom designed optimal sleep transistors are essential in industrial power gating design to ensure that the benefit of leakage power reduction from sleep transistors overwhelms the area and power penalties introduced by the sleep transistors. Although certain library vendors provide switch cells with a few difference sizes, industrial power-gating designs commonly implement one size sleep transistors in the power network for better variation control. Large size sleep transistors are only used manually to replace those sleep transistors where large IR-drops are identified in post-layout IR-drop analysis. Consequently, it becomes impractical to freely size sleep transistors in the industrial power-gating designs. The sleep transistor P/G network optimization becomes a problem of finding the optimal number of sleep transistors and their placement as well as optimal P/G network grids, wire widths and layers. Moreover, the optimization of the sleep transistors placement and P/G network are usually constrained by custom layout design rules.
Based on the understanding of the requirements from industrial sleep transistor P/G network synthesis, a practical power-gating network synthesis method has been developed for industrial power-gating design. The method simultaneously optimizes sleep transistor insertion and placement as well as power network grids and wires for minimum area, maximum routability with a given IR-drop target. In this method, the permanent power network, the virtual power network and the sleep transistor arrays are modeled as a single network. A fake via concept is introduced to model sleep transistor's channel resistance, layout positions and physical connections. The introduction of the fake vias enables us to leverage existing industrial P/G network synthesis methods and tools in the sleep transistor P/G network synthesis.
In the remainder of the paper, the fake via concept and the methods of sleep transistor P/G network modeling and synthesis will be described in detail.
III. SLEEP TRANSISTOR P/G NETWORK MODEL
The sleep transistor power network consists of a permanent power (VDD) network, a virtual power (VVDD) network and distributed sleep transistors that connect the two networks. The VDD and VVDD networks can be represented by two resistive networks as show in Fig. 2 . The current source at a VVDD network node is the worst-case current of the cells connected to the node. Although the current signature of a cell is dynamic in nature, namely, it varies with time as signal transiting, the worst-case average current signatures are commonly used in industrial P/G network synthesis, because accurate dynamic current is not available at the power planning stage. For the same reason, the sleep transistor power network is synthesized based on the worst-case average cell currents to meet user defined IR drop and EM targets. The dynamic IR drops are controlled by on-chip decoupling capacitor insertion and placement.
In a power-gating design, active cells recieve power supply from the virtual power network through the sleep transistors and the permanent power network. When the sleep transistors are conducting in operating mode, the virtual and permanent power networks effectively become a single network. To achieve an optimal sleep transistor power network, the sleep transistor distribution and the permanent and virtual power networks should be optimized simultaneously and as a whole in the sleep transistor power network synthesis.
To address this need, a fake via concept is introduced in the sleep transistor power network synthesis. In this concept, the electrical conductance and physical connection of a sleep transistor are modeled by a fake via. The fake via connects the permanent and virtual power networks in a similar way as a physical via. However, via resistance is defined by the channel resistance of a conducting sleep transistor biased at Vds equaling a sleep transistor IR drop target. SPICE analysis of the sleep transistor channel resistance with different Vds bias shows (Fig. 3) that the channel resistance is not sensitive to the Vds variation at small Vds bias voltage. Therefore, a constant value of the channel resistance can be used to fairly accurately model a conducting sleep transistor.
The fake via also models the layout position of a sleep transistor. The position of a fake via between the permanent and virtual networks defines the placement position of a sleep transistor. The size of the fake via is the size of the sleep transistor. With the introduction of the fake via, it becomes applicable to model the sleep 
Gx = b
(1) Where G is conductance matrix based on the sleep transistor power network synthesis model (Fig. 4) x is the vector of voltages at the nodes in Fig. 4. b consists of the current of the cells at the nodes.
Adding power supplies (Vdd) as voltage sources to the power network at positions of the power pads, the equation (1) becomes a modified nodal equation where x contains branch currents and b includes the voltage sources. The G and b are formed by the methods in [15] . Equation (1) can be solved by efficient linear equation solvers. Once the voltages at the nodes are obtained, the IR drop at a node is calculated as the difference of the supply voltage and the node voltage. The branch current density for EM analysis is calculated by dividing branch wire section area from the branch current, which is obtained from branch voltage and wire resistance. The model in Fig 4 and (1) are the foundation of the sleep transistor P/G network synthesis method that will be described in the remaining of the paper.
IV SLEEP TRANSISTOR LAYOUT CONSTRAINTS
In industrial power-gating designs, the sleep transistors are often designed to occupy two rows of standard cell placement in order to achieve optimal Ion/Ioff ratio and area efficiency. Consequently, these sleep transistors can only be placed in those rows that result in correct alignment of P/G rails of the sleep transistors and the standard cells in the rows.
For example, in Fig. 5 , the sleep transistor in black shows the correct placement where VVDD and VSS rails of the sleep transistor are aligned with the VVDD and VSS rails of the mirrored standard cell rows. On the other hand, the sleep transistor in red is not correct because VVDD rail of the sleep transistor are misaligned with VSS rails of the standard cell rows.
Sleep transistor row placement constraints
Besides row placement constraint, the sleep transistors need to be placed such that the permanent power straps can drop vias to connect the corresponding power pin on the sleep transistor so as to minimize IR drops and routing resources. For instance, in Fig. 6 , the sleep transistor in black can successfully drop vias to the VDD power pin (in pink) on the sleep transistor, while the sleep transistor in red is improperly placed away from VDD power strap and additional wires are required to connect the VDD strap to the VDD pin on sleep transistor. In industrial power-gating designs, these constraints need to be properly addressed and met for a production worthy 
Sleep transistor pin alignment constraints
In the proposed method, these layout constraints and implement are implemented as special rules to guide the sleep transistor placement in the sleep transistor power network synthesis.
V PRINCIPLE OF THE SLEEP TRANSISTOR P/G NETWORK SYNTHESIS
The sleep transistor power network synthesis is formulated as a constrained optimization problem, as shown in (2) where the sleep transistor P/G network is optimized on the topology of P/G network, the wire width and layer of P/G straps, and the number and placement of the sleep transistors.
where A sleep is total silicon area of the sleep transistors A straps is total metal area of VDD and VVDD wires w is weight on the cost of sleep transistor area IR n is IR drop on node n IR target is defined IR drop target j m is current density of VDD network branch m j EM is maximum current density defined to prevent EM violations.
The Asleep, Astraps, IRn and jm are all functions of VDD network grids, wire width, layer, and sleep transistor placement. IR drops and EM current density calculations are based on the solution of the equation (1) . The Asleep and Astraps are calculated based on the number of the sleep transistors and metal area of VDD network generated in an iteration of the optimization. The weight w is applied to Asleep in the objective function to direct the optimization to be more focused on the sleep transistor area than the VDD network metal area, because the sleep transistor area is more expensive than metal area. The value of w can be defined by users. The default value is 1. Figure 1 Sleep transistor P/G network synthesis flow
In this synthesis, a sleep transistor power network is represented by the model in Fig. 4 and is optimized by minimizing the objective function (2). The synthesis is an iterative process where the sleep transistor insertion and placement as well as the VDD and VVDD networks are optimized and adjusted. Once the sleep transistor power network is updated in the iteration, the objective function (2) is evaluated and checked for optimization convergence. The IR-drop and EM constraints defined in (2) are calculated by the methods described in section 2 based on the network node voltages obtained by solving (1) using an efficient matrix solver. The sleep transistor distribution represented by the fake via distribution is optimized together with the power networks. In each 
Sleep T
iteration, the optimization engine generates updated values of the optimization parameters, such as P/G network branch resistances and fake via resistances. If the changes of the parameters are smaller than the threshold defined, the power straps are sized based on the updated parameter values while fake vias are fixed because sleep transistors are not sizable. Otherwise, the P/G network topology (grids) is re-generated based on the updated parameter values. Heuristic rules are used to adjust the number and pitch of the fake vias based on the conducting resistance of the sleep transistor and the layout constraints described in section 3. Such heuristic rules include scaling up power grid pitch when the strap wire width is smaller than the defined minimum wire width and scaling down if the strap wire width becomes larger than the maximum wire width. After that, the newly generated sleep transistor P/G network is evaluated again in the next optimization iteration. This process continues until the optimization converges. At that point, the fake vias are removed and the sleep transistors are inserted at the positions of the fake vias. The layout constraints of the sleep transistors are verified and their power and control signal pins are connected accordingly. Finally, the power straps and via arrays are created and verified to comply with process and design specific layout rules and a sign-off quality sleep transistor P/G network is generated. The synthesis flow is outlined in Fig. 7 .
VI IMPLEMENTATION OF THE SLEEP TRANSISTOR P/G NETWORK SYNTHESIS
The size of equation (1) increases in quadratic with the number of instances in a design. This results in long run time. The number of optimization iterations also significantly impacts the run time. To improve the efficiency, a number of techniques in the synthesis method have been implemented.
Firstly, a coarse grain current source concept is adopted to model cell current sources of those standard cells in the P/G network grid as one or a few of cell cluster current sources. This greatly reduces the equation size. The cluster current sources are generated based on the user-defined worst-case power and supply voltage with an assumption of uniformly distributed cell placement. It is worth noting that final cell placement is not available at power planning stage. Therefore, the assumption of uniform cell placement is acceptable and commonly used in the power planning in SoC design industry.
Secondly, efficient industrial P/G network synthesis methods and heuristic sleep transistor insertion equations are leveraged to generate the initial sleep transistor P/G network, based on design size, macro placement, worstcase power dissipation, IR-drop target, maximum current density, sleep transistor channel resistance and layout constraints. As the result, the quality of an initial sleep transistor P/G network is significantly improved requiring fewer optimization iterations in the synthesis.
Thirdly, virtual power planning and virtual sleep transistor placement techniques are implemented in the method to efficiently create, update, and extract layoutbased data of the sleep transistor power network based on the synthesis model (Fig. 4) . This avoids the overhead of generating power networks and sleep transistors in layout during the optimization iterations. After the optimization converges in the virtual power plan space, the solution is committed in layout implementation where power straps and via arrays are created and sleep transistors are inserted to satisfy the layout rules and constraints.
VII APPLICATION RESULTS
The method has been applied to a 126mm 2 90nm industrial design. The design has 1.1M instances including 56 hard macros and runs at 185MHz main clock frequency. 80 power pads are placed around the chip. The permanent Vdd network is composed of the top two metal layers M5 and M6. The virtual Vdd network is built in M1 and M2 layers. The sleep transistor is a double-row custom designed switch cell from an industrial power-gating design. The switch cells must be placed under M6 straps to get direct via connections to Vdd. To verify the quality of the method, the sleep transistor power network of the designs was synthesized twice; one run with total IR-drop targets set at 100mV which is generally accepted IR-drop target in industry, and another run with a tight IR-drop target at 60mV. The synthesized sleep transistor power networks are analyzed using an industrial power plan analysis tool [16] . The results are shown in Table I . In the first case, the switch cell takes only 0.26% of the silicon area (core area). The permanent and virtual Vdd networks also have very low metal utilization due to simultaneous optimization of the sleep transistor and the permanent/virtual power networks. M1 rail is not quoted in the table, because it is the standard cell rail defined by process specification and is fixed. In the second case, both switch cell and power net utilization increase due to the deliberately tightened IR-drop target which is challenging to meet even without the sleep transistors by conventional power planning tools. Yet, the utilizations are still very small and excellent for an industrial design. There is no EM violation in both cases. The total maximum IR-drop on the sleep transistors, permanent Vdd and virtual Vdd networks are slightly over the IRdrop target and acceptable.
Ideally, the method would be compared with other sleep transistor P/G network synthesis methods. However nothing has been found in the literature of published examples of such methods applied to industrial powergating designs with custom designed sleep transistors and layout constraints. Therefore, the proposed method is compared with an industrial script-based custom sleep transistor power network generation method used in production power-gating designs. Three testcase designs, associated with the script-based method, are used for comparisons. The designs are implemented in a 90nm process with 6 metal layers and have the same chip area of 10.5mm 2 . However, the power dissipations of the designs are specified differently at 1W, 1.2W and 1.4W respectively for different clock speed, design complexity and cell utilizations. In the three designs, the max IR-drop target is defined at 50mV corresponding to 5% of Vdd at 1.0V. 54 power pads are evenly placed around the design connecting to power rings. The results of the method and the script-based custom method are compared in terms of max IR-drop, total silicon area of the sleep transistors and total metal area of the permanent and virtual power networks. The results are shown in Table II . No EM violation was reported in all of the cases. The results show that the method produces significantly better sleep transistor power networks than the scriptbased custom method. The max IR-drop is reduced by 13% from 52.5mV to 46.6mV in Design1, and by 13% and 15% in Design2 and Design3 respectively. The silicon area of the sleep transistors is significantly reduced by the method resulting in only 26% of the sleep transistor area required by the custom method in Design1 and 36% and 42% of that of the custom method in Design2 and Design3 respectively. The metal areas of the power straps in M2, M5 and M6 layers are increased in the method by 29%, 24% and 25% in Design1, Design2 and Design3 respectively. It is worth mentioning that the metal area has a much smaller impact on the silicon usage than the sleep transistor silicon area, because 6 layers of metal are available for routing and silicon area is directly linked to chip cost. The average saving of 65% of silicon area used by the sleep transistor in the method compared with the custom method is significantly valuable to production low-power designs. It is also worth noting that the script-based custom method is a layout based heuristic method which generates the sleep transistor P/G network based on the cell count and average power dissipation estimated from chip size and clock frequency regardless of the fact that power dissipation varies significantly with switching activities and cell density which are determined by design complexity and type. Consequently, the sleep transistor power networks produced by the custom method are the same in the three designs due to same chip size and clock frequency. The IR-drops from the custom script-based method are larger than that from the proposed method, because the method optimizes the sleep transistor distribution together with the power networks, i.e. grid pitch, metal widths and layers.
VIII CONCLUSION
Sleep transistors in industrial power-gating designs are custom designed as switch cells with fixed sizes for optimal switching and area efficiencies. Consequently, the sleep transistor P/G network optimization becomes a problem of finding the optimal number of the fixed size switch cells and their placement as well as optimal P/G network grids, wire widths and layers. Moreover, layout constraints on the placement of the custom designed sleep transistors must be considered in the sleep transistor P/G network synthesis. DESIGN-1
DESIGN-2 DESIGN-3
A practical sleep transistor P/G network synthesis method has been developed to address these requirements from industrial power-gating designs. The method simultaneously optimizes sleep transistor insertion and placement as well as the power network grids and wires for minimum area and maximum routability within a given IR-drop target. The sleep transistor imposed layout constraints are considered by the method in the power network synthesis.
In this method, a fake via concept is introduced to model conducting channel resistance, physical connection and placement of the sleep transistors in the sleep transistor P/G network synthesis. With the fake vias, the three components of the sleep transistor P/G network, namely the permanent P/G network, the sleep transistor distribution and the virtual P/G network, are modeled as a single network and optimized together to minimize area penalty while meeting the defined IR-drop and EM targets. The fake vias concept also makes it possible to leverage industrial power network synthesis methods in the sleep transistor P/G network synthesis to generate the optimal numbers and positions of the sleep transistors needed for a design.
This method has been applied successfully to industrial low-power designs and produced significantly better sleep transistor P/G networks compared with a scriptbased sleep transistor implementation method used in industrial power-gating designs. The sleep transistor area was reduced by 65% and IR-drop was lowered by 14% on average at small price of 26% increase in metal area in M2, M5 and M6 layers.
