, "Post place and route design-technology cooptimization for scaling at single-digit nodes with constant ground rules," J. Micro/Nanolith. MEMS MOEMS 17(1), 013503 (2018) Abstract. Standard-cell design, technology choices, and place and route (P&R) efficiency are deeply interrelated in CMOS technology nodes below 10 nm, where lower number of tracks cells and higher pin densities pose increasingly challenging problems to the router in terms of congestion and pin accessibility. To evaluate and downselect the best solutions, a holistic design-technology co-optimization approach leveraging state-of-theart P&R tools is thus necessary. We adopt such an approach using the imec N7 technology platform, with contacted poly pitch of 42 nm and tightest metal pitch of 32 nm, by comparing post P&R area of an IP block for different standard cell configurations, technology options, and cell height. Keeping the technology node and the set of ground rules unchanged, we demonstrate that a careful combination of these solutions can enable area gains of up to 50%, comparable with the area benefits of migrating to another node. We further demonstrate that these area benefits can be achieved at isoperformance with >20% reduced power. As at the end of the CMOS roadmap, conventional scaling enacted through pitch reduction is made more and more challenging by constraints imposed by lithography limits, material resistivity, manufacturability, and ultimately wafer cost, the approach shown herein offers a valid, attractive, and low-cost alternative.
Introduction
This paper is based on the imec N7 technology platform, with contacted poly pitch (CPP) of 42 nm and tightest metal pitch (M X ) of 32 nm. This technology is predicted to be similar to industry 5 nm. 1 The plot in Fig. 1 shows the scaling landscape for advanced nodes. The horizontal axis specifies the CPP in nm, as illustrated in the standard cell template, enables to shrink the width of the cell. On the vertical axis, the M X pitch is reported. This dimension is instead correlated to the cell-height reduction. It is, therefore, reasonable to adopt the product between the CPP and M X as a metric of area scaling at standard cells level. The constant product curves are then to be considered isoarea and potentially equivalent ways to implement the same node. It is important to observe that, especially for below 28-nm nodes, multiple and mismatching naming conventions appeared for designating similar technologies, further validating the need for a quantitative definition.
Motivation of the Work
In earlier nodes, the 0.5 factor area scaling demanded to follow Moore's law was seamlessly achievable through a 0.7 reduction of the CPP and M X dimensions. Scaling beyond imec N7 dimensions exclusively through pitch scaling is made more and more challenging by several factors. The first limitation is patterning resolution, whose cliffs are shown in Fig. 1 . Dashed lines indicate several patterning techniques available. Although the detailed analysis between the trade-offs of different patterning options is beyond the scope of this paper, we observe that even utilizing the self-aligned quadruple pattering (SAQP), which offers the highest resolution, a printability limit at 20-nm pitch is encountered. The second limitation is determined by device performance. As explained in Ref. 3 , reducing the CPP below 40 nm severely impacts the transistor I on , with up to 2× current degradation at 32-nm CPP pitch. This prevents the possibility of isoperformance scaling. In addition to that, the M X parasitics exhibit a superlinear increase as a function of pitch tightening as shown in Ref. 4 , adversely affecting both power and performance. Finally, very advanced lithography techniques are required, which significantly increase the wafer cost at every technology node transition. 5 
Post P&R DTCO Approach
Exploring design-technology solutions alternative to pitch scaling to make standards cells smaller for the same set of ground rules is an attractive way to reduce area for digital IP blocks. 6 Smaller footprint cells, however, also imply a higher pin density and a reduced number of routing tracks that pose to the place and route (P&R) tool, pin accessibility, and routing congestion challenges. Moreover, as a result of the reduced routing resources within the cell, a subset of standard cells might require to be enlarged horizontally, mitigating the benefits of track-height reduction. Taking into account both of these potential side effects requires multiple sets of standard cells to be designed, and their routability to be evaluated on a realistic cell distribution. In other words, post P&R experiments are necessary to quantify how efficiently the design and technology innovations propagate at the system level. An electrical analysis also needs to be conducted for a complete power performance and area (PPA) comparison, thus making the generation of a complete process design kit (PDK) needed.
Content
The rest of the paper is organized as follows: in Sec. 2, the main technology features are shown for imec N7 technology, and the PDK generation flow is documented. In Sec. 3, DTCO solutions alternative to pitch scaling are introduced and described. Section 4 describes the general setup and methodology adopted in the P&R experiments and the role of electronic design automation (EDA) tools in the DTCO loop. Section 5 shows the physical results deriving from the adoption of the solutions described in Sec. 3. Sections 6 and 7 complete the IP-Block level analysis with power-performance, and IR-drop results, respectively. In Sec. 8, conclusions are drawn.
2 Technology Platform and PDK Development
From Device to the Interconnect
Imec N7 technology uses a FinFET (FF) device, for which an electrical model has been implemented and calibrated using ballistic and TCAD simulations as in Refs. 7 and 8. As explained in Ref. 2, a local interconnect or middle of line (MOL) layer (MINT) was introduced in imec N7 technology to facilitate the internal routing of the standard cells and provide better connectivity between the front end of line (FEOL) and actual back end of line (BEOL) routing layers. In Fig. 2(a) , the cross section from the gate level to metal 2 clarifies the scheme adopted. The gate runs orthogonally with respect to the fins and is separated by a spacer from the M0A active contact. M0G is used to offset the gate laterally to guarantee a gridded alignment of the VINT via, which connects both M0A and M0G to the MINT layer. MINT allows to perform the greatest part of the cell internal routing, that is completed on M 1 and for particularly complex cells, through connections up to M 2 . In modern process nodes metal pitches, widths, and thicknesses gradually increase from lower to upper layers of the BEOL 9 as schematically shown in Fig. 2(b) . Table 1 shows the main process features of the routing and via layers. M X layers are the tightest and most expensive, and are utilized to optimally connect to the small geometries of the cells and resolve congestion issues. M Y and M Z layers offer lower parasitics due to increased width, thickness, and pitches, and should be utilized as early as possible, especially for longer interconnects. The interconnect parasitics were taken into account with the methodology shown in Ref. 10 . Table 1 shows the patterning techniques used for each of the layers. It is clear that the 
Enabling the Post P&R DTCO
The process/litho assumptions were used to drive the dimensions of the devices and standard cells. The detailed analysis of the standard cell design is performed in Ref. 12 , where a 1-D template-based technique was used to build multiple libraries in 7.5-tracks and 6-tracks. Once the standard cells design is completed and device models are available, the. lib files containing the timing and power views of the given library are generated through the flow shown in Fig. 3 . The .lef file containing the abstract physical view of the cells is also to be generated. Only the layers above MINT (included) are made visible to the P&R tool. The RC and dielectric properties of the MOL/BEOL are coded into a process description file (.ict format) that allows the generation of a technology file (qrcTechfile) 13 used to accurately extract RCs during P&R. Finally, the set of rules modeling the given litho/process constraints needs to be coded for all layers exposed to the router in the technology lef.
14 Figure 4 documents and graphically illustrates the main rules used for each class of layers.
Alternative Solutions to Pitch Scaling
To enable area scaling at the IP-block level without modifying the set of ground rules (Fig. 4) , alternative solutions were explored. These solutions were named "scaling boosters" and defined as: design, process, or EDA options, that when used in conjunction, allow to reduce area at the IP-block level. The list of the scaling boosters explored, with a description of their intended impact, is provided in Table 2 . Each of these solutions will be described in detail in the following paragraphs. Figure 5 summarizes the area impact for the scaling boosters that directly decrease cell 
Number of Tracks Reduction
In this work, the number of tracks is defined as the cell height divided by the MINT pitch. Standard cells single height is, therefore, 240 and 192 nm for 7.5-and 6-tracks cells, respectively. Moving from a higher (T1) to a lower number of tracks (T2), the maximum area shrink achievable is given by T2/T1 that corresponds to a 0.8 factor in the case of the transition from 7.5-to 6-tracks. The potential side effects 16 of such a transition could be represented by: (i) horizontal cell enlargement, due to increased difficulty of intercell connections, (ii) decreased routability and placement density, and (iii) reduced performance at the IP-block level due to fin depopulation. As shown in Fig. 5 , it was possible to avoid cell enlargement both for NAND2 and DFF and achieve the 0.8 full area shrink. (ii) and (iii) Will be discussed in Secs. 5 and 6, respectively.
Single Diffusion Break
The single diffusion break (SDB) is a scaling booster enabled by the process flow explained in Ref. 17 . In essence, from a standard cell design point of view, a more selective fin-etching allows to separate different devices with a single dummy gate rather than two, (Fig. 6 ) yielding an area shrink in the horizontal dimension. From Fig. 5 , we see that this feature has more impact for simple cells as it allows the 6-tracks NAND2 to further shrink from 0.8 to 0.6, with respect to the original 7.5-tracks dimensions, whereas it has a more limited benefit for the DFF that reduces from 0.8 to 0.74. More flexibility in complex cell design, cell width reduction.
M1 and MINT open to routing Pins on M0 and M1, allowing for routing in M1 and M0 extensions.
More routing resources available. Helps router for pin access.
"Vertical" power mesh with outbound rail
Smaller rail footprint reduces VDD/VSS impact on cell area.
Less routing resources consumed by the power delivery network improve routability.
Deep trench on MINT Increasing height of MINT layer. Making the "vertical" power mesh electrically viable.
Porous cells
Two dummy tracks are inserted into the center of largest cells.
Enlarges some problematic cells but improves routability. 
Self-Aligned Gate Contact
An additional technology process flow 18 that is meant to shrink the width of the cells is the self-aligned gate contact (SAGC). The SAGC makes it feasible to place the gate contact over the acive area rather than constraining its placement to the p − n separation track. As exemplified in Fig. 6 , this additional degree of freedom allows to better stagger the contacts and effectively reduce the usage of dummy gates. Figure 5 shows that this feature is particularly leveraged in complex cells: the 6-tracks with SAGC are allowed in fact to further reduce the normalized DFF area to 0.57 while providing no additional benefits for the NAND2.
M1 and MINT Open to Routing
The cell architecture has been engineered with most of the pins on MINT, and the remaining connections completed on M1. The version of the router used in this work 19 has been enhanced to better resolve pin accessibility issues extending the MINT pins to skip to free tracks and use the depopulated M1 for intracell routing and short connections. The clips from the tool in Fig. 7 illustrate these concepts. The turnaround time, from the submission of the enhancement requests to the availability of a beta (nonproduction) build incorporating the additional features, was generally less than two months. It is clear that the development effort will be correlated to how disruptive the impact of the modification is on the preexisting flow, resulting in a significant delta between simple fixes which could take less than a man-week, and complex methodology changes that might require multiple man-months to propagate the enhancement across several engines. This consideration further highlights the necessity to involve EDA early in the DTCO loop, and align on the expected efforts to guarantee the timely readiness of the capabilities.
"Vertical" Power Mesh With Outbound Rail
To improve routability in the 7.5-tracks scenario, and especially to enable 6-tracks cells, it was also necessary to co-optimize the cell architecture, with the local rails (from MINT to M3) of the power delivery network (PDN). In fact, reducing cell height, the traditional solution 2 in Fig. 8 (a) using a multicritical dimension (CD) power rail on M2, is no longer applicable as it would consume too many of the routing resources on M2, critically reducing the number of tracks available for signal routing. This topology also degrades placement quality due to the interaction of the M1 pins in the standard cells, and the M1 power staples connecting the M2 to the MINT power rail. The solution adopted here, shown in Fig. 8(b) , was to remove the power rail on M2, introducing vertical power rails on M1, and using M2 only to strap together the stripes on M1 with parallel stripes added on M3 to decrease the resistance. The solution in the next paragraph was also used to further compensate the M2 power rail removal. The electrical validation will be shown in Sec. 7, where IR-Drop robustness of this topology will be proved. The topology consumes the whole vertical track on M1 for the VDD/VSS stripes, whose utilization was nevertheless severely constrained also in the original topology due to the presence of M1 staples. The choice of the distance between the vertical stripes (S i in Fig. 8 ) is thus not determined by cell height, and it is evident that this spacing will play a fundamental role in the tradeoff between routability and IR-drop. In fact, tightening this dimension more stripes will be inserted reducing IRdrop, but routability will be challenged due to placement quality degradation and reduced signal tracks available on M1. 20 
Deep Trench on MINT
Removing the M2 power rail poses IR-drop challenges that could be mitigated utilizing a multiCD power rail on MINT.
In our technology, we decided to keep the mint power rail single CD that enables the more compact standard cell architecture adopted, and use the deep trench technology in Ref. 21 to mitigate IR-drop.
Porous Cells
To facilitate the insertion of the largest cells (e.g., DFF, fulladder) below the VDD/VSS stripes, the insertion of two extra tracks in the middle of these cells can be considered. 20 This makes a reduced subset of cells more "porous" to the power stripes at the cost of ∼15% area enlargement on the single cells that is intended to be recovered with increased placement density. This solution is expected to be particularly efficient in a scenario requiring a tight spacing of the vertical stripes (e.g., 1 μm), which as shown in Fig. 9 becomes comparable with the width of the largest cells, making placement legalization extremely challenging without this solution.
P&R Experiments Methodology
To address the complexity of advanced nodes, we need a holistic integration and co-optimization between technology innovations, design solutions, and finally EDA enhancements, for instance, in the following domains: pin accessibility, intracell routing, interaction between placement and PDN, and layer promotion. It is, therefore, essential to codevelop the EDA tools as shown in Fig. 10 , to guarantee support to the latest design-technology constructs. 22 Post P&R feedback determines how successfully it is possible to propagate the DTCO solutions into PPA gain at the chip level which represents the ultimate goal of the whole DTCO loop. The results from the experiments reported in Secs. 5-7 are benchmarked on the LDPC design from OpenCores 23 that counts ∼50 K-gates. The choice of this core was made to facilitate the disclosure of data, avoiding industry policies that restricted the publication of the benchmarking figures for other larger and more industry relevant cores available to the authors. Although the selected reference design is relatively small compared with an industry standard, it is larger than the designs normally used by academia 9, 16 to conduct routability studies. The validity of the routability and area benchmarking was further guaranteed by verifying that the metal stack selected (Fig. 2) is fully utilized up to the top layers to implement the chosen design, testifying that the design is large enough to challenge the router with this setup that enables us to appreciate the impact of the different solutions. Finally, the comparative analysis showed similar PPA trends for this core and larger designs tested, allowing to obtain similar results and conclusions as the ones derived from the LDPC. Multiple libraries including ∼60 cells were used; each library based on the technology was described in Sec. 2 and combining the different solutions shown in Sec. 3. The benchmarking was performed with the flow in Fig. 3 using the 16.2 build of Cadence Innovus Implementation System. The methodology that was followed in Sec. 5 to individuate the maximum routable density (PD max ) was to sweep placement density with a resolution of 2.5% and check the number of design rule violations (DRCs; rules violations) as shown in Fig. 11 . The typical behavior that was observed is that the order of magnitude of the number of violations changes beyond a certain value of placement density, facilitating the individuation of the routability limit. The maximum density, combined with the cell distribution from P&R, determines chip area according to the relationship as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 3 6 0 ChipArea ¼
The total cell area in Eq. (1) is given by the linear combination between the standard cell area of each cell type and its instance count in the actual cell distribution resulting from mapping the design in the given library. Synthesis will try to restructure the netlist, decreasing the count of larger cells where possible. The nominal supply voltage (VDD) was set to 0.65 V across all the experiments and a single typical corner was used. The physical experiments (Sec. 5) were conducted with low target frequency of 500 MHz, to decouple the physical analysis from issues related to timing closure, as density increases due to buffer insertion. Starting from this frequency point, frequency was swept in steps of 500 MHz to perform the PPA (Sec. 6) and IR (Sec. 7) evaluation. It is important to specify that the libraries used do not contain high drive cells, reducing the maximum achievable frequency.
Physical Results

Experimental Setup
The scaling boosters described in Sec. 3 were combined into seven different standard cells libraries. The libraries differ in the usage of the scaling boosters and in the P&R setup as shown in Table 3 . The proposed sequence of experiments was set up to progressively extract learnings on the area impact of the several scenarios. Absolute cell area by cell type is shown in Fig. 12 for library #1, used as reference.
The scaling factor of each cell with respect to the reference library is plotted for all the other libraries as shown in Fig. 13 .
Results and Analysis
Results from the comparative analysis are reported in Fig. 8 ) of 2 μm, which is suitable for a low IR-drop scenario. Engineering the standard cells with an outbound power rail made it necessary to enlarge some of the cells with respect to the reference library as shown in Fig. 13 and confirmed by the increased cell area of run#2 with respect to run#1. Nevertheless, this penalty was overcompensated by a placement density increase of 10% (from 70% to 80%) that resulted in an area gain of 7%. The proposed PDN is, therefore, already beneficial for 7.5-tracks, while being an essential enabler for the 6-tracks topology. Comparing run#2 and run#3, we quantify the impact of the usage of porous cells in the PDN architecture still in the 2-μm spacing PDN scenario. This solution further enlarges the complex cells causing cell area to increase of >10% with respect to the reference run. In this case, the þ2.5% density increase (from 80% to 82.5%) is not sufficient to compensate cell area enlargement and we conclude that porous cells do not significantly help to reduce area in a relatively large (e.g., 2 μm) PDN spacing scenario, where routability is already good. However, if we tighten the PDN spacing from 2 to 1 μm as in run#4, without the porous cells, as it would be required in a high-IR drop scenario, we observe a 10% (from 80% to 70%) placement density degradation with corresponding chip area increase. In this scenario, the porous cells help to recover þ7.5% placement density as in run#5. This yields a −5% chip area respect to run#4, demonstrating the usefulness of the porous cells in a tight (e.g., 1 μm) spacing scenario of the PDN. However, compared with run#2, we still witness an area increase of ∼10%, confirming the expected trade-off between routability and IR-drop. In run#6, we switch to the 6-tracks cell architecture without additional scaling boosters in a 2-μm PDN spacing scenario. Comparing cell area for the libraries in run#6 and run#2, we notice that it was possible to achieve the full 0.8 area gain on the greatest part of the cells. In P&R, it was possible to maintain in run#6 the same density as in run#2 (80%) transforming the cell-level area gain into actual chip area gain. In run#7, we further scale the 6-tracks through the combined usage of the SDB and SAGC which shrink cell area of >35% with respect to run#6, losing only 2.5% placement density in P&R respect to run#6, in spite of the relevant increase of pin density. Comparing run#7 with the reference initial run (run#1), we verify that it was possible to reduce chip area below a factor 0.5 or in other words, achieve the area benefits equivalent to a full node without pitch scaling. In run#8 and run#9, the PDN spacing is tightened to 1 μm in a 6-tracks scenario without and with SAGC and SDB, respectively. Although porous cells can help in maintaining higher placement densities (75%), we confirm that in both the 6-track scenarios the reduction of PDN spacing from 2 to 1 μm impacts final chip area by >10%. To substantiate more quantitatively the pin density increase, which needs to be handled by the tool, we compared in Fig. 15 the pin density histograms and heatmaps of a 7.5-tracks scenario and a 6-tracks with SDB and SAGC. We witness a dramatic increase in pin density that the tool has to resolve, with significant population of bins with pin densities between 40% and 50% in the most scaled 6-tracks library.
Power and Performance Results
Moving from 7.5-to 6-tracks implies the transition from 3-fins per device to 2-fins per device scenario. To electrically evaluate the impact of this transition, run#2 and run#7 were chosen and starting from the initial frequency (500 MHz), a frequency sweep in steps of 500 MHz was performed. The summary of the PPA results is presented in Table 4 . Maximum density was targeted in the first three frequency steps: 500 MHz, 1 GHz, and 1.5 GHz. For the highest frequency run (2 GHz), target density was decreased by 5%. The motivation is that at 1.5 GHz, an increase in the final density in the range of 5% was already witnessed due to buffer insertion, testifying challenging timing closure. Lowering the target density at maximum frequency allows to allocate area for the buffers making the design still routable and DRC clean. The area gain of the "boosted" 6-tracks versus the 7.5-tracks is consistent with Sec. 5 across all the frequency range. Table 4 shows the following common static timing analysis metrics: worst negative slack (WNS), total negative slack (TNS), and number of failing paths. The WNS quantifies the worst violation in the design of the setup and hold margins, indicating how far the worst path is from timing closure. The TNS is the sum of all the negative slacks, i.e., the delta between the required and actual arrival time at the registers inputs. From the analysis of these metrics, we observe that timing can be closed without violations up to 1.5 GHz for both the 7.5-tracks and the "boosted" 6-tracks, whereas the 2 GHz runs fail to reach the frequency target with more than the half of the paths failing. A more detailed way to analyze timing is to compare the slack distributions for the two scenarios across the frequency sweep, as in Fig. 16 . For the initial frequency (500 Mz), we see that timing is not challenged, and the greatest part of the paths exhibits slacks >500 ps. Increasing frequency, the distributions shift left and their right tail (larger slack values) reduces, graphically showing the reduced margin from the timing targets. Finally, for the 2 GHz runs the curves are approximately centered on zero with lowest values (WNS) up to −30 ps and half of the area with slack lower than zero (TNS). From a technology point of view, the key learning is that, considering a full library at IP-block level, the transition to 2-fins per device demanded by the 6-tracks cell architecture can be enabled at isoperformance, as the slack distributions do not substantially differ in the two scenarios in each of the frequency steps examined.
Performance
Power
Power calculation was performed propagating default switching activities, thus using statistical methods rather than specific input vectors. Table 4 shows that across all the frequencies the internal and switching powers are responsible for the greatest part of the total power, with their relative contributions roughly evenly split. The internal power takes into account the power dissipated by charging/discharging parasitics capacitances inside the cells, plus the short-circuit power during the transition. The switching power derives instead from the charging and discharging of the load capacitances seen by the driving cells. 24 Leakage power does not exceed 3% of total power in neither of the runs, being partly related to the fact that the analysis was done at a single typical corner. Figure 17 shows the linear increase in power versus frequency for the two scenarios. Both internal and switching powers are lower in the 6-tracks-2 fins, determining power savings of >20% across all the frequency range. Investigating the origin of the unchanged performance and reduced power at IP-block level of the 6-tracks-2 fins is an ambitious goal as it is dependent on a plethora of factors: cell timing properties, cell power properties, cell distribution, wire distribution, wire resistance, wire capacitance, pin capacitance, frequency, buffer insertion and timing optimization strategy from the tool, congestion, etc. For this reason, an analytical model that takes into account this complexity would be an extremely challenging task, confirming the necessity for a post P&R approach. Nevertheless we highlight in Table 4 , the reduced pin capacitance of the 6-tracks runs (up to >20%) that is certainly one of the key contributors of the power benefits and isoperformance.
Runtime
Finally, we report in Table 5 the typical runtimes using eight CPUs, of each implementation step 19 of our single corner flow, from synthesis to postroute optimization, based on the IP Block used. Of course actual runtimes vary as a function of DRC number and target frequency. These data demonstrate the possibility to perform a post P&R experiment on an IP-block of similar size in slightly >1 h, allowing to explore multiple DTCO options with reasonable turn around time (TAT). For reference and completeness TAT for a larger core counting, 1 M instances were also included. Once the trends have been quickly individuated though the runs with the smaller core, the conclusions can be confirmed and validated on the larger design.
IR-Drop Results
For the runs that closed timing (up to 1.5 GHz), we also verified power integrity that has been described in Ref. 25 as one of the major impediments in single-digit node implementations. As we want to focus the analysis on the lowest layers of the PDN, namely from MINT to M3, we assumed ideal supply voltage (VDD) and ground (VSS) above M4. Because of this assumption, we want to target an aggressive IR-drop limit of 2.5% VDD, corresponding to 16 mV. A typical 5% VDD 26 target (on all layers) leaves an additional 2.5% VDD for the upper layers, where an efficient optimization is possible through nondefault rules (NDR), metal width enlargement, via arrays and dedicated layers for the power mesh on the thickest layers. The approach used to calculate IR-drop is a vectorless dynamic, domain-based analysis.
24 Figure 18 shows the cumulative distribution functions (CDF) for the dynamic IR-drop values at the nodes of the power mesh. Also for IR-drop, we observe similar behavior for the 7.5-and 6-tracks runs. This is qualitatively understandable if we consider that the area reduction for the 6-tracks that contributes to increase power density is counterbalanced by the power reduction seen in Sec. 6, and by an increased "density" of power rails on MINT (every 6-tracks rather than 7.5). The curves in Fig. 18 shift to higher values of IR-drop as frequency is increased. We extracted the IR-drop values at the 99% percentile rather than the maximum value, filtering out the extreme hotspots that should be fixed manually and that could be misleading for a comparative analysis. We conclude that the 16 mV IR-drop limit target is met for the 500 MHz and 1 GHz runs, while we reach close to the limit at 1.5 GHz. For this range of frequencies (and beyond), it is, therefore, reasonable to switch to a tighter value of the PDN spacing (S i ) as shown in Sec. 5, moving from 2 to 1 μm. Using the results from Sec. 5, we can switch from the configurations in run#2 and run#7 to the ones in run#4 and run#9, that use porous cells to maintain high placement densities: 77.5% and 75%, respectively, corresponding to an area loss in the range of 10% with respect to the corresponding runs at 2-μm spacing. Comparing the IR-drop values between the runs with 2-and 1-μm spacing allows to quantify the trade-off between routability and power integrity. This comparison is done in Fig. 19 both graphically through the heat-maps and quantitatively in the table reporting the IR-drop values at 99% percentile. Both analyses demonstrate significant reduction of dynamic-IR tightening the PDN spacing to 1 μm, up to >40%, allowing the design to meet the IR-drop target and hypothetically allow further increase in the frequency through the introduction of high drive cells.
Conclusion
In this paper, the impact on post P&R PPA of different combinations of standard cell architechtures, design solutions, and technology options was investigated based on imec N7 node, using the same rules across all design-technology space and leveraging state-of-the-art EDA tools from Cadence. The quantitative area comparison demonstrated the possibility to achieve up to 50% area reduction within the same technology platform. The electrical analysis further proved with post P&R data that these area benefits can be obtained without performance penalty and power reduction up to >20%. Finally, the criticality of the trade-off between routability and IR-drop for these geometries was quantified, showing that power mesh dimensions for tight IR-drop requirements can determine up to >10% area penalty. All these results combined demonstrated the feasibility to mimic the PPA gains of a node through cost-effective DTCO solutions rather than pitch scaling.
Luca Mattii is an electrical engineer working as a corporate consultant for Cadence Design Systems, currently supporting imec on physical implementation at advanced nodes. He received his BS and MS degrees in electronic engineering from the University of Pisa in 2011 and 2014, respectively, and he is now also enrolled in an industrial PhD at Braunschweig University. His current research interests include advanced nodes, design-technology co-optimization, and EDA.
Biographies for the other authors are not available. 
