Abstract-Power integrity gains growing importance for integrated circuits in 45nm technology and beyond. This paper provides a tutorial of modeling and design for beyond the die power integrity.We explain the background of simultaneous switching noise (SSN) and its impacts on circuit designs. We discuss various models of different accuracy and complexity for the board, package and chip, and suggest how to select proper ones for board-package-chip co-simulation and co-design of SSN. We then review different design techniques to suppress SSN, including I/O planning and placement, decoupling capacitor allocation, package layer stacking and power/ground plane stapling.
I. INTRODUCTION
With technology scaling down to 45nm and beyond, power integrity has become a major bottleneck for the reliability of high performance system in package (SiP) or system on chip (SoC) integration. The reduced supply voltage and increased clock frequency and chip density has made the circuits more vulnerable to power supply noise than ever before.
Simultaneous switching noise (SSN), also referred to in the literature as ΔI noise, is considered to be a major threat to power integrity. Accordingly, we will focus on SSN in this paper. SSN primarily occurs due to a very large amount of instantaneous power/ground current from the simultaneously switching gates, which is quite common in clock synchronized circuits. SSN is mainly an inductive noise, and can be generally characterized by the equation Vn = L dI dt , where Vn is the magnitude of SSN. L is the parasitic inductance of the chip and the package, and I is the total switching current. In other words, the magnitude of SSN is proportional to the total parasitic inductance, and to the rate of change of the switching current. SSN noise causes supply voltage fluctuation; it reduces noise margin of digital circuits; it shifts the operating point of analog circuits, it decreases the effective driving strength of the gates; and it causes output signal distortion (e.g. jitters) impairing signal integrity.
SSN is most significantly observed around the output pads of the chip. The reasons are three-fold: First, in order to drive large offchip loads, the I/O buffers are usually very large in size, drawing a significant amount of instantaneous currents when they switch, as shown in Fig. 1 . Second, in clock synchronized chips multiple I/O buffers tend to switch simultaneously to create a large surge current with a sharp slope. Third, the parasitic inductance of the power distribution network of the package, including the interconnections to both the chip and the board, is usually in the range of a few hundred pico-Henries. Such large inductance has made the package a major contributor to the SSN.
As the on-chip switching current and the chip and package inductance jointly cause the SSN, it is clearly a global effect which requires the consideration of chip and beyond-die components. Accordingly, the accurate simulation of SSN requires the modeling of all the entire power delivery system, including the board, the package and the chip. In this paper we will discuss how the different models proposed in literature can offer a wide spectrum of complexity and accuracy tradeoffs, and how to select the proper models for the best accuracy and efficiency in SSN simulation for contributing factor. We will also point out some problems that remain unsolved. In order to reduce SSN, many design techniques have been proposed in the literature. They target various design freedoms in different design stages, trying to reduce the parasitic inductance, or trying to reduce the impedance between the power and ground plane. In this paper, we will briefly discuss a few of these techniques, and point out the critical issues in each of them.
The remainder of the paper is organized as follows. We will review different modeling techniques in Section II. Section III discuss various design techniques to suppress the SSN, including I/O planning and placement, decoupling capacitor allocation, package layer stacking, and power/ground plane stapling. Concluding remarks are given in Section IV.
II. MODELING
A. Overview Such a PDS structure leads to three distinct impedance peaks, if looking from the chip into the PDS, as shown in Fig. 3 [2] . The first and the smallest peak is in the kHz range, mainly caused by the coupling between the power regulator and the board. The second in the MHz range, mainly caused by the coupling between the package and the board. The third in the 100MHz range, mainly caused by the coupling between the chip and the package. As discussed in the previous section, the SSN from I/O buffers is first injected into PDS via such coupling. Fig. 3 . PDS Impedance seen from the chip [2] .
As can be seen in Fig. 2 , in order to model and simulate the SSN, it is necessary to model the entire PDS. Due to high complexity, the power regulator, board, package and chip are modeled separately and connected together for simulation, as shown in Fig. 4 . At the frequency of interest (100MHz and above), the pins, pads and traces connecting those blocks are mainly inductive. For short interconnect, bus or trace, they can be simply modeled as a lumped inductance. For long interconnects or to improve accuracy, they can be modeled as transmission lines. The biggest advantage of such block-based modeling is that it allows the models to be built with different complexity, depending on their importance to the metric of interest and the desired accuracy of the simulation. The power regulator block is typically modeled by an ideal voltage source and the connector to the board is modeled by a lumped inductance and resistance [2] . As power regulator does not directly contribute to the SSN, such a simple model should suffice.
B. Board and Package Modeling
The board and the package models can be classified into three categories: lumped models, distributed models and S-parameter models.
The lumped models uses a simple geometry with a few RLC elements (e.g. π equivalent circuit). However, these models lack accuracy and should only be used when the component to be modeled has little impact on the overall system performance.
To improve accuracy, we can perform parasitic extraction using partial element equivalent circuit (PEEC) method. The resulted circuits have a huge number of RLC elements, and model reduction [3] , [4] or some other simplification techniques [5] , [6] are needed to reduce the model complexity. However, the computational cost still remains high, limiting the application of the method.
As an alternative, we can directly extract the S-parameters of the board or the package over a wide range of frequencies, and build models based on the extracted S-parameters. A main advantage of such a method is that the board or the package can be treated as a black box when extracting the S-parameters. For a simple example, we have shown in Fig. 5 how the S-parameters for a two-port (fourterminal) black box can be obtained in the form of a 2 × 2 matrix. We add an incident wave a1 at port 1, and measure the reflected wave b1 at port 1, and the transmitted wave b2 at port 2. Note that when we measure S-parameters, the output port is typically loaded with some reference impedance (e.g. 50Ω). Accordingly, the ratio of b1/a1 defines S11, and that of b2/a1 defines S21. Similarly, by adding an incident wave at port 2 and load port 1 with some reference impedance, we can obtain S12 and S22. As such, the input/output behavior of this black box can be predicted without any regard for its content. S-parameters are frequency-dependent, and are measured by sending a single frequency signal into the network and detecting what waves exit from each port with a reference load impedance. Practically, their can be obtained using a 3D full-wave EM simulator such as HFSS [7] or vector network analyzer (VNA) measurements. By sweeping over a wide frequency range, they can reveal frequencydependent characteristics including the skin effect and the dielectric conductance effect.
S-parameter models can be simulated directly using convolutionbased methods. It is also possible to synthesize an RLC circuit from S-parameters. Note that for a given S-parameter, there are an infinite number of circuits that can correspond to it. The general approach is to create a circuit template with a certain topology and convert the measured S parameters to Y or Z parameters following the standard procedure [8] . Then by matching the Y/Z parameters of the template and the measured Y/Z parameters (e.g. matching their poles and zeros), we can determine the element values in the template [9] - [12] .
Note that by truncating the system poles in the right half of the s-plane, we can guarantee the system to be stable. However, if the synthesized circuit is to be simulated together with other components, passivity should also be guaranteed, which ensures that the circuit can only consume energy instead of generating it. It is because that the interconnection of stable systems is not necessarily stable; while the interconnection of passive systems is always passive. [13] offers a good review of all the passivity enforcement techniques for these models created from S-parameters, which can be characterized into three categories: The first category is based on the direct enforcement of certain passivity constraints by means of convex optimization. The second category enforces the passivity constraints at a few carefully selected discrete frequency samples by means of second-order cone programming. The third category is global passivity enforcement based on the Hamiltonian eigenvalue perturbation. It is important to note here, that the passivity of a model guarantees that the model is also stable as well as causal, but not vice versa [14] . Accordingly, it is sufficient to check the passivity of the model for stability and causality.
While the techniques are quite different, the general idea is the same: they try to force the system to satisfy certain passivity constraints by applying minimum perturbation to the system. In addition, those methods cannot be applied to large-scale system as they are computationally expensive. It still remains an open problem in the literature as to how to enforce passivity for largescale systems efficiently. Compared with lumped models, S-parameter models have much improved accuracy, especially when high-speed I/Os are considered. These models also have reduced complexity compared with the models extracted through PEEC.
Note that for SSN simulation a lumped RLC model is often used for the board while the PEEC models and S-parameter models are often used for the package, as the package plays a much more critical role in SSN than the board.
C. Chip Modeling
For the chip models used in SSN simulation, we are only interested in the I/O behavior instead of its detailed internal structures. Accordingly, the primary target of chip modeling for SSN is the characterization of the I/O buffers.
Generally, the I/O buffer models can be classified into two categories: transistor models and behavioral models. Nonlinear transistor models have been used in [15] - [18] . They are very accurate for highspeed I/Os. However, the direct use of detailed transistor models in SSN simulation becomes less practical, as the models become increasingly complicated with technology scaling.
The simplest behavioral model is a time-varying current source [2] , approximated as triangular spikes at the time of switching. Simple as it is, the non-linearity of the I/O buffer is ignored and the negative feedback effect of the driver circuitry cannot be captured [5] , [19] . The negative feedback effect is important, for example when we consider the relationship between SSN and the number of switching I/O buffers: It results in a sub-linear function that will saturate as shown in Fig. 6 [20], [21] , which is mainly caused by the fact that the drop in supply voltage will in turn reduce the switching current. Based on a simplified equivalent circuit derived from the internal structure of the I/O buffers, I/O Buffer Information Specification (IBIS) models have been proposed as shown in Fig. 7 [22] . An IBIS equivalent circuit includes five basic elements: pull down driver, pull up driver, power supply and ground clamping diodes, slew rate of the waveform, and parasitic elements associated with each pin. The pull down driver and clamping diode model I/O buffer characteristics when driven low or towards ground voltage; while the pull up driver and clamping diode model the characteristic when driven high or towards the power supply voltage. Ccomp is the intrinsic capacitance. Package characteristic resistance, inductance and capacitance are added by Rp, Lp, and Cp, respectively. IBIS models can be used to characterize I-V curves, rising/falling transition waveforms, and package parasitic information of the device. Besides the reduced complexity, IBIS models also have the advantages of protecting circuit and process intellectual properties, and also have easy portability.
The major limitations of IBIS models are three-fold: First, they can only consider a limited number of physical effects and many effects inherent to the devices are ignored. Second, the model is inaccurate if its load is out of the range it was produced for. If the package parasitics are changed, then the I-V curve should be re-generated instead of pulled from a previous model. Finally, the IBIS model cannot capture the dynamic characteristics of the driver accurately as the modeling technique relies primarily on static characteristics [23] . Accordingly, IBIS models are only good when the I/O speed is not high. To accommodate the situations where the dynamic characteristics are important (especially in high-speed I/O), several works have been proposed to use the radial basis function (RBF) to represent the I/O buffer's dynamic behavior [24] - [26] . While such models can be quite accurate, their complexities soon become intractable for complex driver circuits with multiple ports [27] . To improve this, modeling technique using spline functions with a finite time difference approximation has been proposed to model moderately nonlinear I/O buffers [28] . Spline function with finite time difference approximation includes the previous time instances of the buffer output voltage/current to capture the output dynamic characteristics accurately. As such, it takes into account both the static and the dynamic memory characteristics of the driver during modeling. The application of the method, however, cannot be extended to highly nonlinear buffers.
To conclude the section, we illustrate the importance of the chip, package and board co-simulation. We first connect all the models together as shown in Fig. 2 , and simulate the S11 parameter of the system, as well as the power supply voltage waveform at one output pad. Then we separately simulate each model, with their outputs loaded with the input impedance of the model they are connected with. The results are shown in Fig. 8 , from which we can easily see that the separate simulation can result in significant error.
III. DESIGN
For the circuit to function correctly, the maximum amplitude of SSN must not exceed the noise margin of the gates. To achieve this goal, many design techniques have been proposed to suppress SSN. In this section, we will briefly review these techniques.
A. I/O Planning and Placement
I/O planning and placement plays a key role as the interface between chip and package designs in a co-design flow. Most of today's high performance ICs are designed with flip-chip technology, which eliminates the wires for chip-package bonding. The bonding is achieved through bumps via surface mount technology (SMT). As shown in Fig. 9 , I/O cells are first connected to bumps on the die via redistribution layer (RDL) routing, then the die is "flipped" and mounted on the surface of the substrate, where bumps are connected to bump pads on the substrate. Finally, package trace routing is performed to furnish the connection between bump pads to package pins. For flip-chip designs, there are two types of I/O cell placement schemes: peripheral I/O and area I/O. Peripheral I/O scheme restricts the placement of I/O cells at the chip boundaries, and it is a costeffective way to transform traditional wire-bonding chip designs to flip-chip designs. Area I/O scheme allows I/O cells to be placed anywhere within the die area and it is inherently suitable for flipchip packaging. In [29] , the peripheral and area I/O schemes are also called extrinsic and intrinsic flip-chip designs, respectively. Interested readers are referred to [30] for more details.
For I/O planning and placement, we need to assign the pins and pads to different signals and power/ground supply. Different assignments can significantly impact the system performance, including signal and power integrity.
While many factors can affect I/O planning and placement, in this paper we will focus on the issues that need to be considered for SSN suppression. There are mainly two criteria.
First, as illustrated in Fig. 10 , the power and ground pins and pads for analog and digital circuits should be separated whenever possible. As such, the switching noise generated by the digital portion of the system will not affect the operation of the analog portion of the systems, which is generally more sensitive to power supply noise. Second, the pads and pins for power and ground should be made as numerous as possible. With the increased power/ground pin/pad number, the inductance will become smaller (parallel connection) which leads to a lower SSN. In addition, the slew of the curve characterizing SSN v.s. the number of switching I/O buffers becomes smaller with the increase of power and ground pin/pad number, as shown in Fig. 11 .
B. Decap Allocation
Decoupling capacitance (decap) allocation is another important technique to reduce the SSN, which is done after the I/O placement. Decaps can be used to short power and ground planes at high frequencies to control voltage fluctuations. Decaps can be inserted on chip or in package. The on-chip decaps are typically implemented using the gate capacitance of MOS transistors, which are small in value. They can be inserted in the white spaces after the placement of the circuit blocks. Different from the on-chip decap, off-chip decaps are discrete passive components with a given capacitance, equivalentseries resistance (ESR) and equivalent-series inductance (ESL). ESL and ESR are among the decisive factors for the cost (dollar-amount) of one decap. A set of typical values for discrete off-chip decaps are illustrated in Table. I. Both on-chip and off-chip decaps can be used to suppress the SSN. However, the latter are usually more effective, as they are typically much larger than the on-chip decaps, and can be placed close to the I/O pads where the SSN is significant [31] , [32] . Accordingly, in this section, we will focus our discussion on off-chip decaps. Considering the congestion from signal and power routing, offchip decaps can be inserted only at selected slots called legal positions. Legal positions are used to connect terminals of decaps inside or outside the package. The off-chip decap optimization often minimizes the total decap cost subject to power integrity constraints and congestion from package routing.
To illustrate the effectiveness of off-chip decap allocation, Fig. 12 shows the power supply voltage map across the top plane. The SSN noise amplitude without off-chip decaps is around 1.0V, and the supply voltage profile is shown in Fig. 12(a) . With the allocated decaps, the SSN becomes smaller (around 0.25V) as shown in Fig. 12(b) .
C. Package Layer Stacking and Power/Ground Plane Stapling
In high performance flip-chip package, multiple layers are typically used for power/ground planes and signal routing. The number of layers depends on the number of the signals that need to be routed, the cross-talk constraints on these signals, and the number of voltage domains. Particularly, the number of voltage domains constrains the number of power plane layers that should be assigned and how a layer should be partitioned and shared by multiple voltage domains.
Usually multiple power/ground planes are used in the package to keep the power supply noise low [2] and to shield the signal routing planes. If affordable, we should shield every routing plane by alternated power/ground planes in between. These power/ground planes in different layers are stapled through vias, as shown in Fig. 13 . The number of vias used and the locations of these vias can significantly impact the power integrity of the package. [34] illustrates the impact of the number of vias and their locations for the package design shown in Fig. 13 , assuming that the power supply of the board is ideal and that the vias. Measurement results indicate that the resonance frequency is shifted towards higher frequencies with the increase of the number of vias. On the other hand, the locations of the vias do not have a significant impact on the resonance frequency. Instead, they change the inductance of the package. A centered via distribution where all the power/ground vias going downwards in the center (Fig. 14(a) ) always has a lower inductance than a uniform via distribution where all the power/ground vias connected to the chip first go down to the upper power/ground planes and then uniformly go downwards. A simple explanation for this is that the centered via distribution has a smaller current loop compared with the uniform distribution. Recall that the SSN is proportional to the total inductance of the system which is dominated by the package inductance, therefore we can conclude that the centered via pattern can be helpful in suppressing the SSN effectively. 
IV. CONCLUSIONS
Power integrity has become an increasingly important design consideration for circuit designs in 45nm technology and beyond. In this paper, we provided a tutorial overview of power-integrity driven modeling and design issues. We explained the background of simultaneous switching noise (SSN) and its significance to the circuit designers. We discussed various models of different accuracy and complexity for the board, package and chip. We explained how to select proper models for the simulation of SSN. We then reviewed different design techniques to suppress SSN, including I/O planning and placement, decoupling capacitor allocation, package layer stacking and power/ground plane stapling.
