The power distribution network (PDN) is an increasingly significant consumer of on-chip interconnect resources. Thus, PDN estimation is increasingly central to system-level interconnect prediction for modern ICs. PDN design and verification require accurate power estimation and realistic current source distribution across a die. However, at early design stages, detailed placement or switching information is rarely available, so that designers either rely on pessimistic overdesign, which can lead to severe routing congestion, or encounter unexpected voltage noise problems at late design stages, which can lead to costly design iterations. In this work, we seek to identify a general trend for power density. From both empirical and analytical studies on random activity distributions, we propose a power law of activity density, which can potentially enable estimates of power density and voltage noise, as well as of required power distribution network (PDN) resources, in early design stages.
INTRODUCTION
Due to the increase in transistor density and operating frequency, power consumption has also been increased. While operating voltage reduction can quadratically reduce power consumption, voltage margin is also reduced and the impact of voltage noise (e.g., IRdrop, and Ldi/dt) increases. To suppress the voltage noise, a large portion of on-chip interconnect resources is dedicated to the power distribution network (PDN). Major issues with PDN design are (1) how accurately and (2) how early power can be estimated: overestimation wastes valuable routing resources, while underestimation results in parametric yield loss as low-Vcc faults. Since PDN design is a very early step in the IC implementation flow, excessive voltage noise observed in late design stages due to insufficient PDN design will cause costly design iterations.
Power is estimated at all levels of design abstraction. Electronic system level (ESL) estimation typically exploits a history of measured power of IPs. Register-transfer level (RTL) tools can apply quick-synthesis method before power estimation. For gate-level netlists, more accurate power estimation techniques are available given accurate parasitic estimates and SPICE-characterized power models.
At all design levels, the most important input is the switching activity information. Switching activity information can be dumped from stimuli-based functional simulations and then fed to power estimation tools via .vcd, .saif, .fsdb etc. formats. However, functional simulation requires large runtime and substantial data storage. Finding a representative stimulus is time-consuming, and guarantees of quality or relevance are difficult to establish. Due to these difficulties, at early design stages (or even at signoff stages) vectorless power estimation is used. This approach relies on statistical assumptions for the switching probabilities of primary input ports; the switching probabilities are then propagated into internal logic with consideration of functionality. Existing statistical techniques in power estimation include transition density [7] , probabilistic waveform [9] and binary decision diagram (BDD) [4] based techniques [8] . Probabilistic waveforms [9] provide more details in time domain compared with signal probabilities on signal switching activities. BDDs [4] better capture signal correlations arising from reconvergence in a netlist.
For early-stage PDN design, another important issue is where the estimated power values should be distributed in a die. An estimated lumped power value of a functional block can be assumed to be evenly distributed to all nodes in an R(L)C PDN network, or a single current source can be assumed to be placed at the center of a target region of the PDN network. However, the former does not account for worst-case power distribution, and the latter can be too pessimistic and result in overdesign. To achieve a realistic current source distribution assumption, trends for current density must be understood at multiple length scales. Coarse-grain current densities determine C4 bump pitch and the amount of on-chip routing resources required for the power distribution network. Local power densities contribute to voltage noise hotspots, with increased voltage gradients then causing local on-chip performance variations. Figure 1: Power density with respect to sample area [11] .
Intuitively, local power density should increase exponentially as area reduces. For example, an LVT clock buffer (occupying only a couple of square microns) may dissipate up to tens of microwatts, even as the entire SOC chip (occupying a square centimeter) dissipates just one watt. One motivation for our work is from Figure 1 , which shows local power density with respect to normalized area, reported from an industry source [11] . The curve is described as
where I/A denotes the current consumption I within area A, and α represents the activity factor of the design. Coefficients c 1 , c 2 and c 3 capture design characteristics [11] . Although the curve does not explain any physical mechanisms, it is shown that the measured data (i.e., blue dots) closely follow the fitted power function (i.e., blue curve). 1 A second motivation for our work is the well-known Rent's rule, a simple, empirical power-law relationship between the number of I/O terminals T for a logic block and the number of gates N contained in that block [6] :
where k and p are empirical parameters. In the field of interconnect prediction, Rent's rule has provided a powerful foundation to various subfields such as wirelength distribution estimation [2] [3], fanout distribution estimation [13] , and via count estimation [12] [5].
Motivated by the power law relationship between the power density and area as shown in Figure 1 , and the variety of applications of the power law-based Rent's rule, we seek a power law for power density. In this work, we consider (1) the maximum possible activity density for a given area, and (2) how activity density changes with time. We also (3) seek to identify a general trend for the activity density, so as to enable estimates of power density and voltage noise, as well as of required power distribution network (PDN) resources, in early design stages.
DENSITY OF RANDOM ACTIVITY
Intuitively, given an average activity factor of a design, the activity density can change with the area and the location of the sampling area. Figure 2 illustrates the sampling of area and the changes of the activity density. In addition, activity density can change with time. Figure 3 illustrates the temporal dependency of activity density. At each time (e.g., clock cycle) t the activity can change due to the design's functional activity. We evaluate maximum activity density of a design using an uncorrelated activity model, as follows. Average activity factor (0 ≤ p ≤ 1) of a design is the average number of toggled gates (or nets) per cycle in the entire design. The maximum activity density d(m) is defined as the maximum activity observed within m × m (1 ≤ m ≤ N) windows among all possible m × m windows in the design. To remove the dependency to the design size, we define a normalized maximum activity density
Activity Density Calculation
Given an N × N array that represents a design having an average activity factor p, we find the maximum activity density over all m × m regions using the procedure in Figure 4 . For instance, for a design with p = 0.2, and N = 5 as shown in Figure 5 Procedure: MaximumActivityDensity Inputs: activity information ∈ {0,1} for each element in N × N array 
Maximum Activity Calculation for Artificial Data
To study activity density in large grid arrays and over many timesteps, an efficient counting method is required. Figure 6 shows an efficient algorithm to calculate the switching activity in an m×m window with bottom-left corner (x, y), from the given activity information countInBox1. The value of countInBox1 for location (x, y) is 1 if there is switching activity at that location, and 0 otherwise. The proposed algorithm iterates over m from 1 to N for all possible locations of m-sized windows within the N × N grid array, and computes the counts of activities in larger-sized windows from the previously-computed counts of smaller-sized windows, based on the dynamic programming approach. 
EMPIRICAL MODEL FROM POWER DENSITY CALCULATIONS
Due to the random nature of activity distribution, we find an average maximum activity density from multiple trials of activity distribution. Figures 7 and 8 show our design of experiments for a single timeframe and multiple timeframes, respectively, for various activity factors (p) and various sizes of timeframes (w). Figure 9 shows the normalized maximum activity density for different average activity factors. We observe that when m = 1, the normalized maximum activity density d norm (1) is roughly 1/p, and as m increases, d norm (m) decreases and converges to 1. Figure 10 shows the normalized activity density changes with respect to the number of timeframes. We observe that as the number of timeframes considered increases, the normalized maximum activ- Figure 7 , for a single timeframe.
Normalized Activity Density
Important characteristics of the maximum activity density are summarized as follows. (5) The decay rate of d norm (m) depends on p; for large p, the rate is small, but for small p, the rate is large.
(6) d norm (m) decreases slowly as w increases.
(7) As p decreases to near zero, d norm (m) diverges to infinity. 
Normalized Activity Density Model
Based on the observations in the previous subsection, we propose an empirical activity density model.
In Equation (2), the term 1−p p and the last constant 1 model the observed characteristics (2) and (3), the exponent w c 1 (p−1) models characteristics (4) and (5), and together these terms represent characteristic (6). The term m −c 2 p c 3 models exponential decay of the density with respect to m, i.e., characteristic (1).
From curve fitting (using nlinfit in MATLAB [15] ), we find model coefficients c 1 , c 2 and c 3 , and finally, we obtain the following model equation for the maximum activity density. 
Model Validation
We compare our model with actual data collected. Figure 11 shows the model accuracy with respect to different w values for a given p (= 0.05) and various m values. We can see that our model can correctly capture the impact of multiple timeframes w. Figure  12 shows the model accuracy with respect to p for a given timeframe (e.g., w = 1) and various m values. We again observe that our model can account for the impact of p with sufficient accuracy.
We also validate our model on a real design with the estimated power values using a typical power estimation flow. Figure 13 summarizes the following (typical) power estimation flow.
Run a cycle-accurate architectural simulator with benchmarks
and periodically take a snapshot of the entire circuit state, i.e., values of all registers and input/output traces.
2. Generate a Verilog testbench that initializes the circuit states, drives inputs and check outputs.
3. Synthesize, place and route RTL designs.
4. Run circuit (gate-level) simulation with the generated Verilog testbench and with standard delay format (SDF) backannotated, then generate a value change dump (VCD) file. To analyze power density trends, we split the design into 100×100 grids, and assign power values of cell instances to corresponding grids where the cell instances are placed. Then, we calculate power density changes with respect to the area (m). Figure 14 shows normalized maximum power density of the sparc_exu_alu design for a single (maximum-power) cycle and multiple clock cycles. From the figure, we observe that power density trends with the real design are similar to the activity density trend with artificially generated data: (1) power density increases exponentially as area increases, and (2) activity density slowly decreases as the number of cycles considered increases. However, power density from the real design shows even larger discrepancy from our model than the artificial activity density data. This may be due to the switching activity of the real design being significantly lower than the assumed average activity for artificial data.
Feed the VCD file into a power estimator (Synopsys

ANALYTIC MODEL BASED ON DISCREPANCY THEORY
Separately, we have attempted to analytically estimate maximum activity density based on Chernoff bounds [1].
Model Construction
Let G n denote an n × n grid of devices. We assume that each Figure 12 : Estimated maximum activity density from our model versus measured maximum activity density from experiments, for a single timeframe (i.e., w = 1) with various p.
device is either quiescent (i.e., no switching) or active (i.e., switching) at each time step. For time step t and device d i j , let s i j (t) denote the random variable that indicates whether d i j is active at the time step. We find the number of active devices in geometric rectangles R uv m = R uv m×m of size m × m whose top-right corner is (u, v). The number of active devices in a rectangle at time t is S uv m (t) = ∑ (i, j)∈R uv m s i j (t). We define the maximum activity count S m (t) over rectangles R uv m as S m = max (u,v) S uv m (t). We seek to determine how S m (t) behaves as a function of m and t.
We first start with a simplified time-invariant model in which we assume that s i j (t) = 1 with probability p independent of other devices and t. Since this model is time-invariant, we drop the dependence on t in our notation.
Fact 1 (Chernoff Bound) . Let x 1 , · · · , x n be mutually independent 0/1 random variables, each equal to 1 with probability p. If X = ∑ n i=1 x i , for any 0 < δ < 1, we have the following inequalities,
Equation (3) gives the probability of X being larger than its average value µ by δµ. Without loss of generality, X can be replaced with S uv m . With the average activity factor p, the average count µ of benchmark binary -bzip, equak, ... (SPEC-2000) power report ( S uv m is calculated as pm 2 , and by probability theory, the following inequality must be satisfied.
The probability Pr[S uv m > (1 + δ)µ)] is then bounded by ε/(n − m + 1) 2 , where (n − m + 1) 2 is the number of locations of an m × m window in an n × n design. Equating the right-hand side of Equation (3) to ε/(n − m + 1) 2 results in δ = 4p ln (n−m+1) 2 ε /pm. Substitution of δ in Equation (3) gives the following property.
Lemma. For any (u, v),
From the Lemma, Fact 2 for the maximum activity count of an m × m window for a single time step is derived. S T m ≤ pm 2 T + m 4pT ln (n−m+1) 2 ε with probability at least 1 − ε.
Model Validation
We verify Fact 3 using the generated data set described in Section 2. We calculate S T m from all 1,843,200 combinations of the following parameters, using the algorithm given in Figure 6 .
• p = {0.05, 0.10, 0.15, 0.25, 0.5, 0.9}
• n = {100, 200}
• m = {1,...,100} for n = 100, and m = {1,...,200} for n = 200
The inequality in Fact 3 is verified using data fitting. First, we transform the inequality into an equation with a model coefficient α as
Second, we find coefficient α from curve fitting with the obtained S T m using the generated data set and evaluated values from Equation (5) with given parameters p, m, n and T . Finally, we check whether the value of α is reasonable; if Fact 3 is correct, α should approach one. Table 1 shows the fitted α values for various ε values and errors of the proposed model in Equation (5) with each α. For each combination of p, n, m, and T , we calculate error as a ratio of the difference between the modeled value (S T m from Equation (5)) and the measured value (from our data set) to the measured value. We then take "average" and "maximum" errors from all p, n, m, and T combinations. From the table, we observe that all fitted α values are near one and are not larger than one. This suggests that Fact 3 gives tight estimates. We also observe that the model shows high accuracy; average model error is less than 2% for all ε values. However, as shown in the fourth column, we can see very high maximum error for a small portion of the data. Since S T m values are small when m is small, small discrepancy between the data and the estimation becomes significant in terms of percentage. (If we discard cases m < 10, the average error reduces to 0.67% ∼ 1.01% and the maximum error reduces to 51.94% ∼ 74%.) 
CONCLUSION
We have presented a general law for power density which can potentially enable new estimates of power density and voltage noise, as well as of required power distribution network (PDN) resources in early design stages. From computational experiments as well as discrepancy analyses using random activity distributions, and from experiments using a real design and a production design flow, we empirically observe a power-law relationship between maximum switching activity density and area. We provide closed-form activity density models from empirical data analysis and probability theory; these can be used to improve the accuracy and efficiency of early-stage PDN resource prediction. Our ongoing work includes further simplification of models and validation of the proposed models against large industry designs. Our ultimate goal is to develop fast and accurate PDN design and optimization methodologies for early stages of IC design.
