Abstract-This paper proposes accurate architectural-level interconnect wirelength and fan-out models. Existing models are based on Rent's rule and fail to capture the impact of microarchitectural and implementation parameters. Hence, significant deviation is observed when validated against implementation data, i.e., up to 79% (22%) in total wirelength (average fanout). Our proposed models both enable architectural-level prediction of interconnect wirelength and fanout, and show significant accuracy improvement vs. existing models with respect to layout data.
I. INTRODUCTION Clock frequency, power consumption, and chip size are largely determined by the wiring requirements of a VLSI system. Hence, wirelength estimation has always been instrumental in determining physically achievable design implementations [4] . Likewise, interconnect fanout can be used in interconnect modeling to enable more accurate delay and power calculations.
Based on different design stages and available information at each stage, we categorize wirelength estimation approaches into: (1) analytical, (2) netlist-based, and (3) placement-based [1] , [6] . Analytical methods are based on Rent's rule, a famous wellestablished empirical relationship, and assume that the netlist is not known. However, the input may consist of the major factors, the total number of gates, and average number of pins per gate. In netlistbased methods, the total number of gates, total number of nets, total number of pins, and fanout distributions are easily obtained. Since we know exact values of major parameters, wirelength and fanout values can be predicted more accurately. Such estimates are typified by the "wireload models" used in RTL synthesis optimizations. After the placement stage, all the pins in the netlist are placed and the available routing resources, including number of routing layers, etc. are known and better estimation accuracy is achieved.
In this work, we propose interconnect wirelength and fanout models that have the flexibility of the analytical models with the accuracy of placement-based models. Our contributions are as follows.
• We propose accurate closed-form models to estimate interconnect wirelength and fanout. Our models are derived from implementation data.
• We consider microarchitectural and implementation parameters in our models to allow more accurate estimation of interconnect wirelength and fanout.
• Our models show significant accuracy improvement vs. the existing models with respect to implementation data. The remainder of this paper is organized as follows. In Section II we review two of the recent works in interconnect wirelength and fanout estimations [3] , [9] . Section III describes our implementation flow and testcases, and presents the details of our design of experiments. Section IV describes our modeling methodology and presents our proposed average wirelength and fanout models for one of our testcases. In Section V we validate our proposed models and compare them with the existing models. Finally, Section VI concludes the paper.
II. EXISTING MODELS
The underlying assumption for the existing models is that Rent's rule holds throughout an entire system. Rent's rule is a simple power-law relationship between the number of I/O terminals for a logic block, T , and the number of gates contained in that block, N as shown in [8] :
where k and p are empirical parameters. Hence, the number of nets of length l, Net(l) may be estimated by [3] 
where D(l) is the number of valid two-pin net placement sites and q(l) is the probability that a placement site is occupied. The occupancy probability can be derived as
and the number of 2-terminal net placement sites within the boundary of a finite, 2-D array of N gates is given in Equation (4) .
In addition, fanout distribution can also be derived by using Rent's rule recursively in a collection of gates. The following set of equations are due to [9] . In a system with N gates, the overall number of terminals connected through m-point nets is N ·Tnet(m), where Tnet(m) is the number of terminals shared through an mpoint net of each gate in the boundary of the m gates. Therefore total number of m-point nets is simply N · Tnet(m)/m. Hence, total number of m-point nets in the entire system can be described by Equation (5) . Subsequently, replacing m with F o+1, where F o is fanout, gives Equation (6) .
Equation (6) describes the fanout distribution in a logic network of N gates, in which k and p are Rent parameters. Despite their flexibility, existing analytical wirelength and fanout models are not accurate due to (1) not having accurate information about the netlist, and (2) not taking into account the combining impact of microarchitectural and implementation parameters. Final design outcomes are affected by certain optimization steps that are employed during design implementation, i.e., pre-placement, post-placement, and pre-clock tree synthesis optimization steps, etc. Hence, the choice of implementation parameters can significantly change the quality of results. In this work, we consider the impact of relevant implementation parameters in our models (cf. Section III). Figure 1 shows our implementation flow which includes traditional SP&R flow plus wiring characteristic extraction and model generation steps that we have scripted for "push-button" use in our experiments. The steps in Figure 1 represent the major physical design steps. At each step we require that the design must meet the timing requirements before it can pass on to the next step. In our flow, we first synthesize corresponding RTL codes of each of our testcases with worst-case timing libraries. During synthesis process we can set certain optimization objectives among which are: (1) target frequency, (2) target area, and (3) target power. Choosing different combinations of these objectives yields widely different quality of results. Hence, to mimic a typical industrial timing-driven flow, where certain performance constraints must be satisfied, we impose the target frequency as the primary objective while area and power minimization are considered as secondary objectives. In addition to optimization parameters, there are some 'inherent noise' in IC implementation tools that may be aggravated with respect to tightness of the timing constraint (i.e., target frequency) [7] . Therefore, we implement the designs at slower frequencies than the maximum achievable.
III. IMPLEMENTATION FLOW AND DESIGN OF EXPERIMENTS A. Implementation Flow and Tools
Using synthesized netlist we implement the designs through place & route (P&R) steps using different row utilization and aspect ratio values, at the floorplan stage. At the end of routing stage we generate the placed and routed DEF, and obtain wiring characteristics, i.e., total wirelength, total number of nets, average fanout, etc. and associate them with the corresponding microarchitectural and implementation parameters. In order to compare our models against the existing models we need to provide Rent parameters. We use an internal Rent parameter evaluation program which calculates Rent parameters of a given circuit design using circuit partitioning method, MLPart [2] .
In our experiments we use two different IP cores: (1) on-chip router, and (2) Discrete Fourier Transform (DFT). We use Netmaker v0.82 and SPIRAL v1.0 to generate a library of synthesizable router and DFT core RTL netlists, respectively. We perform our experiments using libraries in TSMC 65GP technology. We use Synopsys Design Compiler [14] to synthesize the RTL netlists and Cadence SOC Encounter [13] to execute the P&R flow. We use different row utilization and aspect ratio values (explained more in Section III-B) and RentCon, an internal Rent parameter evaluation program, to extract Rent parameter values. Finally, MARS3.0 tool suite [12] is used for model generation.
B. Design of Experiments
As with any other model characterization task, it is not trivial to obtain meaningful and sensible results, as large number of variables determines the ultimate outcomes. Hence, we only choose the relevant microarchitectural and implementation parameters that are of interest at the system-level which also significantly affect the quality of results. In our experiments we have two main axes: (1) microarchitectural parameters, and (2) implementation parameters. Below we explain the corresponding microarchitectural parameters for each of our testcases and show the values that they take on in our experiments.
On-Chip Router. For router we pick a baseline virtual channel (VC) router in which VC allocation and switch allocation are performed sequentially in one clock cycle. In a VC router the microarchitectural parameters are:
• fw: flit-width is the width of link or channel between routers and is specified in bits In addition to the above list, there are three more parameters which control the resource usage of the DFT core: (1) dir which is a boolean parameter that selects whether the core accepts naturalordered input vectors and outputs bit-reversed-ordered output vectors, or vice versa, (2) scale which is a boolean parameter that determines whether the fixed-point data representation of intermediate values is scaled by a factor 2 to avoid overflow, and (3) dp which specifies the number of kernel modules instantiated in the DFT implementation (i.e., degree of parallelism). These parameters are to allow the user to customize the tradeoff between performance and resource usage and do not change the functionality of the core. Hence, we do not consider them in our models. However, if their impact on wiring characteristics (i.e., wirelength and fanout) is of interest, then they can be added to the model at the expense of increased setup cost due to larger number of configurations. Table  I shows the router and DFT core microarchitectural parameters and the values they take on in our design of experiments. Our implementation parameters include (1) clock frequency fclk, (2) aspect ratio ar, and (3) row utilization util. Target clock frequencies for the router design are 100MHz, 200MHz, and 400MHz, and we use 100MHz, 250MHz, and 450MHz for the DFT core. We also use three different aspect ratio values, 0.5, 1.5, and 2.5, while for row utilization, 50%, 70%, and 92% are used. Using automation scripts we vary the above microarchitectural parameters and generate corresponding RTL netlists for router and DFT cores, respectively. We then implement the RTL netlists using TSMC 65GP library.
IV. MODELING PROBLEM Approximating a function of several to many variables using only the dependent variable space is a well-known problem with applications in many disciplines. The goal is to model the dependence of a target variable y on several predictor variables x1, · · · , xn given realizations {yi, x1i, · · · , xni} N 1 . The system that generates the data is presumed to be described by [5] y = f (x1, · · · , xn) +
over some domain (x1, · · · , xn) ∈ D ⊂ R n containing the data. There are two main regression analysis methods: (1) global parametric, and (2) nonparametric. The former approach has limited flexibility, and can produce accurate approximations only if the assumed underlying function,f , is close to f . In the latter approach,f does not take a predetermined form, but is constructed according to information derived from the data. Multivariate adaptive regression splines (MARS) is a machine learning-based regression technique which is used in our methodology.
Given a design microarchitecture X μarch , circuit implementation C circ , we apply MARS to construct interconnect wirelength and fanout models, W L =f (x 
whereŷ is the target variable (i.e., average wirelength and fanout), c0 is a constant, ci are fitting coefficients, and bij(xij) is the truncated basis function with xij being the microarchitectural / implementation parameter used in the i th term of the j th product. I is the number of basis functions and J limits the order of interactions.
The optimal MARS model is built in two passes: (1) Forward pass: MARS starts with just an intercept, and then repeatedly adds basis function in pairs to the model. Total number of basis functions is an input to the modeling, and (2) Backward pass: during the forward pass MARS usually builds an overfit model; to build a model with better generalization ability, the backward pass prunes the model using a generalized cross-validation (GCV) scheme
where n is the number of observations in the data set, I is the number of non-constant terms, and C(M ) is a complexity penalty function to avoid overfitting. Using implementation data and the nonparametric regression technique explained above, we propose accurate architectural-level interconnect wirelength and fanout models. Figures 2 and 3 show our proposed average wirelength and fanout models for DFT core, respectively. V. EXPERIMENTAL SETUP AND SIGNIFICANCE ASSESSMENT We now validate our wirelength and fanout models described in earlier sections. For interconnect wirelength we compare four different models.
Basis
1) Our proposed model (Prop.) 2) Christie et al. [3] model with N , p, and k modeled as a function of microarchitectural and implementation parameters (Model 2) 3) Modified Christie model with a correction factor (Model 3) 4) Christie model with N , p, and k derived from layout data for each configuration (Model 4) For interconnect fanout we compare the same four models except that our reference model is the one proposed by Zarkesh-Ha et al. [9] .
Our proposed model (Prop.) uses machine learning-based regression method, i.e., MARS, to estimate interconnect wirelength and fanout as a function of microarchitectural and implementation parameters. For Model 2 we use the same regression approach to model N , p, and k as a function of microarchitectural and implementation parameters. Then, we use the estimated Rent parameters in the Christie (Zarkesh-Ha) model to obtain wirelength (fanout) values. In the Christie model, unit distance between adjacent placement sites, i.e., l = 1, is modeled as XdieYdie/N , where Xdie and Ydie are the width and height of the floorplan size of each design, respectively. In Model 3, we first model N , p, and k as a function of microarchitectural parameters only (i.e., with similar modeling approach as in Prop.). Next we apply the estimated Rent parameters in the Christie (Zarkesh-Ha) model and introduce a correction factor α such that Actual Wirelength = α × ModelChristie (Actual Fanout = α × Model Zarkesh−Ha). Then we model α as a function of implementation parameters only, using the same modeling approach as in our proposed models. In the modified Christie's model (Model 3), we do not include the unit distance model as used in Model 2 because unit distance depends implementation parameters. Finally, Model 4 uses Christie (Zarkesh-Ha) model to estimate wirelength (fanout) using extracted N , p, and k values from implemented designs.
For Models 2, 3, and 4, to extract Rent parameters, we use layout reports to obtain N , and an internal Rent parameter evaluation program RentCon, which extracts p and k values from a placed and routed DEF. To compute p and k we use circuit partitioning-based method in which we recursively partition the netlist into smaller partitions using min-cut bisection until the minimum number of cell instances over all partitions reaches 2, and we count each sourcesink connection crossing the boundary as one pin. For each level of the recursive bipartitioning, we compute the geometric mean values of cell instances and pins, which represent one data point in the fitted curve of the Rent parameter. To generate our models, we randomly select 10% of our entire data set (i.e., total of 2187 cases for each testcase), and test the models on the other 90% of the data. To show that the selection of the training set insignificantly changes our model accuracy we randomly select 10% of the entire data set five times and show the corresponding maximum and average error values (Table IV) . Tables II and III show comparisons of our proposed wirelength and fanout models with the above models, respectively. We observe significant accuracy improvement vs. existing models (Model 4) with respect to layout data. Our estimated average wirelength values show an accuracy improvement of up to 14.7% (58.2%), and 24.9% (42%) in average (maximum) errors for DFT and router testcases, respectively. For average fanout, we observe up to 9.3% (17%), and 5.4% (16.8%) in average (maximum) errors for DFT and router testcases, respectively. Finally, Figures 4, 5, 6 , and 7 show scatter plots of our average wirelength and fanout estimations against corresponding Christie and Zarkesh-Ha models (i.e., Model 4) with respect to layout data. 2 These plots confirm the accuracy improvement of our proposed models vs. existing models.
From Models 2 and 3 we understand that both microarchitectural and implementation parameters should be considered during model development. In addition, we can confirm that existing Rent's rulebased wirelength and fanout estimation models fail to correctly capture the impact of microarchitectural and implementation parameters which results in unrealistic wiring characteristics estimations. 2 These plots are for the DFT models, however, router models show similar accuracy. 
VI. CONCLUSIONS
In this work, we present accurate architectural-level interconnect wirelength and fanout prediction models. Our models show significant accuracy improvement vs. existing models, i.e., up to 58% and 17% for wirelength and fanout estimations, respectively. Accurate wiring estimations can drive effective early-stage design space exploration. To enable system-level design space exploration, we consider all the relevant microarchitectural parameters of our testcases. In addition, our models consider the combining effect of the architectural and implementation parameters on wiring characteristics.
