A fast area-delay estimation technique for RTL component generators by Jha, Pradip K. & Dutt, Nikil D.
UC Irvine
ICS Technical Reports
Title
A fast area-delay estimation technique for RTL component generators
Permalink
https://escholarship.org/uc/item/5b92c1n4
Authors
Jha, Pradip K.
Dutt, Nikil D.
Publication Date
1992-04-10
 
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Notice: This Material 
may be protected 
by Copyright Law 
(Title 17 U.S.C.) 
A Fast Area-Delay Estimation Technique 
,,,.-
for RTL Component Generator~ 
Pradip K. Jha and Nikil D. Dutt 
~ ~ 
Technical Report #92-33 
April 10, 1992 
Dept. of Information and Computer Science 
University of California, Irvine 
Irvine, CA 92717 
(714) 856-8059 
;¡ ·! ,·. 
j ', \ ~ ¡' ; ! ; 
1 .'I 
1 ¡ 
\' ,' 
'' 
Abstract 
An important benefit of high-level synthesis is rapid design space exploration through examina-
tion of different design alternatives. However, such design space exploration is not feasible with-
out fast and accurate area and delay estimates of the synthesized designs. These estimates must 
factor in physical design effects and technology-specific information in arder to achieve accuracy. 
High-level synthesis tools often use abstract, parameterized component generators far describing the 
synthesized RT design, and thus need to be supported by fast and accurate estimators for these pa-
rameterized RT-components. Ideally, we would like to obtain the actual area and delay attributes of 
each component by constructing (or generating) the designs. However, such constructive methods 
require excessive run times, prohibiting on-line integration with the tasks of scheduling and allo-
cation. In this paper, we describe a fast (on-line) method for estimating the area and delay of 
regular-structured generic RT components that are tuned to a particular technology library. The 
estimation models are generated using a least-square approximation on a set of sample data points 
from selected component implementations. We performed an extensive set of experiments to val-
idate our estimation technique on combinational as well as sequential RT component generators. 
The results show a prediction of the area and delay to within 10% of the actual values. These 
models have also been integrated with a high-level synthesis system to permit on-line estimation of 
a component's area and delay. 
ii 

Contents 
1 Introduction 
2 Related Work 
3 Problem Definition 
4 Our Approach 
4.1 Area-Delay Models 
4.2 Formulation of Estimation Models 
4.3 Estimation Model Generation . . . 
4.4 Selection and management of components in the database 
4.5 An Example ......... . 
5 Models for other components 
5.1 With respect to DTAS 
5.1.1 Gates 
5.1.2 Decoder 
5.1.3 M ultiplexer 
5.1.4 Comparator . 
5.1.5 Arithmetic and Logic Unit(ALU) 
5.2 With respect to LAST /TELE . 
5.2.1 Generic AND gate 
5.2.2 Mutiplexer .. 
5.2.3 Logic unit(LU) 
5.2.4 Encoder ... 
5.2.5 Shift Register 
5.2.6 Adder .... 
6 Experiments and Results 
6.1 Results ......... . 
6.1.1 With respect to DTAS . 
6.1.2 With respect to LAST /TELE . 
iii 
1 
2 
3 
4 
5 
5 
6 
7 
7 
10 
10 
10 
11 
11 
13 
15 
15 
15 
16 
16 
16 
16 
16 
17 
18 
18 
18 
6.2 Analysis ............ 18 
6.2.1 With respect to DTAS 18 
6.2.2 With respect to LAST /TELE . 19 
7 Summary 20 
8 Acknowledgements 20 
iv 
List of Figures 
1 Top level structure of the Logic Unit 9 
2 Structure per bit of Logic U nit 9 
3 Design of decoder . . . . . . . 11 
4 Top level structure of Multiplexer 12 
.S Structure per bit of Multiplexer 12 
6 Flowchart to estímate area-delay far a comparator 14 
7 Comparison of actual and calculated area far Multiplexer wrt DTAS 2.5 
8 Comparison of actual and calculated delay far Multiplexer wrt DTAS 2.5 
9 Comparison of actual and calculated area far Comparator wrt DTAS 26 
10 Comparison of actual and calculated delay far Comparator wrt DTAS 26 
11 Comparison of actual and calculated area far LU wrt DTAS 26 
12 Comparison of actual and calculated delay far L U wrt DTAS 26 
13 Comparison of actual and calculated area far AL U wrt DTAS 27 
14 Comparison of actual and calculated delay far ALU wrt DTAS 27 
1.5 Aggregate Error profile wrt DTAS . . . . . . . . . . . . . . . . 28 
16 Comparison of actual and calculated area far REGISTER wrt LAST /TELE 30 
17 Comparison of actual and calculated delay far REGISTER wrt LAST /TELE 30 
18 Comparison of actual and calculated area far ADDER wrt LAST /TELE 31 
19 Comparison of actual and calculated area far ADDER wrt LAST /TELE 31 
20 Comparison of actual and calculated delay far ADDER wrt LAST/TELE 32 
21 Comparison of actual and calculated delay far ADDER wrt LAST/TELE 32 
22 Aggregate Error profile wrt LAST /TELE . . . . . . . . . . . . . . . . . . 33 
V 
List of Tables 
1 Function table for a bit-slice of a Logic U nit( L U) 8 
2 Results for generic AND gate as campa.red to DTAS 23 
3 Results for generic OR gate as campared to DTAS 23 
4 Results for generic NAND gateas campa.red to DTAS 23 
5 Results for generic NOR gate as compared to DTAS 23 
6 Results for generic XOR gate as compared to DTAS 24 
7 Results for generic XNOR gateas compared to DTAS 24 
8 Results for Logic gates as compared to DTAS 24 
9 Results for Multiplexer as campa.red to DTAS 24 
10 Results for Comparator as compared to DTAS 25 
11 Results for Logic unit as compared to DTAS 25 
12 Results for AL U as campa.red to DTAS ... 28 
13 Results for generic AND gate as compared to LAST /TELE 29 
14 Results for M ultiplexer as campa.red to LAST /TELE . 29 
15 Results for Logic unit compared to LAST /TELE . . . 29 
16 Results for REGISTER as campa.red to LAST /TELE 29 
17 Results for adder compared to LAST /TELE . . . . . . 33 
vi 
1 Introduction 
Behavioral or High-Level Synthesis (HLS) maps the behavior of a design to a RT structure and 
a controller that execute the input behavior under user specified design constraints. A major 
task in HLS is design space exploration in terms of selecting and allocating a proper set of RT 
components. This task of design space exploration is guided by metrics such as the area and delay 
of the components. Extensive design space exploration and trade-off analysis between different 
design alternatives can only be achieved if fast and accurate area and delay estimators exist for 
evaluating selected components; such estimators are crucial to the success and acceptance of high-
level synthesis as a design methodology [GDWL92] [MiLD92]. 
High-Level Synthesis typically relies on a library of well-defined, parameterized RT component 
generators to simplify the mapping of behavioral variables and operators to physical components. 
These parameterized components are used as the building blocks for the tasks of allocation and 
binding. Each component is customized by parameterized attributes such as the required bit-width 
and functionality. Such a library of RT component generators provides a complete component 
set for HLS [Dutt88], and provides a path to physical design through logic and layout synthesis 
[Dutt91]. 
Since we need accurate estimates for effectively supporting the tasks of component selection and 
binding, we would ideally like to store the area-delay attributes for all the component instantiations 
so as to provide accurate figures for component selection and binding. However, the storage of 
ali components is infeasible since there is virtually an infinite number of possible components 
that can be generated by varying the parameter attributes. For example, consider the class of 
AL U component generators that can perform any subset of 26 arithmetic, logic and comparison 
functions. There are 226 possible combinations of the functions alone. The bit-widths of the two 
inputs to the AL U can also be parameterized. Furthermore, each AL U component ( with a fixed 
set of parameters) can ha ve multiple implementations ( e.g., ripple-carry, carry-lookahead). If we 
assume that the ALU's width varíes from 1to128, and that each ALU component has two possible 
implementations, we get 226 * 128 * 2 = l. 72 * 101º area-delay values for the class of AL U generators 
alone! Although it may be unrealistic to assume that ali possible values of the parameters can 
be attained, this number is indicative of the vast design space populated by different component 
instantiations. 
An obvious alternative to storing ali the designs and their metrics is a constructive approach: we 
can generate the component implementations on demand, and return the actual area-delay metrics 
of the constructed design. For example, DTAS[Kipp91] is a system that constructs component 
designs by decomposing a generic RT component into primitive celis and performing a mapping to 
a given RTL technology library [DuKi91]. However, such constructive tools have to search a large 
space of candidate designs and exhibit relatively long run-times ( on the order of few-to-several 
minutes), and hence cannot be directly integrated on-line with HLS tools for the task of rapid 
design space exploration. 
We propase a practical solution to this problem by developing a set of estimation functions that 
generate these area-delay metrics on-line. These estimation functions accept generator parameters 
such as input-width and list-of-functions and return area-delay metrics for the component specified 
by the parameters. 
The rest of the paper is organized as follows. Section 2 describes related work. Section 3 defines 
the problem of estimating the area and delay of generic component generators, given a component's 
1 
parameters and a few technology-specific design data points. Section 4 describes the estimation 
models we have developed. Section 5 describes the experiments performed to validate our models. 
vVe show that our models are simple, fast and provide fairly accurate results ( within 103 of actual 
values ). Furthermore, the estimators are integrated with an existing HLS system[LiGa88J and run 
on-line during the tasks of component selection and allocation. Section 6 concludes with a summary. 
2 Related Work 
The problem of area and delay estimation has been studied at several design levels and in several 
contexts. At the level of a complete datapath design, work has been done to predict the area-
time tradeoffs for a datapath given the area and delay values for the primitive modules used to 
construct the structural design[JaM188]. Component (module) selection techniques[Jain90] have 
been proposed to perform area-time analysis for a datapath, given a library of RT components 
with area and delay characteristics. 
At the logic design level, work has been done to predict the area and delay of an RT component, 
given its structural implementation as a netlist of logic cells or blocks [JOLS92]. Early approaches 
such as PLEST[KuPa89] analytically estimated the area of a component represented as a netlist 
of cells, and returned the area as a sum of the component 's functional and estimated wiring areas. 
LAST[ChKu91] and TELE[ChKu92] use a combination of analytical and constructive techniques to 
predict the area and delay for a netlist of cells, while accounting for wiring. Recent work in layout-
based estimators[WCGa91] [CWGa91] also provide area and delay values, given the structure of 
the modules ( components ); these approaches take into account technology factors such as layout 
architecture and wiring. 
Purely constructive approaches can be used to estímate the area-delay characteristics of RT 
components. One constructive method generates Boolean equations for the RT component and 
represents these in a canonical form to provide area-delay estimates of the component[BRSW87]. 
Another constructive method builds a tree of possible component implementations by decomposing 
the component into an interconnection of primitive logic blocks. This large design space can 
then be pruned and searched for candidate designs. Tyagi[Tyag90] uses an algebraic model for 
representing the design space and a set of area-delay-power functions to prune design search space. 
DTAS[Kipp91] also provides functional area and delay values for a component by pruning the 
design space with a performance filtering function. Although these constructive techniques yield 
good estimates, they are too slow for direct integration with high-level synthesis tools that require 
real-time (on-line) estimators. 
[WaCh90] propases a simple technology-independent model for predicting the delay of combina-
tional control (random) logic, given the Boolean equations for the logic (it <loes not <leal with area 
estimation). Their technique assumes a certain structuring of logic. Consequently, it cannot handle 
multiple design implementations - a common occurrence for RT datapath components. Although 
our approach is similar to [WaCh90], we provide both area and delay estimates at a level higher 
than logic equations. 
[Brew88] presents a few simple estimation functions for datapath components, derived from a 
general structure of the component's implementation and technology-specific values from a data-
book. However, these estimation models have not been tested against actual designs, and do not 
account for parameterized multi-function components, or for a range of component implementa-
2 
tions. 
In summary, previous approaches have dealt with estimation either at a higher level (i.e., a 
complete RT datapath design) or at lower level (i.e., logic equations or structural implementation of 
a particular component), but have not addressed the problem of rapid estimation of parameterized 
RT components. However, HLS tools can be effective only if they are supported by fast (on-line) 
estimates of area-delay metrics for RT-components to support component selection, scheduling 
and allocation decisions, as well as design tradeoffs between different component implementations. 
Furthermore, at the behavioral level, it is convenient to specify an RT-component using a set of 
parameters for the component 's functionality, bit width and other attributes. This convenience 
comes at cost: estimation becomes difficult, since the possible space of design implementations is 
infinitely large. To alleviate this problem, we propose a technique to rapidly estimate the area-delay 
metrics for such parameterized RT-datapath components used in HLS. 
3 Problem Definition 
vVe define the area-delay estimation problem in terms of a set of generic component generators, 
their possible parameters and the estimation functions. 
Let G be the set of component generators, P be the set of parameter names and D be the set 
of domains for the parameters. 
• G = { GdGi is a RT component generator.} 
• P = {PdPi is a parameter.} 
• D = { D ¡ 1 Di is a domain for parameter P¡.} 
An ordered set of parameters PG¡ is associated with each generator G¡. 
• PG¡ = { Pi1, P¡z, .. ., P¡n} such that P¡k is the kth parameter of generator G¡ and there are n 
parameters associated with G¡. 
Each component generator G¡ is a function that maps its parameter values PG¡ to the instantiation 
of a specific component. That is, the set of components covered by a generator G¡ is obtained by 
applying the functions G¡ on the cross product of all the domains associated with the parameters 
belonging to G¡. 
• C'¡ = G¡(Dil X D¡z ... X D¡n) and Dik is the domain associated with kth member of PG¡. 
Consider the ALU generator, i.e., let G¡ =ALU. This generator has three parameters: input-width, 
num-functions and function-list, i.e., 
• PG¡ = (input-width, num-functions, function-list) 
The set of components C'¡ covered by the AL U generator is given by the cross product of the domains 
of these parameters. For example, a 4-bit ALU component that can perform two functions: (ADD, 
SUB) is a member of C'¡. This component is specified as ALU( 4, 2, (ADD, SUB)). 
3 
Each component C'¡j may have several alternative hardware implementations. Let S'¡j be the 
set of implementations far the component C'¡j. 
• S'¡j = {SijklSijk is the kth implementation of component C'ij.} 
Each implementation Sijk has two metrics, area Aijk and delay Dijk associated with it. Thus 
we have a set of areas and delays corresponding to different implementations of a component. 
We develop area-delay models far each component generator G¡ based on the area and delay far 
a subset of components C'¡j covered by G¡. Let A¡j be the set of areas and D¡j be the set of delays 
far the subset of components under consideration. Based on the members of Á¡j, we pro pose the 
area model F Á¡j. Similarly, based on the members of D¡j, we pro pose the delay model F D¡j. These 
area-delay models F Áij and F Dij predict metrics far the components C¡k that are not members of 
Cij but that are derived from the generator G¡. 
Two distinct components C'¡j and Cik that are derived from the same generator G¡, vary in 
terms of their values of parameters PG¡j and PG¡k, respectively. The proposed models should be 
able to predict the area and delay across a range of values far these parameters. Furthermore, each 
component C'¡j has a set of possible implementations Sij, with multiple area ( Áij) and delay (Dij) 
values. For example, an ADDER component has multiple implementations and multiple area-delay 
values. Hence, we need a set of area-delay models FA¡ and F Di to handle multiple S¡j and multiple 
metrics (Aij and Dij far a component C'¡j ). It is obvious that the models F Á¡j and F Dij should 
be sorne function of the parameters PG¡ specifying the component C'¡j: 
• FA¡j =elijo+ 2:,k(aijk * f Áijk(PGijk)) 
• FD¡j =dijo+ 2:,k(dijk * f Dijk(PGijk)) 
The problem thus translates in to two steps: farmulation of the functions F A¡j and F D¡j, and 
determination of the constants and coe:fficients ( a's and d's ). In this paper, we briefly describe 
these steps and provide experimental results to support their validity. 
For the rest of the paper, we use the terms RT component and RT module interchangeably. We 
also use the term metrics to refer to the area and delay values far a component, and estimates to 
denote the estimated values far the area and delay metrics. 
4 Our Approach 
There are two ways by which area-delay metrics far RT components can be provided to a high-
level synthesis (HLS) system. The first method precomputes these metrics far a set of component 
implementations and stores them in a component database. The second method uses estimation 
models to calculate the metrics online. While the first method provides very accurate metrics at 
the cost of long run-times and a huge component database (we have virtually an infinite number 
of possible components), the second method compromises accuracy far run-time. 
We propase a combination of these two methods. For sorne components, we store area-delay 
metrics of actual design implementations in the component database. For other components, we 
use models to generate the metrics. In this section, we describe a method to model the area-delay 
4 
metrics and provide sorne insights on how to select components whose metrics are to be stored into 
component database. 
4.1 Area-Delay Models 
Our estimation technique uses a sample space of design points on which we perform a least-square 
regression fit of the formulated functions F Áij and F Dij. We believe that t his is a useful approach 
since designers often store the attributes of commonly occurring designs ( e.g., 4-, 8- and 16-bit 
adders). Sin ce a regression analysis using least-square approximation may not capture the intrica-
cies of certain component implementations, we have to pay attention to the appropriate selection 
of the sample data points. The following steps summarize our approach: 
Step 1 Generate some real design structures for a component and obtain sample data points far 
area and delay. 
Step 2 Study the structure of the design generated, and the variations of the area-delay metrics 
with respect to the parameters that define the component. 
Step 3 Formulate functions for estimating the area and delay of the generator. 
Step 4 Run least square approximation to calculate the consta~ts (aijo and dijo} and the coefficients 
of various terms (aijk and dijk) used for the functions modeling the metrics. 
Step 5 Test the estimation model. 
We test the accuracy of the results against a user-specified error bound. If our models do not 
satisfy the user-specified error bounds, we go through an iterative experimentation phase, where 
we repeat sorne of the Steps 1-5 above. We note that the error bound is often satisfied by the 
simple addition of linear and lag factors to the estimation functions, as suggested by the design's 
structure. In an extreme case when the error bound is not satisfied, we may have to generate more 
implementation data points and repeat ali of the steps to obtain new coefficients. However, in our 
experiments, we have observed convergence within one or two iterations for an error bound of 10%. 
4.2 Formulation of Estimation Models 
Recall tha.t the area-delay estimation models FA and F D far a generator G are functions of its pa-
rameters PG. One significant parameter far a generator is the component's bit-width, which clearly 
has a significant impact on the estimation models. Tyagi [Tyag90] has developed an information-
theoretic model to understand the relationship between a module's parameters ( e.g., bit-width) 
and its performance metrics. Based on the communication between u-bit slices of of a component, 
he classifies sorne combinational components into categories and provides a formulation of models 
for each category. These models describe the asymptotic behavior of area and delay with respect to 
majar parameters of the generators. Far example, both the area and delay of a ripple-carry adder 
vary linearly with respect to the biLwidth of the inputs. 
We use a similar formulation far the primary factors of the area-delay models that account 
for the majar contribution towards the performance metrics of a component. In addition to these 
5 
primary factors, we add sorne secondary factors that are decided by sorne rules of thumb and by 
the structure of the components. 
For example, consider the class of Multiplexer generators that has two parameters: num-
inputs( ni) and input-width(iw). The area per bit of a multiplexer grows linearly with ni. This 
constitutes the primary factor in the area model of multiplexer. A constant and a logarithmic 
term with ni form the secondary factors. The delay of a multiplexer has logarithmic behavior with 
respect to ni (primary factor), and we add a constant anda linear termas secondary factors. Since 
we know that the area of multiplexer is directly proportional to íw and that its delay is independent 
of iw, we get the following area-delay models for a parameterized multiplexer: 
Area = iw * ( a1 + a2 * log2 fníl + a3 *ni) 
It is interesting to note that at a first glance, certain components may not seem to have a 
variation with respect to the component's bit-width (e.g., bit-wise AND). However, when the 
components are laid out, we find that the routing effects begin to appear for larger bit-widths. We 
therefore need to add a function of the bit-width for these components also. 
4.3 Estimation Model Generation 
vVe use a least square approximation method on a set of sample design implementations, to de-
termine the coe:fficients of the formulated estimation equations. We therefore need to address the 
following questions: (1) Which of the infinitely many possible components for a generator should 
be considered for calculating the coe:fficients? (2) For each selected component, how do we actually 
obtain the actual area and delay values? 
The set of components that are to be used for calculating the coe:fficients should be fairly 
representative of ali the possible components for a generator. We attempt to select the set of 
components so as to avoid a bias towards any particular parameter or any particular subset of 
values for a parameter. For example, severa! components that are implemented as tree-based logic 
structures show distinct behaviors for mtm_inputs that are powers of two. Consequently, we often 
use the following values for the parameter num_ínputs: (2, 3, 4, 5, 6, 7, 8, 12, 15, 16, 32, 48, 64). 
The real area-delay values can be obtained by physicaliy laying out each component and mea-
suring the actual area and delay. Although this provides very accurate measures, it is a very time 
consuming process. Since our approach needs a substantial number of a component's area-delay 
values ( on the order of 15), we use existing tools to provide these metrics. 
DTAS [Kipp91] is one such tool that provides performance metrics. DTAS maps RT-level design 
components from the GENUS generic component library [Dutt88] to technology-specific library 
cells and macros. Given the specification of a component, DTAS generates a set of alternative 
designs corresponding to different design decompositions using the primitive building blocks. The 
area provided by the DTAS is the sum of the functional areas of the building blocks used in the 
design implementation. DTAS computes the delay values for ali critical paths through a design 
implementation (pin-to-pin, rising and falling). This calculation takes into account the fan-out of 
the designs. Thus, DTAS provides a good source of sample design points for our approximations, 
using functional area and delay values. 
6 
To incorporate wiring effects, we use LAST[ChKu91] and TELE[ChKu92], which are estimators 
that provide more accurate metrics by considering not only functional blocks but also the wiring 
contributions. Since LAST and TELE have been benchmarked against actual layouts produced by 
commercial tools as well as against custom-designed layouts, they provide a good source of sarnple 
design points for area-delay values that include routing information. 
4.4 Selection and management of components in the database 
As mentioned befare, it is infeasible to store attributes for all possible components, since there 
are virtually infinite number of possible component implementations. vVe need to select a subset 
of components that fits in our space-limited component database and yet enhances the overall 
performance by providing accurate metrics for sorne components. This database should include 
data for cornponents that are frequently used and components for which estimation models do not 
perform well. 
We propase the following guidelines to choose the subset of components to be stored in the 
database. 
• Store rnetrics for components that are characterized by para.meters with obvious values. For 
example, very often designs use components that have bit-width equal to sorne power of two 
( e.g. 8, 16, 64). 
• Store rnetrics for components that are frequently used. The component database can record 
the tally of various component references in the past. Components with high tally should 
have its attributes stored. 
• Prior design knowledge of a cornponent's circuitry could provide hints regarding the com-
ponents that have high probability of being referenced. For example, if the design in con-
sideration is 16-bit microprocessor, chances are high that at least one 16-bit register will be 
required for the design. Such cornponent attributes should be stored in the database. 
• Previously referenced cornponents can provide sorne insights on the nature of components 
that will be referenced in the future. If a synthesis tool has queried for a 16-bit AL U, chances 
are high that it will require a 16-bit register very soon. 
• While generating and testing the estimation models, we may discover sorne "problernatic" 
cornponents whose area-delay values do not fit well with models. These are another set of 
cornponents whose area-delay values should be stored in the database. 
• Also, if the database is running out of space and the attributes of sorne cornponents need to 
be deleted, then the least-recently-used components should be compromised. This is based 
on the mernory management principle that the least recently referenced components have the 
least chance of being referenced in future. 
4.5 An Example 
We illustrate the derivation of rnetric models for the Logic unit (LU) generator. A generic LU 
component can perform any subset of the 16 primitive logic functions. The pararneters associated 
7 
Function AB AB' A 'B A'B' 
ZERO o o o o 
ONE 1 1 1 1 
AND 1 o o o 
NAND o 1 1 1 
OR 1 1 l o 
NOR o o o 1 
XOR o 1 1 o 
XNOR 1 o o 1 
LID 1 1 o o 
RID 1 o 1 o 
LNOT o o 1 1 
RNOT o 1 o 1 
LINHI o o 1 o 
RINHI o 1 o o 
LIMPL 1 1 o 1 
RIMPL 1 o 1 1 
Table 1: Function table far a bit-slice of a Logic Unit(LU) 
with this generator are ínput-width and set-of-functions. Table 1 lists the 16 possible functions far 
a generic L U. 
Let us assume that A and B are the two inputs to the LU. Also, let bw be the bit-width of the 
two inputs A and B, nf be the number of functions, ft be the set of functions and num1 represent 
the sum of the number of l's in Table 1 corresponding to the functions far LU instance under 
consideration. 
We now walk through the method outlined in the previous section using this example. 
Step 1 Using DTAS ( or any other design generator) we compute the area and the delay far logic 
units with varying bw(bit-width) and ft(set of functions). As mentioned befare, we choose 
designs with bw that are both powers of two and values that lie between these powers of two. 
Also, these data points include various combinations of functions, both in terms of nf and ft. 
Step 2 vVe study the structure of sorne of the designs generated. Consider the structure of a 2-bit 
LU component that perfarms the fallowing functions: ONE, NAND and XOR, (i.e., bw = 2, 
nf = 3 and fl = (ONE,NAND,XOR)). The top level structure of this LU is shown in 
Figure l. This figure shows that the design is composed of two independent modules: FG_l2 
and FG-13 ( one far each bit), with each module sharing the three control lines SO, Sl and 
S2. Each of these modules has the structure shown in Figure 2. 
We analyze the structure shown in Figure 2 from the input to the output. First, we have five 
inverters, two far the inputs and three far the control lines. Next, we have nine 3-input AND 
gates. Each of these AND gates represents a '1' in Table 1 and has a number of inputs equal 
to nf Since numl = 9 (4 far ONE, 3 far NAND and 2 far XOR), we need nine AND gates. 
At the next stage, we have faur 3-input AND gates, one far each column. The inputs to these 
AND gates arrive from A and B and sorne ORed function of result of the nine 3-input AND 
gates from the previous level. Finally, the output of these faur AND gates is ORed to give 
the result. 
8 
FO F1 
Figure 1: Top level structure of the Logic U nit 
Figure 2: Structure per bit of Logic Unit 
9 
Also, we study the variation of area and delay as we change the parameters: bit-width and 
set-of-functions. We observe that the area of an n-bit LU is n times the area of a one-bit LU. 
We also note that the delay of an L U is independent of its bit-width. 
Step 3 Combining the study of previous steps, we observe the following: 
• Since the area is proportional to bw, we can make area predictions for an L U by multi-
plying the area for (bw = 1) with bw. 
• The AND gates at level 2 account for most of the LU area. This is proportional to numl. 
As observed before, each of these AND gates has number of inputs equal to nf. Thus, 
the area is proportional to the product of nf and numl. 
• The number of inverters is nf. 
• There are a few other gates that are independent of the LU parameters. 
• The critica! path goes through an inverter, an AND gate with number of inputs equal 
to the number of functions, an OR gate, a 3-input AND gate and :finally a 4-input OR 
gate. 
Based on these observations, we propose the following area and delay functions for the L U 
component generator: 
Area(bw, nf, numl) = bw * (a1 + a2 * nf + a3 * nf * numl) 
Delay(bw, nf, numl) = d1 + d2 * ln(nf) + d3 * ln(numl) 
Step 4 We run least square approximation routines on the area-delay values of sample design 
points to obtain the coefficients a¡ 8 and d¡8 • 
Step 5 vVe test our model against sorne real design points that are not stored in the component 
database. The initial model we derived from the functions mentioned above yields satisfactory 
results, and thus we use this model. 
5 Models for other components 
In this section, we discuss the estimation models developed for sorne other generic RT component 
generators using the method described in the previous section. 
5.1 With respect to DTAS 
5.1.1 Gates 
Generic gate components include AND, OR, NAND, NOR, XOR and XNOR logic gates. Each 
gate generator has two parameters: num-inputs(ni) and input-width( iw). The area of an n-bit gate 
is n times the area of a 1-bit gate. The gate delay is independent of iw. We develop the following 
area-delay models: 
A rea = iw * ( a1 + a2 * log2 f nil + a3 *ni) 
Dela y = d1 + d2 * log2 f nil + d3 *ni 
10 
5.1.2 Decoder 
04 05 
Figure 3: Design of decoder 
A decoder is characterized by a single parameter, input-width(iw ). The structure of a decoder 
with iw = 3 is shown in Figure 3. ro through 12 are the inputs and 00 through 07 are the outputs. 
Each output is fed by an AND gate, and each of these AND gates has number of inputs equal to 
iw. Number of outputs is given by 2iw. Also, the critical path consists of an inverter and one of 
the AND gates discussed above. Based on these observations, we develop the following models: 
5.1.3 Multiplexer 
Ar ea = a1 + a2 * iw + a3 * 2iw * log2 fiw l 
Delay = di + d2 * iw + d3 * 2iw 
A multiplexer has two parameters, num-inputs( ni) and input-width( iw). The structure of a 
multiplexer with four inputs( ni = 4), each input being two-bit wide( iw = 2), is shown in figure 4. 
ro and I1 constitutes the first input, 12 and I3 the second input, 14 and 15 the third input and I6 and 
17 the last input. SO through S3 are the four control lines. Similar to the design of LU discussed in 
section 4.5, the structure of a multiplexer is composed of two independent modules : MUX_l2 and 
11 
10 11 
Figure 4: Top level structure of Multiplexer 
Figure 5: Structure per bit of Multiplexer 
12 
MUX_l3( one for each bit), with each module sharing the four (SO through S3) control lines. Each 
of these modules has the structure shown in Figure 5. 
In Figure 5, we have four AND gates, one for each input. The number of inputs to each of these 
AND gates is one more than the number of control Unes. Thus, area of these AND gates will be 
proportional to the ni, which is equal to the number of control lines. The total area contributed by 
the AND gates is proportional ni2 . Similarly, area contributed by the inverters and the OR gate 
before the input is proportional to ni. The total area of the multiplexer is given by the product of 
iw and area per bit-width of multiplexer. 
A . ( . ·2) rea = iw * a1 + a2 * m. + a3 *ni 
The critica! path goes through an in verter, an AND gate and an OR gate, with number of 
inputs proportional to ni. The following delay model is developed: 
5.1.4 Comparator 
A comparator performs a subset of 6 functions: EQ, GT, LT, NEQ, LEQ and GEQ. Out of 
these 6 functions, the last three functions(NEQ, LEQ and GEQ) are inverse of the first three 
functions(EQ, GT and LT) respectively. Accordingly, the functions NEQ, LEQ and GEQ are 
implemented by adding an inverter to the output of the corresponding inverse functions. 
[Kipp91] uses the design described in [Mano88] to implement the comparator. Based on the 
design of [Mano88], the comparator has been divided in to three cases. For each of these three cases, 
separate area-delay models based on bit-width( iw) ha ve been developed. 
Case 1 Comparators that do not have GT or LT functions. 
Area = ao + a1 *iw 
Dela y = do +di * log2 liw J + d2 * iw 
Case II Comparator with only one of these two functions: GT and LT. 
Case III Comparator with both of the functions: GT and LT. 
Dela y = do + di * log2 fiw l + d2 * iw 
The flowchart in Figure 6 illustrates steps to calculate area and delay for a component. In the 
flowchart, area( Case x) and delay( Case x) refers to the area and delay respectively generated using 
the above model for Case x. 
13 
~ 
aq 
~ 
..., 
(1) 
O) 
'Tj 
-o ~ g. 
~ 
..., 
""'" 
""'" o 
(1) 
en 
""'" §" 
....... 
~ 
""'" ~ (1) 
~ 
..., 
(1) 
~ 
1 
p.. 
(1) 
-~ 
'< 
O' 
..., 
~ 
C"l 
o 
s 
'd 
~ 
..., 
~ 
""'" o 
..., 
no 
area = area + area(Case 111) 
delay = delay + delay(Case 111) 
2 
area = area + 2 
delay = delay + 0.46 
no 
area = area + 2n 
delay = delay + 0.26 
area = area + 1 
delay = delay + 0.46 
o 
no 
yes 
area =0 
delay =O 
area = area + area(Case 11) 
delay = delay + delay(Case 11) 
no 
area = area + 2n 
delay = delay + 0.26 
area = area + 1 
delay = delay + 0.46 
retum 
yes 
area = area + area(Case 1) 
delay = delay + delay(Case 1) 
no 
area = area + 1 
delay = delay + 0.46 
5.1.5 Arithmetic and Logic Unit(ALU) 
The generic ALU cornponent can perfarm any combination of 4 arithrnetic ( +, -, INC, DEC'), 6 
comparisons ( GT, LT, EQ, GE, LE, NEQ) and 16 logic functions. Any specific ALU component 
perfarrns a subset of these functions far sorne given bit-width. An AL U is typically built using one 
of two styles [Mano88]: 
Integrated style A basic adder with additional logic at its input and output. 
Segregated style Separate arithmetic and logic blocks rnuxed at the output far the result. 
Also, extra logic is added if the subset of functions required includes any cornparison function. 
Based on this irnplernentation, the AL U design has been categorized into faur cases: 
Case 1: AL U with arithmetic functions only. 
Case 11: ALU with arithmetic and cornparison functions only. 
Case 111: ALU with integrated style. 
Case IV: AL U with segregated style. 
We have developed area-delay models far Case I. We are working on models far the other three 
cases. These rnodels are functions of input-width(iw), number of functions(nf) and sorne of the 
terrns( CI, numl) used to describe the functionality of ALU in [Kipp91]. 
Area = area(Adder) + a0 +a¡* nf * (CI + iw * numl) 
5.2 With respect to LAST/TELE 
In this section we present the models developed far sorne of the component generators with respect 
to LAST /TELE. We followed the similar methodology described in the previous section, though 
sorne extra terrns are added to the rnodels to capture effects of wiring. Recall that LAST /TELE 
provides very accurate area/ delay values by considering the effects of shape functions and wiring. 
Instead of going through the steps discussed in the previous section, we present the models directly. 
5.2.1 Generic AND gate 
An AND gate is characterized by the two parameters: nurn-inputs (ni) and input-width( iw). 
Area = ao + a1 * log2( í nil) + a2 *ni + a3 * log3( Í iw l) + a4 * iw 
Delay =do+ di *log2(fnil) + d2 *ni+ d3 *log3(fiwl) + d4 *Íw 
15 
5.2.2 Mutiplexer 
Multiplexer is characterized by two parameters: num-inputs( ni) and input-width( iw). 
5.2.3 Logic unit(LU) 
The area-delay models for Logic unit generator is functions of input-width( iw), num-functions( nj) 
and number of 1 's( numl) in the table described in the example section. 
A rea = ao + ai * nf + a2 * nf * numl + a3 * log2(fiw l) + a4 * iw 
Dela y = do + di * iw * log2( fiw l) * numl + d2 * iw2 
5.2.4 Encoder 
The encoder generator is characterized by num-inputs(ni). 
Area = ao + ai *ni+ a2 *ni* log2( f nil) 
del ay = do+ di* ni 
5.2.5 Shift Register 
A shift-register is characterized by input-width(iw) anda set of functions from (LOAD, SHIFT-
LEFT and SHIFT-RIGHT). A one-bit register is typically designed with a flip-flop associated with 
sorne logic. The one-bit structure is then replicated for iw times. The following area-delay model 
is used for our experiments: 
Area = ao + ai * iw + a2 * iw * log2( f iw l) + a3 * iw2 
Dela y = do + di* iw + d2 * iw * log2( f·iw l) + d3 * iw2 
Different coefficients have been calculated for the three cases: registers with one function, 
registers with two functions and registers with all the three functions. 
5.2.6 Adder 
An adder is characterized by specifying the input-width( iw). We considered four design styles for 
the adder: Full ripple-carry adder, Full carry-lookahead adder, Carry-save adder and Medium adder 
with 4-bit CLA rippled through. For each of these we use the following models: 
Area = ao + ai * iw + a2 * iw * log2( f iw l) + a3 * iw2 
16 
Although we have same models for the four styles; value of coefficients vary based on the style. 
Note that the area and delay models for a generator with respect to LAST /TELE are very sim-
ilar. The area-delay metrics provided by DTAS is functional only a11d does not consider the effects 
of wiri11g. When wiring is taken into consideration, delay becomes depende11t 011 the wire-le11gth 
which in turn depends 011 the area of the compo11ent. For example, the delay model forj multiplexer 
with respect to LAST /TELE includes terms with i11put-width, even though functio11al delay is 
independent of input-width. This is because area of multiplexer is proportional to input-width and 
with increasing area wire-length increases leading to higher delay value for the component. 
6 Experiments and Results 
In this section, we describe the experiments performed to test our models. vVe compared the 
estimates generated by our model against the metrics derived from the design structures generated 
by DTAS [Kipp91], LAST [ChKu91] and TELE [ChKu92]. The area values provided by DTAS 
counts the number of equivalent two-input NAND gates used to implement the component. For 
a component 's delay ( measured in nanoseconds), DTAS returns the worst-case delay for all paths 
through the design. LAST and TELE provide area and delay values respectively, based on G DT 
3µ CMOS standard cell technology. 
Our experiments attempted to cover a wide range of possible component implementations, 
including combinational and sequential components. We <lid this by generating parameter values 
randomly for each component generator. The number and set of functions (for multi-function 
components) were also chosen randomly. For each such randomly chosen component, we ran our 
models, and compared the results with an actual design generated by above mentioned tools. 
We considered the following generators: AND, OR, NAND, NOR, XOR, XNOR, MUX, Com-
parator, LU, ADDER, ALU and SHIFT-REGISTER. Two parameters are needed for the Gates and 
MUX components: num-inputs and input-width, whereas ADDER requires only one parameter: 
input-width. The Comparator, Logic unit, Alu and Shift-register require not only the input-width, 
but also a set of functions. We first present the data in Section 5.1, and then provide an analysis 
in Section 5.2. In all the results presented, the percentage error is defined as : 
estimated - actual 
Percentage_enor =J / J *100 a et u a 
Besides the percentage error per test component, we present the Coefficient of Correlation(CC) 
between the metrics generated from our models and the actual metrics from DTAS and LAST /TELE. 
The coefficient of correlation between two variables X and Y is defined as follows : 
. . . (E(X *Y) - E(X) * E(Y)) 2 
Corrletion_Coef Jicient = (E(X2) _ E 2(X))(E(Y2) _ E2(Y)) 
A correlation_coefficient value of O.O signifies that the two variables under study are not correlated 
at all, whereas a value of 1.0 says that they are fully correlated. Thus, CC values clase to 1.0 are 
desired for our experiments as it signifies that the metrics from our models are very clase to the 
actual metrics. 
17 
6.1 Results 
6.1.1 With respect to DTAS 
Tables 2 through 12 summarize the results of our approach as compared to DTAS. For each data 
point, we also report the percentage error for area and delay and for each generator we report 
Correlation_Coeff. 
Tables 2 through 7 shows the results for the six logic gates. Table 8 lists the percentage errors 
for six gate generators. Both the average and the maximum percentage errors are shown. For other 
generators, the error per test point, average and maximum errors are reported. Figure 15 depicts 
the aggregate results for ali the components. 
Table 12 enumerates the percentage error for each of the designs for an AL U componen t. 
For each ALU component with fixed parameters, we generated two-to-three alternative design 
implementations. The average percentage error over ali the designs for each of ALU component is 
also listed. 
Figures 7 through 14 graphically shows the actual and calculated area and delay for each of 
the generators in discussion. In each of these plots, X-axis represents the different test points and 
Y-axis represents the area/delay values. Each point on the X-axis specify a particular value of the 
parameter associated with the generator. For example, in LU generator, a test point is specified by 
a 2-tuple (ni, nf), where ni represents the parameter num-inputs and nf represents the parameter 
num-functions for a LU component. 
As mentioned before, corresponding to each point of ALU, we have multiple implementations, 
each having its area-delay values. Figure 13 and 14 plots values for only two of the implementations 
per test points. 
6.1.2 With respect to LAST /TELE 
Tables 13 through 17 describe the results as compared to LAST and TELE. Table 13 lists the 
percentage error per test point for generic And gates, Table 14 for the Multiplexer generator, Table 
15 for Logic unit, Table 16 for the Shift-register and Table 17 for the ADDER generator. 
Similar to ALU, an ADDER component has four designs, each based on a style. These styles 
are: ripple-carry(RC), full carry lookahead(CLA), carry save(CSA) and Medium(MED) with 4-bit 
CLA blocks rippled. We report percentage errors for each implementation style. 
Figures 16 through 21 illustrates the actual and calculated area and delay for Shift-register and 
ADDER generators. For ADDER generator, we plot graph for each implementation. 
6.2 Analysis 
6.2.1 With respect to DTAS 
Tables 2 through 12 show that the maximum average error for all the generators in our study is 
8.20% for area and 6.58% for delay. Thus on the average, the data generated by our model is within 
±10% of the values generated by DTAS using the LSI library[LsiC87]. The maximum percentage 
18 
error over all generators is 19.873 far area and 22.713 far delay. However, very few data points 
reach these extreme values: our low averages far the errors reflect this fact. 
The above tables also list the correlation coefficients far area and clelay models corresponding 
to each generator. We observe that except far one case(the clelay model far NOR gate), CC values 
are very clase to 1.0, thus validating our approach. 
Figure 15 shows the aggregate error profiles far area ancl clelay. For area roughly one-thircl and 
far delay roughly half the data points exhibit an error of less than two percent. After this huge 
concentration in 0-2 percent range, the frequency of error tapers off rapidly as the error increases. 
For area, 77% percent of the test points ha ve error less than 1 O percent and 95% test points ha ve 
errors less than 16 percent. For delay, figures are 873 and 94 3 respectively. These results validate 
our hypothesis about the goodness of our estimators. 
Table 12 reports the errors associated with each implementation far each ALU test point. 
This table verifies our claim that our approach can handle multiple implementations of a given 
componen t. 
Figures 7 through 14 illustrate the fidelity of our estimates far different generators with respect 
to DTAS. We see that the estimated area and delay follow the actual values very closely. We 
observe that the absolute error values are larger for the data points with bigger metric values. This 
phenomenon results in a relatively unifarm percentage error over different test points. 
6.2.2 With respect to LAST/TELE 
Tables 13 through 17 show the estimation results with respect to LAST /TELE, which includes the 
functional and routing components. We observe that the average error far area is between 6-153. 
However, the corresponding figure for delay is between 9-193. This can be attributed to the effects 
of wiring delay which are sensitive to physical design characteristics such as placement and routing, 
and that are difficult to capture in general. The maximum percentage error over all generators is 
24.093 far area and 31.423 for delay. 
The correlation coefficients between the metrics from our models and the actual metrics from 
LAST /TELE are, in most cases, very clase to 1.0, the lowest being O. 7803 corresponding to the 
delay model far generic AND gate. 
Figure 22 shows the aggregate error profiles for area and delay. For area roughly two-fifth and 
far delay roughly one-third the data points, the error is less than two percent. For area, 81 % 
percent of the test points have error less than 1 O percent and 92% test points have errors less than 
16 percent. For delay, figures are 673 and 833 respectively. Once again, these results valiclate our 
hypothesis about the goodness of our estimators. 
Figures 16 through 21 depict the fidelity of our estimates far Shift-register and ADDER, with 
respect to LAST /TELE. We see that the estimated area and delay fallow the actual values very 
closely far different classes of generators ( combinational and sequential), as well as with respect to 
two different backend tools. 
We conclude our analysis with two important observations. First, our test points were generated 
randomly (i.e., we randomly selected the component generator parameter values ). Note that in a 
real design situation, components with certain design properties (i.e., parameters) will be invoked 
more often, and can be stored with precomputed metrics in the component database. This will 
19 
lower our average error. Second, our estimation models are integrated on-line with HLS tools. 
This is possible because of the simple estimation functions chosen, which use only a few additions, 
multiplications and logarithmic operations. We thus tradeoff accuracy of the metrics ( i.e., ± 103) 
far real-time evaluation of the estirnates. 
7 Summary 
In this paper, we presented an accurate on-line method far estimating the area and delay of RT com-
ponent generators used in high-level synthesis. Our approach has reduced the estimation problem 
from complex circuit analysis to a virtual table-lookup and thus has brought clown the estimation 
time from minutes to the arder of microseconds. Our approach can handle the area/ delay con-
tributed by functional blocks as well as the total area/ delay including the wiring. Furthermore, 
we have demonstrated the estimation technique on both combinational and sequential RT compo-
nents. The experiments show very good results, with aggregate errors in the range of ±10%. Our 
area-delay models are simple, fast and fairly accurate, and have been integrated with an existing 
high-level synthesis system [LiGa88] [RaGa91]. Although our experiments were based on area/delay 
values generated by previously benchmarked tools and not by actual layouts, we believe that the 
estimation approach we presented is general and that it can produce even better results if provided 
with more accurate sample design data points. 
8 Acknowledgements 
This research was supported in part by NSF grant #MIP9009239 and in part by SRC contract 
#91-DJ-146. We thank Prof. Fadi Kurdahi and Champaka Ramachandran far providing us with 
access to the LAST and TELE estimators and Dr. James Kipps far providing us with DTAS. 
We thank Michael P. Donohoe far help in developing logic equation generators far each GENUS 
component generator. We also thank Prof. Daniel Gajski far his helpful comments throughout the 
development of this work. 
References 
[Brew88] Forest D. Brewer, "Constraint Driven Behavioral Synthesis," PhD Dissertation, Univer-
sity of Illinois, Urbana-Champaign, May, 1988. 
[BRSW87] R. K. Brayton, R. Rudell, A. Sangiovanni-Vincentelli and A. R. Wang, "MIS : A 
multiple-level logic optimization system," IEEE Transaction on Computer-aided Design, 
pp1062-1081, November 1987. 
[ChKu91] F. J. Kurdahi and C. Ramachandran, "LAST: A Layout Area and Shape function es-
Timator far High Level Applications," Proc. of The European Conf. on Design Automa-
tion'91 pp351-355, February 1991. 
[ChKu92] C. Ramachandran and F. J. Kurdahi, "TELE: A Timing Evaluator using Layout Estima-
tion far High Level Applications," Proc. of The European Conf. on Design Automation'92 
March 1992. 
20 
[CWGa91] V. Chaiyakul, A. C-H. Wu and D. D. Gajski, "Timing Models for High-level Synthesis," 
Teehnieal Report 91-70, University of California at Jrvine, October 1991. 
[DuKi91] N. D. Dutt and J. R. Kipps, "Bridging High-Level Synthesis to RTL Technology Li-
braries," Proe. 28th Design Automation Conferenee, June 1991. 
[Dutt88J N. D. Dutt, "GENUS:A Generic Component Library for High Level Synthesis," Teehnieal 
Report 88-92, University of California at Irvine, 1988. 
[Dutt91] N. D. Dutt, "Generic Component Library Characterization for High Level Synthesis," 
VLSI Design '91: The Fourth CSI/IEEE International Symposium on VLS I Design, 
New Delhi, 1991. 
[GDWL92] D. Gajski, N. Dutt, A. Wu and S. Lin, "High-Level Synthesis: Introduction to Chip 
and System Design," Klmver Aeademie Publishers 1992. 
[Jain90] R. Jain, "MOSP: Module Selection for Pipelined Designs with Multi-cycled Operations," 
Proe. IEEE Int. Conf. on Computer-aided Design'90, pp212-215, November 1990. 
[JaM188] R. Jain, M. J. Mlinar and A. C. Parker, "Area-Time Model far Synthesis of Non-Pipelined 
Designs," Proe. IEEE Int. Conf. on Computer-aided Design'88, pp48-51, November 1988. 
[JOLS92] Q. Ji, Y. S. Oh, M. R. Lighter and F. Somenzi, "Technology Independent Estimation of 
Area in Logic Synthesis," Proe. Synthesis and Simulation 1vf eeting and ínter. Interehange 
{SASIMI92}, ppl 71-180, April 1992. 
[Kipp91] J. R. Kipps, "An Approach to Component Generation and Technology Adaption,'' Ph D 
Dissertation, University of California at Irvine, December 1991. 
[KuPa89] Fadi J. Kurdahi and A. C. Parker, "Techniques far Area Estimation of VLSI Layouts," 
IEEE Transaetions on Computer-aided Design, pp81-92, January 1989. 
[LiGa88] J. S. Lis and D. D. Gajski, "Synthesis from VHDL," Proe. IEEE Int. Conf. on Computer 
Design'88, pp.378-381, 1988. 
[LsiC87] "LSI, CMOS Macrocell Manual,'' LSI Logie, !ne., Milipitas, CA, 1987. 
[Mano88] M. M. Mano, "Computer Engineering Hardware Design,'' Prentiee-Hall, !ne. New Jersey, 
1988. 
[MiLD92] P. Michel, U. Lauther and P. Duzy (Editors) "The Synthesis Approach to Digital System 
Design,'' Kluwer Aeademie Publishe·rs 1992. 
[RaGa91] L. Ramachandran and D. D. Gajski, "An Algorithm far Component Selection in Per-
formance Optimized Scheduli ng," IEEE International Conferenee on Computer-Aided 
Design, pp92-95, November 1991. 
[Tyag90] Akhilesh Tyagi, "An Algebraic Model far Design Space with Applications to Function 
Module Generation,'' Proe. of The European Conf. on Design Automation'90, pp114-118, 
March 1990. 
[WaCh90] D. E. Wallace and M. S. Chandrasekhar, "High-level Delay Estimation for Technology-
Independent Logic Equations," Proe. of IEEE Int. Conf. on Computer-aided Design'90 
pp118-191, November 1990. 
21 
[WCGa91] A. C-H. Wu, V. Chaiyakul and D. D. Gajski, "Layout-Area Models far High-Level 
Synthesis," Proc. of IEEE Int. Conf. on Computer-aided Design'91 pp:34-37, November 
1991. 
[Wolf89] Wayne H. Wolf, "How to Build a Hardware Description and Measurement System on 
an Object-Oriented Programming Language," IEEE Transactions on Computer-Aided 
Design'89 pp288-301, March 1989. 
22 
Parameters Percentage error 
Num-inputs Bit-width A rea Dela y 
7 24 5.06 0.27 
Parameters Percentage error 
Num-inputs Bit-width A rea De/ay 
9 44 9.45 3.12 6 29 3.11 21.88 
5 40 19.35 4.25 12 43 2.99 1.33 
9 16 9.45 3.12 14 53 9.93 9.43 
7 12 5.06 0.27 10 1 2.77 0.66 
10 11 19.87 2.65 6 13 3.11 21.88 
6 47 1.61 4.96 12 27 2.99 1.33 
14 21 1.56 4.56 Average error 4.15 9.42 
Average error 8.92 2.32 
Maximum error 19.87 4.96 
Maximum error 9.93 21.88 
Correlation Coefficient 0.9720 0.9735 
Correlation Coefficient 0.9186 0.9943 
Table 4: Results for generic NAND gateas com-
Table 2: Results for generic AND gate as com- pared to DTAS 
pared to DTAS labelnand 
Parameters Percentage error 
Num-inputs Bit-width A rea De/ay Parameters Percentage error 
14 5 5.94 3.22 Num-inputs Bit-width A rea De/ay 
10 17 3.22 0.50 7 8 6.32 14.86 
10 1 3.22 0.50 14 37 4.86 10.48 
11 38 5.54 0.24 10 49 13.13 13.32 
9 28 13.92 0.77 6 61 10.75 10.36 
7 2 6.99 1.34 12 11 8.31 8.14 
5 24 3.05 3.38 
Average error 5.98 1.42 
Average error 8.68 11.43 
Maximum error 13.13 14.86 
Maximum error 13.92 3.38 Correlation Coefficient 0.9922 0.5214 
Correlation Coefficient 0.9535 0.9899 
Table 5: Results for generic NOR gate as com-
Table 3: Results for generic OR gate as com- pared to DTAS 
pared to DTAS 
23 
Parameters Percentage error 
Num-inputs Bit-width A rea De/ay 
3 30 o.o o.o 
20 1 o.o o.o 
15 .58 o.o o.o 
13 16 o.o o.o 
11 22 o.o o.o 
7 2 o.o o.o 
5 24 o.o o.o 
3 30 o.o o.o 
2 57 o.o o.o 
16 63 o.o o.o 
Average error O.O o.o 
Maximum error O.O o.o 
Correlation Coefficient 1.0000 1.0000 
Table 6: Results far generic XOR gate as com-
pared to DTAS 
Parameters Percentage error 
Num-ínputs Bit-wídth A rea De/ay 
14 21 o.o o.o 
4 16 o.o o.o 
10 33 o.o o.o 
16 15 o.o o.o 
6 45 o.o o.o 
12 59 o.o o.o 
5 40 o.o o.o 
3 46 o.o o.o 
36 1 o.o o.o 
13 32 o.o o.o 
Average error o.o o.o 
Maximum error o.o o.o 
Correlation Coefficeint 1.0000 1.0000 
Percentage error 
Cates A rea Dela y 
Avg it1lax Avg Max 
AND 7.14 19.35 2.32 4.96 
OR 4.19 13.92 0.99 3.38 
NAND 2.49 9.93 5.65 21.88 
NOR 4.34 13.13 5.72 14.86 
XOR 0.00 0.00 0.00 0.00 
XNOR 0.00 0.00 0.00 0.00 
Table 8: Results far Logic gates as compared to 
DTAS 
Parameters Percentage error 
Num-ínputs Bit-wídth A rea De/ay 
7 2 9.65 2.96 
21 8 10.11 1.70 
19 30 17.97 0.03 
18 25 13.70 0.82 
23 25 2.12 3.32 
24 25 1.22 2.97 
20 5 14.11 0.87 
14 5 5.13 0.51 
3 5 8.46 22.71 
12 27 1.51 0.12 
9 27 15.83 1.77 
2 27 4.22 4.80 
8 7 5.18 9.21 
21 7 10.11 1.70 
30 7 3.74 6.17 
Average error 8.20 3.98 
Maximum error 17.97 22.71 
Correlation Coefficient 0.9844 0.9878 
Table 7: Results far generic XNOR gateas com- Table 9: Results far Multiplexer as compared to 
pared to DTAS DTAS 
24 
Parameters 
Bit-wídth #Functíons 
7 6 
11 4 
10 1 
11 2 
15 3 
14 3 
10 .5 
5 4 
6 3 
A vera.ge error 
Maximum error 
Correlation Coeff 
Percentage error 
A rea Dela y 
7.3 0.6 
3.5 0.9 
2.3 .5.1 
4.8 2.4 
6.2 0.1 
4.6 8 .. 5 
3.6 0.5 
10.2 4.9 
7.9 3.6 
5.6 2.96 
10.2 8.5 
0.9974 0.9890 
LIS 
a: 
< 
.... - - .... Actual 
1000.0 o---------c Estimated 
TEST POINTS(iw) 
Table 10: Results far Comparator as campa.red Figure 7: Comparison of actual and calculated 
to DTAS a.rea far Multiplexer wrt DTAS 
Parameters 
Bít-wídth #Functíons 
6 4 
2 8 
20 2 
6 12 
30 13 
26 7 
19 12 
12 14 
16 16 
A vera.ge error 
Maximum error 
Correlation Coefficient 
Percentage error 
A rea Dela y 
0.01 2.96 
0.65 1.99 
17.31 13.99 
7.48 5.14 
6.93 4.39 
1.48 4.90 
7.73 3.64 
4.69 3.70 
1.64 3.03 
5.33 4.86 
17.31 13.99 
0.9935 0.9870 
14.0 
--·····• Actual 
12.0 G----------8 Estlmated 
10.0 
~ a.o 
~ 
6.0 
4.0 
2.0 
TEST POINTS(nl.lw) 
,··· 
Table 11: Results far Logic unit as campa.red to Figure 8: Comparison of actual and calculated 
DTAS dela.y far Multiplexer wrt DTAS 
25 
TEST POINTS(lw,nf) 
Figure 9: Comparison of actual and calculated 
area far Comparator wrt DTAS 
:s 
w 
Cl 
10.0 
5.0 
o.o 
\ \ ........ 
..... ···· 
•·········•Actual 
D---------El Estlmated 
(7,6) (f1,"4) (10.1) (1,2) (11,2) j16,31 (U,:3J (10,!S) (5,4) 
TEST POINTS(lw,nf) 
Figure 10: Comparison of actual and calculated 
delay far Comparator wrt DTAS 
26 
10000.0 
5000.0 
···• Actual 
-- Estlmated 
o.o L~==;r;:;:::::;.,,~.,~. =,,,:. .•~,=,,!-o,.~,=,.."'°'.,~, =,~:..~~,,,"'",,,~,=,,,"'°.•=,,=,,.+..,= . ~,,+-,,,=,,~J 
TEST POINTS(lw,nf) 
Figure 11: Comparison of actual and calculated 
area far L U wrt DTAS 
a.o 
:s 6.0 w 
CJ 
4.0 
2.0 
... ·· 
~· 
/ 
/ 
' 
/ 
_,,, .. ·· 
.. ·······/ 
...• · 
.•... ···· 
.... ·· 
···-···•Actual 
-- Eslimated 
i20.2) (6,.(J {31,l!ll (16,Sl (26,7) !2,6) (19,12) po,13) (6,12J (12,14J !16,HI) 
TEST POINTS(lw,nf) 
Figure 12: Comparison of actual and calculated 
delay far L U wrt DTAS 
10000.0 
8000.0 
6000.0 
4000.0 
2000.0 
o.o 
···• Actual(design1) 
..... - - Actual(design2) 
<>---------<> Estimated(deslgn1) 
6----- - --6 Estlmated(deslgn2) 
(32.9) (7,7) (2.12) {32.7} {31.10) (24,5} (28,6) (30,5) (17,8) (4, 10) 
TEST POINTS(lw,nf) 
Figure 13: Comparison of actual and calculated area for AL U wrt DTAS 
60.0 
50.0 
40.0 
30.0 
20.0 
10.0 
•····-- Actual(des1gn 1) 
-- - • Aclual(design2:) 
o-------o Estimated(das1gn1) 
~ -A Estlmated(design2) 
~--····., 
/ ........ 
....... 
(32,9) (7,7) (2,12) (32,7) (31,10) (24,5) (20,6) (30,5) (17,0) (.4,10) 
TEST POINTS(lw, nf) 
Figure 14: Comparison of actual and calculated delay for ALU wrt DTAS 
27 
Parameters Percentage error 
Bit-width #Functions Designl Design2 Design.'"i 
A rea De/ay A rea De/ay A rea De/ay 
32 9 14.65 4.34 14. 18 0.54 14.16 1.17 
7 7 10.93 8.34 6.38 13.73 6.30 16.00 
2 12 5.45 1.52 5.43 1.74 
32 7 13.64 6.14 12.79 8.70 12.75 8.27 
31 10 5.40 1.90 5.25 0.46 5.24 1.53 
24 5 7.71 3.45 6.87 7.41 
28 6 4.69 5.55 4.33 9.66 
30 5 13.22 4.91 11.96 6.41 13.10 8.06 
17 8 1.24 7.12 1.18 16.39 
4 10 4.79 9.66 4.71 10.44 4.69 10.49 
Average error 
Maximum error 
Correlation Coefficient 
Table 12: Results for AL U as compared to DTAS 
~ 20 
c.> 
e 
G) 15 
:::s 
O" 
! 10 u.. 
5 
,_ 
LO 
• Area 
g Delay 
1 1111 1 
~, ~ ~ 00 o ~ ~ ~ 00 1 1 1 ,..- ,..- ,..- ..- ,_ 
o ~ ~ ~ ~ ó ~ J ~ 
Percentage error 
o 
'!" 1 
00 o 
,..- ~ 
Figure 15: Aggregate Error profile wrt DTAS 
28 
Average 
A rea De/ay 
14.48 2.02 
7.48 12.65 
5.44 1.63 
13.06 7.70 
5.30 1.29 
7.30 5.44 
4.46 7.00 
12.32 6.13 
1.21 11.76 
4.73 10.20 
7.58 6.58 
14.65 16.39 
0.9824 0.9938 
Parameters Percentage error 
Num-inputs Bit-width A rea Dela y 
Parameters Percentage error 
Bit-width #Functions A rea De/ay 
10 11 5.88 31.42 2 8 12.13 25.50 
7 12 4.86 26.12 20 2 19.65 17.23 
9 16 5.37 26.14 6 12 2.67 11.57 
4 1 11.19 10.17 30 13 24.09 2.06 
14 21 23.17 6.47 
7 24 1.70 20.12 
31 5 22.59 12.83 
26 7 20.71 20.20 
19 12 17.25 12.53 
4 35 0.07 11.23 16 5 5.17 1.13 
9 44 0.11 13.38 12 14 4.34 4.44 
6 47 8.77 15.59 Average error 14.29 11.94 
Average error 8.31 18.68 
Maximum error 23.17 31.42 
Maximum error 24.09 25.50 
Correlation Coefficient 0.9879 0.9555 
Correlation Coefficient 0.9620 0.7803 
Table 15: Results for Logic unit compared to 
Table 1:3: Results for generic AND gate as com- LAST /TELE 
pared to LAST /TELE 
Parameters Percentage error 
Parameters Percentage error 
Bit-width #Functions A rea De/ay 
Num-inputs Bit-width A rea Dela y 39 1 1.92 7.98 
25 2 1.03 7.36 60 2 0.00 2.64 
2 7 7.20 2.18 30 1 1.31 4.80 
3 5 15.98 17.43 48 2 0.21 8.62 
5 14 2.47 17.33 
5 20 0.84 8.15 
50 1 3.72 3.42 
47 1 0.07 4.36 
36 3 3.60 14.43 
7 21 1.69 6.49 24 2 0.21 12.70 
7 30 16.88 2.91 20 2 1.33 13.82 
7 8 2.16 9.65 3 2 13.70 16.30 
Average error 6.03 8.94 Average error 2.61 8.91 
Maximum error 16.88 17.43 Maximum error 13.70 16.30 
Correlation Coeffcient 0.9835 0.8085 Correlation Coeffcient 0.9987 0.9941 
Table 14: Results for Multiplexer as compared Table 16: Results for REGISTER as compared 
toLAST/TELE toLAST/TELE 
29 
<C 
w 
= <C 
4000000.0 ~----~-~-~--------------~--
300CX>OO O 
200CX>OO.O 
100CX>OO.O 
... 
•·· ··•Actual 
~Estimated 
O.O L__-(3~9'-.1~) -(6---'0-.2~) -(3---'0-, 1-) -(4-'0.-2)-~(5-'0.-1)-(~47_L,-1 )-(-36.L,3_)_(-24L,2_) -(2---'0'-.2-) -(---'3,-2) _ _J 
TEST POINTS(iw,nf) 
Figure 16: Cornparison of actual and calculated area far REGISTER wrt LAST /TELE 
~ 
LLJ 
Cl 
400.0 
.... 
300.0 
200.0 
100.0 
.. 
.... 
•······• Ac1ual 
G-------EJ Estlma1ed 
...... _ 
\. 
o.o c___(-oo-'-.-1)--(e-'o.-2-) -,3~0'-. 1-) -,-•• .L.2-,-,-.-'-o.-1 ,--,,-',_-1-) -,.---'.'-•• -) -,,-,L,,-,-,-,o.L.2_) __ <•-'-.2-)--
TEST POINTS(lw,nf) 
Figure 17: Cornparison of actual and calculated delay far REGISTER wrt LAST/TELE 
30 
8000000.0 .--~-~~-~-~--~-~--~-~--~-~-~ 
6000000.0 
4000000.0 
2000000.0 
'11 
" 1 \ 
' \ 
' \ 1 \ 
1 \ 
1 \ 
' \ 
' \ 
' \ 
' \ 1 \ 
1 
1 
1 
1 
1 
1 
' 
' 1 
' 1 
' 
' 
' 
' 1 
' 
• - eAclual(RC) 
G---E) Calculaled(RC) 
- - • Actual(CU\} 
m---m Esllmated(GLA) 
O.O '--~2L1 ----=6L1 -~46~--",,~-2='5~-4,Cg~-4'=7~~2LQ--Q'-----c'7~--' 
TEST POINTS(lw) 
Figure 18: Comparison of actual and calculated area far ADDER wrt LAST /TELE 
TEST POINTS(lw) 
Figure 19: Comparison of actual and calculated area far ADDER wrt LAST /TELE 
31 
:3 
LU 
o 
1000.0 ~------~--~-~--~------------
800.0 
600.0 
400.0 
200.0 
... ~ • Actual(RC) 
G------E.> Calculated(RC) 
- - • Actual(CLA) 
m----m Eslimated(CLA) 
o.o L--_....L---'---'----'--~--~--'-----'---..__ _ __,_ _ __, 
21 51 45 11 25 49 4 7 29 g 7 
TEST POINTS(lw) 
Figure 20: Comparison of actual and calculated delay far ADDER wrt LAST /TELE 
600.0 
500.0 
400.0 
>-:s 300,0 
UJ 
o 
200.0 
100.0 
0
·
0 '---~2"'1--s='1--~,.'-s--1L1--2c"s--4c"9--4-"7--~='----9..___-éo7----' 
TEST POINTS(lw) 
Figure 21: Comparison of actual and calculated delay far ADDER wrt LAST/TELE 
32 
Parameters Percentage error 
Bit-width Ripple Carry Carry Lookahead Medium Carry S'ave 
A rea De/ay A rea De/ay A rea De/ay A rea De/ay 
21 2.11 1.66 15.67 4.62 1.16 7 .16 3.62 4.57 
51 2.48 1.47 13.47 2.82 0.78 7.52 1.21 4.09 
45 2.05 0.87 2.80 0.23 2.43 6.67 2.13 1.79 
11 2.00 0.65 2.03 6.87 2.32 15.61 6.26 5.85 
25 0.75 0.47 3.69 2.93 1.52 4.87 1.29 1.27 
49 2.37 0.54 1.58 3.62 2.70 5.54 0.72 3.24 
47 1.99 0.86 1.30 2.55 0.12 9.67 0.57 0.68 
29 0.36 0.50 5.27 1.07 1.87 1.07 2.37 1.63 
9 1.12 0.94 7.50 4.03 4.05 13.15 11.17 14.23 
7 3.00 0.97 10.19 4.16 7.49 16.12 2.48 7.17 
Average error 
Maximum error 
Correlation Coefficient 
Table 17: Results for adder compared to LAST /TELE 
~ 20 
e 
Q) 15 
::s 
C" 
! 10 u.. 
5 
• Area 
m Delay 
Percentage error 
Figure 22: Aggregate Error profile wrt LAST /TELE 
33 
Average 
A rea De/ay 
5.64 4.50 
4.48 3.97 
2.35 2.39 
3.15 7.24 
1.81 2.38 
1.84 3.23 
0.99 3.44 
2.47 1.07 
5.96 8.09 
5.79 7.10 
3.45 4.34 
15.67 16.12 
0.998 0.9937 

