University of South Florida

Digital Commons @ University of South Florida
Computer Science and Engineering Faculty
Publications

Computer Science and Engineering

1998

Architectural Power Estimation Based on Behavior Level Profiling
Srinivas Katkoori
University of South Florida, katkoori@usf.edu

Ranga Vemuri
University of Cincinnati

Follow this and additional works at: https://digitalcommons.usf.edu/esb_facpub

Scholar Commons Citation
Katkoori, Srinivas and Vemuri, Ranga, "Architectural Power Estimation Based on Behavior Level Profiling"
(1998). Computer Science and Engineering Faculty Publications. 121.
https://digitalcommons.usf.edu/esb_facpub/121

This Article is brought to you for free and open access by the Computer Science and Engineering at Digital
Commons @ University of South Florida. It has been accepted for inclusion in Computer Science and Engineering
Faculty Publications by an authorized administrator of Digital Commons @ University of South Florida. For more
information, please contact scholarcommons@usf.edu.

(C) 1998 OPA (Overseas Publishers Association)
Amsterdam B.V. Published under license
under the Gordon and Breach Science
Publishers imprint.
Printed in India.

VLSI DESIGN
1998, Vol. 7, No. 3, pp. 255-270
Reprints available directly from the publisher
Photocopying permitted by license only

Architectural Power Estimation Based
on Behavior Level Profiling
SRINIVAS KATKOORI a’t and RANGA VEMURI b’*

of South Florida, Department of Computer Science & Engineering, 4202 E. Fowler Avenue,
ENB 118, Tampa FL 33620-5399;
bLaboratory for Digital Design Environments, Department of Electrical and Computer Engineering,
813 Rhodes Hall, Mail Location 30, University of Cincinnati, Cincinnati, Ohio 45221-0030

University

High level synthesis is the process of.generating register transfer (RT) level designs from
behavioral specifications. High level synthesis systems have traditionally taken into
account such constraints as area, clock period and throughput time. Many high level
synthesis systems [1] permit generation of many alternative RT level designs meeting
these constraints in a relatively short time. If it is possible to accurately estimate the
power consumption of RT level designs, then a low power design from among these
alternatives can be selected.
In this paper, we present an accurate power estimation technique for register transfer
level designs generated by high level synthesis systems. The technique has four main
aspects: (1) Each RT level component used in high level synthesis is characterized for
average switched capacitance per input vector. This data is stored in the RT level
component library. (2) Using user-specified stimuli, the given behavioral description is
simulated and event activities of various operators and carriers are measured. Then, the
behavioral specification is submitted to the synthesis system and a number of alternative
RTL designs meeting speed, space and throughput rate constraints are generated. (3)
Event activity of each component in an RT level design is estimated using the event
activities measured at the time of behavior level profiling and the structure of the RTL
design itself. (4) The event activities so obtained are then used to modulate the average
switched capacitances of the respective RT level components to obtain an estimate the
total switched capacitance of each component.
Detailed power estimation procedures for the three different parts of RTL designs,
namely, data path, controller and interconnect are presented. Experimental results
obtained from a variety of designs show that the power estimates are within 3%- 10%
of the actual power measured by simulating the transistor level designs extracted from
mask layouts.
Keywords: High level synthesis, power estimation, behavioral profiling, register transfer level
designs, low power

*Corresponding author.

tThis work was performed as part of the doctoral dissertation, when the author was at University of Cincinnati.
255

256

S. KATKOORI AND R. VEMURI

1. INTRODUCTION

Due to the increasing demand for portable
applications and the rapidly growing complexity,
power consumption has become one of the main
issues in the realization of VLSI chips. There have
been major efforts [2] to reduce the power
consumption at all levels of abstraction in the
design flow. Accurate power estimation techniques
are the key to the success of these efforts. Although
accurate power estimation is possible at the lower
levels of abstraction, it is very time consuming.
Hence, recently focus has shifted to the higher
levels of abstraction including register transfer
(RT) level and above [3]. In this paper, we present
a power estimation technique for automatically
synthesized RT level designs. This technique is
based on behavior level profiling.
A high level synthesis system accepts a behavioral specification written in a hardware description language such as VHDL, a module library,
and design constraints such as the area and delay
constraints. The module library consists of RT
level modules such as adders, multipliers, registers
and multiplexors. The output of the synthesis
system is a RT level design satisfying the user
specified constraints. The synthesis time is usually
quite small compared to logic synthesis or layout
synthesis. Hence, it is possible to synthesize many
constraint-satisfying RT level designs in a relatively short time.
RT level designs are composed of two interacting parts: datapath and controller. The datapath
consists of execution units such as adders and
multipliers, storage units such as registers and
RAMs, and interconnect units such as multiplexors and buses. Since the structure of the
design is known completely accurate power
estimation is feasible. In addition, since the
modules are at a sufficiently high level of abstraction such estimation should be time efficient. At
the higher levels of abstraction such as the
behavioral level, accurate power estimation is
difficult due to the lack of sufficient implementation detail. On the other hand, at lower levels of

abstraction, such as logic and layout levels, even
though sufficient implementation detail is available, estimation time is discouraging. Hence, we
are motivated to estimate power at the RT level of
abstraction. For a given RT level design and for a
given set of input vectors, we estimate the total
capacitance switched in the design. We use
"power" and "switched capacitance" synonymously. Our estimation technique is set in the
context of a high level synthesis system known as
the Profile-Driven Synthesis System (PDSS).
Our power estimation procedure requires the
following inputs: (1) A module library characterized for the average intrinsic switched capacitance
(ISC) per input vector. (2) Profile data for various
carriers and operators in the data flow graph of the
behavioral specification. This data is obtained by
simulating the behavioral specification using userspecified stimuli. (3) A RT level design generated
by the synthesis system. (4) Binding information of
the operators and carriers in the data flow graph to
the module instances in the RT level design.
High level synthesis process introduces certain
RT level module instances such as temporary
registers and multiplexors for which the profile
data is not known since these modules have no
direct correspondence with the operators and
carriers at the behavior level. Profile data for such
data path units is derived using the profile data at
their inputs which in turn is obtained from the
profile data measured at the behavior level. The
switched capacitance for each module instance is
estimated as the product of its profile data and its
intrinsic switched capacitance obtained from the
module library. The total switched capacitance in
the datapath is the sum of estimated switched
capacitances over all instances.
The switched capacitance estimation for the
controller, assumed to be implemented as a PLA,
is as follows: A parameterized PLA characterization table for average switched capacitance per
clock cycle is obtained as explained in Section 5.
Given the controller size, the switched capacitance
for the controller is estimated by determining the
closest point in the PLA table.

ARCHITECTURAL POWER ESTIMATION

The power estimated for the entire design is the
sum of the estimated switched capacitances of the
datapath and the controller. Experimental results
show that the power estimated in the RT level
datapaths and controllers is within 15% of the
actual power measured at the layout level.
Section 2 presents a brief survey of the power
estimation techniques at architectural and other
levels of abstraction. Section 3 discusses various
issues involved in power estimation. Section 4
discusses the concept of behavioral profiling.
Section 5 discusses the module library characterization and the PLA characterization for the
average switched capacitance per unit vector.
Section 6 discusses the power estimation technique. Section 7 presents the results obtained for
several examples. Section 8 discusses the results

and presents concluding remarks.

2. PREVIOUS WORK

Powell et al. [4] suggested Power Factor Approximation (PFA) method for characterizing each
module in a module library consisting of functional blocks. The method provides different gate
equivalent models for blocks such as multipliers,
adders, etc. Each functional block is associated
with a PFA proportionality constant and a
hardware complexity constant. The PFA constant
captures the intrinsic internal activity of the
module. Purely random inputs are applied when
deriving the PFA constant. The power dissipation
in a chip is the sum of the power dissipation in all
blocks of the chip. The power contributed by a
block in the chip is simply the product of the above
two constants and the block’s activity frequency.
The activity frequency of a functional block is the
frequency at which the function is performed. In
Powell et al. [5] present an algorithm level power
dissipation model for a class of DSP algorithms
known as MA-based (Multiply-Add) DSP Algorithms. The major sources of power dissipation in
MA-DSP systems were identified to be memory
operations, computations and I/O operations.

257

Impact of the number of available processing
elements, complexity of processing elements,
memory organization and type of arithmetic on
power dissipation was discussed. This model
relates power dissipation to high level algorithmic
and architectural parameters.
Chandrakasan et al. [6, 7] described a high-level
synthesis, system, HYPER-LP, which uses a
variety of architectural and computational transformations to minimize power consumption in
application-specific datapath-intensive CMOS circuits.
Landman et al. [8] presented a methodology for
low-power design-space exploration at the architectural level. Black-box power models for the
architectural-level components were generated [9]
and used to estimate power while preserving the
accuracy of the gate or circuit level estimation. The
power analysis tool was set in the context of
HYPER [10], a high level synthesis system. The
key differences between our approach and Landman’s approach are (1) our synthesis system,
known as PDSS (Profile Driven Synthesis System)
[11], is targeted towards control-dominated ASIC
applications. The behavioral specifications can
contain complex control constructs such nested
loops, conditional and subprograms. On the other
hand, HYPER primarily targets mostly straightline DSP-style specifications. (2) Our approach is
based solely on the behavioral profiling. Landman’s estimation is based on behavioral profiling
or RT level profiling. For large designs, with large
set of inputs, the latter approach is time consuming and hence design space exploration is difficult.
(3) Our characterization of the module library is
based on purely random inputs, that is, Uniform
White Noise (UWN) model. Landman, on the
other hand, proposed DBT (Dual Bit Type) model
to take into account the input activity. Our power
dissipation model based on UWN model is simpler
compared to Landman’s and yet yields reasonably
accurate estimates.
Renu et al. [12] proposed a behavior level
power estimation technique based on a combination of analytical and stochastic methods. Based

258

S. KATKOORI AND R. VEMURI

on this, a design space exploration tool is
presented which is used to examine the effect of
different design steps such as transformations and

algorithms. These techniques have also been
implemented in HYPER synthesis environment
[10].
Anand et al. [14] present a behavioral synthesis
system known as Genesis, for synthesizing low
power datapath intensive CMOS circuits. During
the allocation phase, (1) the physical capacitance is
reduced by minimizing the number of functional
modules, registers and multiplexors; and (2) the
transition activity for a given module is reduced by
selecting a proper sequence of operations for that
module. The controller is optimized so as to
generate control signals which will reduce the
transition activity in the datapath. This is achieved
by introducing don’t-cares in the state table of the
controller. If a datapath module is idle for a
particular cycle, then the control signal driving
that module is assigned a don’t-care, thus avoiding
unnecessary clocking of the module. In Anand et
al. [15] present a simulation-based method to
measure intra- and inter-iteration effects of hardware sharing on switched capacitance. During the
simulation, information is gathered which is used
to formulate allocation as an ILP problem with the
total switched capacitance in the datapath as the
objective function. The solution to the problem
yields optimal allocation for the given model.
A detailed discussion about power consumption
in CMOS digital designs can be found in [16].
Techniques for low power operation are presented
which use the lowest possible supply voltage
coupled with architectural, logic, circuit, and
technology optimizations. An excellent literature
survey on the power estimation techniques at the
logic and lower levels of abstraction can be found
in [17],
In [11, 18], we have proposed a behavior level
profiling based technique to estimate switching
activity and switching capacitance in a design. The
estimation is carried out in the scheduling and
perfornance analysis phase of the synthesis. For a
given input specification, various schedules can be

generated satisfying the user given constraints. The
schedule with least estimated switching capacitance is further synthesized. The estimation
technique adopts analytical approach at the design
level and statistical approach for the module
library characterization. One of the drawbacks of
the approach is that the interconnect estimation is
somewhat inaccurate at the scheduling level,
resulting in inaccurate power estimation. In the
present work, the estimation is at the RT Level
and is based on the behavior level profiling of the
input specification. Since the interconnect structure is known completely, power estimation in the
interconnect is more accurate compared to that
obtained at the end of the scheduling step. In the
present approach, the error in power estimator is
in the range of 3 % 10%.
3. ISSUES IN POWER ESTIMATION

In a CMOS digital circuit, the power consumed is
given by the following equations [19, 16]"

econsumed

Pswitching -+- Psc + Pleakage

eswitcling

E

esc
eleakage

.

* Ci * V supply

Isc * Vsupply
Ileakage * Vsupply

Pswitching is known as the switching component of
power consumption which arises due to charging a
node with a load capacitance of Ci and which is
clocked at a frequency, fi. Psc, the short-circuit
component arises when the PMOS and NMOS
transistors are switching simultaneously resulting
in a short-circuit path from the voltage supply to
ground. For a very short period of time, current is
drawn from the voltage supply to the ground
which results in power dissipation. Pleakage is due
to the leakage current,/leakage, which arises due to
substrate injection and subthreshold effects.
The dominant term is Pswitching. This term is
dependent on the architectural parameters and is
relatively amenable for estimation at higher levels

ARCHITECTURAL POWER ESTIMATION

of abstraction. It is well-known that the static
power consumption in digital CMOS circuits is
negligible compared to the dynamic power consumption. Hence Pleakag, which is static in nature
is negligible. Pse can be kept within 15-20% of
eswitehing [20] by proper design methodology. Thus,
it is sufficient to estimate Pswitching to estimate the
average power consumed by a design.
Dynamic power consumption is strongly dependent on the stream of inputs applied to the circuit
[17]. Without any information about the input
stream, it is impossible to accurately estimate the
power consumption of a design. Thus, for a power
estimation technique it is necessary to provide
actual or statistical information about the input
behavior.
Different power estimation techniques make
different assumptions about the input vectors.
These techniques are based on statistical, stochastic, probabilistic, or analytical approaches. For
any technique two broad steps can be identified:
(1) Characterization of the circuit components for
power and storing relevant information about the
components in the form of statistical models,
parameterized tables, equations, etc. This is
usually done only once for all the components
used in the circuit. (2) Estimation of average power
for a given design by combining the input behavior
information specific to a design with the module
library information using a statistical, stochastic
probabilistic or analytical approach or a mix of
these approaches.
In our approach for power estimation at RT
level, the input vector behavior is indirectly
specified by the user by providing a sequence of
typical input vectors, known as the profiling
stimuli. These vectors denote typical usage of the
digital system being synthesized. These vectors are
used to simulate the behavior level specification
during which event activities of various behavior
level operators and carriers are monitored and
recorded. Collectively this information is known as
the profile data.
For a given set of inputs to a digital circuit, the
capacitance switched in the circuit is a measure of

259

the power consumed by the circuit. We adopt this
indirect approach for power estimation. Thus, in
this paper, we use "power" and "switched
capacitance" synonymously.
The module library is precharacterized for
average switched capacitance per input vector as
explained in detail in Section 5. RT level designs
contain three subunits: datapath, controller, and
interconnect. Detailed procedures to estimate the
switched capacitances in each of these units are
presented in Section 6.

4. BEHAVIORAL PROFILING
The concept of profiling a given program to gather
various statistics is not new. A well-known technique for measuring program performance is to
insert monitoring code into the program and
execute the modified program. Program profiling
counts the number of times each basic block is
executed and the number of times each controlflow path is traversed. Profiling is widely used to
measure instruction set utilization, identify program bottlenecks and estimate program execution
times for code optimization [25, 26, 27, 28, 29].
Techniques to inser monitoring code to optimally
and efficiently profile programs exist in the
literature [30, 31, 32].
Behavioral level profiling is similar to program
profiling. For profile data to make sense in case of
high level synthesis, one needs to understand the
correspondence between the constructs (variables,
operations, loops etc.) in the behavior representation to elements in the resulting hardware. Understanding this correspondence helps in determining
the data to be gathered during profiling. The
profiling strategy is mainly dependent on how
different synthesis tasks go about synthesizing the
target design.
Consider the behavior description written in
VHDL as shown in Figure 1. One possible RTL
data path synthesized from the specification is as
shown in Figure 2. The correspondences between
elements of the specification and the elements of

S. KATKOORI AND R. VEMURI

260
(1)
(2)
(3)
(4)
(s)
(6)
(7)
(8)
(!0)
(11)
(12)
(13)
(14)
S
(16)
(17)

(18)
(19)
(20)
(21

FIGURE

ENTITY toy IS
PORT(a, b
c

IN INTEGER;
OUT INTEGER);

END toy
ARCHITECTURE foo OF toy IS
BEGIN
p: PROCESS(a, b)
VARIABLE u, v INTEGER;
BEGIN
u := a+b;
v := a-b;
IF (a > b)
THEN
c <= u;
ELSE
c <= v;
END IF;
END PROCESS
END f oo

A Behavioral Specification in VHDL.

FIGURE 2 A RTL Data path Synthesized from Specification
in Figure 1.

the RTL design is also shown by the line number
annotations. Each register is associated with a
carrier in the description, for example, register a
corresponds to the port a in the specification.
The profile data obtained by behavioral profiling should indicate the usage of different hardware
elements. For example, the profile data of an
assignment statement in the behavioral description
gives an estimate of the excitation frequency of the
corresponding path in the hardware. In our

example, if line number (18) has a profile data of
10, it means that the corresponding path from the
output of subtracter through the multiplexor to
the input of the register c, is excited ten times.
RTL designs generated by high level synthesis
systems contain temporary registers and interconnect units which have no direct correspondence
with constructs in behavior level specification.
Profile data for such RTL components which do
not explicitly appear in the specification has to be
calculated by some indirect means.
In order to profile a behavioral specification the
profiler inserts monitoring code in the specification. This code typically declares, initializes and
increments various counters to measure various
types of event activity. The modified program is
then simulated to determine the profile data.
Behavior profiler takes the CDFG representation of the specification and generates equivalent
VHDL program with probes (counters and similar
monitoring variables) to gather various event
activities. We need to profile the CDFG rather
than the original specification since the CDFG
representation exposes all the operations and
carriers (edges in CDGF) that will be bound to
hardware resources.
The generated VHDL program is simulated
using input vectors called the profiling stimuli
supplied by the user. Profiling stimuli should
represent typical usage of the design being
synthesized. Since profiling stimuli will decide the
event activity in the design, the user should take
extreme care in preparing this data. Some suggestions as how to prepare profiling stimuli for
different classes of designs are given in [11].
For the given profiling stimuli, the profile data of
the specification constitutes the following information associated with the CDFG nodes and edges.
The event activity of a CDFG node op is the
number of times that node is executed and is
denoted by Eop. The transaction activity of an edge
e is the number of times the edge is traversed
during the execution and is denoted by Te. The
event activity of an edge e is the numbers of times
input changed on the edge and is denoted by Ee.

ARCHITECTURAL POWER ESTIMATION

Ee< Te. Probes are inserted by the
the profile data.
to
measure
profiler
Note that

5. POWER CHARACTERIZATION

OF RTL MODULES AND PLAs
5.1. Module Library Characterization
The RTL module library contains parameterized
modules such as n-bit registers, n-bit adders and nbit m-to-1 multiplexors. Modules are parameterized with respect to bit-width of each input and,
where applicable, the number of inputs. For each
module in the library, its interface description,
parameters such as area, delay and average
intrinsic switched capacitance (ISC) characteristics
are stored in the library. The area, delay and ISC
characteristics are expressed as a function of
parameter variables such as bit-width, word length
etc. and are in the form of either equations or
tables. If the data cannot be fit into an equation,
then it is stored as a table. For tables, linear
interpolation or extrapolation is assumed whenever the parameter value is not available for a
given value of parameter variable.
For a given library module, area, delay and ISC
values are determined by generating layouts for
different parameter values. Linear regression by
the method of least squares is used to find an.
equation which determines the area, delay or ISC
characteristic given the bit-width parameter value.
If the standard error is too high, then the data is
entered as a table assuming the use of linear
interpolation in between the data points. Determination of area and delay parameters for layout
instances is straightforward. Area can be directly
measured from the layout and delay can be
determined through simulation or a timing analysis programs such as Crystal [34]. Determination
of ISC which depends on input patterns is more
involved and is described below.
We define the average intrinsic switched capacitance (ISC) of a module instance as the average
capacitance that is expected to switch when an

261

input event (change of logic values on the input
lines) takes place. ISC of a module instance is
determined by extracting a switch level model from
its layout, simulating the switch level module using
a very long stream of randomly generated input
patterns and monitoring the capacitance switched
per pattern, until convergence occurs as discussed
below. The capacitance measurements are carried
out by IRSIM-CAP [37], which is a modified
version of IRSIM [38] switch level simulator for
better capacitance measurements.
Let Ck be the total capacitance charged after
applying k random input patterns without reinitialization between successive patterns. Zk Ck/k
denotes the average capacitance per input pattern
after applying k patterns. 6k-’lZk--Zk_l[/Zk_
denotes variation in the average capacitance
between the k-1 th and k th patterns. We continue
to apply random input patterns until 6k remains
less than 0.001 over 1000 consecutive input pattern
applications. At this point we say that the average
switching capacitance estimation converged and
accept the value of Zk after the last input pattern is
applied. This value is the ISC of that instance of
the module. Similar procedure is used to determine
the ISC of various instances of the module and
results are expressed as an equation or table.
Figure 3 shows the ISC characteristics of a
library module. Figure 4 shows ISC plots with

eooo

;000

12000

Input Patterns (k)

FIGURE 3 ISC Characteristic of a 16-bit Register.

S. KATKOORI AND R. VEMURI

262
45

Register
,2-input Mux

5.2. PLA Characterization
"/-

16
Bitwldth

FIGURE 4 ISC vs. Bit-Width for three Parameterized
Modules.

respect to the bit-width for three modules, namely,
adder, register and two-input multiplexor. Table I
shows the ISC characteristics of some PDSS
library modules. For RAM component, there are
two parameters namely, select size and the word
size. The ISC value shown for RAM is the average
capacitance.switched for either a Read or a Write
operation.

TABLE
s1.

ISC Data for Some Parameterized Library Modules (Bit Width > 1)

Module
Adder
Subtracter

Comparator >
Multiplier
Multiplexor

Register
Signal Register
(Register + Glue Logic)
9.
10.
11.
12.

AND
OR
NOT
NAND

13.
14.
15.
16.

NOR
XOR
XNOR
RAM

The controller is a finite-state machine which we
assume is implemented as a PLA structure. The
PLA structure consists of an input plane, an
output plane, and I/O buffers. We assume that the
PLA is implemented using dynamic CMOS with
pre-charged product and output lines [19]. The
product and output lines are selectively discharged
based on the input conditions and are controlled
by two non-overlapping clocks.
A PLA is characterized by three parameters:
(1) input size, 2; (2) output size, (9; and (3) the
number of states, S. The ISC for any controller
function of these parameters. By varying 2", (9, S,
random PLAs are generated and characterized as
follows: The switch level model of the controller,
extracted from the layout, is simulated using
random input vectors. Simulation is carried out
until the capacitance switched per clock step (as
opposed to per input pattern in the case of the
modules in the library) converges in a fashion
similar to the one described in module library
characterization.

ISC Table (Bit Width-ISC(pF))
1-0.45, 2-0.98, 4-1.93, 5-2.43, 8-3.84, 16-7.74
2-0.97, 4-2.50, 6-3.26, 8-5.64, 10-7.05, 16-12.16
1-0.44, 2-0.88, 4-1.82, 5-2.00, 6-2.78, 8-3.99, 16-12.57
2-2.27, 3-3.53, 4-7.99, 5-15.30, 8-60.48, 16-455.39
2-inputs: 2-0.45, 4-0.86, 8-1.70, 12- 2.53, 16- 3.39
4-inputs: 2-1.41, 4-2.68, 8-5.20, 12-7.95, 16-10.79
6-inputs: 2-2.46, 4-4.69, 8-9.46, 12-14.53, 16-19.53
8-inputs: 2-3.29, 4-6.23, 8-13.10, 12-19.89, 16-26.73
1-3.77, 2-6.53, 4-12.09, 5-13.68, 6-18.19, 7-18.67, 16-41.62
2-10.90, 3-12.41, 4-15.45, 5-15.99, 8-23.78, 16-39.35
2-0.17, 3-0.29, 4-0.36, 5-0.45, 6-0.55, 8-0.76, 10-0.97, 16-1.55
2-0.18, 3-0.27, 4-0.38, 5-0.48, 6-0.51, 8-0.71, 10-0.98, 16-1.48
1-0.04, 2-0.08, 3-0.12, 4-0.16, 5-0.20, 8-0.33, 16-0.66
1-0.06, 2-0.13, 3-0.19, 4-0.26, 5-0.32, 6-0.38, 7-0.44, 8-0.53,
16-1.06
3-0.17, 4-0.22, 5-0.28, 6-0.35, 8-0.47
2-0.31, 3-0.50, 4-0.68, 5-0.86, 6-0.98, 8-1.35, 10-1.69
2-0.31, 3-0.50, 4-0.64, 5-0.80, 6-0.97, 8-1.26, 10-1.61, 16-2.56
sel_size=2: 1-159.84, 2-187.12, 4-244.310, 8-392.75, 16-592.48
sel_size 3: 1-217.57, 2-250.65, 4-318.67, 8-463.22, 16-736.95

ARCHITECTURAL POWER ESTIMATION

263
’pla.data’

A PLA characterization table is obtained, which
is used later for the estimation of switched
capacitance in the controller. Table I! shows a
portion of the PLA characterization table. Figure
5 shows a three dimensional plot of ISC values for
controllers with varying (.9, $ and with input size,
77-5.

510

6. ARCHITECTURAL POWER

15

ESTIMATION
PDSS (Fig. 6) accepts specifications in a behavioral subset of VHDL and user-specified constraints
in terms of clock-period and area. It generates a
RT level design satisfying the given constraints.
PDSS consists of four main modules: scheduling
and performance estimation, register optimization,
interconnect optimization and controller generation. More detailed discussion on PDSS appeared
in [11, 33].
The RT level design produced by PDSS consists
of four major subunits from the power estimation
view point: Datapath, Controller, Interconnect

2tp: (O)30

15
35

States(S}

10
40

45

-

FIGURE 5 PLA Characterization with size of Inputs I 5.

Specification
fVHDL)

(I) Scheduling

(3)

Optimization

(4)

binding

Profiling

-Co::User

(2) Repair Optimization

Design (VHDL)

FIGURE 6 PDSS Environment.

TABLE II A portion of the PLA Characterization table
SI.No

2-

0

S

ISC(pF

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.

5
5
5
5
5
5
5
5
5
5
5
5
10
10
10
10
10
10
10
10
10
10

15
15
15
15
15
15
20
20
20
20
20
20
25
25
25
25
25
30
30
30
30
30

5
10
15
20
25
30
5
10
15
20
25
30
10
15
20
25
30
10
15
20
25
30

9.50
16.45
18.85
20.00
24.78
26.84
11.39
15.14
21.42
28.66
31.34
33.82
23.37
25.36
35.97
43.93
43.99
27.40
29.35
41.40
47.96
48.61

and System Clock. Power consumed in the design
is given by,

Pdesign

Pdp -+- Pcon + Pinter

+ eclock

where Pdp, Pcon, Pinter, and Pclock are the power
consumed in the datapath, controller, and interconnect and system clock respectively.
Our RT level power estimator needs the following inputs as shown in Figure 6:
1.

Profile Data:

This is obtained from the beha-

vioral profiling of the high level specification
given as input to the PDSS. For each operator
and each edge in the CDFG a count of total
event activity occured on the operator/edge is
recorded.

264

S. KATKOORI AND R. VEMURI

2. Binding Information: One of the synthesis tasks
is to bind each operator and edge in the CDFG
to an instance of one of the modules in the
module library. It also binds the temporary
variables introduced to hardware registers in
the module library.
3. Module Library: The module library is precharacterized for ISC as explained in the
section 5.
4. RT Level Output: This is the structural implementation containing instantiations of modules from the module library. The controller is a
finite state machine description. The datapath
and controller interact with each other to form
the entire design.

To estimate average power of a given RT-Level
design, the power estimator goes through the
following phases: (1) Pre-processing stage; (2)
Profile data computation of hardwar resources
introduced during synthesis; and (3) Power estimation of the design.

6.1. Pre-processing Stage

The power estimator initializes with the ISC values
of all the modules obtained from the library
characterization. The binding information provided by the synthesis tool is used to build a list of
instances (inst_list) of modules. Each instance is
initialized with sum of the profile data of all the
operators (or edges) in the CDFG which are
bound to that particular instance. Note that some
of the instances’ profile data is not known as they
are introduced during synthesis. The profile data
of such instances is computed in the next phase.
6.2. Profile Data Computation

Algorithm Compute_profile() in Figure 9 is used
to compute the profile data of the temporary
registers and the interconnect units introduced
during the synthesis.
Procedure Build_dependency_st() builds a dependency list of the instances. It goes through each
instance inst in the instance list inst._list and if the

profile data of the instance is unknown, then it
adds the instances at the inputs, to the insti’s
dependency list.
Consider Figure 7 in which there is a feedback
from the output of the multiplexor inst(j) to the
input of the register inst(i). Such a configuration
gives rise to dependency cycles. Procedure Relnore_cycles () removes the above described dependency cycles in the following way. Let two
instances and j be in a dependency cycle. Besides
the input which gives rise to a dependency cycle, if
the profile data on remaining inputs of an instance
is known, then let us say that the profile data of
that instance is known. Otherwise, the profile data
of the instance is said to be unknown. The
following three possibilities can occur:

Case 1" The profile data of both instances is
known. The profile data of each of the instances
is equal to the sum of the profile data of both
instances.
Case 2: The profile data on only one of the
instances (say i) is known. We remove the edge
from j to i. Assuming that instance j is not in a
dependency cycle with any other instance, the
profile data ofj can be computed, which is the sum
of the profile data of all the instances (including/)
at its inputs. Since there was an edge from j to i,
the instance has event activity from the output of
instance j. Thus the new profile data of is the old
profile data plus the computed profile data of
instance j.
Case 3: The profile data of both instances is not
known. Both the edges in the cycle are removed,

ns

(
i
j
)
i
inst(j)
==>

ns

dependency
cycle

FIGURE 7 An example of dependency cycle.

ARCHITECTURAL POWER ESTIMATION

profile data of and j are computed based on the
profile data of the instances at other inputs. The
new profile data of each instance is the sum of the
profile data of both instances.

To illustrate Case 1, consider Figure 8. Two
instances inst(i) and inst(j) are both in a
dependency cycle. The profile data on inputs A
and B of inst(i) are 10 and’ 20 respectively.
Similarly, the profile data on inputs of inst(j)
namely, C and D are 15 and 12 respectively. We
make a conservative assumption that the inputs of
a multiplexor are not switching simultaneously.
Thus, the profile data on the output of a multiplexor is the sum of the profile data on all the
inputs. Thus, the equations to compute profile
data on outputs of both instances are:

e(x)

e( r’) + e(A) +

P( Y)

P(X) + P(C) + P(D)

Where P(X) is the profile data on the output of
inst(/) and P(Y) is the profile data on the output of
inst(j). P(X) appears on the right hand side of
P(Y) equation and vice versa. The above set of
equations cannot be solved, unless we remove the
dependency cycle. Since P(A), P(B), P(C) and
P(D) are known, the example belongs to Case as

A

265

discussed above. With the dependency cycle
removed, the profile data of X and Y are P’(X)
P(A)+ P(B)= 30 and P’(Y) P(C)+ P(D)= 27.
With the dependency cycle included, the new
profile data for both X and Y are, P(X)= P(Y)
P’(X)+ P’(Y)= 57. If P(A) or P(B) is unknown
to start with, then the example belongs to Case 2.
if P(A) or P(B) and P(C) or P(D) is not known
then the example belongs to Case 3.
After removing the cycles, for each instance
whose profile data is unknown, it is calculated as
the sum of profile data of all the instances at i’s
inputs. After computing profile data for all the
instances, the data path power can be computed as
follows.

6.3. Power Estimation in Data Path

The data path consists of the execution units such
as adders and multipliers and storage units such as
latches and shift registers. The power consumed by
the datapath Pdp, and is computed by lines 2-4 of
the procedure Estimate_Power (). Pdp is given by:

Eop * ISCop

Pdp
op

Where Eop is the event activity (or profile data) of
the operator (or register) and ISCop is the average
switched capacitance value of a hardware module
instance to which the operator node op is bound.

B
6.4. Power Consumed by System Clock

inst(i)

X

As the system clock controls all the clocked

/

components in the data path, it is loaded by a

C

D

Y
FIGURE 8 An example to illustrate profile data computation
in presence of a dependency cycle.

Algorithm Compute_profile()
begin
1. T Build_dependency_list();
2. Remove_cycles(T);
3. for each I in inst_list do
4.
for each J in/.dependency_list do
5.
/.profile_data += J.profile_data
6.
end for
7. end for
end
FIGURE 9 Algorithm for the computation of profile data.

S. KATKOORI AND R. VEMURI

266

large amount of capacitance. The power consumed
by the system clock is estimated in Algorithm
Estimate_Power ( ) shown in Figure 10. The lines
8-10 estimate the load capacitance Cclock on the
system clock. In the clocked components such as
registers and latches, the load capacitance on the
clock line varies approximately 50fF per bit-width.
Thus, total capacitive load on the system clock is
the sum of the clock capacitances of each instance.
The total capacitance switched in a design is given
by the product of number of input vectors (N0,
total number of clock cycles required to process an
input vector (Ttota and clock capacitance (Cclock ).
6.5. Power Estimation in Controller

The controller is a finite state machine implemented as a PLA. Any PLA is characterized by three
parameters: the number of inputs 2-, the number of
outputs (9 and the number of states, S. In the
module library, there already exists a PLA
characterization table, which was explained in
detail in section 5. From the table, we can obtain
the average intrinsic switching capacitance ISC of
a PLA of a given size. Interpolation/Extrapolation
is assumed where ever the values are not available
for a given set of parameter values. The ISC value
so obtained is the average capacitance that switches
per clock step in the PLA of size (2-, O, S).
Algorithm Estimate_power 0
begin
1. for each I in inst_list do
2.
3.
4.

Nv be the total number of profiling stimuli
applied. Let the CDFG be scheduled in Nc number
Let

of control steps. In the module library, for each
module, the number of clock steps needed to
process an input vector is stored as function of its
parameters such as bit-width, wordsize etc. The
total number of clock steps required to process an
input vector is sum of the maximum number of
clock steps needed in each of the control step.
Algorithm Estimate_clock_steps () estimates the
number of clock steps needed by the design to
process an input vector (say Ttotal). The power
consumed in the controller is given by the product
of Nv, Ttotal and the ISC(2,O,S) as given in
Algorithm shown in Figure 10.
6.6. Power Estimation in Interconnect

Already the profile data for the interconnect units
has been calculated as discussed in profile data
computation phase. The interconnect units are not
present in CDFG and arise due to operator sharing,
register sharing and interconnect sharing. In this
work, we consider only Multiplexor-based designs.
The profile data of a multiplexor is computed as
the sum of the event activities on all the inputs.
This is a conservative estimate of the total number
of events that the multiplexor is subjected to.
The power consumed in the interconnect is
calculated in the same way as is done for the
datapath.

OPERATOR OR REGISTER) then
Pdp+=/.profile_data ISC(l.op_type,/.size)

if (Lmodule_type

else
if (/.module_type
INTERCONNECT) then
tnter+= /.profile_data ISC(MULTIPLEXOR,/.size)
endif
endif
if (/.module_type
CLOCKED_COMP) then
Caock +----- 50fF (Lsize)
11.
endif
12. end for
13.
14. Let 27, O and q be the controller size.
15. Ttotat =Est hnate_clock st eps()
16. C’con
17. Pco Nv
18. Pdo Nv T,o Cdo
20. Ptotat Pa + Peon +/nter + Paoc
end

5.
6.
7.
8.
9.
10.

FIGURE 10 Algorithm for the estimation of power.

Algorithrn Estimate_clock_steps ()

beg
0
1.
2.
3.
4.

5o
6.
7.
8.
9.
end

for in 0 to

N do

max_clock_cnt

+- 0

for each op scheduled in control step do

if( raoduleop.cloek_steps > max_clock_cnt)
where op is bound to moduleop then
max_clock_cnt +- moduleo.clock_steps
endif
end for

Ttotat += max_clock_cnt
end for

FIGURE 11 Algorithm to estimate the number of clock steps.

ARCHITECTURAL POWER ESTIMATION

It is given by:

a two phase non-overlapping clocking scheme.
Although the designs are generated in a scalable
CMOS technology, all results for this paper are
obtained using 2 micron feature size. Switch level
models are extracted from the layouts and
simulated using the IRSIM-CAP [37] switch level
simulator. Table IV shows the synthesized design
data at the layout level.
Table V shows the estimated and actual powers
in the data path and interconnect of the six
designs. The estimated power is computed at the
RT-level and actual power is determined by the
switch level simulation of the synthesized designs.
As shown in the table, the percentage error in
estimation for data path is in the range of 2.51%
12.58% with the average deviation from the actual

Ei * ISC(MUX, /.size)

einter

where is a instance of a multiplexor of size/.size.
and ISC (MUX, /.size) is the average intrinsic
switching capacitance of a multiplexor of size
/.size.

7. RESULTS

In this section we present experimental results for
six designs"

1.
2.
3.
4.
5.
6.

267

Compression chip
Decompression chip
FIFO, a first-in first-out queue
Find, sort and search chip
Shuffle Exchange Network [35]
Traffic light controller

value being 6.25%.
Table VI shows the comparison of powers for
controller. The estimation error is in the range
3.53% 15.22% with the average deviation being
10.51%. Table VII shows the comparison of the
power dissipated due to the system clock. The
estimation error is in the range 18.59%
30.69%
with the average deviation of 22.32%. Table VIII
shows the power values for the entire design,
which is the sum of the power, in datapath (Pdp),
interconnect (einter), system clock (Pclock), and

Table III shows the behavioral specification data.
PDSS system is implemented in C++ on Sun
Sparcstation platforms.
Each register level design produced by PDSS is
processed by the Lager IV silicon compiler [36] to
generate mask layouts. The designs generated use

TABLE III Behavioral Specification Data for Six Designs
S1.

Design

Compress
Decompress
FIFO
Find
Shuffle Xchg NW

TLC

LOC

DFG
Nodes

DFG
Edges

Profiling
Stimuli

Profiling
Time (s)

42
40
70
63
450
72

22
22
38
33
31
27

107
106
176
121
2040
123

25
25
25
16
14
10

9.48
6.00
16.33
9.81
30.90
1.37

TABLE IV Synthesized Design Data at the Layout Level
S1.

Design

Clock

Nodes

Transistors

Compress
Decompress
FIFO
Find
Shuffle Xchg

TLC

200ns
200ns
900ns
550ns
160ns
200ns

Area

Cycles

Simulation
Time (min)

1450
825

5.34
3.23
20.44
35.00
240.00
1.28

(sq. mm.)

Period

2,946
2,803
4,438
5,602
49,655
1,938

6,315
6,059
10,688
11,458
95,004
4,769

10.9
10.3
24.6
20.3
418.7
6.9

2,364
5,360
1,975
420

S. KATKOORI AND R. VEMURI

268

TABLE V Comparison of the power in the Data path and Interconnect
S1.No

Total (edp + einter)

Design

Compress
Decompress
FIFO
Find

Shuffle
TLC

Estimated

Actual

pF

pF

24219
15034
56863
266602
525207
4415

21171
14473
51614
281415
545976
4526

Average Error

%Devn.
12.58
3.73
9.23
5.55
3.95
2.51

6.25

TABLE VI Comparison of the power in the Controller
S1.No

Total (Peon)

Design

Compress
Decompress
FIFO
Find
Shuffle

TLC

Estimated

Actual

pF

pF

45974
26309
338266
351066
297400
13906

39209
22303
304340
330718
256303
13414

Average Error

%Devn.
14.71
15.22
10.03
5.79
13.81
3.53
10.51

TABLE VII Comparison of Clock power
S1.No

1.
2.
3.
4.
5.
6.

Total (Pclock)

Design

Compress
Decompress
FIFO
Find
Shuffle

TLC

Estimated

Actual

pF

pF

10793
6249
27580
59787
554768
2786

13036
8167
33958
47241
444923
2268

Average Error

% Devn.
20.78
30.69
23.12
20.98
19.80
18.59
22.32

TABLE VIII Comparison of the total power for the Entire Design
S1.No

Total (Pdesign)

Design

Compress
Decompress
FIFO
Find
Shuffle

TLC

Average Error

Estimated

Actual

pF

pF

80986
47592
422709
677455
1377375
21107

73416
44943
389912
659374
1247202
20208

% Devn.
9.35
5.57
7.75
2.66
9.45
4.26
6.51

ARCHITECTURAL POWER ESTIMATION

controller (Pcon). The percentage error is in the
range 4.26%- 9.35% and the average deviation is
6.51%. This shows reasonable correlation between
the estimated and actual values not only in the
entire design but also in the datapath and
controller seperately.
8. DISCUSSION AND CONCLUSIONS

269

Electronics Directorate of the Wright Laboratory
of the US Air Force under contract number
F33615-9 l-C- 1811 and by the Advanced Research
Projects Agency under order no. 7056 monitored
by the Federal Bureau of Investigation under
contract no. J-FBI-89-094.

References

The following are some of the factors which have
not been taken into account during the power
estimation:

[1] Camposano, R. and Wayne Wolf. (1991). "High Level
VLSI Synthesis", Kluwer Academic Publishers.
[2] Lemnois, Z. J. and Gabriel, K. J. (1994). "Low-Power
Electronics", IEEE Design and Test of Computers, pp. 8-

1. Effect of Placement and Routing
2. The random characterization of the RTL
module library and PLAs gives rise to an
inherent estimation error. This can be remedied
by taking the activity on the inputs into the
estimation procedure.
3. Glitch power consumption.
4. PLA characterization based on only inputs,
outputs and states is not sufficient. The state
table information has to be taken into account.
5. In the estimation of power in multiplexors, we
assumed that the activity on the inputs is added
up to get the activity of the multiplexor. We are
making a very conservative assumption that all
the inputs are not switching simultaneously.
This is another source of error.

[3] Najm, F., "Towards a high-level power estimation
capability", In Proceedings of the 1995 International
Symposium on Low Power Design, April 1995.
[4] Powell, Scott R. and Chau, Paul M. (1990). "Estimating
Power Dissipation of VLSI Signal Processing Chips: The
PFA Technique," VLSI Signal Processing IV, pp. 250-

13, Winter.

In this work, we presented an accurate power
estimation technique based on the profile data
obtained at the behavior level. The estimation
technique is implemented in the framework of a
high level synthesis system. Compared to the
estimation techniques at the lower levels of
abstraction, the technique is faster in the execution
time. For the six examples considered, the average
estimation error at the design level is within 10%,
which demonstrates that the estimation technique
is reliable.

259.

[5] Powell, Scott R. and Chau, Paul M., "A Model for
Estimating Power Dissipation in a Class of DSP VLSI
Chips", IEEE Transactions on Circuits and Systems,
38(6), June 1991.
[6] Chandrakasan, Anantha P. et al., "HYPER-LP: A
System for Power Minimization Using Architectural
Transformations", Proceedings of ICCAD, pp. 300303, November 1992.

[8]

[9]

[10]

[11]

[12]

[13]
[14]

Acknowledgements
This work is done at the University of Cincinnati
and is supported in part by the Solid State

et al., "Optimizing Power
Using Transformations", IEEE Transactions on Computer Aided Design, pp. 12-31, January 1995.
Landman, P., "Low-Power Architectural Design Methodologies", Ph.d. Thesis, Memorandum No. UCB/ERL
M94/62, 30th August 1994.
Landman, P. and Rabaey, J., "Black-Box Capacitance
Models for Architectural Power Analysis", Proceedings of
the 1994 International Workshop on Low Power Design,
Napa Valley, CA, pp 165-170, April 1994.
Rabaey, J. M., Chu, C., Hoang, P. and Potkonjak, M.,
"Fast Prototyping of Datapath-Intensive Architectures,"
IEEE Design and Test of Computers, pp. 40-51, June
1991.
Nand Kumar, Srinivas Katkoori, Leo Rader and Ranga
Vemuri (1995). "Profile-Driven Behavioral Synthesis for
Low Power VLSI Systems", IEEE Design and Test of
Computers, pp. 70-84, Fall.
Renu Mehra and Rabaey, Jan, M., "Behavioral Level
Power Estimation and Exploration", Proceedings of the
International Workshop on Low Power Design, pp. 165170, April 1994.
Paul Landman and Jan Rabaey, "Power Estimation for
High Level Synthesis", Proceedings of EDAC-EUROASIC, pp 361 366, February 1993.
Anand Raghunathan and Jha, Niraj K. (1994), "Behavioral Synthesis for Low Power", Proceedings of ICCD.
Anand Raghunathan and Jha, Niraj K. (1995). An ILP
formulation for low power based on minimizing switched
capacitance during data path allocation", in the Proceedings of IEEE Symposium on Circuits and Systems.

[7] Chandrakasan, Anantha P.

[15]

S. KATKOORI AND R. VEMURI

270

[16] Chandrakasan, Anantha P., Sheng, S. and Brodersen, R.,
"Low Power CMOS digital design", IEEE Transactions

of Solid State Circuits, April

1992.

[17] Najm, F. N., "A survey of power Estimation Technique
in VLSI circuits (Invited Paper)" IEEE Transactions
VLSI Systems, 2(4), 446-455, January 1995.
[18] Srinivas Katkoori, Nand Kumar and Ranga Vemuri,
"High Level Profiling Based low Power Synthesis
Technique", In the Proceeedings of ICCD, October 1995.
[19] Weste, N. and Eshraghian, K. (1985). "Principles of
CMOS VLSI Design: A ,Systems Perspective", AddisonWesley.

[20] Veendrick, H. J. M., "Short-circuit dissipation of static
CMOS circuitry and its impact on the design of buffer
circuits," IEEE Journal on Solid State Circuits, SC-19, pp.
468-473, August 1984.
[21] Ravi Kalyanaraman, "Behavioral Test Generation for
VHDL Programs", MS Thesis, Department of Electrical
and Computer Engineering, University of Cincinnati,
September 1993.
[22] Darrel Ince (1991). "Software Testing", in John McDermid (ed.) Software Engineer’s Reference Book, Butterworth-Heinemann Ltd.
[23] John Hennessy and David Patterson (1990). "Computer
Architecture: A Quantitative Approach", Morgan Kaufmann Publishers.

[24] Stallings, W. (ed.) (1990). "Reduced Instruction Set
Computers (RISC)", IEEE Press.
[25] Cmelik, R. F., Kong, S. I., Ditzel, D. R. and Kelly, E. J.,
"An Analysis of MIPS and SPARC instruction set
utilization on the SPEC benchmarks", In ASPLOS-IV
Proceedings, SIGARCH Computer Architecture News 19,
pp. 290-302, 2 April 1991.

[26] Graham, S. L., Kessler, P. B. and McKusick, M. K., "An
execution profiler for modular programs", Software
Practice Exper. 13, pp. 671-685.
[27] Morris, W. G., "CCG: A prototype coagulating code
generator", Proceedings’ of the SIGPLAN 91 Conference
on Programming Language Design and Implementation,
SIGPLAN Nat.(ACM) pp. 45-58, 26, June 1991.
[28] Pettis, K. and Hanson, R. C., "Profile guided code
positioning", Proceedings of the SIGPLAN 91 Conference
on Programming Language Design and Implementation,
SIGPLAN Nat. (ACM) pp. 16-27, June 1990.
[29] Sarkar, V., "Determining average program execution
times and their variance", In Proceedings of the A CM
SIGPLAN 89 Conference on Programming Language
Design and Implementation, SIGPLAN Nat. (ACM),
289- 312, 24 June 1989.
[30] Ball, T. and Larus, J. R., "Optimally Profiling and
Tracing Programs", A CM Transactions on Programming
Languages and Systems, 16(4), 1319-!350, July 1994.
[31] Goldberg, A., "Reducing overhead in counter- based
execution profiling", Tech Rep. CSL-TR-91-495, Computer Systems Lab., Stanford Univ., Standford, Calif,
Oct. 1991.
[32] Samples, A. D., "Profile Driven Compilation", Ph.D
thesis (Rep. UCB/CSD 91/627), Computer Science Dept.,
Univ. of California, Berkeley, Apr. 1991.
[33] Jayanta Roy and Ranga Vemuri (1992). "DSS :A
Distributed Synthesis System", IEEE Design and Test of
Computers.

[34] John Ousterhout (1987). "Using Crystal for Timing
Analysis", Electrical Engineering and Computer Sciences,
University of California at Berkeley.
[35] "Novel IC Shuffles Parallel Processing Data", Electronic
Products, pp. 42--50, August 1, 1986.
[36] Rajeev Jain et al., "An Integrated CAD System for
Algorithm-Specific IC Design", IEEE Transactions on
Computer Aided design, 10(4), April 1991.
[37] Landman, P., "IRSIM-CAP Modified version of IRSIM
for better Capacitance Measurements, Univ. of Calif.
Berkerley.
[38] Salz, A. and Horowitz, M., "IRSIM: an incremental
MOS switch-level simulator," in Proc Design Automation
Conf., pp. 173--178, June 1989.

Authors’ Biographies
Srinivas Katkoori is an assistant professor in
computer science and engineering at the University
of South Florida, Tampa. In 1997, he received his
doctoral degree in computer engineering from the
University of Cincinnati. In 1992, he received, his
bachelor’s degree in electronics and communication engineering from the Osmania University,
India. His research interests are in high-level
synthesis and low-power synthesis of VLSI systems. Katkoori is a member of IEEE and ACM

SIGDA.
Ranga Vemuri, an associate professor of electrical and computer engineering at the University
of Cincinnati, also directs its Laboratory for
Digital Design Environments. His interests include
the computer-aided design of digital systems,
formal verification, system synthesis, performance
modeling, hardware description languages, and
parallel algorithms. Vemuri received the M.Tech.,
degree from the Indian Institute of Technology,
Kharagpur, and the Ph.D., from Case Western
Reserve University, both in computer engineering.
He received the Siddhartha gold medal, a distinguished research award, and an outstanding
teacher award. He is a member of the IEEE
Computer Society, IEEE Circuits and Systems
Society, ACM SIGDA, American Society of
Electronic Engineers, and Eta Kappa Nu.

International Journal of

Rotating
Machinery

Engineering
Journal of

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

The Scientific
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

International Journal of

Distributed
Sensor Networks

Journal of

Sensors
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Journal of

Control Science
and Engineering

Advances in

Civil Engineering
Hindawi Publishing Corporation
http://www.hindawi.com

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Volume 2014

Submit your manuscripts at
http://www.hindawi.com
Journal of

Journal of

Electrical and Computer
Engineering

Robotics
Hindawi Publishing Corporation
http://www.hindawi.com

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Volume 2014

VLSI Design
Advances in
OptoElectronics

International Journal of

Navigation and
Observation
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation
http://www.hindawi.com

Hindawi Publishing Corporation
http://www.hindawi.com

Chemical Engineering
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Volume 2014

Active and Passive
Electronic Components

Antennas and
Propagation
Hindawi Publishing Corporation
http://www.hindawi.com

Aerospace
Engineering

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2010

Volume 2014

International Journal of

International Journal of

International Journal of

Modelling &
Simulation
in Engineering

Volume 2014

Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Shock and Vibration
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

Advances in

Acoustics and Vibration
Hindawi Publishing Corporation
http://www.hindawi.com

Volume 2014

