Techniques for Energy-Efficient Communication Pipeline Design by Qu, Gang & Potkonjak, Miodrag
542 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002
Techniques for Energy-Efficient
Communication Pipeline Design
Gang Qu and Miodrag Potkonjak
Abstract—The performance of many modern computer and
communication systems is dictated by the latency of communi-
cation pipelines. At the same time, power/energy consumption
is often another limiting factor in many portable systems. We
address the problem of how to minimize the power consumption
in system-level pipelines under latency constraints. In particular,
we apply fragmentation technique to achieve parallelism and
exploit advantages provided by variable voltage design method-
ology to optimally select voltage and, therefore, speed of each
pipeline stage. We focus our study on the practical case when each
pipeline stage operates at a fixed speed. Unlike the conventional
pipeline system, where all stages run at the same speed, our
system may have different stages running at different speeds to
conserve energy while providing guaranteed latency. For a given
latency requirement, we find explicit solutions for the most energy
efficient fragmentation and voltage setting. We further study a
less practical case when each stage can dynamically change its
speed to get further energy saving. We define the problem and
transform it to a nonlinear system whose solution provides a lower
bound for energy consumption. We apply the obtained theoretical
results to develop algorithms for power/energy minimization of
computer and communication systems. The experimental result
suggests that significant power/energy reduction, is possible
without additional latency. In fact, we achieve almost 40% total
energy saving over the combined minimal supply voltage selection
and system shut-down technique and 85% if none of these two
energy minimization methods is used.
Index Terms—Energy minimization, latency, low-power design,
pipeline.
I. INTRODUCTION
SYSTEM level pipelines are widely acknowledged as themost likely bottleneck of many computer systems. For
example, a read miss in the system data or instruction cache
will block the application program until the entire block
with requested data arrives [1], [23]. The tradeoff is clear:
longer blocks imply fewer misses, but also longer interrupt
latency. Similarly, in high-speed local and wide-area networks,
selecting proper block size to exploit intrinsic concurrency in
communication pipelines is a key issue [7], [27]. As the final
example, where communication pipelines dictate performances
we mention path-oriented operating systems [16]. Therefore, it
is not surprising that recently the question of how to improve
the performance of a system pipeline received a great deal of
Manuscript received February 1, 2001; revised January 7, 2002. This work
was supported in part by the National Science Foundation under Grant 9734166.
G. Qu is with the Electrical and Computer Engineering Department and Insti-
tute of Advanced Computer Study, University of Maryland, College Park, MD
20742 USA (e-mail: gangqu@eng.umd.edu).
M. Potkonjak is with the Computer Science Department, University of Cali-
fornia, Los Angeles, CA 90095 USA (e-mail: miodrag@cs.ucla.edu).
Digital Object Identifier 10.1109/TVLSI.2002.800522
attention in computer architecture, operating systems and com-
pilers communities. The essence of the problem is abstracted in
a recent work [24] where the discussion is on how to minimize
the transmission latency by careful packet fragmentation.
On the other hand, the increasing use of portable systems
(such as personal computing devices, wireless communications
and imaging systems) makes the power consumption one of
the primary circuit and system design goals. The most effec-
tive method to reduce power consumption is to lower the supply
voltage level, which exploits the quadratic dependence of power
on voltage [5]. However, reducing the supply voltage increases
circuit delay and decreases the clock speed. The resulting pro-
cessor core consumes lower average power at the cost of in-
creased latency. Therefore, it becomes less effective when tight
deadlines are present.
Recent progress in power supply technology along with
custom and commercial CMOS chips that are capable of
operating reliably over a range of supply voltages makes it
possible to build processor cores with supply voltages that
can be varied at run time according to the application latency
constraints [3], [17]. The variable voltage processor core is
capable of operating at different optimal points along the power
and speed curve in order to achieve high energy efficiency.
In particular, with multiple supply voltages on the chip, the
processor core can use high voltage for applications with tight
deadlines and keep the voltage low otherwise to reduce total
energy consumption [3], [22].
In this paper, we address the energy minimization problem
in system-level pipelines under latency constraints. We use the
recent advances in power supply technologies and the variable
voltage design methodology to choose a voltage profile for each
pipeline stage, which optimally minimizes the energy consump-
tion of the entire pipeline system.
The rest of the paper is organized as follows. Section II de-
scribes the related work in communication pipeline and low
power design techniques. In Section III, we discuss the pipeline
model, processor model and formulate the problem. We solve
the problem optimally in two cases: 1) each pipeline stage has a
fixed voltage, which may vary from stage to stage and 2) every
stage can have variable supply voltages (detailed proof, example
and discussion can be found in the technical report [19]). We
present the experimental results in Section VI, and Section VII
concludes.
II. RELATED WORK
The most relevant related work are efforts in communication
pipeline design and evaluation and low power design tech-
1063-8210/02$17.00 © 2002 IEEE
QU AND POTKONJAK: TECHNIQUES FOR ENERGY-EFFICIENT COMMUNICATION PIPELINE DESIGN 543
niques. In particular, within the former domain fragmentation
techniques for managing congestion control, packet buffering,
packet losses, and the optimization techniques for improvement
of distributed file systems and high-speed local networks
are directly relevant. Within the latter, we focus our survey
on system-level power minimization techniques and variable
voltage techniques.
In the introduction, we already surveyed a number of com-
munication pipeline systems and research efforts for latency
optimization of these systems. It is important to note that many
application specific systems operate at the highest-level of ab-
straction as processing pipelines on blocks of input (e.g., digital
TV and audio and segmentation subsystems of communication
devices). Apparently, fragmentation has been used in design
of the Internet for quite a long time. More recently, studies
on how to exploit flexible block fragmentation to improve
performances of DEC workstations has been also conducted
[12]. More detailed survey of fragmentation techniques is given
in [24].
Dynamically adapting voltage and therefore the clock fre-
quency, to operate at the point of lowest power consumption for
given temperature and process parameters was first proposed
by Mackenet al. [13], [15]. Later, [11], [26] described imple-
mentation of several digital power supply controllers based on
this idea. Several researchers have recently developed efficient
dc-dc converters that allow the output voltage to be rapidly
changed under external control [17], [21]. We mention that a
dynamic voltage-scaled microprocessor system has been re-
ported recently [3] and leave further discussion on variable
voltage processor for the next section.
In the software world, there has been also recent research
on scheduling strategies for adjusting CPU speed so as to
reduce power consumption. For example, Weiseret al. [25]
proposed an approach where time is divided into 10–50 ms
intervals and the CPU clock speed (and voltage) is adjusted
by the task-level scheduler based on the processor utilization
over the preceding interval. Govilet al. [9] concluded that
smoothing helps more than prediction in voltage changing. Yao
et al. [29] described an off-line minimum-energy schedule and
an average rate heuristic for job scheduling for independent
processes with deadlines, though under the assumptions that
1) the processor can change its speed arbitrarily, i.e., the
changes are instantaneously with no physical bounds and 2)
the jobs are preemptive with no preemption penalty. Qu [18]
extended this by the discussion of both nonpreemptive jobs on
such ideal variable speed processor and general jobs on real
variable speed processors where both the maximal/minimal
voltage constraints and the limitation on speed changes are
considered. Survey on system-level low power techniques can
be found in [28] and energy efficient microprocessor design
has been discussed in [2] and [8].
III. B ACKGROUND AND PROBLEM FORMULATION
In this section, we first describe the variable voltage processor
and the store-and-forward pipelining network, then characterize
the user packet and formulate the problem.
A. Variable Voltage Processor
The variable voltage is generated by the dc-dc switching reg-
ulators. Time to reach steady state at a new voltage is normally
innegligible. However, recent work on dc-dc converters allows
the output voltage to be changed rapidly. For example, Burd
et al.[3] implemented a microprocessor system that consist of a
dc-dc switching regulator, an ARM V4 microprocessor, a bank
of SRAM ICs, and an interface IC. The supply voltage and clock
frequency can be dynamically varied from 1.2 V to 3.8 V within
70 s with an energy efficiency of 0.54–5.6 mW/MIP.
To compensate the complexity of real variable voltage
system, we have seen plenty of efforts in the following two
directions. On one hand, there have been many proposal and
implementation of multiple supply voltage systems [6], [14],
[20], [22]. These research groups have addressed the use of
two or three discrete supply voltages. The idea is to switch
among these simultaneously available voltages according to the
processing load, computation requirement, latency constraint,
etc. On the other hand, ideal variable voltage system has also
been studied theoretically [18], [29]. An ideal variable voltage
processor can change its speed from zero toinstantaneously
without any overhead. Apparently, such ideal processor is not
feasible, but the study of this model gives us insightful view of
the problem and more importantly, it provides the lower bound
of energy consumption by using variable voltage processors.
Although there is no reported studies that takes these overheads
into consideration, there exist evidence showing that this bound
could be tight. First, Honget al. [10] reported a task scheduling
heuristics which, when applied to multimedia benchmarks,
results in a total energy consumption only 1.5% higher on
average than the lower bound obtained from the ideal case.
Furthermore, Burd and Brodersen [4] discussed various design
issues for dynamic voltage scaling systems. In their prototype
design, it takes 26s and 6.5 J for a full-scale transition from
1.2 V to 3.8 V. They estimated a practical limit of voltage
change rate on the order of 5 V/s, with the potential of going
as high as 20 V/s, for 0.6 m process. This will further reduce
the transition time and energy.
With different supply voltages, the processor will be able to
operate at different speeds, the time and power consumed to
execute the same task (or same amount of computation) will
also be different. We adopt the following relationships among
the voltage, delay, power, and energy [5]: Suppose with a 5-V
constant supply voltage, the processor finishes a task in time
, the power dissipation is . Then, with a supply voltage
, to finish the same task, the processing time , the power





where is the threshold voltage.
544 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002
B. Pipeline Model
As proposed in [24], we represent the network as a sequence
of store-and-forward pipeline stages characterized by the fol-
lowing parameters:
• is the number of pipeline stages;
• is the fixed per-fragment overhead for stage;
• is the per-byte transmission time for stagewith the
5-V reference supply voltage.
The fixed per-fragment overhead,, can be considered as
the context switch time and may vary from stage to stage. If
none of the stages has overhead, the best strategy, as we will
show soon, is to fragment the packet as small as possible to
utilize parallelism. is proportional to the inverse of the
bandwidth for stage with a 5 supply voltage. In the extreme
case, if there is no bandwidth limitation for all stages, to achieve
the minimum latency the entire packet should be sent as a single
fragment to avoid the per-fragment overhead.
At the sender’s end, the packet is fragmented and sent to
the first stage of the pipeline. A pipeline stage will start the
transmission of a fragment as soon as it receives both the entire
fragment from the previous stage and an acknowledgment from
the next stage, which is sent when the next stage is ready for
the reception of the current fragment. We refer to these as the
rules for transmission. The transmission is completed when
the receiver’s end receives the last fragment of the packet.
C. Problem Formulation
Our objective is to minimize the power consumption for
transmitting a packet through the network under the user-spec-
ified latency constraint. The two tools that we use to achieve
this are packet fragmentation and supply voltage selection.
The following variables are associated with the packet for the
convenience of analysis:
• is the size of the entire packet;
• is the deadline to transmit the entire packet;
• is the number of fragments;
• is the size of theth fragment ( );
• : (life) time that the th fragment stays on theth stage.
The packet’s size and the deadline are given by the user,
the network is characterized by the number of pipeline stages
, the overhead , and the unit transmission time for
each pipeline stage. We further assume that the processors at all
stages are identical (i.e, with the same , and ).
A fragment’s lifetime on a stage is the sum of the per-frag-
ment overhead and the actual transmission time on this stage.
Let be the voltage at which theth processor operates at
time , then the processor’s energy consumption is
(4)
where is the power dissipation at supply voltage.
We want to minimize by finding the best
voltage and fragment schemes. Formally, we seek solutions
for the energy minimization with deadline on variable voltage
(EMDVVP) problem.
Instance: A pipeline with parameters (number of pipeline
stages), (per fragment overhead on stage) and
(per-byte transmission time on stageat reference voltage), a
packet with size and transmission deadline.
Question: Find the voltage scheme for each processor
and a fragment of the packet, such that the entire
packet is transmitted within and the total energy consumption
is minimized.
We explain our approach and give the main results with sketch
of proof in the following sections, while interested readers can
find detailed proof, example, and discussion in the technical
report [19].
IV. FIXED VOLTAGE WITHIN THE SAME STAGE
We first consider a simple case when the processor on each
stage operates at a fixed voltage, but the voltages can be different
from stage to stage. It is important to study this case because
of the extreme simplicity of implementation. Since each pro-
cessor will operate at a constant supply voltage, no additional
hardware is required. Once the voltage level for each processor
is determined, the pipeline can be easily set up by applying
h required voltages to corresponding processors. The voltage
scheme problem is reduced to finding the constant voltagefor
the processor at stage. The energy consumption on this stage,
from (4), is simplified to . Moreover, the lifetime
that the th fragment (with size ) stays on theth stage can be
expressed as
(5)
Lemma 4.1:A necessary condition for the energy consump-
tion to be minimized is to finish the transmission exactly at the
deadline .
sketch of the proof:Suppose that we have a packet frag-
mentation, a voltage scheme where is the
constant voltage at which processor on theth pipeline stage
and the last fragment leaves the last stage before the deadline
, we show that this cannot be optimal by constructing another
voltage scheme on the same packet fragmentation that consumes
less energy.
We consider the voltage scheme where
is the reduced voltage on the last stage such that
the transmission will complete on the deadline. This clearly
consumes less energy. However, we still need to verify the new
voltage scheme does not violaterules of transmission. 1) With
low voltage and, hence, slow transmission speed, each fragment
will spend more time on the last stage. Therefore, the starting
transmission time of each fragment will be no earlier than its
original starting transmission time at . This implies that we
will not start transmitting a fragment that has not yet arrived.
2) The transmission cannot start until the next stage is ready for
reception of a new fragment. The slow-down of the last stage
will delay the transmission of previous stages. But this delay
will not be longer than the delay on the last stage caused by
lower voltage and therefore the deadlinewill not be missed.
Finally, the new voltage on the last stage can be easily
determined. Let be the time that the first fragment arrives at the
last stage, then the total transmission time on this stage will be
(if there is no starving), whereis the number of
fragments and is the per-fragment overhead. For a packet
QU AND POTKONJAK: TECHNIQUES FOR ENERGY-EFFICIENT COMMUNICATION PIPELINE DESIGN 545
of size , we select such that the per-byte transmission
time becomes exactly .
Intuitively, Lemma 4.1 says that the pipeline will use as much
time as possible for transmission such that the processors can be
scheduled at low voltages and thus minimize energy consump-
tion. On the other hand, on each single stage, the best strategy
is to transmit a fragment immediately upon its reception and the
accomplishment of sending the previous fragment. This implies
that the voltage should be adjusted such that all stages are syn-
chronized and leads to the following lemma.
Lemma 4.2: If the packet can only be fragmented into fixed
size, then a voltage scheme minimizes the
energy consumption if and only if
(6)
Sketch of the proof:When we restrict fragmentation to be
equal-sized, for all when
is fixed, i.e., equal lifetime for all fragments on the same stage.
We will show that these constants are the same for all stages by
contradiction.
Suppose ’s are not the same, then there exists ,
such that either , or , or both. We can
reduce the supply voltage on theth stage and construct a better
solution with less energy consumption. In fact, such solution can
be found in four steps.
1) Find the smallest such that or
. (Assuming for simplicity).
2) Reduce to such that .
3) Make appropriate changes on the stages after theth stage
because of the delay of fragments by .
4) Modify the voltage schemes to fit the deadline.
The new solution will consume less energy. Therefore, any
strategy with different ’s cannot be optimal.
From (6), the processor at the stage that has the largest per-
fragment overhead must operate at a high voltage to achieve a
short per-byte transmission time . Therefore, this stage
will consume more energy than other stages and we call such
a stagedominant stagebecause it dominates the total energy
consumption.
Theorem 4.3:Let stage be the dominant stage, then there
is a unique solution for the EMDVVP problem. The number of
fragments is given by
(7)
and each stage will operate at a fixed supply voltage that can
be determined by (6) with the constant on the right-hand side
equals .
Sketch of the proof:Let be the number of fragments,
be the size of each fragment for a packet of size
and be an optimal voltage scheme. The time
to transmit the entire packet is
The first term is the time for the first fragment to travel through
the entire pipeline. From Lemma 4.2, it equals to
for any . The second term is the time to send the rest of the
packet from the last stage. Lemma 4.1 requires their sum to be
for the energy to be minimized, therefore we have
(8)
can be easily solved in terms offrom (8) and con-
sidering (1), we get
(9)




Next, we plug the values of ’s into (3) and get the total
energy consumption, which is expressed in terms of. Since the
e ergy is dominated by stageand we know that low voltage
results in low energy. To find the optimal scheme, we take the
first derivative of with respect to , set it to zero and get the
unique solution (7).
It follows from (8) immediately that the constant on the
right–hand side of (6) is .
The voltage level on each stage can be easily determined from
(6).
Remarks: How do the network’s parameters and the latency
affect the optimal scheme?
• : When the latency constraint is loose (i.e.,is large),
(7) predicts more fragments. Energy consumption is re-
duced because each processor gets a long transmission
time and thus can use low voltage.
• : From (7), we see is an increasing function with re-
spect to , the number of pipeline stages. This means that
the more stages in the network, the more fragments we
should have. This takes advantage of the parallelism.
• : If the per-fragment overhead at the energy dominating
stage is high, less fragments should be used to avoid a large
total overhead. If there is no overhead, then we should
fragment the packet as small as possible so that more parts
of the packet can be transmitted in parallel.
• : The number of fragments in the optimal scheme is in-
dependent of the packet size. However,does play a very
important role in the voltage scheme (10). This is not sur-
prising, since we use the ideal variable voltage processor,
which can adjust its speed (by changing supply voltage)
according to the size of the packet.
To end this section, we show the following corollary.
Corollary 4.4: Theorem 4.3 holds for deep submicrometer.
Sketch of the proof:For deep submicrometer tech-
nology, voltage-delay is given by
( , the current technology hasas 1.5 or 1.6). Recall
546 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002
that (8) is independent of the voltage-delay model. When
, (9) will be replaced by
We rewrite this as . Differentiating both
sides with respect to , we get
and
Therefore, if and
only if and the latter gives (7).
V. VARIABLE VOLTAGESWITHIN THE SAME STAGE
We first explain how to transform the energy minimization
problem to a nonlinear system and then discuss implementation
challenges for variable voltage on the same stage.
A solution to the EMDVVP problem requires a supply
voltage profile for each processor and a packet fragmenta-
tion. Suppose that there arepipeline stages and the packet
is cut into fragments, an optimal solution to the general
EMDVVP problem consists of voltage time functions
for and the size of fragment for
.
Power/energy is a convex function on supply voltage, so any
best voltage scheme will not change voltage during the trans-
mission of a fragment on a single stage. That is, we have the
following lemma.
Lemma 5.1: In an optimal solution, the supply voltage
changes either on the arrival of a new fragment or on the
completion of transmission of the current fragment.
This outlines the shape of the voltage functions , which
are step functions with all possible discontinuous points at the
time when new fragment arrives or the current fragment leaves.
Therefore, we only need to determine ( ) constants
, the voltage for processor
to transmit fragment.
Lemma 4.2 synchronizes all processors on a fixed length
fragmentation such that no stage will congest or starve. We
can generalize this for variable fragment size:
Lemma 5.2:The optimal voltage scheme, for a given frag-
mentation, provides the lifetime of the th fragment on the
th stage such that for all and ,
the following holds:
(11)
This gives a recursive relationship among adjacent fragment’s
transmission time on adjacent stages. From the
such recursive formulas in (11), we can easily solve
’s.
Finally, there are two global constraints: the transmis-
sion deadline and the packet size . Therefore, there are
’s and ’s,
a total of ( ) constants, need to be determined.
We express the total energy consumption in terms of these
TABLE I
MYRINET GAM PIPELINE PARAMETERS
variables from (4) and the EMDVVP problem
becomes equivalent to finding the minimal of this function.
Applying the first-order condition, we will have a nonlinear
system with ( ) variables where the nonlinearity
comes from the nature of the power model.
Theorem 5.3:Given the number of fragments , the
EMDVVP problem with pipeline stages is reduced to solving
a nonlinear system with free variables.
Unlike the easy-to-implement pipeline systems with fixed
voltage on the same stage, system with variable voltage on
the same stage introduces many implementation challenges:
What is the most energy efficient way to change voltage?
With a dynamically changed supply voltage (and, therefore,
clock frequency), what is the system’s performance? The extra
hardware (e.g., the dc-dc switching regulator) that enables
the variable voltage also consumes power, how should the
solution change if we take this into consideration? Based on
a simplified model, Qu [18] describes how to dynamically
vary voltage to minimize the energy for a give task. The more
practical multiply supply voltage systems have been reported.
For example, the dual supply voltage media processor, graphic
controller LSI and a MPEG4 codec core. Many implementation
issues (placement, routing, synchronization, etc) and empirical
power reduction of the system have also been addressed [19],
[22].
VI. SIMULATION RESULTS
In this section, we report the results when applying our new
nergy minimization approach on several pipeline models,
in particular the Myrinet GAM pipeline that researchers in
Berkeley adopted to study the transmission latency minimiza-
tion by variable sized fragmentation [24].
Myrinet GAM pipeline consists of four stages, stage 0 copies
data on the sender host; stage 1 is the sender host DMA; the next
stage is an abstract pipeline stage of the network DMAs at both
end hosts and a receiver host DMA; stage 3 is the copy on the
receiver host. The parameters of this pipeline are given in Table I
[24]. The second column is the per-fragment overhead, the third
column is the per-kilobyte transmission time at the reference
supply voltage, the last column is the (normalized) reference
power for each stage at the reference supply voltage. We further
suppose there is a packet of fixed size being transmitted via
this network with various user-specified latency constraints and
let the threshold and reference supply voltages be 0.8 and 5 V,
respectively.
We first determine the energy dominant stage. As we
discussed in Section IV, energy consumption on each stage
is determined by the supply voltage which is proportional to
, where is a stage-independent constant.
(This is clear from (10) and the expression of in the
QU AND POTKONJAK: TECHNIQUES FOR ENERGY-EFFICIENT COMMUNICATION PIPELINE DESIGN 547
TABLE II
OPTIMAL FIXED SIZE FRAGMENTATION, VOLTAGE SCHEME AND THE NORMALIZED POWER CONSUMPTION
FOR MYRINET GAM PIPELINE WITH CONSTANT VOLTAGE ON EACH STAGE
TABLE III
ENERGY REDUCTION ON MYRINET GAM PIPELINE OVER TRADITIONAL ENERGY MINIMIZATION TECHNIQUES
proof of Theorem 4.3.) Therefore, the larger the per-byte
transmission time is, the more energy is consumed. So is
the per-fragment overhead. In the Myrinet GAM pipeline, it
is clear that stage 2 is the dominant stage because it has both
the largest per-fragment overhead and the longest per-byte
transmission time.
After identifying the energy dominant stage, we can apply
Theorem 4.3 to decide the optimal packet fragmentation directly
from (7) which is reported in the second column of Table II.
Then we can compute the constant on the right-hand side of (6)
and calculate the supply voltage level for each stage from (1) and
(6). Finally, the power consumption can be obtained from (2).
We normalize it to the power consumption at the 5-V reference
supply voltage and details are shown in Table II.
To demonstrate the energy efficiency of the new approach,
we compare the above result with the traditional energy mini-
mization techniques, namely minimal supply voltage selection
and system shut-down. We report our power/energy saving over
these techniques in Table III.
The minimal supply voltage selection method computes
the minimal voltage that can meet the transmission deadline
and applies it to the (fixed-voltage) processors on all stages.
In this case, such optimal voltage is the one that we use for
stage 2. Columns 2–5 in the top half of Table III give the
energy saving of the new approach over the best (voltage-)
configured fixed-voltage system on each individual stage. An
average of 92.3%, 22.6%, and 90.7% power/energy reduction
on the three pipeline stages, excluding the dominant stage 2,
respectively, is achieved. At both end hosts (stages 0 and 3),
significant amounts of power/energy are saved because of the
high transmission speed at these two stages (see Table I). Stage
1 has the same per-byte transmission time as stage 2, however,
it has a smaller per-fragment overhead, so we can lower the
upply voltage (as shown in Table II) and this little difference
in the per-fragment overhead results in a more than 22%
power/energy saving. There is no saving from stage 2 because
this approach uses the same voltage on stage 2 as our approach.
If we use systems with a fixed 5–V voltage, the energy
dominant stage 2 becomes the bottleneck as it has the largest
per-fragment overhead and the slowest transmission speed.
For a tight 200 s latency, it fails to meet the transmission
deadline. The use of variable voltage processor solves this
problem since we can speed up the bottleneck stage by applying
a higher voltage as indicated in Table II. Columns 2–5 in the
bottom half of Table III show the power/energy saving for
loose latency constraints. The average saving is almost 85%
and we save nearly 70% from the energy dominant stage.
The system shut-down technique shuts the system (or some
components of the system) down when the system is idle to
save energy. We compare our approach with an ideal system
shut-down technique that shuts the system down whenever
there is no processing load and turns the system back on when-
ever necessary and there is no overhead associated with system
shut-down and wake-up. Because energy consumption is the
product of power and execution time, it becomes necessary to
distinguish power and energy consumption when the system
shut-down technique is applied. Basically, reducing supply
voltage saves power and energy consumption, but not at the
same rate since low voltage results in long execution time to
complete the same amount of workload. In our simulation, we
assume that the system shuts down to save energy when idle,
either waiting for packet from the previous stage or waiting for
he acknowledge from the next stage. In this case, our approach
saves more than 85% energy on both end hosts and 56% and
548 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002
TABLE IV
PIPELINE PARAMETERS AND ENERGY REDUCTION ONADPCM, MC UNIT AND RLS PIPELINES(THE ENERGY DOMINANT STAGES ARE INBOLD)
50%, respectively, on stages 1 and 2. This gives a total energy
saving of almost 70% over the fixed 5-V system combined
with the system shut-down technique. When both minimal
supply voltage selection and system shut-down are applied, our
approach achieves an average 39.3% energy saving. Detailed
energy saving on each stage is reported in the right half of
Table III.
Comparing the four blocks in Table III, one can see that
both minimal supply voltage selection and system shut-down
techniques can save system’s power/energy consumption. Their
combination, the best fixed-voltage system with shut-down in
the top right block, is capable of reducing more than half of the
energy consumed by the fixed 5-V system without shut-down.
Our approach can save 39.3% more on top of this. Furthermore,
energy saving is mainly determined byand ; loose latency
results in less energy saving over the minimal supply voltage
selection method and more energy saving over the fixed 5-V
system.
We have constructed several other communication pipelines
using the Hyper tool and Table IV shows the pipeline parame-
ters and our simulation results. The three systems, ADPCM, MC
Unit, and RLS, have three, five and three pipeline stages, respec-
tively. The energy dominant stages are marked in bold. The
and columns are the same as before.column is the rel-
ative power consumption. We simulate the transmission under
different latency constraints and the optimal voltages for each
pipeline stage are reported. The last two columns show the en-
ergy saving on each stage over constant supply voltage without
and with the system shut-down technique. On most stages, we
see significant (close to or more than 90%) energy saving. The
last row of each pipeline system gives the total energy saving
over all the pipeline stages when the relative power consump-
tion is considered. For ADPCM, we are able to save 28%
even when system shut-down technique is applied. However, for
the other two systems, the energy savings are less than 5%. The
reason is that in these systems the energy dominant stages con-
sume large portion, e.g., almost 97% on stage 2 in RLS, of the
system’s total energy. Therefore, our technique is especially ef-
ficient for pipeline systems where the nondominant stages also
contribute significant to the total energy consumption.
VII. CONCLUSION
In this paper, we address the problem of how to minimize
the power consumption in system-level pipelines under latency
constraints. In particular, we exploit advantages provided by
variable voltage design methodology to optimally select speed
and therefore voltage of each pipeline stage. We define the
problem and solve it optimally under realistic and widely
accepted assumptions. We apply the obtained theoretical results
to develop algorithms for power minimization of computer and
communication systems. We direct our discussion in detail in
two specific cases: 1) the packet has to be equally fragmented
and the supply voltage on a stage cannot be changed; and 2) both
the size of the fragment and the voltage are variables. We derive
an explicit formula for the first case and transform the latter to
the problem of finding the minimum of a nonlinear function.
The simulation with real-life pipeline parameters shows that
even with the former approach, significant power reduction
is possible without additional latency.
ACKNOWLEDGMENT
The authors thank the editor-in-chief and anonymous re-
viewers for their valuable comments and suggestions.
REFERENCES
[1] T. E. Anderson, M. D. Dahlin, J. M. Neefe, and D. A. Pattersonet al.,
“Serverless network file systems,”ACM Trans. Comput. Syst., vol. 14,
no. 1, pp. 41–79, Feb. 1996.
[2] T. D. Burd and R. W. Brodersen, “Processor design for portable sys-
tems,”J. VLSI Signal Processing, vol. 13, no. 2–3, pp. 203–221, Aug.
1996.
[3] T. D. Burd, T. Pering, A. Stratakos, and R. Brodersen, “A dynamic
voltage-scaled microprocessor system,” inIEEE Int. Solid-State Circuits
Conf., vol. 466, Feb. 2000, pp. 294–295.
[4] T. D. Burd and R. W. Brodersen, “Design issues for dynamic issues
scaling,” inInt. Symp. Low Power Electron. Design, July 2000, pp. 9–14.
QU AND POTKONJAK: TECHNIQUES FOR ENERGY-EFFICIENT COMMUNICATION PIPELINE DESIGN 549
[5] A. P. Chandrakasan, S. Sheng, and R. W. Broderson, “Low-power
CMOS digital design,”IEEE J. Solid-State Circuits, vol. 27, no. 4, pp.
473–484, 1992.
[6] J. M. Chang and M. Pedram, “Energy minimization using multiple
supply voltages,” inInt. Symp. Low Power Electron. Design, 1996, pp.
157–162.
[7] B. N. Chun, A. M. Mainwaring, and D. E. Culler, “Virtual network trans-
port protocols for Myrinet,”IEEE Micro, vol. 18, no. 1, pp. 53–63, Jan.
1998.
[8] R. Gonzalez and M. Horowitz, “Energy dissipation in general purpose
microprocessors,”IEEE J. Solid-State Circuits, vol. 31, pp. 1277–1284,
Sept. 1996.
[9] K. Govil, E. Chan, and H. Wasserman, “Comparing algorithms for dy-
namic speed-setting of a low-power CPU,” inACM Int. Conf. Mobile
Comput. Networking, Nov. 1995, pp. 13–25.
[10] I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava,
“Power optimization of variable-voltage core-based systems,”IEEE
Trans. Comput.-Aided Design, vol. 18, pp. 1702–1714, Dec. 1999.
[11] M. Horowitz, “Low power processor design using self-clocking,” in
Workshop on Low-Power Electronics, Aug. 1993.
[12] H. A. Jamrozik, M. J. Feeley, G. M. Voelker, and J. Evanset al.,
“Reducing network latency using subpages in a global memory
environment,” inProc. Int. Conf. Architectural Support Programming
Languages Operating Systems, vol. 31, Sept. 1996, pp. 258–267.
[13] V. Von Kaenel, P. Macken, and M. G. R. Degrauwe, “A voltage reduction
technique for battery-operated systems,”IEEE J. Solid-State Circuits,
vol. 25, pp. 1136–1140, Oct. 1990.
[14] Y. R. Lin, C. T. Hwang, and A. C. Wu, “Scheduling techniques for vari-
able voltage low power designs,”ACM Trans. Design Automat. Electron.
Syst., vol. 2, no. 2, pp. 81–97, 1997.
[15] P. Macken, M. Degrauwe, M. Van Paemel, and H. Oguey, “A voltage
reduction technique for digital systems,” inProc. IEEE Int. Solid-State
Circuits Conf., Feb. 1990, pp. 238–239.
[16] D. Mosberger and L. L. Peterson, “Making paths explicit in the Scout
operating system,” inProc. USENIX Symp.n Operating Systems Design
Implementation, Oct. 1996, pp. 28–31.
[17] W. Namgoong, M. Yu, and T. Meng, “A high-efficiency variable-voltage
CMOS dynamic dc-dc switching regulator,” inProc. IEEE Int. Solid-
State Circuits Conf., vol. 489, Feb. 1997, pp. 380–381.
[18] G. Qu, “Scheduling Problems for Reduced Energy on Variable Voltage
Systems,” Master Thesis, Comput. Sci. Dept., Univ. California, Los An-
geles, 1998.
[19] G. Qu and M. Potkonjak, “Techniques for Energy-Efficient Communica-
tion Pipeline Design,” Univ. Maryland Inst. Advanced Computer Studies
(UMIACS), Tech. Rep. UMIACS-TR-2002-16, 2002.
[20] S. Raje and M. Sarrafzadeh, “Variable voltage scheduling,” inInt. Symp.
Low Power Design, 1995, pp. 9–14.
[21] A. J. Stratakos, S. R. Sanders, and R. W. Brodersen, “A low-voltage
CMOS dc-dc converter for a portable battery-operated system,” inProc.
Power Electronics Specialist Conf., vol. 1, June 1994, pp. 619–626.
[22] K. Usami and M. Igarashi, “Low-power design methodology and ap-
plications utilizing dual supply voltages,” inProc. Asia South Pacific
Design Automation Conf., Jan. 2000, pp. 123–128.
[23] G. M. Voelker, H. A. Jamrozik, M. K. Vernon, and H. M. Levyet al.,
“Managing server load in global memory systems,” inProc. ACM Int.
Conf. Measurement Modeling Computer Systems (SIGMETRICS 97),
vol. 25, June 1997, pp. 127–138.
[24] R. Y. Wang, A. Krishnamurthy, R. P. Martin, T. E. Anderson, and D.
E. Culler, “Modeling communication pipeline latency,” inProc. Joint
Int. Conf. Measurement Modeling Computer Systems (SIGMETRICS
’98/PERFORMANCE’98), 1998, pp. 22–32.
[25] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for re-
duced CPU energy,” inProc. USENIX Symp. Operating Systems Design
Implementation, Nov. 1994, pp. 13–23.
[26] G.-Y. Wei and M. Horowitz, “A low power switching power supply for
self-clocked systems,” inProc. Int. Symp. Low Power Electronics De-
sign, August 1996, pp. 313–317.
[27] M. Welsh, A. Basu, and T. von Eicken, “ATM and fast Ethernet network
interfaces for user-level communication,” inProc. 3rd Int. Symp. High-
Performance Computer Architecture, 1997, pp. 332–342.
[28] A. Wolfe, “Issues for low-power CAD tools: A system-level design
study,”ACM Trans. Design Automat. Embedded Syst., vol 1, no. 4, pp.
315–332, October 1996.
[29] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reduced
CPU energy,”Proc. IEEE Annual Foundations Computer Science, pp.
374–382, Oct. 1995.
Gang Qu received the B.S. and M.S. degrees in mathematics from the
University of Science and Technology of China in 1992 and 1994 and the
M.S. and Ph.D. degrees in computer science from the University of California,
Los Angeles (UCLA), in 1998 and 2000.
Since 2000, he has been with the University of Maryland, College Park, where
he is currently an Assistant Professor in the Department of Electrical and Com-
puter Engineering and Institute of Advanced Computer Studies. His research
interests include intellectual property reuse and protection, low-power system
design, applied cryptography, and sensor networks.
Dr. Qu won the Outstanding Master of Science Award in 1998 from the Henry
Samueli Engineering School, UCLA, the 36th Design Automation Conference
Graduate Scholarship Award in 1999, and the ACM SIGMOBILE MobiCom
Best Student Paper Award in 2001.
Miodrag Potkonjak received the Ph.D. degree in electrical engineering and
computer science from University of California, Berkeley, in 1991.
In 1991, he joined C&C Research Laboratories, NEC USA, Princeton, NJ.
Since 1995, he has been with the University of California, Los Angeles (UCLA),
where he is a Professor in the Computer Science Department. His research inter-
ests include communication systems design, embedded systems, computational
security, and practical optimization techniques.
Dr. Potkonjak received the NSF CAREER award, the OKAWA Foundation
Aaward, the UCLA TRW SEAS Excellence in Teaching Award, and a number
of best paper awards.
