An Accelerator-Based Wireless Sensor Network Processor in 130 nm CMOS by Hempstead, Mark et al.
An Accelerator-Based Wireless Sensor
Network Processor in 130 nm CMOS
The Harvard community has made this
article openly available.  Please share  how
this access benefits you. Your story matters
Citation Hempstead, Mark, David Brooks, and Gu-Yeon Wei. 2011. “An
Accelerator-Based Wireless Sensor Network Processor in 130 Nm
CMOS.” IEEE J. Emerg. Sel. Topics Circuits Syst. 1 (2) (June): 193–
202. doi:10.1109/jetcas.2011.2160751.
Published Version doi:10.1109/JETCAS.2011.2160751
Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:27770104
Terms of Use This article was downloaded from Harvard University’s DASH
repository, and is made available under the terms and conditions
applicable to Open Access Policy Articles, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-
use#OAP
An accelerator-based wireless sensor network processor in 130nm CMOS
Mark Hempstead, Gu-Yeon Wei, David Brooks
School of Engineering and Applied Sciences, Harvard University
{mhempste, guyeon, dbrooks}@eecs.harvard.edu
We have implemented a system architecture for wireless sen-
sor network nodes in 130nm CMOS. It operates at 550 mV and
12.5 MHz. Our system uses 100x less power when idle than a tra-
ditional microcontroller, and 10-600x less energy when active.
It achieves energy efficiency by using a small event processor,
heterogeneous hardware accelerators, and application-controlled
VDD gating.
Networks of ultra-low-power nodes that include sens-
ing, computation, and wireless communication have applica-
tions in medicine, science, industrial automation, and security.
System-on-chip (SoC) implementations of such nodes can pro-
vide both energy efficiency and adequate performance to meet the
long deployment lifetimes and bursts of computation that char-
acterize wireless sensor network (WSN) applications. Proposed
SoC for WSNs typically rely on general-purpose microcon-
trollers as the main compute engine and often run in subthreshold
to minimize energy [4]. Unfortunately, subthreshold operation in-
creases susceptibility to on-die parameter variations, limits the
performance needed for real-time applications, and requires cus-
tom SRAM design [2]. In order to accommodate the wide va-
riety of computing needs in WSNs while minimizing energy
consumption, we propose an accelerator-based system architec-
ture. Our design fully embraces the accelerator-based computing
paradigm, including acceleration for the network layer (rout-
ing) and application layer (data filtering). Moreover, our archi-
tecture can disable the accelerators via VDD-gating to minimize
leakage current during the long idle times common in WSN ap-
plications.
We target a class of habitat monitoring WSN applications that
aim for long deployment lifetimes and incorporate data filtering
and multihop routing on the nodes. Specifically, this architec-
ture was informed by the volcano monitoring system deployed by
Werner-Allen et al. [1]. In that system, nodes sampled both seis-
mic and infrasound signals and use an exponentially weighted
moving average (EWMA) filter to detect interesting events and
transmit data back to a team of vulcanologists.
Fig. 1 (a) presents a block diagram of the prototype chip.
The Event Processor (EP) is a small programmable state machine
that runs interrupt service routines (ISRs) to control the flow of
data between the on-chip memory and multiple accelerators, such
as the message processor, programmable data filter, and timer,
which are memory mapped and connected via the system bus [3].
The EP also acts as a power manager, turning accelerators on and
off as needed by the running application. While the system also
includes an 8-bit general-purpose microcontroller to handle in-
frequent and irregular tasks, it can usually be disabled. During
long idle times, only the EP—and perhaps select blocks such as
P
O
W
ER
 C
O
N
TR
O
L
D
A
TA
A
D
D
R
E
S
S
IN
TE
R
R
U
P
T
Event
Processor
Interrupt 
Processing/Power 
Management
(Regular Events)
Power Enable Lines
Off Chip Bus 
Signals
Bus Signals
uController
General 
Processing
(Irregular Events)
(a) System Block Diagram
A 680pJ/task Processor for Sensor Network Applications in 130nm CMOS 
 
Mark Hempstead, David Brooks, Gu-Yeon Wei 
Harvard University School of Engineering and Applied Science 
33 Oxford St. Cambridge, MA 02138 USA 
{mhempste, dbrooks, guyeon}@eecs.harvard.edu 
 
Abstract: This paper presents a low power processor designed 
specifically to address event driven computation and long idle 
times that characterize wireless sensor network workloads. 
The system employs application specific hardware accelerators 
and fine-grained VDD-gating. We present active, idle, and 
gated power measurements of our 130 nm prototype system 
across voltage and frequencies. The system consumes 680pJ 
for a typical WSN task at 550mV and 12.5 MHz and for 
low-throughput workloads VDD-gating reduces energy 
consumption by 9x. 
Keywords: Wireless Sensor Networks, Low Power 
Introduction 
Networks of ultra low power nodes which include sensing, 
computation, and wireless communication have applications in 
medicine, science, industrial automation, and security. Ultra 
low power computation will extend battery life of these nodes 
and potentially create completely self sustainable networks by 
enabling energy scavenging. We propose a system architecture 
that departs from traditional general-purpose computing 
architectures and is specifically tailored for wireless sensor 
network (WSN) applications. Active power consumption is 
reduced through hardware based event handling and hardware 
acceleration of typical operations. Architecture simulations 
show that this system can complete certain WSN tasks in 
1/10th the number of cycles of traditional systems providing 
energy-efficiency during active mode [1]. With active power 
reduced, idle power dominates for low duty cycle WSN 
applications. The system architecture addresses leakage 
current by providing application control of block-level VDD- 
gating. The generalized accelerator architecture and 
fine-grained VDD-gating provide additional low-power 
opportunities compared to other systems which focus on 
acceleration of the radio stack [2] or rely on subthreshold 
operation [3].   
Architecture 
A key design goal of the system is to provide energy efficient 
processing for WSN applications while retaining flexibility 
and programmability.  Events are handled by the Event 
Processor (EP), a small programmable state machin . Memory 
mapped hardware accelerators are connected to the system bus.  
Accelerators are chosen to speedup typical computation found 
in WSN applications. The hardware accelerators provide the 
energy efficiency of application specific circuits and trigger 
interrupts wh n a c mputation is complete or an event, such as 
a timer or radio message, has arrived. The EP runs interrupt 
service routines (ISRs) which control the flow of data between 
the hardware accelerators and control the status of the 
VDD-gating transistors for each accelerator block. EP and 
hardware accelerators are not intended to execute infrequent 
WSN operations and a general-purpose microcontroller is 
included on the chip, but it is supply-gated most of the time. 
Our modular system architecture supports the inclusion of 
many different hardware accelerators depending on the 
requirements set by the SoC integrator. Figure 1 presents the 
block diagram of our prototype chip. The event processor is 
connected via the system bus to a set of example hardware 
accelerators – message processor, programmable data filter 
and timer subsystem. 
Fig. 1: System Block Diagram 
Implementation 
We implemented our test chip in 130 nm CMOS in 8 layers of 
metal using a semi-custom design flow. A die photo is shown 
in Figure 2. The system contains 444,982 transistors including  
4KB of foundry supplied SRAM.  All of the major blocks and 
system bus were synthesized from RTL using a standard cell 
library and placed and routed. We implemented a custom 
VDD-gate circuit which was attached to the synthesized blocks. 
Figure 3 displays the schematic of the VDD-gating circuit and 
the layout location of the circuit in relation to the filter block. 
SRAM2 SRAM1
Microcontroller
Message 
Processor
FilterEvent Processor
TimerTester
 
Fig 2: 130 nm test chip die photo (2mm x 2mm).  
Fig 3: VDD-gating circuit and layout 
Micro
Controller
Event 
Processor
Sy
st
em
 B
us
Interrupt
Power Ctrl
Addr/Data
SRAM
Message 
Processor
Data 
Filter
Timer
Addr/Data
Accelerators
Sy
st
em
 B
us
Vi
rt
ua
l V
D
D
G
lo
ba
l V
D
D
Ctrl LinesPower Enable(b) Die Photo
Figure 1. An Accelerator-based System for WSNs.
the t mer—must be powered. The tester I/O block facilitates test-
ing to verify functionality.
The chip was manufactured in a 130nm bulk CMOS process
with eight layer f metal. A die photograph is shown in Fig-
ure 1 (b). The system contains 444,982 transistors including 4KB
of foundry-supplied SRAM.
Event Processor SRAM Microcontroller Accelerators
0.001
0.01
0.1
1
10
100
Component
P
ow
er
 (µ
W
)
 
 
Active
Idle
Gated
Figure 2. Per Block Power Consumption 550 mV and 12.5 MHz.
Our first experimental measurements have verified reliable op-
eration across a range of lower clock frequencies—25 kHz to 12.5
MHz—that are suited to the low power needs of WSN applica-
tions. SRAM reliability limits the minimum operating voltage to
450mV. Fig. 2 plots the per-block power consumption of the sys-
tem, running custom microbenchmarks written to exercise each
block in three operating modes - active (12.5MHz@550mV), idle
(0MHz @550mV), and powered off (VDD-gated). VDD-gating
reduces the power consumption of individual blocks by 50-100x,
EWMA Filter Threshold Filter CAM MP Outgoing MP Irregular MP Regular
100
101
102
103
104
Routine
Cy
cle
 C
ou
nt
 
 
Microcontroller
Accelerator
(a) Cycle Count Comparison
48
High-intensity workloads
Without 
VDD-gating
With
VDD-gating
(b) Power vs. Workload
Figure 3. Performance and Power Benefits of Specialization.
which helps to minimize power consumption during long peri-
ods of inactivity. The event processor block cannot be VDD-gated
since it must always be available to handle interrupts.
In this talk, we compare our prototype to nine processors for
WSNs in the literature. Because the commonly used metric of
energy-per-instruction cannot be easily applied to accelerator-
based systems, we introduce the concept of energy-per-task. We
defined a task as a collection of dependent computations that are
executed periodically. We present measurements of a task that
is similar to the volcano monitoring application. This task takes
131 cycles to execute and consumes 678.9 pJ at 550 mV and 12.5
MHz. An equivalent routine written for the Mica2 mote requires
1532 instructions. Using this information we compute the energy
per equivalent instruction as 0.44 pJ, which is significantly lower
than systems in the literature – the lowest energy systems, gen-
eral purpose cores operating in subthreshold, consume 2-3 pJ per
instruction.
This analysis does not isolate the benefits of an accelerator-
based architecture from the process technology, circuit imple-
mentation, and amount of SRAM. Thus, we compare the cy-
cle count and energy of full applications running on accelerators
to running on the on-die general-purpose microcontroller. These
applications combine data filtering, outgoing message prepara-
tion, and flood-based message routing, which are prototypical
WSN routines. We analyze routines for data filtering (EWMA and
threshold), network routing using a CAM structure, recording an
outgoing message, detection of an incoming irregular message,
and automatic relay of a regular message. The on-die Z80 micro-
controller closely resembles 8-bit architectures employed in other
WSN SoCs. For fairness, all routines were written in assembly
and hand-tuned for accelerator- and microcontroller-based opera-
tion, respectively. Fig. 3 (a) presents the cycle count of each rou-
tine for both scenarios. Multiple points for a particular routine
reflect different inputs that yield different performances. Accel-
erator implementations see cycle speedups from 15 to 635x, di-
rectly translating in energy savings. In this talk, we show, through
measurements of energy consumption, that hardware accelerators
consume 1/10th to 1/600th the energy consumed by software-
based routines running on the microcontroller.
Building on individual characterizations above, we compare
compute-block power consumption for different workload re-
quirements and include idle power in our analysis. These results
exclude additional system power overheads (e.g., EP and SRAM)
common to both types of systems in order to clarify the compar-
ison. WSN workload intensity varies significantly depending on
the observed phenomena-from 1 task/minute for weather obser-
vations to > 105 tasks/second for high-frequency data collection.
Figure 3 (b) plots the average power consumption of rou-
tines run on either the accelerators or the microcontroller while
varying workload intensity. For each datapoint, the lowest power
voltage/frequency operating point was chosen, For light work-
loads (< 10 tasks/sec), the system can operate at the lowest volt-
age and frequency (450mV, 25 KHz) and power consumption is
dominated by leakage current. For medium-intensity workloads
( 104 tasks/sec), using accelerators provides 1000x power sav-
ings due to a 635x speedup in cycle counts and a 50% lower sup-
ply voltage. As workload increases, active power dominates un-
til the clock frequency required by the microcontroller reaches
the performance limit of the system at the maximum supply volt-
age of 1.2V. Routines run on the accelerator can operate up to 107
tasks per second with a voltage less than 1.1 V. Also shown in the
plot, VDD-gating lowers the power consumption for both scenar-
ios under light loads, but the accelerators’ higher inherent perfor-
mance enables VDD-gating for longer periods of time that trans-
late to additional power savings.
In conclusion, this accelerator-based system is well suited for
both high performance and low performance sensor network ap-
plications. The system provides efficient computation through
hardware acceleration for habitat monitoring applications. The
modular architecture and event processor enable the management
of idle power through VDD-gating.
References
[1] G. Werner-Allen et al. Fidelity and yield in a volcano monitoring sensor
network. In Symposium on Operating Systems Design and Implementation
(OSDI), November 2006.
[2] J. Kwong et al. A 65nm Sub-Vt, Microcontroller with Integrated SRAM and
Switched-Capacitor DC-DC Converter. In IEEE International Solid-State
Circuits Conference (ISSCC), February 2008.
[3] M. Hempstead et al. An ultra low power system architecture for sensor net-
work applications. In International Symposium on Computer Architecture
(ISCA), June 2005.
[4] S. Hanson et al. Exploring variability and performance in a sub-200-mv pro-
cessor. IEEE Journal of Solid-State Circuits, 43(4):881–891, April 2008.
