Signal-Processing-Driven Integrated Circuits for Energy Constrained Microsystems. by Khan, Osama U.
  
Signal-Processing-Driven Integrated 
Circuits for Energy Constrained 
Microsystems 
by 
Osama U. Khan 
 
 
 
 
A dissertation submitted in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
(Electrical Engineering) 
in The University of Michigan 
2014 
 
 
 
 
 
 
Doctoral Committee: 
 
Associate Professor David D. Wentzloff, Chair 
Professor Michael P. Flynn 
Associate Professor Anna C. Gilbert 
Professor Wayne E. Stark 
  
 
 
 
 
 
 
 
 
 
 
 
 
©   Osama U. Khan 
2014 
 
ii 
 
 
Acknowledgements 
In the name of Allah, the Most Gracious, the Ever Merciful: I would like to extend my 
appreciation to my parents Qamar Sultana & Ikram Ullah Khan and my siblings Wajahat Ullah 
Khan and Alvina Khan who were always there for me through thick and thin. I would also like to 
mention few names here, because without their help and support it wouldn’t have been possible 
to achieve this milestone today namely, Professor Mohammad Nauman (Late), Lecturer Amer 
Hashmi, Koshish foundation and their team. 
This research work is supported by the Signal Processing, Information Technology Program 
of the National Science Foundation under award number CCF-0910765. 
iii 
 
Table of Contents 
 
Acknowledgements ......................................................................................................................... ii 
List of Figures ................................................................................................................................ iv 
List of Tables ............................................................................................................................... viii 
Abstract .......................................................................................................................................... ix 
Chapter 1 Intrudocution ...................................................................................................................1 
1.1. WSN for Structural Health Monitoring (SHM) ............................................................ 5 
1.2. Background Study ....................................................................................................... 14 
1.3. Thesis contributions .................................................................................................... 18 
Chapter 2 Compressed Sensing .....................................................................................................21 
2.1. Matlab demonstration ................................................................................................. 23 
2.2. Sparse in Frequency .................................................................................................... 24 
2.3. Sparse in Time ............................................................................................................ 26 
Chapter 3 Compressed Sensing Analog Front-end for GHz ADCs ...............................................29 
3.1. Proposed Hardware ..................................................................................................... 33 
3.2. Design Decisions ........................................................................................................ 36 
3.3. Simulation Results ...................................................................................................... 40 
3.4. Prototype Chip ............................................................................................................ 44 
Chapter 4 2.4GHz Short-Range Zigbee Compatible RX with Near-Threshold Digital Baseband 52 
4.1. IEEE 802.15.4 Standard Specification........................................................................ 53 
4.2. Modulation and Spreading .......................................................................................... 55 
4.3. Adaptive Sampling...................................................................................................... 57 
4.4. Prototype Chip ............................................................................................................ 58 
4.5. Analog Front-End Building Blocks ............................................................................ 63 
4.6. Measured Results ........................................................................................................ 72 
Chapter 5 Near-Threshold Hardware Accelerator for Arbitrary Bayesian Networks ...................78 
5.1. Bayesian Networks ..................................................................................................... 81 
5.2. Clique Tree Example .................................................................................................. 85 
5.3. Distributed Intelligent Sensing ................................................................................... 86 
5.4. Proposed Hardware Accelerator ................................................................................. 88 
5.5. ALARM Network ....................................................................................................... 91 
5.6. Measured Results ........................................................................................................ 94 
Chapter 6 Conclusion .....................................................................................................................97 
References ....................................................................................................................................100 
 
iv 
 
List of Figures
 
Figure 1 Moore’s law visualization [3] ........................................................................................... 2 
Figure 2 WSN conceptual cartoon .................................................................................................. 2 
Figure 3 Application areas for WSNs  [6] [7] [8] [9] [10] [11] ...................................................... 4 
Figure 4 Lithium-Ion battery energy density trend [13] ................................................................. 5 
Figure 5 Wired sensor network conceptual cartoon ....................................................................... 7 
Figure 6 Sensor on Gi-Lu bridge .................................................................................................... 7 
Figure 7 Commercial sensor nodes ................................................................................................. 7 
Figure 8 Generic Wireless Sensor Node ......................................................................................... 9 
Figure 9 Mica2 Sensor node power profile [22] ........................................................................... 10 
Figure 10 Relative power consumption of a sensor node ............................................................. 12 
Figure 11 Comparison of energy consumed per bit in a wireless sensing node [36] ................... 13 
Figure 12 Transceiver block diagram [35].................................................................................... 14 
Figure 13 RF front-end block diagram [37] .................................................................................. 15 
Figure 14 Proposed current-reused folded-cascode front-end [38] .............................................. 16 
Figure 15 Receiver front-end. Only a single gain path is enabled at any time [39] ..................... 17 
Figure 16 Block diagram for the BFSK super-regenerative receiver [40].................................... 17 
Figure 17 Simplified CS block diagram ....................................................................................... 23 
Figure 18 CS illustrative example................................................................................................. 23 
Figure 19 Sparse in Frequency ..................................................................................................... 25 
Figure 20 Recovered sparse signal in Frequency ......................................................................... 25 
v 
 
Figure 21 Recovered signal in the Time domain .......................................................................... 26 
Figure 22 Sparse in the Time domain ........................................................................................... 27 
Figure 23 Recovered sparse signal in the Time domain ............................................................... 27 
Figure 24 Ultra Wide Band (UWB) signaling in frequency and-time domain [52] ..................... 30 
Figure 25 DSSS UWB transceiver block diagram [53] ................................................................ 31 
Figure 26 802.15.4a Compliant Fully Integrated UWB Transceiver [58] .................................... 32 
Figure 27 CS acquisition system................................................................................................... 33 
Figure 28 Hadamard transform ..................................................................................................... 34 
Figure 29 Block diagram of analog WHT front-end [62] ............................................................. 34 
Figure 30 MSE as a function of K. ............................................................................................... 37 
Figure 31 Improvement factor over Nyquist ADC ....................................................................... 38 
Figure 32 Improvement factor for Res = 5 bit. ............................................................................. 39 
Figure 33 Hadamard transform ..................................................................................................... 40 
Figure 34 Recovered pulse ............................................................................................................ 40 
Figure 35 Receiver architecture for waterfall curves .................................................................... 41 
Figure 36 BER curves for infinite resolution ................................................................................ 42 
Figure 37 BER curves for Res = 5 bit. .......................................................................................... 43 
Figure 38 Excess Eb/No vs. compression ratio for BER = 10
-3
 ..................................................... 44 
Figure 39 Block diagram [65] ....................................................................................................... 45 
Figure 40 Input S/H ...................................................................................................................... 46 
Figure 41 Timing diagram [65] ..................................................................................................... 47 
Figure 42 ZCBC based integrator [65] ......................................................................................... 48 
Figure 43 Measured Hadamard coefficients [65] ......................................................................... 49 
Figure 44 Recovered pulse [65] .................................................................................................... 49 
Figure 45 Integrator overshoot calibration ................................................................................... 50 
vi 
 
Figure 46 Die photo ...................................................................................................................... 51 
Figure 47 Schematic view of the PPDU [68]................................................................................ 54 
Figure 48 Format of the PPDU [68] ............................................................................................. 54 
Figure 49 Modulation and spreading functions for the O-QPSK PHYs [68] ............................... 54 
Figure 50 O-QPSK chip offsets [68] ............................................................................................ 56 
Figure 51 Sample baseband chip sequences (the zero sequence) with half-sine pulse shaping [68]
....................................................................................................................................................... 56 
Figure 52 a) Nyquist Sampling grid b) 50% samples on the Nyquist grid ................................... 57 
Figure 53 Probability of chip error rate ........................................................................................ 58 
Figure 54 Prototype chip block diagram ....................................................................................... 60 
Figure 55 Schematic of the analog front-end ................................................................................ 64 
Figure 56 LO buffer ...................................................................................................................... 65 
Figure 57 Single-balanced active mixer ....................................................................................... 65 
Figure 58 gm-C based active filter, first order section ................................................................. 66 
Figure 59 gm-C based active filter, second order section ............................................................. 67 
Figure 60  Gain stage .................................................................................................................... 67 
Figure 61 Buffer for driving Flash ADC ...................................................................................... 68 
Figure 62 Flash ADC comparator ................................................................................................. 69 
Figure 63 SR latch ........................................................................................................................ 69 
Figure 64 Digital baseband state diagram ..................................................................................... 70 
Figure 65 Digital baseband block diagram and adaptive sampling illustration ............................ 71 
Figure 66 Prototype chip die photo in 65nm CMOS .................................................................... 71 
Figure 67 Measured Gain, NF, IIP3, IIP2 of the RX front-end and Flash ADC spectrum .......... 73 
Figure 68 Measured Energy per bit profile along with simulated energy efficiency breakdown of 
the radio, Bit error rate and the radar plot of the system .............................................................. 74 
Figure 69 Measured baseband signal for -40dBm RF signal. ....................................................... 76 
vii 
 
Figure 70 Measurement setup and received RF packets ............................................................... 77 
 Figure 71 Machine Learning ........................................................................................................ 79 
Figure 72 Reducing wireless traffic through intelligent sensing .................................................. 80 
Figure 73 A simple Bayesian network [45] .................................................................................. 82 
Figure 74 Sum-product Algorithm [44] ........................................................................................ 83 
Figure 75 Factor product ............................................................................................................... 83 
Figure 76 Factor marginalization .................................................................................................. 84 
Figure 77 Factor reduction ............................................................................................................ 84 
Figure 78 Message propagation in the Student clique tree [44] ................................................... 85 
Figure 79 Proposed distributed processing ................................................................................... 87 
Figure 80 Block Diagram.............................................................................................................. 89 
Figure 81 State Diagram ............................................................................................................... 90 
Figure 82 ALARM Bayesian network [74] .................................................................................. 92 
Figure 83 Clique tree ALARM network ....................................................................................... 93 
Figure 84 Resolution, accuracy and area tradeoff ........................................................................ 94 
Figure 85 Energy distribution for ALARM network .................................................................... 95 
Figure 86 Die photo in 65nm CMOS ............................................................................................ 96 
viii 
 
List of Tables
Table 1. Energy harvesting potential sources [14] .......................................................................... 6 
Table 2 Performance comparison of recently published low-power radios in JSSC .................... 18 
Table 3 Frequency band and data rate for 2.45GHz O-QPSK PHY [68] ..................................... 53 
Table 4 Symbol-to-chip mapping for the 2450 MHz band [68] ................................................... 55 
Table 5 Rx Target Specifications .................................................................................................. 60 
Table 6 Summary and comparison table ....................................................................................... 75 
Table 7 ALARM network energy profile ..................................................................................... 95 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ix 
 
Abstract 
The exponential growth in IC technology has enabled low-cost and increasingly capable 
wireless sensor nodes which provide a promising way forward to realize the vision of a trillion 
connected sensors in the next decade. However there are still many design challenges ahead to 
make these sensor nodes small, low-cost, secure, reliable and energy-efficient to name a few. 
Since the wireless nodes are expected to operate on a limited energy source or in some cases on 
harvested energy, the energy consumption of each building block is of prime importance to 
prolong the life of a sensor node. It has been found that the radio communication when active has 
been one of the highest power consuming modules on a sensor node. Low-energy protocols, e.g. 
processing the raw sensor data on-node, are more energy efficient for some applications as 
compared to transmitting the raw data over a wireless channel to a cloud server. 
In this thesis we explore signal processing techniques to realize a low power radio solution for 
wireless communication. Two prototype chips have been designed and their performance has 
been evaluated. The first prototype chip exploits compressed sensing for Ultra-Wide-Band 
(UWB) communication. UWB signals typically require a high ADC sampling rate in the receiver 
which results in high power consumption. Compressed sensing is demonstrated to relax the ADC 
sampling rate to save power. The second prototype chip exploits the sensitivity vs. power trade-
off in a radio receiver to achieve iso-performance at lower power consumption and the time-
varying wireless channel characteristics are used to adapt the sampling frequency of the receiver 
x 
 
based on the SNR/Link quality of the communication channel, saving power, while maintaining 
the desired system performance. 
It is envisioned that embedded machine learning will play a key role in the integration of 
sensory data with prior knowledge for distributed intelligent sensing which might enable reduced 
wireless network traffic to a cloud server. A Near-Threshold hardware accelerator for arbitrary 
Bayesian network was designed for clique-tree message passing algorithm used for probabilistic 
inference. The hardware accelerator was benchmarked by the mid-size ALARM Bayesian 
network with total energy consumption of 76nJ for 250µS execution time. 
 
  
1 
 
Chapter 1 
Introduction 
Integrated Circuits (ICs) have come a long way from their days of inception. The tremendous 
growth in IC technology has played a huge role in shaping the way we live our lives today. For 
the past several decades this growth has been accurately predicted by Gordon Moore, also known 
as Moore’s Law stating that the number of transistors on a chip will double approximately every 
two years [1]. This improvement in the number of transistors on a chip (also known as 
integration density) is not the only metric of ICs that has been steadily improving about every 
two years. ICs are becoming cheaper, faster, lower power, and smaller in size with even greater 
functionality. All of these improvements have been made possible by the semiconductor 
industry’s ability to shrink the minimum feature size used to make integrated circuits [2]. es 
shows the trend for Intel’s microprocessor transistor count over several decades and draws 
analogous comparisons if transistors were people to get a better perspective of transistor scaling 
over time [3]. 
 
2 
 
 
This improvement trend in IC technology has opened doors for many application areas which 
were previously considered not suitable either due to economic reasons or an IC solution was 
simply inadequate. One such example is emergence of low-power, low-cost, multi-functional 
miniature sensor devices which, when combined with a radio transceiver and a microcontroller 
results in a small form-factor sensor node. These sensor nodes, when networked wirelessly, form 
a Wireless Sensor Network (WSN) [4]. The basic concept of a WSN is shown in Figure 2.  
 
 
Wireless Link
Sensor Node
Figure 1 Moore’s law visualization [3] 
Figure 2 WSN conceptual cartoon 
3 
 
 
The WSNs are envisioned to bring about a new wireless revolution and the concept of Internet 
of Things (IOT) to reality [5]. Figure 3 shows some of the application areas of the WSNs. The 
WSNs have diverse applications ranging from medicine [6], agriculture [7], military [8], 
structural health monitoring [9], automotive [10], inventory tracking, smart home and security to 
name a few [9] [11]. In all these application areas the basic idea is the same which is to use 
appropriate sensors and use the WSN sensing and computational capabilities to monitor and 
analyze various markers. The data collected can be processed online or offline depending on the 
application. Ideally a WSN can not only monitor but is also desired to have application-driven 
actuation capabilities. For example, in the field of medicine, sensors can be placed on a subject 
for either invasive or noninvasive monitoring to collect and analyze various bio-markers [12]. In 
case an abnormality is observed, the WSN can take appropriate action by either alerting the 
physician in time or in an ideal case can even administer treatment in a completely autonomous 
and unsupervised environment. 
One of the hurdles in making the WSNs successful is the slow improvement in battery 
technology. While IC technology was improving exponentially, battery technology is not able to 
keep up with it.  
4 
 
 
 
Figure 4 shows the improvement trend in energy density for a Lithium-Ion battery along with 
a data point for the current state of the art [13]. As can be seen, a less than 7% per year 
improvement in the energy density is observed. The energy capacity of a battery is also limited 
by safety and cost. This, along with the fact that the nodes in a WSN are of small form factor, 
exacerbate the problem as it imposes limitations on the battery size and thereby reduces the total 
amount of energy available for a sensor node to operate. If a sensor node were to operate solely 
on a battery, then this puts a constraint on the life of a sensor node before it runs out of battery 
energy.  
Precision Agriculture Civil Infrastrcuture
Medical
Micro robotics
AutomotiveAssets Tracking
Figure 3 Application areas for the WSNs  [6] [7] [8] [9] [10] [11] 
5 
 
It is highly desirable, and often times critical for an application, that the nodes in a WSN have 
a long life span. To tackle this challenge, it is essential that a holistic approach is taken to reduce 
power of each building block on a sensor node. This calls for innovative circuit design 
techniques and architectures from the circuit designers’ community. Another approach to 
improve sensor node’s life span is to tap into the ambient power sources to harvest energy and 
reduce battery dependency. Table 1 shows the power density of potential power sources for 
energy harvesting [14]. Energy harvesting at high conversion efficiencies is still an active area of 
research. The improved circuit designs when combined with energy harvesting techniques is a 
promising way forward to achieve the goal of extending the life span of the sensor nodes in a 
WSN.  
 
 
 
 
 
 
 
Figure 4 Lithium-Ion battery energy density trend [13] 
1.1. WSN for Structural Health Monitoring (SHM) 
As mentioned briefly before, Structural Health Monitoring (SHM) is one of the emerging 
application areas for the WSNs. SHM encompasses, but is not limited to bridges [15], buildings, 
dams [16], pipelines [17], aircrafts [18], ships [19], automotive and robotics [16] [20]. A WSN is 
intended to predict the point of failures of these complex engineering structures as they are 
6 
 
exposed to harsh loading scenarios and severe environmental conditions not anticipated during 
the design phase.  They also evaluate the long-term effect of these conditions on structural 
deterioration [16].  
Table 1. Energy harvesting potential sources [14] 
Power source Power 
Solar Outside/Inside 15000/10 µW/cm
2
 
Air flow 380 µW/cm
3
 
Vibrations 375 µW/cm
3
 
Human power 330 µW/cm
3
 
Temperature (5 
o
C gradient) 40 µW/cm
2
 
Pressure variation 17 µW/cm
3
 
 
Some of these structures are already using some sort of structural health monitoring by 
utilizing a mesh of sensors wired to a central data repository unit as shown in a conceptual 
cartoon in Figure 5 [16]. This approach works but has several drawbacks and shortcomings 
which can be addressed by a WSN. One such drawback is that the cost per node of a wired 
sensor network is much greater than for a wireless sensor network mainly because of the 
installation and maintenance cost of extensive lengths of coaxial wires [16]. This not only results 
in increasing the total cost of the sensor infrastructure but also limits the density of sensor nodes 
that can be placed on the structure. In a WSN, wireless connectivity between nodes greatly 
reduces the cost per sensor while allowing more flexibility in sensor placement. As a 
consequence, sensor density on a structure can also be increased significantly. This is important 
for structural health monitoring as more local measurements are available for the damage 
detection algorithms, the more accurately they can evaluate the structural integrity [16]. 
7 
 
 
 
 
 
 
 
 
 
 
Figure 6 shows an example in an academic study where a wireless sensor is placed on the Gi-
Lu bridge in the Nan-Tou county Taiwan, to collect and analyze ambient vibration data [21]. 
Figure 7 shows two additional examples of commercially available sensor nodes. Although a 
WSN is a step forward for the current state of the art in SHM, it still faces some key challenges 
e.g. (a sensor node’s battery life) that need to be addressed before the technology can be 
mainstreamed. 
 
 
 
Central Data 
Repository
Sensor Node
Wired Link
Figure 5 Wired sensor network conceptual cartoon 
Intel Imote2Crossbow MICA2 Mote
Figure 6 Sensor on Gi-Lu bridge Figure 7 Commercial sensor nodes 
8 
 
To reduce total power consumption of a sensor node, each sub-component’s impact on total 
power must be evaluated. Figure 8 shows a generic wireless sensor node along with its 
constituent components. It is observed that when a sensor node’s radio communication is active 
it is the dominant power consuming block [22]. There are several factors that make radio 
communication expensive in terms of power. First, the radio front-end is typically a linear analog 
continuous time system. Linear analog continuous time systems require constant bias currents. 
Second, as with any analog system, there exists complex trade-offs and power is traded with 
different performance parameters [23]. Third, the received radio signal amplitude at the antenna 
can be very small. In the worst case, the received radio signal amplitude can range anywhere 
from few µV to pV. This usually calls for significant signal gain at RF frequencies to overcome 
the noise added by the signal processing and demodulation circuits. Given the fact that gain at 
RF frequencies is expensive in terms of power [24] puts a limitation on lowering the total power 
consumption of the radio front-end. Experimental studies have corroborated this fact. Figure 9 
shows the measured current consumption of a Mica2 sensor node for transmitting a single radio 
message at maximum transmit power [22]. 
Commercial low power radios for wireless communication consume power in the few mW 
ranges. The Texas Instruments (TI) low-power low-cost RF transceiver for the ISM band (part 
number CC2500) consumes 24mW in receive (Rx) mode at a 250KBaud rate while it consumes 
39mW in transmit (Tx) mode at +1dBm output power [25]. Another TI radio transceiver 
example is the 2.4GHz Bluetooth low energy system on chip CC2540. It consumes 59mW at 
1Mbps in Rx mode and 81mW in Tx mode at 0dBm output power [26]. Analog Devices 
ADF7242 is a low power transceiver for the 2.4GHz ISM band. It consumes 34.2mW in Rx 
mode at 250Kbps and 35.3mW in Tx mode at 0dBm output power [27]. 
9 
 
 
Commercial communication radio power when compared with an on-board micro-controller 
is 1 to 3 orders of magnitude higher, depending on the amount of processing. For example one of 
the industry’s lowest power microcontrollers is the PIC12LF1822 from Microchip, whose active 
mode power consumption is 30µW/MHz [28]. 
 
 
Figure 8 Generic Wireless Sensor Node 
 
Power 
Management
Microcontroller
Wakeup/
Controller/
Timers
Energy Constraint SoC
Sensors/Signal 
Conditioning
Memory
Radio 
Transceiver
Antenna
Power Sources
Wireless Comm.
Computation
10 
 
 
Figure 9 Mica2 Sensor node power profile [22] 
  
Another important block on a sensor node is the Timer. To extend the battery life of a sensor 
node, the radio communication is typically duty cycled and a wireless sensor node is expected to 
stay in the sleep mode most of the time. Many applications can tolerate relatively long sleep 
times [29]. However while a wireless node is in the sleep mode it is imperative that it doesn’t 
lose time synchronization with its neighboring nodes if it desires to communicate with them 
when awoken. This is important because if a wireless node comes out of sleep mode and there is 
no other node nearby and also awake, then wireless communication can’t happen, which 
degrades the overall performance of the network. So in order to improve the performance of the 
network it is important that wireless nodes in a network maintain some level of synchronization 
[29]. This means that while the sensor node is asleep the timer on the node is continuously 
running. This requires that the power consumption of the timer should be as low as possible 
while maintaining the desire accuracy of synchronization. An additional benefit of an accurate 
low power timer is that the wireless nodes can stay asleep longer for a fixed timing error 
11 
 
tolerance to keep the average power low [29]. The power consumption of high precision timers is 
easily in the few mW ranges [29] [30], however real-time clocks are now commercially available 
in the 50nW range [31]. 
An alternative approach for duty cycling the communication radio using complex 
synchronization protocols is to use a wakeup radio [32]. The idea is to use an ultra-low power 
always-on radio which continuously listens for a wakeup signal over the communication 
channel. In the event of detecting a wakeup signal the main communication radio is interrupted 
to come out of its sleep mode. It is clear that the wakeup radio power consumption should be 
made as small as possible while maintaining good sensitivity. Wakeup radio is still an active area 
of research and current state of the art wakeup radios offers sensitivities around -70dBm with 
total power consumption in few tens of µW [33] [32] [34]. 
To put the power numbers into perspective Figure 10 shows the relative power consumption 
of various sub-components on a sensor node. It is clear that efforts should be made to reduce the 
power of the communication radio and a sensor node’s radio should be turned off most of the 
time and be active only for short intervals of time; i.e. the communication radio should be 
heavily duty cycled and its leakage power should be minimized.  
12 
 
 
Figure 10 Relative power consumption of a sensor node 
 
To transmit and receive a single bit, even the lowest power transceivers designed for short 
range communication typically requires tens to hundreds of nJ whereas energy consumption per 
ADC conversion or microprocessor instruction of just a few tens of pJ has been demonstrated 
[35]. Figure 11 shows a comparison of energy consumed per bit broken down by task in a 
wireless sensing node using current state-of-the-art hardware [36]. It is clear that it requires less 
power to process 1 bit of data on a sensor node than to wirelessly transmit it. Therefore efforts 
should be made to take advantage of a sensor node’s local computational capabilities to process 
the raw time-history data instead of sending it over a wireless link. Priority should be given to 
process the data locally and transmit only the information content contained in the raw data 
which is needed for global structural health monitoring [16]. For example, an FFT can be 
computed locally on a sensor node on the raw time-history sensor data and only the most 
significant spectral components’ information can be transmitted over the wireless channel. 
 
Comm Radio
Sensor
Processor
1 mW
1 µW
1 nW
Timer WU Radio
13 
 
 
Figure 11 Comparison of energy consumed per bit in a wireless sensing node [36] 
 
To get some perspective let’s assume that a sensor node’s radio is to survive on a single AA 
Alkaline battery of capacity 2500mAh for 5 years. If the radio is duty cycled at 1% (average 
~15mins per day) this corresponds to (        )               of operation. This 
results in 5.7mA of active current for the radio to operate on, assuming no other components on 
the sensor node consume any power. If an average voltage of 1.2V is assumed for the AA 
battery, this corresponds to 6.8mW of available average power for the radio. It is clear from the 
above discussion that current state of the art for commercial radios is shy of meeting these 
requirements. If let’s say the radio front-end energy efficiency is 5nJ/bit and the data rate is 
250Kbps then the radio would consume 1.3mW of average power which for a 1.2V supply 
would require about 1.1mA of average current from the battery. Such a radio receiver could 
easily last for 5 years on a single AA Alkaline battery when duty cycled at 1% (average ~15mins 
per day). 
 
14 
 
1.2. Background Study 
 In this section a background study of some of the recently published low-power radio 
receiver architectures is presented. The intention is to present the main ideas and techniques used 
to implement low-power radio receivers rather than going into the details of the implementation. 
A comparison table at the end summarizes the performance of the radios discussed. 
An ultra-low-power 2.4GHz transceiver for wireless sensor network is presented in [35]. The 
receiver front-end is fully passive and utilizes an on chip resonant matching network to achieve 
voltage gain and interface directly to a passive mixer. Figure 12 shows the proposed transceiver 
block diagram. The transceiver is demonstrated to have 1nJ per received bit and 3nJ per 
transmitted bit with 300µW transmit power and 7dB receiver noise figure. The transceiver uses 
binary FSK with a higher modulation index, trading spectral efficiency for low-power 
architecture. The receiver consumes only 330 µW of power. The achieved receiver sensitivity is 
not reported. 
 
Figure 12 Transceiver block diagram [35] 
 
In [37] a fully integrated 2.4GHz IEEE 802.15.4 compliant transceiver is presented. The 
receiver is a low-IF architecture for achieving high sensitivity at low power consumption while 
15 
 
avoiding the problems associated with direct conversion receivers. The achieved sensitivity is -
101dBm at 1% packet error rate. The receiver RF front-end consumes 10mW of power with 
5.7dB noise figure. Low power is achieved by avoiding IQ down-conversion, which requires 
power hungry LO buffers and IQ generation circuitry. In the proposed architecture passive poly-
phase filter (PPF) is used to down convert the signal by using only the real LO signal. Passive 
mixer is used to further reduce power. The signal loss due to PPF filter and passive mixer is 
compensated by using two cascaded LNA which use current reuse to save power. 
 
 
 
Figure 13 RF front-end block diagram [37] 
 
A low-power 2.4GHz current-reused receiver front-end is presented in [38]. The receiver is a 
heterodyne architecture with Intermediate Frequency (IF) of 10MHz. The receiver consumes 
500µW of power with 10.2dB of noise figure. The achieved RX sensitivity is -90dBm. Low 
power operation is achieved by the current-reused folded cascode front-end shown in  
16 
 
Figure 14. The achieved data rate is 100Kbps at 10
-3
 BER. The receiver front-end achieves 
5nJ/bit energy efficiency. 
An energy-efficient OOK transceiver for short-range wireless sensor networks is presented in 
[39]. The receiver architecture is based on non-coherent envelope detection (and hence no need 
for a local oscillator) and highly scalable RF front-end. Energy efficiency is achieved by 
leveraging un-tuned RF circuits compared to tuned RF circuits. The receiver consumes 2.6mW 
while achieving -65dBm sensitivity at a BER of 10
-3
. Spectral efficiency is traded for energy-
efficiency. Figure 15 show the receiver block diagram, only one horizontal signal gain path is 
active at any given time. The transceiver achieves 0.5nJ/bit for the receiver and 3.8nJ/bit for the 
transmitter. 
 
 
Figure 14 Proposed current-reused folded-cascode front-end [38] 
 
In [40] an ultra-low power receiver for the wireless sensor networks is presented based on a 
binary frequency-shift keying (BFSK) super-regenerative principle Figure 16. The receiver 
17 
 
consumes only 215µW while achieving -75dBm sensitivity at 2Mbps data rate. The energy 
efficiency of the receiver is 0.18nJ/bit. 
 
 
Figure 15 Receiver front-end. Only a single gain path is enabled at any time [39] 
 
 
Figure 16 Block diagram for the BFSK super-regenerative receiver [40] 
 
Table 2 summarizes the performance parameters of the different radio architectures discussed 
above. It is clear that commercial radios for wireless sensor network are far behind in terms of 
achieving sub-nJ/bit energy efficiencies which have been demonstrated by academia. In the next 
section we present how some of these design challenges are addressed using signal processing 
and hardware co-design as an alternative solution to reduce power for energy constrained 
microsystems. 
18 
 
Table 2 Performance comparison of recently published low-power radios in JSSC 
 [35] [37] [38] [39] [40] 
Rx Power 330µW 10mW 500µW 2.6mW 215µW 
Sensitivity --- -101dBm -90dBm -65dBm -75dBm 
Noise Figure 7dB 5.7dB 10.2dB 15dB --- 
Energy Efficiency 1nJ/bit --- 5nJ/bit 0.5nJ/bit 0.18nJ/bit 
 
 
1.3. Thesis contributions 
In this research we have addressed energy efficient radio communication and the sensor data 
fusion of a wireless sensor node to reduce wireless network traffic by utilizing distributed 
intelligent sensing. 
 
1.3.1. Radio Communication 
We have investigated the application of signal processing for reducing the power of radio 
communication for wireless sensor networks. One way to reduce the required transmit energy for 
each bit is to employ wide-bandwidth signals, such as ultra-wide band signals (UWB). UWB 
transmitters have been reported with the lowest transmitted energy per bit [41] [42] [43]. 
However, the required Nyquist rate and the associated signal processing algorithms to process 
the received signals generally require significant energy at the receiver. But, in certain cases 
where the signals are sparse (e.g. UWB pulses are sparse in the time domain) it may be possible 
to reduce the sampling rate at the receiver using an approach called Compressed Sensing (CS). 
 
19 
 
─ CS is leveraged to reduce power for GHz ADCs required for sampling UWB signals 
and its impact on radio communication is studied. The design decisions for the 
proposed architecture (based on heterodyne architecture) are discussed and it is also 
benchmarked with respect to power and performance against the traditional Nyquist 
based radio architecture. 
 
Another approach to reduce radio power is to exploit the direct tradeoff between sensitivity 
and power. This can be useful for situations where a good communication channel exists 
between two wireless nodes. In that situation the extra link margin can be traded to save power 
for reduced radio sensitivity and hence communication distance. 
 
─ An IEEE 802.15.4 standard compliant low power, moderate sensitivity receiver 
(Homodyne architecture) is designed which adapts its sampling rate based on the 
quality of the communication channel. The receiver comprises an analog RF front-end, 
flash ADC and Near-Threshold digital baseband processor. The receiver achieves 
lowest reported RX power for a coherent Zigbee radio and achieves 2X better energy 
efficiency for the RF front-end than current state of the art. 
 
1.3.2. Distributed Intelligent Sensing 
To improve energy efficiency of the overall network for distributed intelligent sensing, as 
discussed previously in a WSN with current state-of-the-art radio energy efficiencies, it might be 
efficient for some applications to perform local computations as compared to sending the raw 
sensor data wirelessly to a cloud server. For this purpose we designed a Near-Threshold 
20 
 
hardware accelerator for arbitrary Bayesian networks which is capable of running Clique-tree 
message passing algorithm for probabilistic inference. Bayesian networks are a class of graphical 
models which are used for making probabilistic inference. This probabilistic inference can be 
based on the prior-knowledge of the system and fusing it with the updated knowledge from the 
sensor data. Bayesian networks have numerous applications from diagnostic in medical expert 
systems, automobiles and infrastructure health monitoring to name a few [44] [45]. The 
hardware accelerator can be programmed to realize a complete Bayesian network on a sensor 
node or process just part of a large distributed network. This way by processing the data locally 
to extract information and sending that information rather than the raw sensor data can result in 
significant reduction in the wireless data communication. 
 
 
 
 
 
 
  
21 
 
Chapter 2 
Compressed Sensing 
This chapter presents a detailed overview of Compressed Sensing (CS). Before understanding 
CS, it is worth mentioning the original problem of digital data acquisition i.e. sampling. Given an 
analog signal of finite bandwidth W, what should be the rate at which the analog signal should be 
sampled such as not to lose any information content? This question is answered by the famous 
Nyquist-Shannon sampling theorem. The theorem states that given a band-limited signal with 
bandwidth W, the analog signal can be exactly recovered from its sample values if the sampling 
frequency (Fs) is greater than twice the maximum frequency present in the analog signal (2*W) 
[46]. 
The sampling theorem is the underlying principle in nearly all digital data acquisition systems. 
This with the fact that Digital Signal Processing (DSP) was leveraging the exponential IC 
growth, resulted in a digital revolution. 
22 
 
CS explores an alternative sampling scheme that doesn’t rely on the Nyquist-Shannon 
sampling theorem [47]. CS theory asserts that one can recover certain signals from fewer 
samples than that required by the traditional Nyquist-Shannon sampling theorem. This is 
achieved by making an assumption that the input signal is sparse (i.e. has few non-zero 
coefficients). 
Compressive Sensing (CS) theory states that given a signal is sparse in one domain (e.g. an 
impulse in the time domain or a Dirac delta in the frequency domain); it can be sampled 
randomly in an orthogonal domain (e.g. the frequency domain or time domain, respectively) at a 
rate less than that suggested by the Nyquist sampling theorem. The sparse signal can then be 
recovered with high probability from these fewer or compressed samples, but with an error 
proportional to the compression rate, by using a recovery algorithm (e.g. L1 minimization) [48] 
[49]. 
We’ll next illustrate this with an example. Suppose an N dimensional signal ‘x’ (vector of 
length N) which is S-sparse (having S non-zero values) in a certain basis ψ (e.g. time domain). 
Now the signal ‘x’ is transformed into another domain in which it is not sparse. Assume K 
samples are then taken in the non-sparse domain, where S < K << N. The compressed 
measurements K are then processed by a non-linear recovery algorithm to recover the original 
signal ‘x’. This works because if sufficient measurements are taken then the compressed 
measurements preserve the structure and information contained in the signal ‘x’.  
Figure 17 shows a simplified block diagram of such a compressed sensing data acquisition 
system. 
 
23 
 
 
 
Figure 17 Simplified CS block diagram 
 
Figure 18 shows an illustrative example where the signal ‘x’ is sparse with sparsity S=1 in a 
given basis ψ and with dimension N=3. The compressed measurements ‘y’ with dimension K=2, 
are taken in an orthogonal domain by projecting the sparse signal ‘x’ into a basis where the 
signal is not sparse. This transformation is achieved by a measurement matrix φ whose 
dimension is given by K x N which in this example would be 2x3. 
 
 
 
Figure 18 CS illustrative example 
 
2.1. Matlab demonstration 
In this section compressive sensing is demonstrated by the help of two Matlab examples.  
 
 
Compressed 
Samples
S < K << N
K
Reconstruct
N
  
 
  
24 
 
– The first example deals with a signal sparse in the Frequency domain while the 
random orthogonal measurements are taken in the Time domain.  
 
– The second example deals with a signal sparse in the Time domain and the random 
measurements are taken in the Frequency domain.  
 
In both examples, noise is not considered and therefore SNR of the signals being considered is 
infinite. The l1-magic Matlab code is used which basically runs a basis pursuit algorithm to 
recover the original signal from compressed measurements [50]. The Matlab code for both 
examples is freely available to download from [51]. 
 
2.2. Sparse in Frequency 
This example demonstrates the compressive sensing using a sparse signal in the Frequency 
domain. The signal consists of the summation of two sinusoids of different frequencies in the 
Time domain. Since the signal is sparse in the Frequency domain therefore K=256 random 
measurements are taken in the Time domain out of total N=1024 samples. This corresponds to a 
compression rate of 75%. 
 
Figure 19 shows the time domain signal and its corresponding DFT. In Figure 20 the recovered 
signal spectrum is shown along with the original signal spectrum. Finally in Figure 21 the 
recovered signal in the time domain is compared with the original time domain signal. 
25 
 
 
 
 
Figure 19 Sparse in Frequency 
 
 
 
Figure 20 Recovered sparse signal in Frequency 
0 200 400 600 800 1000 1200
-2
-1
0
1
2
Samples
A
m
p
li
tu
d
e
Original Signal,1024 samples with two different frequency sinsuoids
0 200 400 600 800 1000 1200
0
200
400
600
Samples
A
m
p
li
tu
d
e
Frequency domain, 1024 coefficients with 4-non zero coefficients
0 200 400 600 800 1000 1200
0
200
400
600
Samples
A
m
p
li
tu
d
e
Original Signal, Discrete Fourier Transform
0 200 400 600 800 1000 1200
0
200
400
600
Samples
A
m
p
li
tu
d
e
Recovered Signal, Discrete Fourier Transform sampled with 256 samples
26 
 
 
 
Figure 21 Recovered signal in the Time domain 
 
2.3. Sparse in Time 
This example demonstrates the compressive sensing using a sparse signal in the Time domain. 
The signal consists of a UWB (Ultra Wide Band) pulse in the Time domain. Since the signal is 
sparse in the Time domain and not sparse in the Frequency domain K=90 random measurements 
are taken in the Frequency domain out of total N=481 samples. This corresponds to a 
compression rate of 81.3%. 
 
Figure 22 shows the Time domain signal and its corresponding DFT. In Figure 23 the 
recovered signal is compared along with the original signal in the Time Domain. 
 
0 200 400 600 800 1000 1200
-2
-1
0
1
2
Samples
A
m
p
li
tu
d
e
Original Signal,1024 samples with two different frequency sinsuoids
0 200 400 600 800 1000 1200
-2
-1
0
1
2
Samples
A
m
p
li
tu
d
e
Recovered Signal in Time Domain
27 
 
 
Figure 22 Sparse in the Time domain 
 
 
Figure 23 Recovered sparse signal in the Time domain 
 
0 0.5 1 1.5 2 2.5 3 3.5
x 10
-8
-1
-0.5
0
0.5
1
Time
A
m
p
li
tu
d
e
Original Signal, UWB Pulse RF freq=4 GHz
0 50 100 150 200 250 300 350 400 450 500
0
1
2
3
4
Samples
A
m
p
li
tu
d
e
Discrete Fourier Transform of UWB pulse
0 0.5 1 1.5 2 2.5 3 3.5
x 10
-8
-1
-0.5
0
0.5
1
Time
A
m
p
li
tu
d
e
Original Signal, UWB Pulse RF freq=4 GHz
0 0.5 1 1.5 2 2.5 3 3.5
x 10
-8
-1
-0.5
0
0.5
1
Time
A
m
p
li
tu
d
e
Recovered UWB Pulse Signal with 90 random samples
28 
 
From the Matlab simulations it is clear that if the signal of interest satisfies the sparsity 
assumption then the sparse signal can be recovered with high probability from the given 
compressed measurements. This potentially allows us to sample a given sparse signal below the 
required Shannon-Nyquist sampling rate (for arbitrary signals) and hence reduce the sampling 
frequency of the Analog to Digital converter for wide-band signals to save power.
  
29 
 
Chapter 3 
Compressed Sensing Analog Front-End for GHz ADCs 
Ultra Wide Band (UWB) signals also known as pulse radio are defined as radio signals with 
bandwidth in excess of 500MHz or 20% of the center frequency. Thus by time-frequency duality 
UWB signals are very short duration pulses in time. This is conceptually illustrated in Figure 24 
along with narrowband signals [52]. Due to their high bandwidth, UWB signals can potentially 
transmit high data rate and can achieve very low transmit energy per bit but on the other hand 
UWB receivers have to process wide bandwidth signals which usually results in high power 
consumption and high energy per bit for the receiver. This is corroborated by the UWB 
transceivers presented in [53] [54]. Figure 25 shows a fully coherent DSSS UWB transceiver 
which consumes 105mW in the transmit mode and 280mW in the receive mode [53].
  
 
30 
 
 
Figure 24 Ultra Wide Band (UWB) signaling in frequency and-time domain [52] 
 
A non-coherent UWB transceiver achieving -78dBm sensitivity is shown in Figure 26 [54].  
The transceiver consumes 0.7mW in the transmit mode and 7mW (analog only) in the receive 
mode [54].  
31 
 
 
Figure 25 DSSS UWB transceiver block diagram [53] 
 
It is clear from Figure 24 that UWB signals are sparse in time. The Nyquist sampling rate for 
a UWB pulse requires ADC sampling at a frequency greater than a GHz, which usually results in 
prohibitively large power consumption in the ADC and following baseband processing. This 
power can be a significant fraction of the total power consumption of the entire system. In [55] 
[56] [57] the power consumption for a 6-bit ADC with a sampling rate in the range of 1.2–1.6 
GHz is greater than 150mW, which might be excessive for certain applications. Sampling the 
UWB signal below the Nyquist rate (sub-Nyquist) may lead to a low power alternative solution. 
Since our goal is not recovering the signal directly but recovering the data being transmitted by 
the UWB signal, a lower sampling rate could be possible.  
 
32 
 
 
Figure 26 802.15.4a Compliant Fully Integrated UWB Transceiver [58] 
 
We apply CS theory [47] [49] for low power circuit design and investigate its application to 
wireless communication. We propose using the Hadamard transform as a measurement matrix, 
as opposed to a Guassian matrix (widely used in CS framework) for CS considering also the 
difficulty of implementation. We further investigate the effect of quantization of the received 
signal and the application of matched filtering in the compressed domain (smashed filtering [59]) 
instead of applying a matched filter in the time domain. We also propose a practical hardware 
implementation for computing the Hadamard Projections (HP) and provide the design criteria for 
selecting the number of compressed measurements and the resolution of a sub-Nyquist ADC. 
33 
 
Further, the power saving for the sub-Nyquist ADC is evaluated as compared to the Nyquist 
ADC with the proposed Walsh Hadamard Transform (WHT) front-end.  
 
3.1. Proposed Hardware 
In this section we propose a hardware architecture with practical constraints for computing 
HPs (Hadamard Projections). The goal is to propose an architecture that will not only allow us to 
take HPs [by computing the Walsh–Hadamard Transform (WHT)] but which is also amenable 
for circuit implementation in an ASIC. Figure 27 shows the block diagram of a hardware system 
exploiting the WHT as a measurement matrix in a CS system. Our signal of interest is a 
baseband UWB pulse which is sparse in the time domain. An analog WHT is computed on the 
incoming sparse signal, the output of which is sub-Nyquist sampled by randomly choosing K 
WHT coefficients out of N possible samples assuming a Nyquist grid. The compressed samples 
may then be post-processed if needed (depending on the particular receiver architecture) using a 
recovery algorithm (e.g., L1 minimization) which reconstructs the original sparse signal in the 
time domain [60] [61]. 
 
 
 
Figure 27 CS acquisition system 
As a practical architecture exploiting CS for computing the analog WHT, we propose a 64-
point WHT as the measurement matrix. The Hadamard coefficients are the inner products of the 
input signal with the Walsh codes as shown in matrix form in Figure 28. To compute the WHT 
Analog 
WHT
Sub-Nyquist 
ADC
Recovery 
Algorithm
K<<N
CS
34 
 
in the analog domain, a discrete-time WHT front-end is proposed, as shown in Figure 29 [62]. 
The incoming UWB signal is correlated with the Walsh codes using Nyquist rate sampling 
(typically in the gigahertz range for UWB signal pulses) and discrete-time integration. 
 
 
 
 
Figure 28 Hadamard transform 
 
 
 
Figure 29 Block diagram of analog WHT front-end [62] 
 
+1  +1  +1  +1 … +1
+1   -1  +1   -1 … -1
+1   -1  -1   +1 … +1
0
1
0
h1
h2
h64
Walsh Matrix
Sparse
Signal
Hadamard
 Coefficients
LFSR
Digital APR
Custom Analog
Walsh 
Generator 
(WG)
Output
Buffer
6
Nyquist Clock
+1
-1
Walsh Code Length
N=64
Time domain
 Sparse Signal
±1
S/H
÷ N
HP
∫
R
K
Sub-
Nyquist
ADC
K
35 
 
The detailed operation of the proposed discrete-time WHT to compute HPs is as follows. A 6-
bit linear feedback shift register (LFSR) generates a pseudo-random number which is used by the 
Walsh code generator (WG) from [63], to generate the Walsh code of that sequency. This is 
equivalent to randomly selecting a row (i.e. a Walsh sequence) in the WHT matrix. The input is 
sampled at the Nyquist rate by the S/H circuits with either positive or negative polarity 
depending on the current value of the Walsh sequence. The circuit operates in two phases: 
sampling and integration. After the input is sampled with the appropriate polarity, the circuit 
enters into the integration phase to integrate the sampled values. At the end of the integration 
phase, the input signal has been correlated with the generated 64-point Walsh sequence. The 
integrator is reset and the computed Hadamard coefficient is then digitized by the sub-Nyquist 
ADC. The S/H circuit is sampling the input signal at the required Nyquist rate while the 
Hadamard coefficients are being output at every 64
th
 sample of the input signal (since the input is 
correlated with a 64-point Walsh sequence).  
 
The above process produces one Walsh coefficient. Now there are two possibilities for a CS 
system which requires computing K compressed coefficients. First, is the parallel architecture in 
which the proposed hardware is repeated K times and the K Hadamard coefficents are computed 
in parallel by simultaneously correlating the input with K different Walsh sequences. This will 
require replicating the sub-Nyquist ADC by K times or increasing the sampling rate for the sub-
Nyquist ADC to be (K/N) x the required Nyquist rate (here N=64). Second, is the series 
architecture in which the Hadamard coefficients are computed in series. This puts a restriction on 
the input to be repetitive for K times to enable the Hadamard computation by correlating the 
input with K different Walsh sequences in series. In this case the sub-Nyquist ADC sampling 
36 
 
rate will be (1/N) x the required Nyquist rate for the input (N=64). The proposed architecture can 
be adapted for either series or parallel implementation, or a combination of both. 
 
3.2. Design Decisions 
Three important questions need to be addressed regarding architectural design decisions [62]. 
First, how to choose the optimum number of compressed measurements (K) or compression ratio 
(N-K)/N. Second, how to choose the resolution of the sub-Nyquist ADC. Third, how the 
proposed CS hardware with sub-Nyquist ADC compares with using a Nyquist ADC without CS. 
For this purpose a Matlab simulation was setup for the architecture shown in Figure 27 for 
different random measurements (K) and resolutions of the sub-Nyquist ADC. In order to 
investigate the effect of CS only, it is assumed in this simulation that the transmitted pulse passes 
through an ideal channel with no additive noise. Figure 30 shows the mean square error (MSE) 
between the input and the recovered pulse (averaged for 100 iterations) as a function of the 
number of random measurements for different resolutions of the sub-Nyquist ADC. The right y-
axis marks the mean square quantization error (MSQE) for the input pulse quantized by a 
Nyquist ADC without CS for comparison. 
 
It is observed that for about K ≥ N/3 the MSE does not depend on K, but is limited by the 
resolution of the sub-Nyquist ADC quantizing the Hadamard coefficients. Furthermore, by 
comparing the recovered pulse’s MSE with the MSQE of a Nyquist ADC, we observe that the 
sub-Nyquist ADC quantizing a signal in the Hadamard domain saves roughly one bit of 
resolution for the same quantization error as for the Nyquist ADC quantizing in the time domain. 
 
37 
 
An ADC figure-of-merit (FoM) is defined as, 
 
    
     
        
   (1) 
 
 where ENOB is the effective number of bits,    is the sampling frequency and Power is the 
total power consumption of an ADC. Since an ideal ADC is assumed in this paper, in that case 
ENOB is equal to the resolution of the ADC. By taking compressed samples K assuming the 
architecture in Figure 29, the sampling rate for the sub-Nyquist ADC reduces to (K/N) x   . 
Furthermore, as mentioned above a sub-Nyquist ADC quantizing Hadamard coefficients can 
save one bit in resolution over a Nyquist ADC quantizing time samples. 
 
 
 
Figure 30 MSE as a function of K. 
 
Combining these two factors, we define an Improvement Factor (IF) for the sub-Nyquist ADC 
as,           where   is the savings in ENOB and (N/K) is the reduction in the sampling 
rate. For the same FoM, IF is the factor by which the power of the sub-Nyquist ADC is reduced 
15 20 25 30 35 40 45 50 55 60 65
7.2e-14
5.2e-12
3.7e-10
2.7e-8
M
ea
n
 s
q
u
a
re
 E
rr
o
r
Random Hadamard Coefficients K
Mean square error average for 100 iterations
 
 
7 bits
6 bits
5 bits
4 bits
3 bits
2 bits
1 bit
3 bit
5 bit
7 bit
Res=Infinite
Nyquist MQSE
38 
 
relative to the Nyquist ADC. Figure 31 shows the simulated IF as a function of K for different 
resolutions of the sub-Nyquist ADC. The dashed line in the plot represents IF=1, below which 
the sub-Nyquist ADC consumes more power than the Nyquist ADC. There clearly exists an 
optimum value of K=24 for which the IF shows a peak. At the peak value of IF, the power of a 
sub-Nyquist ADC can be reduced by a factor of 6. Figure 32 shows the IF for 5 bit resolution of 
the sub-Nyquist ADC along with its constituent components. As expected the (N/K) factor 
decreases as we take more compressed measurements (K) and the resolution factor (  ) saturates 
at about K ≥ N/3, which explains the peaking effect of the IF.  
 
Now to answer the three questions relating to the architecture implementation, the resolution 
of the sub-Nyquist ADC can be chosen based on the intended application requirements for the 
MSQE. Once the resolution of the sub-Nyquist ADC is chosen, there exists an optimum value of 
K for which IF is maximum. At around the peak value of the power of the sub-Nyquist ADC can 
be reduced by a factor of about 6 when compared with the Nyquist ADC.  
 
 
 
Figure 31 Improvement factor over Nyquist ADC 
10 20 30 40 50 60 70
10
-1
10
0
10
1
Random Hadamard Coefficients K
Im
p
ro
v
em
en
t 
F
a
ct
o
r
 
 
Res=1 bit
Res=2 bit
Res=3 bit
Res=4 bit
Res=5 bit
Res=6 bit
Res=7 bit
≈6.4
39 
 
 
 
Figure 32 Improvement factor for Res = 5 bit. 
 
Figure 33 shows the 64-point Hadamard transform for an input pulse along with K=26 randomly 
selected coefficients chosen out of N=64. Figure 34 shows the recovered pulse in the time 
domain using SPGL1 from [61] with a 5-bit resolution for the sub-Nyquist ADC.  
 
The MSE shown in Figure 30, to first order, is a performance metric for the proposed system. 
But for a communication system we are more interested in finding how the choice of and 
resolution of the sub-Nyquist ADC affects the BER in an additive white Gaussian noise 
(AWGN) channel, which is investigated in the next section. 
 
 
0 10 20 30 40 50 60 7010
-1
10
0
10
1
Random Hadamard Coefficients K
Im
p
ro
v
em
en
t 
F
a
ct
o
r
 
 
Res=5 bit
N/K factor
Resolution factor
≈6.4
40 
 
 
 
Figure 33 Hadamard transform 
 
 
 
Figure 34 Recovered pulse 
 
3.3. Simulation Results 
To generate the waterfall curves assuming an ideal channel with AWGN noise, two different 
receiver architectures were considered, both based on matched filtering as shown in  
0 10 20 30 40 50 60 70
-6
-4
-2
0
2
4
6
Random Hadamard Coefficients K
A
m
p
li
tu
d
e 
(m
V
)
64-point Hadamard Transform
0 5 10 15 20 25 30 35 40
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (nS)
A
m
p
li
tu
d
e 
(m
V
)
Recovered Baseband Pulse, Res=5bit, K=26
 
 
Original Pulse
Recovered Pulse
41 
 
Figure 35 [62]. In the first architecture, compressed samples are taken in the Hadamard domain 
and the time-domain sparse signal is recovered using SPGL1 [60], which is then correlated with 
an ideal template to make bit decisions. In the second architecture, the difference is that matched 
filtering is done directly in the Hadamard domain using sub-Nyquist samples (also known as 
smashed filtering in the CS literature) rather than in the time domain after reconstruction. 
 
 
 
Figure 35 Receiver architecture for waterfall curves 
 
Figure 36 shows the bit error rate (BER) curves for both receiver architectures for infinite 
resolution of the sub-Nyquist ADC and compares it with an ideal BPSK curve for different 
values of K. It is found that the smashed filter has better performance compared to the matched 
filter in the time domain. One explanation for this is that the recovery algorithm attempts to find 
a sparse solution in the time domain to a given set of compressed measurements K. However, a 
signal with low SNR cannot be considered sparse, because noise produces many nonzero values. 
 
CS Recovery 
Algorithm
K<<N
Matched 
Filter
Bit slicer
CS
K<<N
Smashed 
Filter
Bit slicer
Receiver 1
Receiver 2
42 
 
 
 
Figure 36 BER curves for infinite resolution 
 
The recovery algorithm in the CS framework assumes a sparse solution to the given set of 
compressed measurements. As a result, the algorithm attempts to reconstruct the noise with the 
sparse solution. This affects the performance of the matched filter and results in an increased 
probability of error (Pe) at a given signal-to-noise ratio (Eb/No) for K< N. We believe that this is a 
strong function of the recovery algorithm being used and should be investigated further in future 
work. 
 
Figure 37 shows the BER curves for 5-bit resolution of the sub-Nyquist ADC quantizing 
Hadamard coefficients. In this case the BER curve for K = N = 64 doesn’t overlap the ideal 
BPSK curve due to the quantization noise. Next we define excess Eb/No as the extra energy per 
bit required in the CS receiver in order to achieve the same BER as that of an ideal BPSK 
receiver. The excess Eb/No required for different values of K at a BER of 10
-3
 is shown in  
0 7 8 9 10 11 12 13 14 15
10
-4
10
-3
10
-2
10
-1
10
0
Eb/N0(dB)
P
e
Smashed Filtering Comparision, Res = Infinite
 
 
-1
10
-5
1
SPGL1 K=16
SPGL1 K=32
SPGL1 K=48
SPGL1 K=64
2 3 4 5 6
Smashed Filter K=16
Smashed Filter K=32
Smashed Filter K=48
Smashed Filter K=64
Ideal BPSK
43 
 
Figure 38. Both receiver architectures are compared along with infinite and 5-bit resolution for 
the sub-Nyquist ADC. Using a smashed filter with 5-bit resolution requires an excess Eb/No of 
about 1dB. The excess Eb/No for the smashed filter with infinite resolution overlaps the 
theoretical predicted loss in SNR curve when K random Hadamard coefficents out of N samples 
are chosen [64]. The theoretical loss in SNR in dB is given by 
 
            (  )         (
 
 
)   (2) 
 
 
 
Figure 37 BER curves for Res = 5 bit. 
 
This result is very important for a designer to consider when designing a compressed sensing 
system. There is inherent loss in SNR by the same factor that the compressed samples are taken 
(choosing K random coefficients out of total N samples results in a K/N loss in SNR). 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10
-4
10
-3
10
-2
10
-1
10
0
Eb/N0(dB)
P
e
Smashed Filtering Comparision, With Res=5 bit
 
 
-1
SPGL1 K=16
SPGL1 K=32
SPGL1 K=48
SPGL1 K=64
Ideal BPSK
Smashed Filter K=16
Smashed Filter K=32
Smashed Filter K=48
Smashed Filter K=64
10
-5
44 
 
 
 
Figure 38 Excess Eb/No vs. compression ratio for BER = 10
-3
 
 
3.4. Prototype Chip 
There exist many potential orthogonal domains suitable for compressive sensing. In this 
research, we implement the Walsh Hadamard Transform (WHT) as a measurement matrix to be 
used in CS. The reason for choosing the WHT is Walsh codes are a sequence of ±1’s that 
multiply the time signal and the inversion can be easily implemented in hardware. 
Figure 39 shows the block diagram of a system exploiting CS and the discrete-time WHT 
front-end [65]. In this work an Analog WHT is computed on the incoming sparse signal, the 
output of which is sub-Nyquist sampled by randomly choosing K samples out of N Nyquist 
samples. These compressed samples are then post processed by a recovery algorithm [60] which 
reconstructs the original sparse signal.  
The circuit computes a 64-point WHT. The system comprises custom analog and synthesized 
digital logic. The function of the latter is to provide the necessary control (e.g. Walsh codes) to 
the analog blocks.  
20% 30% 40% 50% 60% 70% 80% 90% 100%
0
1
2
3
4
5
6
7
8
9
Ratio (K/N)
E
x
ce
ss
 E
b
N
o
(d
B
)
 
 
SPGL1 Res=Infinite
Smashed Filter Res=Infinite
Theory
SPGL1 Res=5bit
Smashed Filter Res=5bit
45 
 
 
 
 
Figure 39 Block diagram [65] 
 
The Hadamard coefficients are the inner products of the input signal with the Walsh codes as 
shown in matrix form in Figure 28. To compute the WHT the incoming signal is correlated with 
the Walsh codes using a GHz sampling rate and discrete-time integration. The speed requirement 
of the sampling and integration are met by splitting the correlation operation into three identical 
time-interleaved channels as shown in Figure 39. A 6-bit LFSR generates a pseudo random 
number which is used by the on-chip Walsh code generator (WG) [63] to generate the Walsh 
code of that sequency. The input is sampled with either positive or negative polarity depending 
on if it is multiplied with +1 or -1 in a Walsh sequence. The inversion is facilitated by the cross 
connections at the input sampling network as shown in Figure 40. The sampling network is fully 
differential to mitigate charge injection and clock feed-through. Each channel accommodates 
four differential S/H circuits and hence the input is continuously sampled at the Nyquist rate by 
Analog 
WHT
Sub-Nyquist 
ADC
Recovery 
Algorithm
K<<N
CS
LFSR Controller 4This Work
Digital APR Custom Analog
Controller Differential 
S/H
∫
Walsh 
Generator 
(WG)
∑
Channel 1
LFSR Amplifier
Summing
Amplifier
Channel
2
3
46 
 
time-interleaving the 12 S/H circuits. All switches in the sampling network are implemented as 
NMOS switches with boosted gates to a supply voltage of 1.4V, which is provided by a level 
converter circuit. 
 
 
 
Figure 40 Input S/H 
 
Each channel has two phases, a sampling phase and an integration phase, controlled by state 
machines [65]. Figure 41 shows the timing diagram and a complete cycle for channel 1. During 
the first four clock cycles, Channel 1 is sampling the input. In the next four clock cycles, 
Channel 1 begins integrating its sampled values while Channel 2 enters the sampling mode. 
Similarly in the next four clock cycles, Channel 2 starts integrating while Channel 3 samples and 
Channel 1 completes the integration phase. The cycle repeats until the input signal is correlated 
with the 64-point generated Walsh sequence. The state machine then connects the output of the 
three channels to the summing amplifier followed by an output buffer. The output settling time is 
conservatively set to 32 clock cycles (26.7ns), after which the next random Hadamard coefficient 
is computed.  
 
Vin+
Vin-
VCM
Boosted Gate
47 
 
The integration time for each integrator is 6 clock cycles, or 5ns. In order to achieve this speed 
an integrator based on ZCBC (Zero Crossing Based Comparator) [66] is used as shown in Figure 
42. Since there is no closed-loop feedback, the ZCBC response time is fast but at the cost of 
overshoot at the output which needs to be compensated. For this purpose a binary weighted 7-bit 
current DAC is used which is enabled only during the charge transfer phase of the ZCBC. 
 
 
 
 
Figure 41 Timing diagram [65] 
 
The chip is fabricated in a 65nm CMOS process. The active area is 0.1425mm
2
. The total 
power consumption is 11.2mW. The dynamic digital power at 1.2GHz is 5.4mW while the 
analog section consumes 5.8mW. The measured Hadamard coefficients are shown in Figure 43, 
along with an ideal Matlab computation of the coefficients (using the measured input pulse 
shown in Figure 44) for comparison. In order to exploit CS, K=26 random Hadamard coefficients 
are selected from the total N=64 measurements as shown in the figure. This results in a 
compression rate of (N-K)/N=59.4%. The compressed samples are then post processed in Matlab 
Clock
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12WG
Integrator Integrator Integrator
Integrating
 Sampling Integrating Integrating
S/H Bank 1
Sampling
Integrating Sampling
Integrating
Integrating
Channel 1
Channel 2
Channel 3
48 
 
using a L1 minimization recovery algorithm and the input pulse is recovered with a mean square 
error of 0.3% as shown in Figure 44. 
 
 
Figure 42 ZCBC based integrator [65] 
Φ1
Φ1
v1
Vb1
vxn
vxp
Φ2b
Φ2b
Φ1
Φ1
vccp vccn
vccp
vccn
reset
reset
vxp
vxn
voutp
voutn
Φ2
Φ2Vb2
v1
Vb2
D0 D0 D6 D6
Pre-Amplifier
7-bit Current DAC
Integration
Caps
Pull Down
 Current Source
Pull Up
Current Source
Dynamic threshold
 detecting latch
49 
 
 
 
Figure 43 Measured Hadamard coefficients [65] 
 
 
 
Figure 44 Recovered pulse [65] 
 
0 10 20 30 40 50 60 70
-2
-1
0
1
2
3
Hadamard Coefficients
A
m
p
li
tu
d
e
 
 
Measured
Matlab Computation
CS Random Samples K=26
0 10 20 30 40 50 60 70
-0.2
-0.1
0
0.1
0.2
0.3
Time Samples
A
m
p
li
tu
d
e
 
 
Recovered Pulse K=N (Measured)
Differential Input Pulse
CS K=26 (Measured)
50 
 
The recovered pulse with and without CS are scaled in Matlab to match the input pulse 
amplitude for better comparison. The measurements clearly show that the accuracy requirements 
on the computed Hadamard coefficients are relaxed. Figure 45 shows the measured overshoot 
calibration by the 7-bit current DAC of the ZCBC based integrator. The overshoot can be 
calibrated within ±15mV (differential) which is close to expected from simulated results. Figure 
46 shows the die photo. 
We have explored the application of compressed sensing to reduce power for GHz ADCs 
required for sampling ultra-wide-band (UWB) signals. It is found that the sub-Nyquist ADC with 
the proposed WHT front-end is lower power by about a factor of 6 as compared to Nyquist ADC 
for the same FoM. The WHT front-end circuit consumes 11.2mW of power while sampling at 
1.2GHz. The achieved compression rate assuming an ideal ADC is 59.4% with a mean square 
error of 0.3%. 
 
 
 
Figure 45 Integrator overshoot calibration 
 
Differential Output
Uncalibrated
Differential Output
Calibrated
0 50 100 150 200 250 300 350
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
Time Samples
A
m
p
li
tu
d
e
Overshoot Calibration
Output
Enabled
51 
 
 
Figure 46 Die photo
 
  
Digital
APR
Analog
3
8
0
 µ
m
125 µm 250 µm
52 
 
Chapter 4 
2.4GHz Short-Range Zigbee Compatible RX with Near-
Threshold Digital Baseband 
The traditional circuit design approach of designing a circuit for the worst-case situation 
doesn’t seem to be optimal for energy constrained applications. Rather circuits that can adapt to 
the dynamic environment and adjust their energy needs to deliver the required performance 
seems to be a promising approach for energy constrained applications. This has been 
successfully demonstrated for microprocessor systems using dynamic voltage and frequency 
scaling [67]. In this chapter we will describe a radio receiver that can adjust its sampling rate for 
the data payload on per packet basis while providing the required performance by exploiting the 
time-varying characteristic of a communication channel, to reduce the total average energy 
required by the radio receiver. 
53 
 
An IEEE 802.15.4 standard [68] compatible radio receiver architecture is presented in this 
section which leverages high input SNR to save power while maintaining the desired system 
performance. The main idea is to exploit the sensitivity vs. power tradeoff in a radio receiver to 
save power. The digital baseband estimates the SNR/Link Quality of the communication channel 
and varies the average sampling rate of the processor to 25%, 50%, 75% or 100% of 2x the 
Nyquist rate. By reducing the average sampling rate in the digital baseband, power is further 
reduced while maintaining the desired link performance. 
 
4.1. IEEE 802.15.4 Standard Specification 
In this section we briefly review the IEEE 802.15.4 Low-Rate Wireless Personal Area 
Networks (LR-WPANs) standard specification. The purpose of the standard is to provide for 
ultra low complexity, ultra low cost, ultra low power consumption and low data rate wireless 
connectivity among inexpensive devices [68]. The standard defines multiple PHYs but we will 
focus only on the O-QPSK PHY in the 2.45 GHz ISM band. The frequency band and data rate 
parameters are defined in Table 3. 
 
Table 3 Frequency band and data rate for 2.45GHz O-QPSK PHY [68] 
PHY (MHz) 
Frequency 
band (MHz) 
Spreading parameters Data parameters 
Chip 
rate 
(kchip/s) 
Modulation 
Bit 
rate 
(kb/s) 
Symbol 
rate 
(ksymbol/s) 
Symbols 
2450 DSSS 2400-2483.5 2000 O-QPSK 250 62.5 
16-ary 
orthogonal 
54 
 
Figure 47 shows the schematic view of the PHY Protocol Data Unit (PPDU) and it’s format is 
illustrated in Figure 48. 
 
 
Figure 47 Schematic view of the PPDU [68] 
 
Figure 48 Format of the PPDU [68] 
 
The length of the preamble for the O-QPSK PHYs is 8 symbols (i.e. 4 octets) and the bits in 
the Preamble field are all binary zeros [68]. The binary data contained in the PPDU is encoded 
using the modulation and spreading functions shown in a reference modulator block diagram  
Figure 49. Each data symbol is mapped into a 32-chip PN sequence as specified in Table 4. 
 
 
 
Figure 49 Modulation and spreading functions for the O-QPSK PHYs [68] 
55 
 
4.2. Modulation and Spreading 
The chip sequences representing each data symbol are modulated onto the carrier using O-
QPSK with half-sine pulse shaping. Even indexed chips are modulated onto the in-phase (I) 
carrier, and odd-indexed chips are modulated on the quadrature-phase (Q) carrier. In the 2450 
MHz band, each data symbol is represented by a 32-chip sequence, and so the chip rate is 32 
times the symbol rate. To form the offset between I-phase and Q-phase chip modulation, the Q-
phase chips are delayed by Tc with respect to the I-phase chips as illustrated in  
Figure 50, where Tc is the inverse of the chip rate [68]. 
 
Table 4 Symbol-to-chip mapping for the 2450 MHz band [68] 
Data symbol Chip values (c0 c1 … c30 c31) 
0 11011001110000110101001000101110 
1 11101101100111000011010100100010 
2 00101110110110011100001101010010 
3 00100010111011011001110000110101 
4 01010010001011101101100111000011 
5 00110101001000101110110110011100 
6 11000011010100100010111011011001 
7 10011100001101010010001011101101 
8 10001100100101100000011101111011 
9 10111000110010010110000001110111 
10 01111011100011001001011000000111 
56 
 
11 01110111101110001100100101100000 
12 00000111011110111000110010010110 
13 01100000011101111011100011001001 
14 10010110000001110111101110001100 
15 11001001011000000111011110111000 
 
 
 
Figure 50 O-QPSK chip offsets [68] 
The half-sine pulse shape is used to represent each baseband chip and is given by (3), 
 
 ( )  {
   ( 
 
   
)         
           
  (3) 
 
 
 
Figure 51 Sample baseband chip sequences (the zero sequence) with half-sine pulse shaping [68] 
 
57 
 
4.3. Adaptive Sampling 
The adaptive sampling is implemented by choosing the samples on the Nyquist grid. During 
Synchronization, the chip initially starts sampling at the full Nyquist rate. After timing 
synchronization is achieved, the digital baseband learns the channel template from the Preamble. 
The acquired samples in the template are then ranked based on their energy. The digital baseband 
determines the number of samples to acquire per symbol for the subsequent data payload in the 
packet depending on the link quality of the communication channel. For example Figure 52(a) 
shows the 2x Nyquist sampling of a data symbol. For 50% sampling on the 2x Nyquist grid the 
first two samples carrying the most significant energy (sample two & three) are selected on the 
2x Nyquist grid for subsequent sampling Figure 52(b). 
 
 
  
  
  
  
a)                                                               b) 
 
  
A Matlab simulation was performed to evaluate the system performance of the proposed 
receiver architecture. Figure 53 shows the probability of chip errors (Pce) as a function of energy 
per bit (Eb/No) for different sampling rates. It should be noted that the curve corresponding to 
the 2x Nyquist sampling rate doesn’t overlap with the ideal BPSK waterfall curve, the reason for 
Figure 52 a) Nyquist Sampling grid b) 50% samples on the Nyquist grid 
58 
 
that is channel estimation has not been performed in the digital baseband for simplicity. For a 
target link performance e.g. Pce=10
-3
, the required Eb/No for Nyquist based receiver is greater 
than 8.5dB while the same performance can be achieved at 50% sampling rate for Eb/No greater 
than 9.2dB (0.7dB excess SNR) and with 25% sampling rate for Eb/No greater than 11.2dB 
(2.7dB excess SNR). 
  
 
 
Figure 53 Probability of chip error rate 
 
4.4. Prototype Chip 
Figure 54 shows the block diagram of the prototype receiver chip that has been designed. The 
direct conversion architecture is chosen for its integration, cost and low power benefits. The 
0 2 4 6 8 10 12 14
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
EbNo(dB)
P
c
e
Probability of Chips Error I/Q, Channel Model = Residential LOS
 
 
2x Nyquist , Res=inf
CS 25%, Res=inf
Ideal BPSK
CS 50%, Res=inf
59 
 
receiver front-end doesn’t employ an LNA. The RF signal is directly applied to the active mixer 
for quadrature down-conversion and low pass filtered. The quadrature LO signal is assumed to 
be available off-chip. The receiver front-end specifications such as Gain, Noise Figure, linearity, 
sensitivity, selectivity and channel selection are describe in [69] and summarized in Table 5 for 
sampling at 2x Nyquist rate and at 50% of 2x Nyquist rate. 
 The channel selection anti-aliasing filter specifications can be derived from the requirements 
of jamming resistance. The IEEE 802.15.4 PHY requires 0-dB rejection at the adjacent channel 
(±5MHz) and 30-dB rejection at the alternate channel (±10MHz), respectively. Assuming 10-dB 
margins, 40-dB rejection at the alternate channel can be achieved through the third-order 
Butterworth-type filter with corner frequency of 1.5MHz. The filter will provide 50-dB rejection 
at 10MHz apart from the wanted signal and thus can be used as the channel selection anti-
aliasing filter [69]. 
 
60 
 
 
 
Figure 54 Prototype chip block diagram 
Table 5 Rx Target Specifications 
Parameters 
Specification @ 1% PER for 
2x Nyquist rate 
Specification @ 1% PER for 
50% of 2x Nyquist rate 
SNRmin -0.36 dB 2 dB 
Gain 9-39 dB 11-43 dB 
NF < 41 dB < 39 dB 
IIP3 > -10 dBm > -10 dBm 
IIP2 > -34 dBm > -32 dBm 
SFDR > 18 dB > 18 dB 
Input power -60 to -20 dBm -60 to -20 dBm 
Mixer
3rd LPF
BB Q
Buffer
Offset 
Cal. DAC
PGA
5 bits
RF IN
LO I
LO I+LO I-
LO Q
LO Q+LO Q-
Buffer
Offset 
Cal. DAC
Flash
ADC
BB I
D
ig
it
al
 B
as
eb
an
d
 (
D
B
B
)
Bits
out
Offset 
Cal. DAC
Offset 
Cal. DAC
Flash
ADC
5 bits
On-Chip
2.45 GHz
Fclk @ 4MHz
Buffer
0.75V
61 
 
LPF Third order, fc=1.5 MHz Third order, fc=1.5 MHz 
ADC resolution 5 bits 5 bits 
Communication 
range 
9.7 m 9.7 m 
 
After quadrature down-conversion and low-pass filtering the baseband signal is amplified 
using programmable gain amplifiers (PGAs). DC-offset cancellation is achieved by using binary 
weighted current DACs throughout the PGAs and at the output of the LPF. A gated buffer 
interfaces the amplified baseband signal to the 5-bit Flash ADC digitizing the baseband signal at 
the sampling rate of 4MHz. 
 
The Friis equation (4) can be used to calculate the maximum line-of-sight communication 
range. The target receiver sensitivity is -60dBm. The IEEE 802.15.4 standard specifies the 
receiver sensitivity to be -85dBm and the nominal transmit power of 0dBm. Using the ISM band 
center frequency of 2.45GHz the communication range corresponding to -60dBm Rx sensitivity 
is found to be 9.7m. 
  
  
     (
 
   
)
 
  (4) 
The Packet Error Rate (PER) is related to the Bit Error Rate (BER) by the following relation if 
acquisition effects are ignored. 
             (5) 
Where N is the number of bits in a packet. For IEEE 802.15.4, N=160 bits and therefore 1% 
PER corresponds to 0.00625% BER. The BER of O-QPSK modulation with half-sine pulse 
shaping is given by (6) [70] 
62 
 
    (√  
  
  
)    (6) 
For 1% PER the required SNRmin would be 8.5dB. Employing Direct Sequence Spread 
Spectrum (DSSS) technique the radio receiver has a Processing Gain (PG) which is calculated by 
the ratio of the chip rate to the data rate. 
          (
         
         
)  (7) 
The chip rate is 2Mcps and Data rate is 250Kbps which corresponds to PG of about 9dB. 
Hence the minimum SNRmin required to achieve 1% PER including PG can be read off  
Figure 53 which is -0.36dB for Nyquist rate sampling and 2dB for 50% Nyquist sampling rate. 
 
For other parameters it is assumed that the ADC reference voltage (REFADC) is 200mV, 
reference impedance is 50 Ohms, insertion loss for the RF band select filter is 2dB and Margin is 
of 10dB. The NF of the receiver front-end is calculated by (8) where BW is assumed to be 
1.5MHz. 
                                              (8) 
 
IEEE 802.15.4 standard doesn’t specify the linearity requirements of the receiver front-end.  
Hence the linearity requirements can be derived from the interferer profile [69] . 
 
IIP3, IIP2 and SFDR are calculated as follows [24] [69],  
     
(                        )
 
      (9) 
Where Pint is the power of the interferer and Psig is the power of the desired signal. 
                                 (10) 
63 
 
     
 
 
(      )             (11) 
F is the receiver noise factor. The maximum and minimum gain required from the front-end 
assuming 5-bit ADC (NADC) and a Back Off (BO) margin of 10dB is calculated as follows, 
 
                                      (12) 
                           (13) 
Where Rmax is the maximum received power which is -20dBm 
 
4.5. Analog Front-End Building Blocks 
The schematic of the analog front-end is shown in Figure 55, with external LO.  Single-to-
differential conversion of the LO signal is achieved using an on-chip LO buffer, the output of 
which is then AC coupled to a single-balanced gilbert-cell active mixer. The baseband gain is 
distributed between the active filter and the PGAs. The LPF which also acts as a channel select 
anti-aliasing filter is designed to satisfy IEEE 802.15.4 PHY requirements. 
64 
 
 
 
Figure 55 Schematic of the analog front-end 
 
Single-to-differential conversion of the LO is achieved on-chip using the LO buffer shown in  
Figure 56. A resistive loaded differential amplifier is cascaded with a source-follower for a DC 
level shift. 
To save power the receiver doesn’t use an LNA and instead relies on an active mixer to 
provide RF gain. The mixer is implemented using a single balanced gilbert cell shown in  
Figure 57. The devices in the active mixer are sized to reduce the flicker-noise corner frequency 
to <100 KHz. 
 
Gain
control
Gain control_b
PGA
LO
vb vb
LO buffer
vb
vbvb
vb
Mixer
vb vb
RF
vcm
gm1
gm2
Transconductance filter 
(1st order section)
[2:0]
vb1
gm cell
vcm
[9:0]
vcm[9:0]
[9:0]
vcm [9:0]
vcm
gm3
Transconductance filter 
(2nd order section)
[2:0]
gm2
vcm
gm4
[2:0]
vcm
[14:0]
vcm [14:0]
[14:0]
vcm[14:0]
vb
vb
vb
I/Q Channel Offset Cal. Current
DAC
Offset Cal. Current
DAC
BB 
output
65 
 
 
 
Figure 56 LO buffer 
 
 
 
Figure 57 Single-balanced active mixer 
 
The baseband gain is distributed between the active filter and Programmable Gain (PG) 
stages. The active filter is based on a 3
rd
 order Butterworth gm-C filter. The first order and 2
nd
 
order section of the filter are shown in Figure 58 and Figure 59 respectively. Offset cancellation 
is achieved using a binary weighted current DAC at the input stage of the 2
nd
 order section. To 
adjust the corner frequency of the filter over process corners, the capacitors are made tunable by 
LO
vb1
vb2
vb3vb4
voutpvoutn
LO_P
RF
LO_N
voutpvoutn
66 
 
a 3-bit binary control word to vary capacitance by ±20%. The differential output of the mixer is 
converted into single-ended by the input stage of the gm-C filter. The entire baseband is 
implemented single-ended to save power. 
 
 
 
Figure 58 gm-C based active filter, first order section 
 
Programmable gain is implemented by switchable fixed gain-stages shown in  
Figure 60. The gain stage is basically a first order gm-C stage without the capacitor. A 
transmission gate is used which when enabled allows the input signal to bypass the gain stage 
which is being disabled by a footer. Each PG stage provides about a gain of about 8dB. Three 
cascaded PG stages are used to achieve a total gain of roughly 24dB. For distributed offset 
calibration, the current DACs are designed to reduce the DC-offset to within LSB/2 of the flash 
ADC. 
The output of the PG stages is fed into a buffer which drives the input capacitance of the flash 
ADC. The buffer is implemented by two cascaded source followers Figure 61. The source 
followers provide a DC level-shift. 
Vin_P
vb1
Vin_N
vout1 vout1
vb2
vb1
67 
 
 
 
Figure 59 gm-C based active filter, second order section 
 
 
 
Figure 60  Gain stage 
 
vout1
vb1
vb2
vout2
vout2
vout
vb2
vout2
vb1 vb1
vout2
vb2
vb1
Vin
vb1
vout1 vout1
vb2
vb1
Gain_cont Gain_cont
Gain_cont_b
68 
 
 
 
Figure 61 Buffer for driving Flash ADC 
 
A 5-bit 4 MHz Flash ADC is designed to digitize the baseband signal. S/H circuit is avoided 
at the input of the flash ADC considering that 1MHz baseband signal isn’t fast enough relative to 
the comparator speed in 65nm CMOS to cause aperture errors. The LSB size is 9.4mV for a 
reference voltage of 300mV, generated off-chip. The DBB converts the thermometer code into 
binary and uses a simple adding encoders’ technique to reduce bubble and sparkle errors of the 
Flash ADC [71] . To reduce power, no pre-amplifier is used in the comparator which makes the 
Flash converter susceptible to comparator kick-back. To reduce comparator kick-back and the 
power consumption of the reference ladder, decoupling capacitors of 2pF are added to the 
reference ladder. 
The schematic of the comparator is shown in Figure 62 which is sized to reduce the 
comparator offset to less than LSB/4. The output of the comparator is stored into an SR latch 
shown in Figure 63 to be read by the digital baseband.  
 
 
vout
vb2
EN
vin
vb1
69 
 
 
 
Figure 62 Flash ADC comparator 
 
 
Figure 63 SR latch 
 
The digital IQ chips are processed by the digital baseband whose state diagram is shown in  
Figure 64. The digital baseband stays in the idle mode until the start signal is asserted, at which 
point it starts processing the received chips to detect the RF signal using simple energy detection. 
Once the RF signal is detected, timing synchronization is achieved by correlating the received 
signal with the header template. 
Reset_b
Vin_P Vin_N
Reset_b
voutp
voutn
voutp
voutn
voutp_b
voutn_b
70 
 
 
 
Figure 64 Digital baseband state diagram 
 
After timing synchronization, carrier phase correction is performed with an assumption that 
there is no carrier frequency error between the transmitter and the receiver front-end. This 
assumption is made to simply the digital baseband. The digital baseband then estimates the 
channel response and link quality. Depending on the link quality the average sampling rate of the 
digital baseband is adjusted. To increase performance under sub-sampling, the digital baseband 
chooses the points with maximum energy on the Nyquist grid. Hard decision decoding is chosen 
over soft decision decoding for simplicity at a cost of roughly 3dB in performance. Finally 
depsreading is performed which outpus the IQ bits. 
 
Figure 65 shows the block diagram of the digital baseband along with conceptual illustration of 
adaptive sampling. The chip is implemented in 65nm CMOS technology with a total chip area of 
2x1 sq.mm.  
Reset=1
Start=1
ED
RF acq. = 1RF acq. = 0
Key
ED = Energy Detection
TS = Timing Synchronization
CPC = Carrier Phase Correction
CE = Channel Estimation
LQ = Link Quality
HDD = Hard Decision Decoding
D = Despreading
TS
Synch. = 1
Synch. = 0
CPC
Idle
CE
& LQ
HDD
D
71 
 
 
 
Figure 65 Digital baseband block diagram and adaptive sampling illustration 
 
 
 
Figure 66 Prototype chip die photo in 65nm CMOS 
 
Nyquist grid50% samples on 
Nyquist gridEnergy
Detection
Carrier
Phase 
Correction
Correlation based 
timing
synchronization
FIFO
Hard Decision 
Decoding
BB I
BB Q Bits
out
Channel
Estimation & 
Link Quality
Despreading
Threshold
Detector
Samples
SNR is estimated and subsequent
sampling rate is determined
A
m
p
lit
u
d
e
Samples
A
m
p
lit
u
d
e
Low SNRHigh SNR
1 2 3 4
E
n
er
g
y
Estimated pulse template & 
samples’ ranking
1st
2nd
3rd
4th
Clock gated module
72 
 
4.6. Measured Results 
Figure 67 shows the measured performance of the analog front-end along with Flash ADC 
spectrum. The flash ADC achieves an ENOB of 4.3 at the input frequency of 1MHz. The total 
average gain over the IF bandwidth of 1MHz is 37dB while the average NF is 28dB. A two-tone 
test at (LO ± 50 KHz) shows the measured IIP3 at high-gain and low-gain setting as -35dBm and 
-14dBm respectively. Figure 68 shows the measured energy efficiency profile of the entire 
system along with simulated energy efficiency breakdown of the radio, the BER curve and the 
radar plot of the most desirable RX metrics for comparison. In the radar plot, a bigger star 
represents a superior design. This plot highlights how communication distance has been traded-
off for improved energy efficiency and battery life. The measured energy efficiency of the radio 
is 3.5nJ/bit and 2.3nJ/bit for the ADC and the DBB. For a BER test the DBB enters into a state 
where it receives the data infinitely. The measured sensitivity of the RX is -52.5dBm at 10
-3
 
BER. From the measured BER performance, it is observed that if the input SNR is about 3dB 
higher at Nyquist sampling than for the same link performance of 10
-3
 the DBB can be operated 
at 25% Nyquist samples with an energy efficiency of 2.1nJ/bit. It should be noted that the DBB 
power doesn’t scale proportional to the reduction in the sampling frequency i.e. 25% of the 
Nyquist rate (4x reduction). This is because the power savings by computing only 1 out of 4 
samples aren’t a significant portion of the overall digital baseband power. This can be addressed 
in the future work by carefully designing and clock gating the digital modules in the synthesized 
baseband logic. Table 6 summarizes the performance of the system and benchmarks with other 
2.4GHz short-range Zigbee radios. The entire system is implemented in 65nm CMOS with active 
area of 0.86mm
2
 Figure 66. This Zigbee RX has 2x better energy efficiency for the radio front-
73 
 
end (3.5nJ/bit) than current state-of-the-art, while reporting the lowest energy per bit (8.1nJ/bit) 
for a Zigbee coherent receiver with near-threshold DBB. 
 
 
Figure 67 Measured Gain, NF, IIP3, IIP2 of the RX front-end and Flash ADC spectrum 
 
Figure 69 shows the measured baseband I channel waveforms. The bottom plot shows the 
transmitted OPQSK data on I-channel, the mid-plot shows the down-converted baseband signal 
at the output of the PGA and the top plot shows the Flash ADC output. The RX is tested with 
off-chip LO power of -6dBm shown in Figure 70. IEEE 802.15.4 compliant packet format is 
used for testing the RX. The bottom plot is obtained when the digital baseband is receiving the 
data infinitely. Four RF packets are shown in the plot, each of duration 2ms. 
0
10
20
30
40
50
IF (MHz)
G
ai
n
 &
 N
F
 (
d
B
)
 
0.2 0.4 0.6 0.8 1
Gain
NF
-40 -35 -30 -25 -20 -15 -10
-80
-60
-40
-20
0
20
RF Power Input (dBm)
B
as
eb
an
d
 O
u
tp
u
t 
(d
B
m
)
RX front-end linearity
(Low gain mode)
-60 -50 -40 -30 -20
-80
-60
-40
-20
0
20
40
RF Power Input (dBm)
B
as
eb
an
d
 O
u
tp
u
t 
(d
B
m
)
RX front-end linearity
(High gain mode)
IIP3 = -35dBm IIP3 = -14.5dBm
0.1
@LO ± 50KHz @LO ± 50KHz
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-120
-100
-80
-60
-40
-20
0
Flash ADC Spectrum Fin=1MHz
Frequency (MHz)
A
m
p
lit
u
d
e 
(d
B
)
ENOB  =4.3 
    SFDR  =35dB
74 
 
 
Figure 68 Measured Energy per bit profile along with simulated energy efficiency breakdown of 
the radio, Bit error rate and the radar plot of the system 
 
 
 
 
 
 
 
 
bias, 0.4nJ
buffer, 1.8nJ
PGA, 0.3nJ
Filter, 0.2nJ
Mixer, 0.7nJ
 -57 -56 -55 -54 -53 -52 -51 -50 -49
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
RF Power (dBm)
B
it
 E
rr
o
r 
R
at
e
2x Nyquist
75% Samples
50% Samples
25% Samples
~3dB @ 10-3 BER
For 25% Samples
Battery 
Life
Comm. Distance @ 7dBm EIRP
1/(Energy efficiency)  
This Work
[1]
[3]
Simulated Radio Energy
 efficiency (3.4nJ/bit)
0
1
2
3
4
E
n
er
g
y 
p
er
 b
it
 (
n
J/
b
it
)
 Measured Energy efficiency (8.1nJ/bit)
Radio, 3.5nJ
ADC, 2.3nJ
Digital, 2.3nJ
E
n
er
g
y 
p
er
 b
it
 (
n
J/
b
it
)
0
2
4
6
8
10
8.1nJ 3.4nJ
75 
 
Table 6 Summary and comparison table 
 This work [72] [73] [54] 
Architecture Zero-IF Sliding-IF Low IF 
(2MHz) 
Zero-IF 
Technology 65nm 90nm 65nm 180nm 
Integrated Digital 
Baseband 
YES NO NO NO 
Data rate & 
modulation 
250 Kbps 
OQPSK 
250 Kbps 
OQPSK 
250 Kbps 
OQPSK 
250 Kbps 
OQPSK 
RX energy 
efficiency 
3.5 nJ/bit 
(excluding 
ADC) 
7.2 nJ/bit 
(excluding 
ADC) 
6.8 nJ/bit 
(excluding 
ADC) 
50.4 nJ/bit 
(including 
ADC) 
RX sensitivity -52.5 
dBm*1 
-100 dBm*2 --- -96 dBm*2 
RX IIP3 (High 
gain) 
IIP3 (Low gain) 
-35 dBm -19 dBm -6 dBm -18 dBm 
-14.5 dBm --- --- 
RX gain (dB) 37 76 57 83.5 
RX NF (dB) 28 6 8.5 9.5 
Power Radio (mW) 
 
0.87 @ 1V 1.8** 1.7** 12.6** 
(including 
ADC) 
Pipeline 
Power ADC (mW) 0.57 @1V 
Flash 
0.3 
SAR 
--- 
Power Digital 
Baseband (DBB) 
(mW) 
0.58 @ 
0.75V 
--- --- --- 
Active Area (mm
2
) 0.86 --- 0.22 --- 
*1 Sensitivity measured @ BER 10
-3 
*2 Sensitivity measured @ PER 1% 
** Excluding frequency synthesizer 
 
76 
 
 
Figure 69 Measured baseband signal for -40dBm RF signal. 
RF=-40dBm, OQPSK modulated signal
1 2 3 4 5 6 7 8 9 x 
10
-5
-2
-1
0
1
Time
I C
h
an
n
el
Modulated transmitted data
1 2 3 4 5 6 7 8 9 x 
10
-5
-0.1
0
0.1
Time
I C
h
an
n
el
Down-converted data
1 2 3 4 5 6 7 8 9 x 
10
-5
-10
0
10
Time
A
D
C
 O
u
tp
u
t
Digitized output
77 
 
 
Figure 70 Measurement setup and received RF packets
 
-5 -4 -3 -2 -1 0 1 2 3 4 5
0
2
4
6
8
10
Time (ms)
R
ec
ei
ve
d
 p
ac
ke
ts
 d
at
a 
(b
as
e 
10
)
RF=-40dBm, measured received packets for  data payload 10012
Preamble
SFD
PHY
payload
Synchronization header (SHR) PHY header (PHR) PHY payload (PSDU)
Preamble SFD
IEEE 802.15.4 compliant packet format
2.45GHz LO @ -6dbm
LO IQ splitter
Test chip
FPGA
78 
 
Chapter 5 
Near-Threshold Hardware Accelerator for Arbitrary 
Bayesian Networks 
In chapter 1, we have discussed the challenges of the WSNs to realize the vision of the 
Internet of Things (IoTs). The nodes in the WSNs are expected to have sensing, signal 
processing/conditioning, communication and computational capabilities. It might be desirable for 
some applications that the nodes in the WSNs to have the capability to function in a completely 
unsupervised environment while minimizing offloading the raw sensor data to a cloud server, to 
improve overall network energy efficiency. For this reason we envision another layer of 
capabilities for these nodes to enable some sort of intelligence or learning abilities from the data 
that has been acquired through sensing capabilities. 
79 
 
The idea has been borrowed from the field of Machine learning, which is a branch of artificial 
intelligence. Machine learning is a field of study that gives computers the ability to learn without 
being explicitly programmed. Machine learning is a multi-disciplinary field that borrows 
techniques from statistics, probability and mathematical optimizations Figure 71.
 
 
Figure 71 Machine Learning 
Machine
Learning
Statistics Probability
Mathematical 
Optimization
80 
 
 
Figure 72 Reducing wireless traffic through intelligent sensing 
 
A generic WSN is shown in Figure 72 using embedded machine learning in a sensor node to 
reduce the overall wireless network traffic. The idea is again emphasized by a simple FFT 
transformed time-series data of eBay stock which is being reconstructed with 20 harmonics. The 
sensor node can learn this data pattern over time and can send this information instead of sending 
the raw time-series data wirelessly. 
 
Sensor
Embedded
Machine Learing
Raw
data
Information/
Knowledge
Knowledge 
instead
of raw sensor data
is wirelessly 
transmitted
FFT transformed 
time series data 
(Raw data) of 
eBAY stock 
reconstructed with 
20 harmonics 
(Knowledge)*
* Ref. http://intelligenttradingtech.blogspot.com/2010/02/fft-of-time-series-promises-and.html
81 
 
5.1. Bayesian Networks 
There are many frameworks in Machine Learning one of them is Bayesian networks. A 
Bayesian network is a probabilistic graphical model that represents a set of random variables and 
their conditional dependencies via a directed (where the edges have a source and a target) acyclic 
graph (graph with no loops) [44]. A Bayesian network is a subset of probabilistic graphical 
models. A probabilistic graphical model is a graphical representation of a probability distribution 
that is also used for knowledge representation. Bayesian networks have widespread applications 
ranging from medicine, finance, robotics, fault diagnosis, structural health monitoring and 
machine learning etc. 
In this research we are mostly interested in the application of Bayesian networks for structural 
health monitoring (SHM) and fault diagnosis. The research will seek a scalable circuit 
architecture which can be programmed for arbitrary Bayesian networks. It is expected that the 
ASIC can be used in a wireless sensor network infrastructure for autonomous operation by 
borrowing some of the techniques from the field of machine learning. 
We will next introduce Bayesian networks by way of a simple illustration. Figure 73 shows a 
simple Bayesian network example for a student [44]. The random variables in the network are 
“Difficulty” of the course that a student undertakes, “Intelligence” of a student, the “Grade” a 
student receives in a course, “SAT” score of a student and “Letter” of recommendation for a 
student. The graphical model represents the relationships among the different random variables. 
Prior probabilities have been assigned to random variables in a network. These can be learned 
either from data or with the help of a domain expert. The network can be queried for inferences 
based on observing different random variables. For example, given a student is intelligent and 
82 
 
the course is easy, what is the probability for the student to get a letter of recommendation? Or 
what is the probability for the student to ace the SAT exam given the student is not intelligent? 
 
Figure 73 A simple Bayesian network [45]  
 
The inference algorithms are mainly divided into two categories: namely Exact Inference and 
Approximate Inference. For example, Variable elimination and Clique trees algorithms belong to 
Exact Inference while inference as optimization and particle based methods belong to 
Approximate Inference. The Sum-product variable elimination algorithm is a very simple brute-
force algorithm which has been summarized in Figure 74. The details of the algorithm can be 
found in reference [44].  Here we will discuss briefly the basic operations that are needed to be 
performed to run the algorithm. 
 
83 
 
 
Figure 74 Sum-product Algorithm [44] 
 
The probability distributions represented in a tabular form in Figure 73 are called factors. A 
factor is a function of one or more random variables. One basic operation in a Sum-product 
algorithm is a factor product. Figure 75 shows an example of a factor product in which two 
factors are multiplied to produce a third factor. 
 
 
Figure 75 Factor product 
a1 b1 0.5
a1
a2
a2
a3
a3
b2 0.8
b1 0.1
b2 0
b1 0.3
b2 0.9
b1 c1 0.5
b1
b2
b2
c2 0.7
c1 0.1
c2 0.2
a1 b1 c1 0.5.0.5=0.25
a1
a1
a1
a2
a2
a2
a2
a3
a3
b1 c2 0.5.0.7=0.35
b2 c1 0.8.0.1=0.08
b2 c2 0.8.0.2=0.16
b1 c1 0.1.0.5=0.05
b1 c2 0.1.0.7=0.07
b2 c1 0.0.1=0
b2 c2 0.0.2=0
b1 c1 0.3.0.5=0.15
b1 c2 0.3.0.7=0.21
a3 b2 c1 0.9.0.1=0.09
a3 b2 c2 0.9.0.2=0.18
 BA ,1  CB ,2
 CBA ,,3
84 
 
Factor marginalization is a process that produces a probability distribution by extracting a 
subset from a larger probability distribution by eliminating some of the random variables.  
Figure 76 shows an example in which a random variable B has been marginazlied from a factor 
resulting in a new factor which depends on random variables A and C only. 
 
 
Figure 76 Factor marginalization 
 
Factor reduction is a process which makes the probability distribution consistent with the 
observations. Figure 77 shows an example in which a value c1 has been observed for the random 
variable C. Factor reduction operation takes that into account and deletes all the entries from the 
distribution which are not consistent with c1 (obervation). 
 
 
Figure 77 Factor reduction 
a1 c1 0.33
a1
a2
a2
a3
a3
c2 0.51
c1 0.05
c2 0.07
c1 0.24
c2 0.39
a1 b1 c1 0.5.0.5=0.25
a1
a1
a1
a2
a2
a2
a2
a3
a3
b1 c2 0.5.0.7=0.35
b2 c1 0.8.0.1=0.08
b2 c2 0.8.0.2=0.16
b1 c1 0.1.0.5=0.05
b1 c2 0.1.0.7=0.07
b2 c1 0.0.1=0
b2 c2 0.0.2=0
b1 c1 0.3.0.5=0.15
b1 c2 0.3.0.7=0.21
a3 b2 c1 0.9.0.1=0.09
a3 b2 c2 0.9.0.2=0.18
 CA,1
 CBA ,,
a1 b1 c1 0.25
a1
a2
a2
a3
a3
b2 c1 0.08
b1 c1 0.05
b2 c1 0
b1 c1 0.15
b2 c1 0.09
a1 b1 c1 0.5.0.5=0.25
a1
a1
a1
a2
a2
a2
a2
a3
a3
b1 c2 0.5.0.7=0.35
b2 c1 0.8.0.1=0.08
b2 c2 0.8.0.2=0.16
b1 c1 0.1.0.5=0.05
b1 c2 0.1.0.7=0.07
b2 c1 0.0.1=0
b2 c2 0.0.2=0
b1 c1 0.3.0.5=0.15
b1 c2 0.3.0.7=0.21
a3 b2 c1 0.9.0.1=0.09
a3 b2 c2 0.9.0.2=0.18
 CBA ,,
85 
 
5.2. Clique Tree Example 
One possible clique tree T for the Student network  is shown in Figure 78 [44]. The first step 
is to generate a set of initial potentials associated with different cliques. The initial potential 
ψi(Ci) is compuated by multiplying the initial factors assigned to the cliques Ci. For example, 
ψ5(J,L,G,S) = φL(L,G) . φJ(J,L,S). Now, if we were to compute the probability P(J) we would do 
the variable elimination process so that J is not eliminated. Thus, we select our root clique some 
clique that contains J, for example, C5 [44]. We then execute the following steps [44]: 
 
Figure 78 Message propagation in the Student clique tree [44] 
 
1. In C1: We eliminate C by performing ∑   (   )   The resulting factor has scope D. We 
send it as a message     ( ) to C2. 
2. In C2: We define   (     )      ( )    (     ). We then eliminate D to get a factor 
over G, I. The resulting factor is     (   ), which is sent to C3. 
3. In C3: We define   (     )      (   )    (     ) and eliminate I to get a factor over 
G, S, which is     (   ). 
4. In C4: We eliminate H by performing ∑   (     )  and send out the resulting factor as 
    (   ) to C5. 
5. In C5: We define   (       )      (   )     (   )    (       ). 
1. C,D 2. G,I,D 3. G,S,I 5. G,J,S,L 4. H,G,J
 
 11
21 :
C
D
C 
  
  2122
32 :,


 

C
IG
D
 
  3233
53 :,


 

C
SG
I
 
 44
54 :,
C
JG
H 

86 
 
The factor    is a factor over G,J,S,L that encodes the joint distribution P(G,J,L,S): all the 
CPDs (Conditional Probability Distributions) have been multiplied in, and all the variables have 
been eliminated. If we now want to obtain P(J), we simply sum out G, L, and S. In a similar way, 
we can apply exactly the same process to computing the distribution over any other variable [44]. 
So far, we have assumed that a clique tree is given to us. How do we construct a clique tree 
for a given Bayesian network or for a set of factors? There are two basic approaches, first based 
on variable elimination and the second on direct graph manipulation. The details of which can be 
found in [44]. 
 
5.3. Distributed Intelligent Sensing 
We propose a hardware accelerator that can be used to run the Sum-product inference 
algorithm or a Clique-tree message passing algorithm (a variant of Sum-product inference 
algorithm) [44] for arbitrary Bayesian networks. The Clique-tree message passing algorithm has 
been chosen as it can be very easily adapted for distributed intelligent sensing. A Clique or a 
group of cliques can be processed on a sensor node while message passing to cliques processed 
on other sensor nodes can be achieved over a wireless channel. 
 
Figure 79 shows a hypothetical complex Bayesian network example scenario. The proposed 
hardware accelerator can be configured to process the entire Bayesian network or a sub-part of it. 
For example as shown in the figure, the inertial sensor node on the ankle of a person can 
correspond to the highlighted portion in the entire Bayesian network and only the relevant 
information can be wirelessly transmitted to other parts of the network. The entire Bayesian 
network might be monitoring the health of a person and can make intelligent decisions based on 
87 
 
prior-knowledge of the subject and fusing it with the updated knowledge from the sensors. This 
allows intelligent as well as need-based sensing. The network can request an update from a 
sensor based on the computed probabilistic inference about an imminent healh concern and can 
possibly improve the computed confidence level on a decision. This then can be used for 
operating these sensor nodes in a completely unsupervised autonomous environments and even 
allowing them to actuate if necessary. 
 
 
 
Figure 79 Proposed distributed processing 
 
Bayesian Network
88 
 
5.4. Proposed Hardware Accelerator 
Figure 80 shows the simplified block diagram of the proposed architecture and its 
corresponding state diagram is shown in Figure 81. Scan chain is used to move the data in and 
out of the memory of the hardware accelerator. The accelerator is first configured into either 
Factor Product, Factor Marginalization or Factor Normlization mode using a configuration table. 
The accelerator can support maximum 1K entries in a factor at a time with a maximum support 
of 20 variables each taking 256 possible values. So the theoretical maximum limit of a factor that 
the proposed hardware accelerator can handle is 2
(8x20)=160
 entries.  
 
89 
 
 
 
Figure 80 Block Diagram 
Scan Chain IO
Factor
Normalization
B = B - ∑B
Cum_sum
Mask_marg Mask_A
20 20
Assignment
to Index A
20
Card[0] Card[19]
8 8
88
Index_A
10
Q1A
Q2A
6
6
Q1A6
6x1
LUT
Log-Log 
Addition
6
6
6
x1
Assignment
to Index B
20
Mask_BCard[0] Card[19]
8 8
88
Index_B
10
20
Mask_marg Mask_B
20 20
Assignment
Generation
Card[0] Card[19]
8 8
Assignment[0] Assignment[19]
88
resetclk
FSM
start
Q2A
A2A
8
10
8
10
Reg. File
B A2B
DB
Factor
Product
C=A+B
Q1A Q2A
C
Q2A
Factor
Marginalization
∑YΦ(X,Y)
Data_marg
Q1A
A1A
8
10 8
10 A1B
DB
Reg. File
A
Node Config. Table
8
20
20
20
8
8
Bits#
Mask_A
Mask_B
Control bits
Card[0]
Mask_marg
Card[19]
x1
90 
 
 
Figure 81 State Diagram 
 
A factor or multidimensional table with an entry for each possible assignment to the variables 
is stored as a flattened single array in the memory. For each variable its cardinality is stored and 
its stride (step size) in a factor is computed on the fly. For example as shown in Figure 75 the 
factor φ3 has variables A,B and C with cardinalities 3,2 and 2 respectively and we can represent 
the factor in memory by a linear array. Here the stride for variable C is 1, for B is 2 and for C is 
4. If we add a fourth variable, D, its stride would be 12. Now using the stride we can convert a 
variable assignment to a correspoding index into the linear factor array [44]. 
 
      ∑           [ ]           [ ]   (14) 
Idle
Config.
 Node
Reg. File
Read
Reg. File
Write
Factor
Product
Factor
Marginalization
Factor
Normalization
Reset=1
91 
 
One of the key tasks is indexing the appropriate entries in each factor for the operations to be 
performed. To compute the corresponding index for each factor operation, variable assignments 
are generated on the fly. This is accomplished by 20 cascaded counters, where each counter 
counting upto the cardinality of the corresponding variable. 
For numerical considerations, operations such as factor product involve multiplying many 
small numbers together, which can lead to underflow problems due to finite precision arithmetic 
[44]. This is addressed by renormalizing the factor after each operation. However, if each entry 
in the factor is computed as the product of many terms, underflow can still occur [44]. This is 
avoided by performing the computation in log-space, replacing multiplication with additions. 
However the factor marginalization operation requires summing entries. To avoid converting 
from log-space to linear-space and storing the result back into log-space, a look-up table for 
addition is used instead to speed up the operation.  
The factor product unit operates on two factors at a given time while the factor 
reduction/marginalization and renormalization unit operates on a single factor. 
 
5.5. ALARM Network 
For testing the proposed hardware accelerator, a medium-size Bayesian network is chosen for 
benchmarking the performance. Figure 82 shows the ALARM (A Logical Alarm Reduction 
Mechanism) network [74], it’s an expert system designed for hospital patient monitoring to 
reduce false alarm rate in a hospital setting. There are 9 observation variables in the network e.g. 
History, HR (Hear Rate), EKG and BP (Blood Pressure) etc. and 8 diagnostic variables. The 
network consists of 27 Cliques and the max number of variables in a clique is 5 while the 
maximum clique size is 144. 
92 
 
 
Figure 82 ALARM Bayesian network [74] 
 
For Clique tree message passing algorithm, the ALARM network is converted offline into its 
corresponding clique tree shown in Figure 83 by using HUGIN Expert software [75]. A Matlab 
simulation model is developed to evaluate the design tradeoff between accuracy and the bit-
resolution for log-space computation. The simulated bit-resolution was 6, 8, 16, 32 and 64 bits. 
As expected the RMSE (Root Mean Square Error) computed by comparing the quantized 
computations with infinite precision decreases with increased bit-resolution as shown in Figure 
84. A 6-bit resolution was chosen to reduce the silicon area to approximately 1 square mm for 
93 
 
the prototype chip. As seen in the graph the RMSE for 6-bit resolution is almost the same as for 
an 8-bit resolution. At such low resolution, the error is limited by overflow due to the small 
dynamic range. The silicon area is dominated by the look-up table logic which is a strong 
function of the bit-resolution chosen. To estimate area overhead for higher bit-resolution, to a 
first order approximation the look-up table logic area is estimated by approximately doubling the 
area for doubling the bit-precision as shown in Figure 84. It should be noted that the RMSE vs. 
resolution tradeoff shown in the figure is not only network dependent but also varies from 
variable to variable within a network. 
 
Figure 83 Clique tree ALARM network 
94 
 
 
Figure 84 Resolution, accuracy and area tradeoff 
 
5.6. Measured Results 
The prototype chip is fabricated in a 65nm CMOS process with an active area of 0.52mm
2
 
shown in Figure 86. The chip is operated at a Near-Threshold voltage of 0.5V to save power and 
improve energy efficiency while operating at maximum clock frequency of 33MHz.  The total 
active power is 218uW. The total energy computed for the ALARM network is 76.2nJ and is 
distributed among its constituent components as shown in Table 7 and the corresponding pie 
chart is shown in Figure 85. 
This energy computation doesn’t take into account the energy required for IO operations e.g. 
the energy spent moving the data in and out of the memory. 
 
0
10
-18
10
-16
10
-14
10
-12
10
-10
10
-8
10
-6
10
-4
10
-2
10
0
Resolution
R
M
S
E
(s
c
a
le
d
)
 
 
LVFAILURE
HYPOVOLEMIA
ANAPHYLAXIS
INSUFFANESTH
PLUMEMBOLUS
INTUBATION
DISCONNECT
KINKEDTUBE
10 20 30 40 50 60 70
10
0
10
1
A
re
a
 (
m
m
2
)
95 
 
Table 7 ALARM network energy profile 
Network Initialization 12.2nJ 
Upward message passing 21.8nJ 
Downward message passing 20.1nJ 
Belief computation 21.3nJ 
8-Diagnostic variables query 0.7nJ 
 
 
 
Figure 85 Energy distribution for ALARM network 
 
96 
 
 
Figure 86 Die photo in 65nm CMOS 
 
There are 8-Diagnostic variables in the ALARM network; after the clique-tree algorithm is 
converged the network can be queried for the probabilistic inference. The energy required to 
compute the probabilistic inference is computed for each of the queried variables which is 86pJ 
for LV FAILURE, HYPOVOLEMIA, ANAPHYLAXIS, INSUFFANESTH, DISCONNECT, 
KINKEDTUBE, PLUMEMBOLUS and 99pJ for the variable INTUBATION. This is computed 
by taking into account the number of factor operations required for each inference times the 
energy required to perform each operation. 
 
 
  
S
c
a
n
 C
h
a
in
RF
RF
97 
 
Chapter 6 
Conclusion 
An exponential growth in miniaturized sensors is imminent in the near future which will act as 
a driving fuel for pervasive and virtually invisible intelligent sensing. The sensor density around 
a person is expected to increase from a few hundreds to thousands which will correspond to 
roughly a trillion networked sensors on the planet. The microsystems encompassing these 
sensors will have to have high energy efficiency for computation, communication and sensing 
operations while keeping the form-factor and cost at minimum. This is mainly because many of 
these microsystems are expected to operate at the edge of the cloud with battery lifetime of 10+ 
years or batteryless operation from harvested energy while being deployed to a large number. 
This poses new design challenges and opportunities for circuit designers and especially for 
wireless communication ICs (Integrated Circuits) as they consume significant amount of power 
when active in a miniaturized microsystem. Different design tradeoffs can be made keeping in 
view the energy constrained design space. 
98 
 
In order to help realize the vision of Internet of Things (IoTs) and a trillion networked sensors, 
we have looked into the techniques as how to improve the energy efficiency of wireless 
communication and how the influx of sensor data can be fused for distributed intelligent sensing. 
Depending on the application, this can help reduce the overall wireless communication for the
entire network by avoiding the raw sensor data transmission over the wireless channel to a cloud 
server and hence making the network more energy efficient. 
The first technique studied to reduce radio power is compressed sensing for Ultra-Wide-Band 
(UWB) communication. Compressed sensing allows sparse signals to be sampled below their 
required Shannon-Nyquist sampling rate. Compressed sensing is proposed to relax the ADC 
sampling rate to save power. A 1.2GS/s Hadamard transform front-end is designed that allows an 
ADC to take compressed measurements. It is found that for radio communication it is better to 
perform matched-filtering in the compressed domain (smashed filtering) as compared to 
performing matched filtering after recovering the sparse signal from compressed measurements. 
This leads to a design tradeoff between the bit error rate performance and the compression ratio 
which is investigated in detail. 
The second technique explored for low-power radio architecture is to exploit the direct trade-
off between sensitivity and power and adapting the sampling rate of a digital baseband with 
respect to the quality of the communication channel. A short-range low-power Zigbee 
compatible radio is presented. The short-range is suitable for communication in the vicinity of a 
handheld device, sufficient for many IoT applications that place highest priority on power 
consumption. 
99 
 
The third technique relates to fusing the sensor data for distributed intelligent sensing while 
reducing the raw sensor data transmission over a wireless communication channel in a network. 
For this purpose a Near-Threshold hardware accelerator is presented for arbitrary Bayesian 
networks. The hardware accelerator can be used to run the Clique-tree message passing 
algorithm in a distributed sensor network.  
  It is expected that embedded machine learning will play a key role in realizing distributed 
intelligent sensing in a trillion networked sensors vision. Where intelligent sensors will not only 
learn from the raw sensor data, adapt to their surroundings and will be able to actuate in a 
completely autonomous and unsupervised environment. This will open tremendous design 
opportunities for circuit designers as well as software engineers to fully exploit the true potential 
of trillion networked devices capable of generating massive amount of data. 
 
  
100 
 
References
 
[1]  G. E. Moore, "Cramming more components onto integrated circuits.," Electronics, vol. 38, no. 8, April 
1965.  
[2]  International Technology Roadmap For Semiconductors, "Executive Summary," 2011. 
[3]  Intel Inc., [Online]. Available: http://www.intel.com/content/www/us/en/silicon-innovations/moores-
law-technology.html. [Accessed 05 11 2012]. 
[4]  B. Paolo, P. Prashant, W. C. Vince, C. Stefano, G. Alberto and H. Y. Fun, "Wireless sensor networks: 
A survey on the state of the art and the 802.15.4 and ZigBee standards," Computer Communications, 
vol. 30, no. 7, pp. 1655-1695, 2007.  
[5]  L. Atzori, A. Iera and G. Morabito, "The Internet of Things: A survey," Computer Networks, vol. 54, 
no. 15, pp. 2787-2805, Oct 2010.  
[6]  M. A. Hanson, H. C. Powel, A. T. Barth, K. Ringgenberg, B. H. Calhoun, J. H. Aylor and J. Lach, 
"Body area sensor networks: Challenges and opportunities.," Computer, vol. 42, no. 1, pp. 58-65, 2009.  
[7]  Probe srl , " Precision Agriculture: Probe srl," [Online]. Available: 
http://www.probesrl.net/eng/solutions/wireless-solutions/precision-agriculture. [Accessed 7 11 2012]. 
[8]  R. Bogue, "Swarm intelligence and robotics," Industrial Robot: An International Journal, vol. 35, no. 
6, pp. 488-495, 2008.  
[9]  I. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, "Wireless sensor networks: a survey," 
Computer Networks, vol. 38, no. 4, p. 3930422, March 2002.  
[10]  B. Shaw, "Low-PowerDesign," [Online]. Available: http://low-
powerdesign.com/sleibson/2011/05/01/future-cars-the-word-from-gm-at-idc%E2%80%99s-smart-
technology-world-conference/. [Accessed 7 11 2012]. 
[11]  R. Yu, "Phys.org," [Online]. Available: http://phys.org/news/2012-05-wearable-devices-track-people-
wireless.html. [Accessed 7 11 2012]. 
[12]  Alemdar, Hande and a. C. Ersoy, "Wireless sensor networks for healthcare: A survey.," Computer 
Networks, vol. 54, no. 15, pp. 2688-2710, 2010.  
[13]  B. Michel and A. Graham, " Li-ion batteries and portable power source prospects for the next 5–10 
years," Journal of Power Sources, vol. 136, no. 2, pp. 386-394, 2004.  
[14]  S. Roundy, E. Leland, J. Baker, E. Carleton, E. Reilly, E. Lai, B. Otis, J. Rabaey, P. Wright and V. 
Sundararajan, "Improving power output for vibration-based energy scavengers," Pervasive Computing, 
IEEE, vol. 4, no. 1, pp. 28- 36, 2005.  
[15]  T. Harms, S. Sedigh and a. F. Bastianini., "Structural health monitoring of bridges using wireless 
sensor networks.," Instrumentation & Measurement Magazine, vol. 13, no. 6, pp. 14-18, 2010.  
[16]  J. P. Lynch and K. J. Loh., "A summary review of wireless sensors and sensor networks for structural 
101 
 
health monitoring," Shock and Vibration Digest, vol. 38, no. 2, pp. 91-130, 2006.  
[17]  I. Stoianov, L. Nachman, S. Madden, T. Tokmouline and a. M. Csail., "PIPENET: A wireless sensor 
network for pipeline monitoring.," Information Processing in Sensor Networks, vol. 6, pp. 264-273, 
2007.  
[18]  T. Becker, M. Kluge, J. Schalk, K. Tiplady, C. Paget, U. Hilleringmann and a. T. Otterpohl, 
"Autonomous sensor nodes for aircraft structural health monitoring.," Sensors, vol. 9, no. 11, pp. 1589-
1595, 2009.  
[19]  H. Kdouh, G. Zaharia, C. Brousseau, G. E. Zein and a. G. Grunfelder, "ZigBee-based sensor network 
for shipboard environments.," Signals, Circuits and Systems (ISSCS), no. 10, pp. 1-4, 2011.  
[20]  S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser and a. M. Turon, "Health monitoring of 
civil infrastructures using wireless sensor networks.," Information Processing in Sensor Networks, pp. 
254-263, 2007.  
[21]  K.-C. Lu, Y. Wang, J. P. Lynch, C. H. Loh, Y.-J. Chen, P. Y. Lin and Z. K. Lee., "Ambient vibration 
study of the Gi-Lu cable-stay bridge: Application of wireless sensing units.," Proceeding of SPIE 13th 
Annual Symposium on Smart Structures and Materials, 2006.  
[22]  V. Shnayder, M. Hempstead, B.-r. Chen, G. W. Allen and a. M. Welsh., "Simulating the power 
consumption of large-scale sensor network applications.," Proceedings of the 2nd international 
conference on Embedded networked sensor systems, pp. 188-200, 2004.  
[23]  B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, 2000.  
[24]  B. Razavi, in RF Microelectronics, New Jersey, Prentice Hall, 1998, pp. 48-49. 
[25]  Texas Instruments, "CC2500," Texas Instruments, [Online]. Available: 
http://www.ti.com/product/cc2500. [Accessed 1 12 2012]. 
[26]  Texas Instruments, "CC2540," Texas Instruments, [Online]. Available: 
http://www.ti.com/product/cc2540. [Accessed 1 12 2012]. 
[27]  Analog Devices, "ADF7242," Analog Devices, [Online]. Available: http://www.analog.com/en/rfif-
components/rfif-transceivers/adf7242/products/product.html. [Accessed 1 12 2012]. 
[28]  Microchip, "PIC12F1822," [Online]. Available: 
http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en544839. [Accessed 28 1 2013]. 
[29]  T. Schmid, Z. Charbiwala, J. Friedman, M. B. Srivastava and a. Y. H., "The True Cost of Accurate 
Time," UCLA, Technical Report TRUCLA-NESL, 2008. 
[30]  T. Schmid, J. Friedman, Z. Charbiwala, Y. H. Cho and a. M. B. Srivastava, "Low-power high-accuracy 
timing systems for efficient duty cycling," Proceedings of the 13th international symposium on Low 
power electronics and design, pp. 75-80, 2008.  
[31]  ambiqmicro.com/products/, "ULTRA LOW POWER SEMICONDUCTORS. REDEFINED," [Online]. 
Available: http://ambiqmicro.com/products/. 
[32]  D.-Y. Yoon, C.-J. Jeong, J. Cartwright, H.-Y. Kang, S.-K. Han, N.-S. Kim, D.-S. Ha and a. S.-G. Lee., 
"A New Approach to Low-Power and Low-Latency Wake-Up Receiver System for Wireless Sensor 
Nodes.," Journal of Solid State Circuits, vol. 47, no. 10, pp. 2405-2419, 2012.  
[33]  S. Moazzeni, G. E. Cowan and a. M. Sawan., "A 28µW sub-sampling based wake-up receiver with− 
70dBm sensitivity for 915MHz ISM band applications.," Circuits and Systems (ISCAS), pp. 2797-2800, 
2012.  
[34]  P. Kolinko and a. L. E. Larson., "Passive RF receiver design for wireless sensor networks.," Microwave 
102 
 
Symposium, pp. 567-570, 2007.  
[35]  B. W. Cook, A. Berny, A. Molnar, S. Lanzisera and a. K. S. Pister., "Low-power 2.4-GHz transceiver 
with passive RX front-end and 400-mV supply.," Journal of Solid State Circuits, vol. 41, no. 12, pp. 
2757-2766, 2006.  
[36]  B. H. Calhoun, J. Lach, J. Stankovic, D. D. Wentzloff, K. Whitehouse, A. T. Barth and J. K. B. e. al., 
"Body sensor networks: A holistic approach from silicon to users.," Proceedings of the IEEE, vol. 100, 
no. 1, pp. 91-106, 2012.  
[37]  W. Kluge, F. Poegel, H. Roller, M. Lange, T. Ferchland, L. Dathe and a. D. Eggert., "A fully integrated 
2.4-GHz IEEE 802.15. 4-compliant transceiver for ZigBee™ applications.," Journal of Solid State 
Circuits, vol. 41, no. 12, pp. 2767-2775, 2006.  
[38]  T. Song, H.-S. Oh, E. Yoon and a. S. Hong., "A low-power 2.4-GHz current-reused receiver front-end 
and frequency source for wireless sensor network.," Journal of Solid State Circuits, vol. 42, no. 5, pp. 
1012-1022, 2007.  
[39]  D. C. Daly and a. A. P. Chandrakasan., "An energy-efficient OOK transceiver for wireless sensor 
networks.," Journal of Solid-State Circuits, vol. 42, no. 5, pp. 1003-1011, 2007.  
[40]  J. Ayers, K. Mayaram and a. T. S. Fiez., "An ultralow-power receiver for wireless sensor networks.," 
Journal of Solid State Circuits, vol. 45, no. 9, pp. 1759-1769, 2010.  
[41]  D. D. Wentzloff and A. P. Chandrakasan., "A 47pJ/pulse 3.1-to-5GHz All-Digital UWB Transmitter in 
90nm CMOS.," in Solid-State Circuits Conference, San Francisco, 2007.  
[42]  Norimatsu, Takayasu, R. Fujiwara, M. Kokubo, M. Miyazaki, A. Maeki, Y. Ogata, S. Kobayashi, N. 
Koshizuka and K. Sakamura, "A UWB-IR transmitter with digitally controlled pulse generator.," Solid-
State Circuits, vol. 42, no. 6, pp. 1300-1309, 2007.  
[43]  A. P. Chandrakasan, F. S. Lee, D. D. Wentzloff, V. Sze, B. P. Ginsburg, P. P. Mercier, D. C. Daly and 
a. R. Blazquez., "Low-power impulse UWB architectures and circuits.," Proceedings of the IEEE, vol. 
97, no. 2, pp. 332-352, 2009.  
[44]  D. Koller and N. Friedman, Probabilistic Graphical Models, The MIT Press, 2009.  
[45]  Standford online, "Probabilistic Graphical Models," [Online]. Available: 
https://class.coursera.org/pgm-2012-002/class/index. [Accessed 3 1 2013]. 
[46]  J. G. Proakis and D. K. Manolakis, Digital Signal Processing Principles, Algorithms, and Applications, 
Prentice Hall, 2006.  
[47]  E. J. Candes and M. B. Wakin, "An Introduction To Compressive Sampling," IEEE Signal Processing 
Magazine, March 2008.  
[48]  D. Donoho, "Compressed Sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 
1289-1306, April 2006.  
[49]  E. Candes and T. Tao, "Near-Optimal Signal Recovery From Random Projections: Universal Encoding 
Strategies?," IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406-5425, Dec. 2006.  
[50]  "l1-magic," [Online]. Available: http://users.ece.gatech.edu/~justin/l1magic/. [Accessed 26 11 2012]. 
[51]  University of Michigan, "Matlab Compressive Sensing Tutorial," [Online]. Available: 
http://users.ece.gatech.edu/~justin/l1magic/. [Accessed 27 11 2012]. 
[52]  R. Blazquez, F. Lee, D. Wentzloff, B. Ginsburg, J. Powell and A. P. Chandrakasan, "Direct Conversion 
Pulsed UWB Transceiver Architecture," 2005. 
[53]  S. Iida, K. Tanaka, H. Suzuki, N. Yoshikawa, N. Shoji, B. Griffiths, D. Mellor, F. Hayden, I. Butler 
103 
 
and J. Chatwin, "A 3.1 to 5 GHz CMOS DSSS UWB transceiver for WPANs," in Solid-State Circuits 
Conference, Feb 2005.  
[54]  G. Retz, H. Shanan, K. Mulvaney, S. O'Mahony, M. Chanca, P. Crowley, C. Billon, K. Khan and P. 
Quinlan, "A highly integrated low-power 2.4GHz transceiver using a direct-conversion diversity 
receiver in 0.18µm CMOS for IEEE802.15.4 WPAN," in ISSCC, 2009.  
[55]  C. M. C. Sandner, A. Santner, T. Hartig and a. F. Kuttner., "A 6-bit 1.2-GS/s low-power flash-ADC in 
0.13-μm digital CMOS.," Journal of Solid-State Circuits, vol. 40, no. 7, pp. 1499-1505, 2005.  
[56]  P. C. Scholtens and a. M. Vertregt., "A 6-b 1.6-Gsample/s flash ADC in 0.18-μm CMOS using 
averaging termination.," Journal of Solid-State Circuits, vol. 37, no. 12, pp. 1599-1609, 2002.  
[57]  K. Uyttenhove and a. M. S. Steyaert., "A 1.8-V 6-bit 1.3-GHz flash ADC in 0.25-μm CMOS.," Journal 
of Solid-State Circuits, vol. 38, no. 7, pp. 1115-1122, 2003.  
[58]  D. Lachartre, B. Denis, D. Morche, L. Ouvry, M. Pezzin, B. Piaget, J. Prouvee and P. Vincent, "A 
1.1nJ/b 802.15.4a-compliant fully integrated UWB transceiver in 0.13µm CMOS," in Solid-State 
Circuits Conference, Feb 2009.  
[59]  M. Braun, J. P. Elsner and a. F. K. Jondral., "Signal Detection for Cognitive Radios with Smashed 
Filtering.," VTC, vol. 69, pp. 1-5, 2009.  
[60]  E. V. Berg and a. M. P. Friedlander., "SPGL1: A solver for large-scale sparse reconstruction.," June 
2007. [Online]. Available: http://www.cs.ubc.ca/labs/scl/spgl1. 
[61]  V. D. Berg, Ewout and a. M. P. Friedlander., "Probing the Pareto frontier for basis pursuit solutions.," 
SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890-912, 2008.  
[62]  O. Khan, S.-Y. Chen, D. Wentzloff and W. Stark, "Impact of Compressed Sensing With Quantization 
on UWB Receivers With Multipath Channel Estimation," Emerging and Selected Topics in Circuits 
and Systems, IEEE Journal on , vol. 2, no. 3, pp. 460,469, Sept. 2012.  
[63]  R. Kitai and a. K.-H. Siemens., "A Hazard-Free Walsh-Function Generator.," IEEE Transactions on 
Instrumentation and Measurement, vol. 21, no. 1, pp. 80-83, 1972.  
[64]  M. A. Davenport, M. F. Duarte, M. Wakin, J. Laska, D. Takhar, K. Kelly and a. R. Baraniuk., "The 
smashed filter for compressive classification and target recognition.," Proceedings of SPIE, vol. 6498, 
2007.  
[65]  O. Khan and D. Wentzloff, "1.2 GS/s Hadamard Transform front-end for compressive sensing in 65nm 
CMOS," in Radio and Wireless Symposium (RWS), Jan. 2013.  
[66]  L. Brooks and a. H.-S. Lee., "A 12b 50MS/s fully differential zero-crossing-based ADC without 
CMFB.," International Solid-State Circuits Conference, pp. 166-167, 2009.  
[67]  T. Burd, T. Pering, A. Stratakos and R. Brodersen, "A dynamic voltage scaled microprocessor system," 
Solid-State Circuits, IEEE Journal of ,, vol. 35, no. 11, pp. 1571-1580, Nov. 2000.  
[68]  IEEE, "IEEE Standards Association," [Online]. Available: 
http://standards.ieee.org/getieee802/download/802.15.4-2011.pdf. [Accessed 3 1 2013]. 
[69]  N.-J. Oh and S.-G. Lee, "Building a 2.4-GHZ radio transceiver using IEEE 802.15.4," Circuits and 
Devices Magazine, vol. 21, no. 6, pp. 43-51, 2006.  
[70]  J. Proakis and M. Salehi, Digital Communications, McGraw-Hill, Nov 6 2007.  
[71]  M. Flynn, C. Donovan and L. Sattler, "Digital calibration incorporating redundancy of flash ADCs," 
Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on,, vol. 50, no. 5, 
pp. 205-213, May 2003.  
104 
 
[72]  Y.-H. Liu, X. Huang, M. Vidojkovic, A. Ba, P. Harpe, G. Dolmans and H. de Groot, "A 1.9nJ/b 
2.4GHz MultiStandard (Bluetooth Low Energy/Zigbee/IEEE 802.15.6) Transceiver for Personal/Body-
Area Networks," in ISSCC, 2013.  
[73]  Z. Lin, P.-I. Mak and R. Martins, "A 1.7mW 0.22mm2 2.4GHz ZigBee RX exploiting a current-reuse 
blixer + hybrid filter topology in 65nm CMOS," in ISSCC, 2013.  
[74]  I. A. Beinlinch, H. J. Suermondt, R. M. Suermondt and G. F. and Cooper, "The ALARM monitoring 
system: A case study with two probabilistic inference techniques for belief networks," in Proc. 2nd 
Europ. Conference on Artificiai Intelligence in Medicine pp. 247-256. , 1989.  
[75]  "HUGIN EXPERT," [Online]. Available: www.hugin.com. 
[76]  Wikipedia, "Selectivity (electronic)," [Online]. Available: 
http://en.wikipedia.org/wiki/Selectivity_(electronic). [Accessed 12 11 2012]. 
[77]  C. P. Yue, "Receiver Architecture," [Online]. Available: 
http://www.ece.ucsb.edu/yuegroup/Teaching/ECE594BB/Lectures/Wireless%20Reciever%20Arch.pdf. 
[Accessed 12 11 2012]. 
[78]  Texas Instruments, "MSP430C091," Texas Instruments, [Online]. Available: 
http://www.ti.com/product/msp430c091. [Accessed 1 12 2012]. 
[79]  Zigbee Alliance, "Zigbee," [Online]. Available: http://www.zigbee.org/. [Accessed 11 12 2012]. 
[80]  A. A. Hafez, M. A. Dessouky and a. H. F. Ragai., "Design of a low-power ZigBee receiver front-end 
for wireless sensors.," Microelectronics journal, vol. 40, no. 11, pp. 1561-1568, 2009.  
[81]  T.-K. Nguyen, N.-J. Oh, V.-H. Le and a. S.-G. Lee., "A low-power CMOS direct conversion receiver 
with 3-dB NF and 30-kHz flicker-noise corner for 915-MHz band IEEE 802.15. 4 ZigBee standard.," 
Microwave Theory and Techniques, vol. 54, no. 2, pp. 735-741, 2006.  
[82]  T.-K. Nguyen, V. Krizhanovskii, J. Lee, S.-K. Han, S.-G. Lee, N.-S. Kim and a. C.-S. Pyo., "A Low-
Power RF Direct-Conversion Receiver/Transmitter for 2.4-GHz-Band IEEE 802.15. 4 Standard in 
0.18um CMOS Technology," Microwave Theory and Techniques, vol. 54, no. 12, pp. 4062-4071, 
2006.  
[83]  J. Bryzek, "Roadmap for the Trillion Sensor Universe," Berkeley, April 2, 2013. 
 
