Designing an ultra-low power sensor node requires careful consideration of the system-level energy budget. Depending on applications, various components can dominate total energy. In this paper, we review three different system energy budget scenarios where any of the microprocessor, memory, and timer of a sensor node can dominate the energy budget. The design space and corresponding trade-offs for these three components are explored to suggest guidelines for the design of ultra-low power sensor nodes.
Radio -TX ULP UWB transmitter energy 1.65 nJ/bit [3] Radio -RX ULP UWB receiver power 1.64 mW [3] 
INTRODUCTION
Sensor nodes collect useful data from their environment in a distributed fashion and process the collected raw data to extract essentials to be transferred to user or central data collection point. Recently, with advances in circuit design, packaging and battery technologies, tiny sensor nodes whose volume is as small as a few cubic-millimeters are demonstrated [1] . Figure 1 shows key building blocks of ultra-low power sensor nodes. Sensor nodes typically consist of sensors for measurement, a microprocessor for data processing, a memory for data and execution code storage, a radio for data transfer, a timer for wake up and communication control, and a power management unit. Although sensor nodes commonly includes these function units, each function unit's frequency or length of activation can largely vary from applications to applications which makes the system energy budget varies significantly.
Other function units, such as wake up receiver, energy harvesting unit can be added for more complete system, but scope of discussion in this paper will be limited to the key function units illustrated in Figure 1 for simplicity. Table 1 shows an example energy/power consumption of function blocks in state-of-the-art literatures [1] . Table 2 shows usage models of these function blocks in three different scenarios. Scenario 1 represents a surveillance application. A sensor node wakes up every 5 seconds and takes an image of a monitored object. The image is then processed with microprocessor (MP) to determine if there was significant change from past images. The result is then transferred to base station. For image processing, a large (16kB) memory (compared to the other scenarios) has to be implemented in this scenario. Scenario 2 represents a temperature monitoring application where a sensor node wakes up every 10 minute and measures temperature periodically. Measured temperature is stored in memory and transferred to a base station hourly. Due to the small data size, a smaller 1kB memory can be used for this scenario. Scenario 3 is temperature monitoring sensor identical to Scenario 2, except that measured data is collected by symmetric communication among sensor nodes instead of centralized base station.
With power/energy numbers given in Table 1 and usage models  in Table 2 , energy usage in each scenario can be estimated as shown in Figure 2 . With Scenario 1, which is relatively computationally intensive, the microprocessor dominates system energy. Therefore, optimizing the microprocessor for minimum energy operation is the key approach for minimizing overall energy consumption and maximizing lifetime of the sensor node.
With sensor nodes that wake up infrequently and spend most of their time in standby mode, memory leakage power dominates total energy as shown in Scenario 2. This is because, unlike other function blocks, memory cannot be power-gated in standby mode for minimizing leakage power since measurement data and execution code stored in the memory will be lost with power-gating. Therefore, to design a sensor node with infrequent Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. activity, minimizing memory leakage with careful selection on memory topology is required.
Sensor nodes are required to form a wireless sensor network in some applications, which requires symmetric communication among the sensor nodes. Scenario 3 shows how energy budget could change with such sensor nodes. With an ULP sensor node, symmetric radio communication should be periodically activated since its energy budget does not allow continuous radio activation. This radio activfation timing has to be precisely determined by a low power timer. If the activation time determined by one sensor node is different from the other sensor node, receiving sensor node has to activate radio receiver for mismatched amount of time until transmitter is activated. Due to the high receiver power (in the order of mW), energy wasted for receiver activation during this mismatch can dominate energy budget of this types of sensor nodes as shown in Figure 2 . Therefore, accuracy and power trade-off of the timer should be carefully explored for designing sensor nodes with symmetric communication.
These three sensor node usage scenario confirms that function block which dominates the energy can vary. Once dominating function block is determined at system level, circuit level design optimization has to be done on the dominating function blocks. Therefore, in this paper, design guidelines and trade-offs for three different function blocks, namely -microprocessor, memory and timer -are discussed at the circuit level.
MICROPROCESSOR DESIGN CONSIDERATIONS IN ULP SENSOR NODES
For sensor nodes where microprocessor dominates system energy budget, optimizing operation conditions of microprocessor is of critical concern to reduce energy consumption. There are two key knobs that can be adjusted in design time, namely supply voltage scaling [7] and fabrication technology selection [8] .
Supply Voltage
Voltage scaling is a popular approach for microprocessor energy reduction [5] [6] . Although voltage scaling incurs a performance penalty, typical sensor node performance requirements aremuch lower than that of commercial microprocessors shown in [5] [6], allowing voltage scaling to the near-threshold or sub-threshold regions. However, excessive voltage scaling can actually increase energy consumption. Therefore, for sensor nodes whose microprocessor takes significant portion of system energy budget, minimum energy supply voltage (V min ) [7] should be carefully determined.
Microprocessor energy consumption can be represented as the sum of switching (E switch ) and leakage energy (E leak ) [EQ1]. Since switching energy is proportional to the square of supply voltage [EQ2], voltage scaling enables a quadratic E switch reduction.
[EQ1]
[EQ2]
E leak can be represented as product of execution time (T) and leakage current (I leak ) of the microprocessor [EQ3]. In the superthreshold regime, E leak has only a minor impact on overall energy consumption. However, as voltage scaling is extended towards the sub-threshold regime, the exponential reduction of on-current incurs an exponential increase in execution time (T), making E leak non-negligible compared to the quadratically reduced E switch . Therefore, with excess voltage scaling, the quadratic E switch reduction can be outweighed by the exponential increase in E leak .
T
[EQ3] Figure 3 shows the simulated energy-supply voltage relationship for a single transition in a 51 stage inverter chain. The minimum energy operation point is approximately 220mV where the increase in E leak overcomes the reduction of E switch .
Meanwhile, sensor node microprocessors can have various architectures, and even with identical architecture, execution code varies across applications, making activity factor vary. Some instructions, such as multiplication/division, demand high circuit activity whereas simple instructions, such as register reads, do not. Therefore, computationally intensive applications can increase activity factor (or vice versa), andthis activity factor should be considered when determining the minimum energy Figure 4 shows the impact of activity factor on V min varies . With a higher activity factor, microprocessor energy consumption is more dominated by E switch , making quadratic switching energy reduction more effective down to lower voltages. Therefore, V min is lower with high activity factor.
In summary, sensor node microprocessors should leverage voltage scaling as far as performance requirements allow while comprehending the limitations posed by the minimum energy point.
Technology Selection
The previous section showed that the minimum energy operating voltage can be used for microprocessor designs. However, this discussion was limited to the time when the microprocessor is active. It also assumed that an arbitrary microprocessor frequency can be chosen for minimum energy operation. In practical sensor node design, standby time energy consumption should be considered. Moreover, there may be specific performance requirements. By carefully selecting fabrication technology, these two factors can be considered in microprocessor design.
Microprocessor energy consumption is largely affected by two key metrics -duty cycle and performance. Duty cycle is defined as the ratio of active time to total time. To first order performance is simply the clock frequency that a microprocessor runs at -we model this using 40 fan-out-of-4 (FO4) inverter delays in this discussion, which represents a typical low voltage processor microprocessor cycle time [8] . Using these two metrics, the design space for a microprocessor in a sensor node is depicted in Figure 5 . Microprocessor for continuous monitoring, such as biosignal monitoring [9] , have high duty cycles (bottom of Figure 5 ), whereas microprocessors that wake up infrequently -every 10 minutes or longer -have low duty cycles (top of Figure 5 ) [10] . Sensor nodes with temperature or pressure sensors would not require high microprocessor performance (left of Figure 5 ) while sensor nodes with imaging capabilities require higher performance (right of Figure 5 ).
Microprocessor energy consumption in a sensor node can be represented by [EQ4].
[EQ4]
E active denotes the energy consumed when the microprocessor is active. Therefore, E active consists of energy consumed by switching activity (E swith ) and energy consumed by leakage current during active mode (E leak Figure 6 ) and high duty cycle/high performance (bottom-right) microprocessors.
Relatively old technologies are optimal in microprocessors with high duty cycle and low performance (bottom left of Figure 6 ). Newer technologies are expected to be preferred for high duty cycles to achieve energy optimality. However, faster speed of advanced technology can lead to task completion far ahead of required microprocessor cycle time, which results in unwanted additional idle time and waste of leakage energy. This energy penalty offsets the benefit of low E switch with advanced technology and makes older technologies a better choice.
On the opposite corner of the design space in Figure 6 (low duty cycle with high performance requirements), older technologies offer reduced leakage. However, the larger supply voltage needed to meet performance requirements lead to higher active energy and overall newer technologies are preferable in such scenarios. 
MEMORY DESIGN CONSIDERATIONS IN ULP SENSOR NODES
As wireless sensor nodes require collected data to be stored until being transmitted or processed, memories as data storage are essential elements in the system. Memories, by their nature, have a low activity factor; only a few percent of the circuit is active at a time. In addition, most wireless sensor nodes are duty-cycled and some of them spend ≥90% of the lifetime in the sleep mode. In this context, standby energy consumption dominates the memory's total energy consumption and becomes the critical factor. Non-volatile memories (e.g., Flash) have virtually zero retention (standby) power, but their active power is typically in ~mW range, which far exceeds the energy budget of smaller battery-powered sensor nodes. Thus, embedded volatile memories (e.g., SRAM, eDRAM) are preferred in such systems, however their volatility leads to significant leakage power during standby mode. Generally, smaller devices contribute to a smaller bitcell size and more leakage power. Because the bitcell array leakage power dominates the memory standby power, there exist trade-offs between the standby power and the bitcell size. These can be clearly observed in low-power memories recently reported, which we are grouping them into three as shown in Figure 7 : Regular SRAMs, Sub-V TH SRAMs and eDRAMs, and ULP SRAMs.
Regular SRAMs typically consume >100pW/bit of standby power, easily exceeding 10µW with just 1kB of capacity. However, their bitcell size is usually smaller than that of other types of SRAM, and the operating frequency is much faster with full supply voltages, resulting in up to >300MHz [12] . This type of SRAM is suitable for sensor nodes that require fast operations as well as a large amount of storage, with a relatively large battery size that can afford the large power consumption; both active and static powers can be more than100µW.
Sub-V TH SRAMs reduce standby power down to ~1pW/bit using low supply voltages, hence allowing smaller batteries . However, their robustness requirements, such as write margins and read margins, usually cannot be met with the conventional 6T bitcell structure due to the reduced supply voltage. A typical approach to overcome this issue involves the separation of read and write paths, which requires additional transistors. Because of these added devices, such bitcells incur 50~100% area overhead compared to regular SRAMs. 10T bitcells [14] [15] from recent literature are shown in Figure 8 (a) and (b); the bitcell in (a) exploits stack effect for added leakage reduction, while the bitcell in (b) supports fully differential read and write. The area overhead could be avoided at the cost of multiple supply voltages with high power active operations [17] . For the applications requiring large amount of memory capacity, embedded DRAMs (eDRAM) [18] can be an alternative choice. They have much smaller bitcell size compared to SRAMs, as shown in Figure 8 (c), resulting in much denser memory. However, the refresh power greatly depends on temperature, so the standby power may easily become orders of magnitude higher than that of SRAMs at high temperature. These sub-V TH SRAMs and eDRAMs are in the middle of the spectrum ( Figure  7 ) and can be useful for sensor nodes with a relatively small battery size (~cm 3 ) that do not require high-speed memory access.
At the other end of the spectrum is ULP SRAM. These often use thick-oxide HVT I/O devices, achieving less than 10fW/bit of standby power. This can be attractive for extremely energyconstrained systems; for example, the entire 3kB ULP SRAM in [2] , whose bitcell is shown in Figure 8(d) , consumes only <100pW of standby power, and this enables the use of a mm 3 -scale battery while still providing months of lifetime. Although their operating frequency is usually limited to < 1MHz due to the combination of HVT I/O devices and the low voltage supply, they are preferred in mm 3 -scale sensor nodes because low-power consumption has the highest priority in such systems. However, the extremely low standby power comes with a large area penalty; the 10T HVT bitcell [2] in Figure 8(d) is ~3× larger than regular SRAMs due to the large size requirements of the HVT devices. This limits the maximum memory capacity in small form-factor systems.
TIMER DESIGN CONSIDERATIONS IN ULP SENSOR NODES 4.1 Role of Accurate Timers
The circuit blocks introduced in previous sections can benefit from various circuit techniques to reduce the power consumption. However, unlike those blocks, a radio transceiver block cannot operate at such low power mainly due to physical limitations of antenna efficiency and size. Since allocating [12]
Commercial 6T Figure 7 . Low-power memories in recent publications can be grouped into three classes: Regular SRAMs, Sub-VTH SRAMS and eDRAMs, and ULP SRAMs. Depending on application, designers should choose among these memory circuits.
substantial area for the antenna is usually not possible for ULP wireless sensor applications, the average active power of the radio of a sensor node is still in mW regime, which is 5 to 6 orders of magnitude higher than the average power of other sensor node components [20] . Therefore the operating duration of the radio transceiver block has to be limited to minimize overall system's energy consumption. To make designing the system harder, the level of instantaneous power from the radio transceiver cannot be afforded directly from power source for some applications. To cope with the problem, the system has to have a large capacitor to act as energy reservoir. Therefore the radio transceiver's operating duration not only has significant impact on total energy consumption but may also increase system volume. That is why the system has to be designed to minimize the operating duration of the transceiver. As will be shown in the next paragraph, the accuracy of a timer has substantial impact on average operating duration of the radio transceiver. Figure 9 shows an example scenario of an ULP wireless sensor node system. A sensor node is activated every 20 minutes to take measurement and a microprocessor processes data for 100ms at 3μW. Then once in every hour, collected data is transmitted by radio module, consuming 1mW for 1ms. For any given time, timer is running in background at 1nW to track time change and initiate scheduled tasks at right timing. In total, the sensor node nominally consumes 5.5μJ per an hour.
However, some applications require sensor nodes to communicate with each other. In such a scenario, each sensor is equipped with a timer to initiate communication synchronously. Unfortunately, the timers may suffer from mismatch, leading to one sensor node turning on the radio earlier than others. With 200ms hourly mismatch , the total energy consumption increases to 206μJ per hour, with 200μJ arising due to the mismatch penalty.
Design Considerations
From this example, it is clear that sensor nodes require accurate timers. However several considerations should be made in the design of such timers.
The first is the power consumption of the timer module itself. Unlike other system components, the timer is always on and cannot benefit from duty cycling. For this reason, even if the timer consumes several orders of magnitude lower power than other duty-cycled high power components such as microprocessor, it may easily dominate overall energy consumption. The second one is random error in timing. The timer will suffer from various sources of noise, such as thermal noise, which leads to random mismatch of period between different timers. Finally, process, voltage, and environmental (PVT)fluctuations will impact the timer, similar to conventional circuits. PVT sensitivity must bebe distinguished from random error. Unless the sensors are used in harsh environments, variation sources such as process or temperature are likely to act as fixed offsets between different timers, rather than manifesting as random error. With the help of simple (negligible power) digital logic, fixed offsets can be calibrated to minimize their effect. Therefore, more emphasis can be allocated to the power consumption and random error characteristics of a given timer design.
Before introducing real design examples, we introduce the concept of Allan deviation [21] , the standard metric used to compare the frequency stability performance of timers. Period deviation due to noise is often Gaussian and its magnitude of deviation can be characterized with RMS jitter, which is defined as the standard deviation of timer period. To compare the jitter of two different types of timers, normalized RMS jitter (expressed in ppm) is often used. However, the accuracy of a timer for measuring a multi-cycle synchronization period is not captured by these metrics since they ignore the averaging of jitter over multiple timer cycles. For example, two timers with identical ppm jitter can have different hourly accuraciessince the timer with a shorter period has more cycles to count, allowing for more averaging. In addition, long term stability estimation based on RMS jitter may give a false impression that longer averaging periods are always beneficial, which is refuted by measurements. Instead, timer uncertainty is well characterized by Allan deviation, which is defined by [EQ5], where τ is the observation period and y n is n-th fractional frequency -that is, Δf/f for Figure 10 , where f is frequency -over observation time τ. Bracket <> denotes time average of samples.
Design Examples
Here we briefly describe three recent ultra-low power timers [22] [23] [24] . Trade-off exists among three key parameters discussed in previous section, namely power consumption, random error, and PVT variation. Figure 10 plots Allan deviation at averaging time of 1000 seconds and power consumption of the three designs. Please note that conventional XO means a 32.768kHz Pierce oscillator built with a discrete inverter running at 1V. The performance summary of the three timers is shown in Table 3 . CMOS based circuits have lower power consumption at smaller area. However, the circuits need to oscillate at low frequency since its oscillation is not based on a resonant component. Inevitably, CMOS-based timers have worse performance with random error and PVT variation compared to a quartz crystal based timer. Quartz crystal based timers -commonly denoted as crystal oscillator (XO) -on the other hand, show much better frequency stability characteristics. These merits come at the expense of higher power and extra system volume due to the quartz crystal. The smallest 32.768kHz quartz crystal package available at the date of publication is 1.44mm 3 [27] . There are also MEMS-based resonators available that can be integrated on-chip while providing better accuracy than CMOS circuitbased timers [25, 26] . Power consumption of these approaches is still higher than other timers, but the numbers are expected to reduce with advances in MEMS and circuit techniques.
If the application does not require communication between sensor nodes, or synchronization period is short, CMOS based timers can save power and system volume. On the other hand, if timing has to be tightly controlled, a quartz crystal can be a good option.
CONCLUSIONS
In this paper, we have outlined guidelines for designing an ULP sensor node system. The energy budget of an ULP sensor node can vary significantly depending on application. Therefore, functional blocks that dominate system energy should be assessed early on and the design space for dominating blocks should be explored to balance and minimize the total system energy.
For microprocessors, voltage scaling down to minimum energy operation voltage (V min ) and technology selection for optimizing microprocessor design in consideration of performance requirement and duty cycle is discussed. Various memory topologies are presented to show trade-off between standby power and bitcell area overhead. State-of-the-art low power timers suitable for ULP sensor nodes are also discussed to illustrate the accuracy versus power trade off. 
