Abstract-Small, low-cost and energy efficient wireless sensor nodes (WSNs) form a vital part of the Internet of Things (IoT). WSNs spend the majority of their time in low-power sleep mode and wake up for short intervals. To ensure minimum energy (MinE) operation of such sensor nodes, fast wide-range voltage scaling is required. As voltage is aggressively scaled between ultra-low retention levels and sub, near or super-threshold active levels, monitoring circuits become essential to guarantee safe operation of the system. This work demonstrates a 50nW voltage monitor fabricated as part of the power management unit (PMU) of a 65nm MinE WSN. Ultra low power operation is achieved by duty-cycling the comparators. Further, dynamic power-bandwidth balancing results in lower quiescent power without loss of response speed. Measured results show 6μs response time giving a superior power-delay balance compared to prior works. This paper describes the design, implementation and measured results along with system implications of the design choices.
I. INTRODUCTION
Recent works have demonstrated CPU-system designs that can operate at supply voltages below transistor threshold voltages (sub-550mV) [1] [2][3] [4] . This allows minimum energy (MinE) operation which is ideal for many emerging sensor applications [5] [6] [7] . Such sensors are severely energy constrained and have low activity rates with relatively lower performance requirements than is offered by current technology nodes at nominal voltages. Many of these applications can also harvest energy from their environment giving theoretically unlimited lifetimes [8] [9] .
A common objective of MinE CPU-systems is to minimize leakage because WSNs spend the majority of their time in sleep modes. Leakage energy also increases exponentially at low voltages, further degrading total energy. To minimize leakage fine-grained power gating is used with example systems having as many as 14 power domains [4] . Integrated voltage regulators (IVRs) are another common feature of MinE CPU-systems to: 1) obtain the low voltages required for subthreshold operation because battery voltages are 1.2V or higher and 2) reduce latency during sleep and active mode transitions. An 80μs sleep-active-sleep transition is demonstrated in [4] .
Fast wide-range dynamic voltage scaling (DVS) is desirable in WSNs to enable frequent entry into sleep modes and also to maximize sleep time. Enabling the clock to the CPU-system upon wake-up requires careful consideration. An early enable can cause timing violations and is catastrophic, while a delayed enable defeats ultra low power (ULP) operation. Voltage monitors are thus required to guarantee safe regulator voltage (V REG ) levels before the clock is enabled. The proposed design demonstrates an ULP voltage monitor scheme using: 1) Duty cycled comparators with dynamic hysteresis tuning. 2) Runtime power-bandwidth trade-off to reduce active power. 3) State-aware tuning to provide 6μs response speed.
The next section describes the system with the voltage monitor highlighting key design metrics. The design and implementation of the proposed voltage monitor scheme is explained in section III. Measured results are presented and compared with prior-art in section IV and conclusions are drawn in section V.
II. SYSTEM DESCRIPTION
The PMU and CPU-system interface can be reliably timed by the PMU clock if the interaction is limited to clock and/or power gating. With dynamic voltage and frequency scaling however, voltage settling and clock-locking introduces additional delays during mode changes. Since WSNs using MinE CPU-systems operate at frequencies not exceeding tens of MHz [10] , clock settling is relatively deterministic. Voltage settling can have a greater degree of uncertainty because integrated switched capacitor converters in MinE systems can have higher output impedance compared to linear regulators. Fig. 1a shows the PMU interface with a CPU-system and voltage monitor. Under ideal conditions, (Fig. 1b) the CPUsystem asserts voltage change request (CHV) when a mode change is desired. This is captured by the PMU state-machine and CPU-system clock is disabled (CKEN). The IVR setting is then changed to the requested value while de-asserting the ACK signal. Assuming the system rail voltage settles immediately, CKEN is asserted followed by ACK. The CPUsystem resumes in the requested mode and CHV is de-asserted. No monitoring scheme is necessary.
In practice however, the transition time (T CH ) is much longer, being dominated by the voltage settling time (T VS ) and the time it takes for the monitor to detect an in-range condition (T VMON ). This is illustrated in Fig. 2 . It is desirable to minimize T VS and T VMON . T VS is affected by load current which is sensitive to temperature, process and system workloads and IVR design characteristics such as output impedance and on/off-chip decoupling capacitance. The objective of this work is to minimize T VMON . CHV (1) CKEN (2) IVR (3) ACK ( Voltage monitors typically use comparators with factorytrimmed threshold voltages for detecting an unsafe rail voltage condition. Sensing slow rising or non-monotonic rail voltages can cause oscillations as rail voltage approaches threshold voltage. This is overcome by using two comparators with slightly offset threshold voltages [11] [12] . This two-level monitoring adds hysteresis to the comparator but allows only for a lowvoltage unsafe condition to be monitored. In MinE systems however, it is necessary to independently monitor for overvoltage conditions as excess leakage can adversely affect MinE operation. In the conventional scheme this would require four comparators making monitoring an energy expensive task.
Monitoring may be implemented with either continuoustime comparators [12] or clocked comparators [13] . Continuous-time comparators exhibit fast response speed but at the expense of higher quiescent power and having four comparators (for upper and lower thresholds) in the always-ON PMU domain can be particularly detrimental to MinE WSNs. Clocked comparators have relatively lower quiescent power but suffer from overheads of having to generate a dedicated clock. Leakage based oscillators with thyristor-like gain stages have been used to generate comparator clock to reduce these overheads but at the expense of speed [3] [13]. This slow clock results in increase in T VMON which is undesirable for fast DVS.
III. PROPOSED DESIGN AND IMPLEMENTATION
The proposed scheme uses reference tuning to add hysteresis, allowing both upper and lower limits to be monitored using two comparators. Figure 3 shows the schematic of the proposed voltage monitor. The comparators and threshold voltage generators can be power gated using PGEN signal. This minimizes static power when the voltage monitor is power gated (system deep-sleep mode). In this mode, the IVR is OFF and hence the monitoring circuit too can be powered down. The upper and lower comparison thresholds (V TU and V TL ) can be programmed using TUSEL and TLSEL. The tunable range between V TU and V TL covers the entire DVS range of the MinE CPU-system. The key feature of this work is the bias current selection bits (BUSEL and BLSEL) for both upper and lower comparators (CMPU and CMPL). The bias selection bits are exercised in a manner so as to minimize the quiescent power of the voltage monitor without compromising monitoring speed. (3)- (2) transition can be fatal to the system while a (3)-(4) transition is less critical. The CPU-system remains functional in state (4) but potentially at a much higher energy cost. Therefore in state (3) CMPU quiescent current is further reduced. The proposed scheme allows three bias current settings to be dialled into the comparators. A BxSEL setting of '3' provides fastest response at highest quiescent power and a setting of '1' provides lowest power operation. Table I summarizes the bias configuration for each state as highlighted in Fig. 4 .
Hysteresis may be added depending on the corresponding comparator output as described in [13] . The proposed design relies on TxSEL bits for achieving this. Thus state (2)- (3) transition is at V TL plus a small voltage (ΔV) while a (3)- (2) transition is at V TL -ΔV. Similarly (3)-(4) occurs at V TU +ΔV and (4)-(3) at V TU -ΔV thus preventing any oscillations. This is indicated in Table I as ΔV TU and ΔV TL .
A. Comparator
Fig . 5 shows the schematic of the comparator. The tailcurrent transistor is a thick gate oxide (TGO) device to allow for better V BIAS control of tail-current. The bias selection bits effectively change the mirror ratio between M17 and M6 controlling response speed of the comparator and its quiescent current. A stack of 6 diode-connected regular-V T transistors (M11-M17) is used for bias generation. Comparators for sensing low voltages use of PMOS input transistors to improve the gain in the input stage [13] which affects quiescent power. The input differential pair (M4, M5) in the proposed design use NMOS transistors and the lack of gain is compensated by using large low-V T devices. This allows input voltages as low as 0.2V to be sensed reliably. The output of the differential stage drives an inverter with stacked high-V T devices (M8-M10) which limits short-circuit current and helps reduce power [12] . M1 and M7 allow the comparator to be power gated with Q forced high. Fig. 6 shows simulation results for supply voltage of 1.0-1.4V and temperature range of 0 -100C. Response speed is measured as the average delay (Fig. 6a) for a correct transition on Q for change in V REGINT from V T -100mV to V T +100mV [14] . Fig. 6b shows the comparator response speed against Fig. 6c shows the response speed with varying temperature. At sufficiently large tail currents the comparator speed is less affected by temperature. Both speed and quiescent power increase exponentially with bias setting. So speed can be traded with power. Another consequence of reducing power is the increased sensitivity of comparator speed to voltage and temperature. Simulation results (Fig. 7) show a 20,000x increase in sensitivity with temperature and 2,000x increase for voltage. However, since the design relies on using low bias current modes only when comparator response is less critical or is not needed, this increased sensitivity does not affect system active-sleep-active transitions.
B. Threshold Voltage Generator and Divider
Threshold voltage for the comparators is generated using stacked diode-connected transistors [15] as shown in Fig. 8 . Both V TU and V TL are obtained from the lower half of the stack to give identical behaviour as temperature varies. For a nominal supply voltage of 1.2V all transistors in the stack operate in sub-threshold regime. PMOS devices are used in source-connected isolated N-wells to avoid body effects and ease layout. Each node in the divider stack is decoupled using 20fF MOS capacitors to provide rejection of high frequency supply ripple. Further, the ON resistance of the multiplexers and a 120fF capacitance on the output node reduces noise on the reference node. The speed and accuracy of comparison depends on both comparator and the threshold voltage generator. The comparators use large devices, common-centroid matched layout, guard rings and dummy devices with sufficient distance between active devices and the well edges to minimise well-proximity effects. Thus contribution of comparator variation to variation in trip points is relatively small. The threshold voltage generator on the other hand uses devices in isolated wells which are not matched in layout. They are more prone to on-chip variation. Thus accuracy of comparison is largely determined by variations in the threshold voltage generator. Figure 9 shows the variation of threshold voltages (V TU and V TL ) for different tap settings over 1000 monte-carlo runs. The worst case spread for V TL is about 60mV and 64mV for V TU . For both V TU and V TL , the box height shows the spread with center bar indicating the corresponding mean. For the same threshold voltage setting V TU and V TL do not overlapmeaning the circuit will always provide a reliable comparison window (V COMP ). This however is a pessimistic result for V COMP because the minimum of V TU and maximum of V TL do not occur simultaneously. The mean values for V COMP and the corresponding hysteresis (ΔV) obtained from simulations is tabulated in Table II. Note that for V REG greater than V BAT /2 (approximate), the comparator sense voltage is divided by 2 using FB2 (Fig. 3) . Since the divided version of V REG is obtained at the midpoint of the diode stack (Fig. 8) , the ratio remains independent of temperature and V REG .
IV. RESULTS
This section presents measured DC and transient results of the proposed scheme. cases: (a) with V REG increasing up to the desired range before decreasing and (b) with V REG increasing beyond the desired range (over-voltage). Fig. 10a shows a ΔV TL of 220mV. However when V REG exceeds V TU (Fig. 10b) ΔV TL is redundant and is reduced to 5mV. A 120mV ΔV TU prevents QU from oscillating. Note that Q INRANGE is asserted only for V TL < V REG < V TU . Fig 11 shows the transient results with V REG transitioning from V TL -30mV to V TL +30mV. Since this does not exceed V TU , QL determines Q INRANGE . Note that the delay in detecting an in-range condition is 6μs (1.2V, room temperature). clock frequencies achievable at super-threshold voltages.
The voltage monitor has highest energy consumption (power times duration) in state 3 when CMPU and CMPL have bias settings of 1 and 2 ( Table I ). The voltage monitor consumes 50nW in this setting at 1.2V as shown in Fig. 13 . Variation of quiescent power with supply voltage and temperature is also shown. The proposed design is compared with state-of-the-art in Table III . The energy expended while waiting for a response from the monitor (E wait ) is the lowest for the proposed design. The chip plot is shown in Fig. 14 . The voltage monitor uses 58μ x 33μ area which is dominated by the two comparators. 
V. CONCLUSION
Scaling supply voltage to sub/near threshold level is necessary to achieve minimum energy operation in processors for WSNs. To best exploit potential energy savings, such WSNs need assist circuits, many of which perform analog functions. This paper described the implementation of an ultra low power voltage monitor circuit to assist MinE CPU-systems with fast wide-range voltage scaling. The proposed scheme achieves better balance between response speed and quiescent power as shown in Fig. 15 .
The benefits from MinE CPU-systems can easily be overwhelmed by slow or high power voltage monitors. For the CPU-system described in [4] , leakage power increases from 100nW by 16x when V REG changes from 0.3V to 0.8V in preparation for active mode. The proposed voltage monitor saves 1.11nJ/wake by reducing the delay to enabling the CPU-system clock by 100x. The voltage monitor thus becomes energy neutral for sensor workloads with more than 50 wakes/second. Duty-cycled comparators and state-aware dynamic power-bandwidth tuning limit the overheads of the proposed monitoring scheme to 1% of the CPU-system active power at MEP voltage.
