I. INTRODUCTION
With increasing power budgets and variations in transistors, there is a need for robust, adaptable, and energy optimal systems. Typically, these needs have been addressed by dynamic voltage scaling (DVS) systems [1] , [2] , [3] . A DVS system is used to provide optimal speed and power performance for circuits by scaling the supply voltage (V dd ). This provides significant energy savings due to the quadratic dependence of switching energy with V dd . Reducing V dd into the subthreshold region, or lower than the threshold voltage (V T ), provides minimum energy consumption of digital CMOS logic [4] . However, in sub-threshold there are increased variations due to process, voltage, and temperature (PVT) conditions.
To adapt to increased variations at low V dd , timing error detection (TED) can be applied within DVS systems. Using TED, all safety margins due to PVT are eliminated by lowering V dd up to and even past the point of first failure (PoFF). TED registers are inserted onto all critical paths of a pipeline (Fig.  1) . A TED latch has the ability to recognize a timing error when data transitions become too slow within combinational logic.
The TED technique is first introduced in [5] and is called transient fault detection (TFD). Using two latches and a comparator, TFD is able detect timing errors known to be the result of local variations and soft errors. Another approach, called Razor [6] , performs TED by use of a modified flipflop while error correction is performed through architectural replay. The energy saved in Razor is optimally traded with overhead due to higher error correction activity. An earlier version of Razor [7] notes that scaling V dd into sub-threshold would be an ideal application for TED. The time-borrowing transition detector (TDTB) in [8] is similar in functionality to Razor except for the circuit layout. When compared to the TEDs above, it was reported that TDTB showed less energy overhead [8] . The concept of TDTB is extended in the current paper by exploring its use in sub-threshold; our design is referred to as TDTBsub throughout this paper.
II. DESIGN OF TDTB-SUB IN 65 NM CMOS

A. TDTB-sub Circuit
TDTBsub, as shown in Fig. 2 , consists of a library positive edge-triggered latch (LATCH), a CLK delay chain, and a transition detector (TD) for determining the position of a transition of data D with respect to CLK. The CLK delay chain uses inverters 3 and 4 to provide a delayed version of CLK. To prevent legitimate D transitions from triggering an ERROR, CLKd should be greater than the maximum CLK-Q delay of LATCH. The TD consists of three main components: a pulse generator (PG), a pull-down network (PDN), and a keeper. The PG uses inverters 0,1,2 and an XOR to generate a short voltage pulse called PULSE when D transitions. The PDN network is synchronized with CLKd in order to pull node K low if D transitions at the same time CLK is high. If node K is driven low, an ERROR results. The transmission gate (TG) is a key component of TDTBsub as explained in Section II-B. D transitions during CLKd high, it is considered a latearriving signal and an ERROR signal is generated. The second transition of D, starting at about 4 ms, indicates that D arrived on time and an ERROR signal is not generated.
B. Sizing and Logic for Sub-threshold
Sizing and logic styles were examined in order to operate TDTBsub in sub-threshold. Sizing is important in subthreshold since the sigma for V T variation due to random doping fluctuations is proportional to (W L) −1/2 . This spread in V T , or local variation, affects the delay, energy consumption, and output swings since current is exponentially dependent on V T in sub-threshold. Logic styles are also affected by variations. Some circuit topologies (e.g. the keeper of Fig.  2 used without a TG) are less robust to variations [4] .
Inverter 0 was required to be large (i.e. W N =2.7μm) to ensure that variations in sub-threshold did not significantly alter the drive strength. The size needs to be large to keep the PULSE size consistent for changing rise/fall times of D. Transistors N1 and N2 were sized larger than W=5.33*W N min, since local V T variation causes stacks of devices exhibit higher variability [4] .
Inverters 0-6 were sized by examining the drive strength and the V M , or switching threshold. For ultra-low voltage operation (i.e. V dd to 50 mV), the minimum V dd operation occurs when the NMOS and PMOS devices have the same current or drive strength [4] . By choosing k=1.5 (i.e. k is the ratio of PMOS size to NMOS) and W N =2.67*W MIN as shown in the sizing metric of [4] , the difference in drive strength is minimal in sub-threshold and V M is near V dd /2.
Device currents have an exponential dependence on V T in sub-threshold. As a result, variations become significant relative to device sizes [4] . For a 300-point Monte-Carlo without TG, P1 is unable to drive K high for the process corner of strong NMOS and weak PMOS as shown in Fig.4 . The stronger (NMOS) of inverter 5 works against the weaker P1 (PMOS) and wins to keep Kl (incorrectly) low at 15 μs. Since the only time to reset K high is during a CLK low, ERROR is locked high indefinitely due to K being low at 15 μs. Increasing the size of P1 is an option but this comes with the cost of increased leakage and unwanted delay to CLKd. The best solution is to close the feedback path to inverter 5 when P1 needs to drive K high by use of a TG. The keepers utilization of the TG provides functionality in sub-threshold (see Section III).
To ensure operation from 0.2 V to 1.2 V and add TED functionality to LATCH, the area of TDTBsub was approximately 10 times larger than LATCH. As a result, the average energy was an average of 12.5 times larger than LATCH for V dd 0.2 V to 1.2 V. Although the average energy of a standard LATCH was increased due to adding TED functionality, system level simulations have shown that significant energy savings can be achieved by placing TED-latches at all critical paths in a pipeline [9] .
C. Leakage
The majority of transistors in Fig. 2 use HVT to reduce leakage, but a closer examination of the keeper is needed in order to choose HVT or LVT. When CLKd is high and D transitions, a PULSE results at N1. Node K should be driven low, thus initiating an ERROR signal. The main goal is to get ERROR generated as fast as possible, which is especially important for the edge case when D transitions at the same time CLK falls.
For ERROR to transition high quickly during a CLKd high and at the same time prevent leakage problems during CLKd low, three steps were taken. First, N1 was set as HVT and N2 as LVT. This resulted in a larger current compared to as if both N1 and N2 were HVT. When CLKd goes low, N1 has HVT and thus prevents leakage from node K. If N1 is a LVT device, TDTBsub will not operate below 0.4 V (Fig. 5) . Every time CLKd goes high, node K is (incorrectly) driven low due to the leakage through N1. During CLKd high, N2 has the ability to be ON and thus N1 should be HVT to prevent leakage. The second step taken was making P1 a HVT device since when it is off, node K may be low and leakage is not desired. Finally, inverter 6 should have NMOS and PMOS that are LVT. This provides a lower delay as K goes low and thus ERROR is generated quicker.
III. SIMULATION RESULTS
Simulations were performed to understand the operating frequency, energy per operation, and process variation effects. To simulate the operating frequency, it was required that CLKd be less than 10% of the period of CLK. Fig. 6 shows the results of the operating frequency simulation for typical process corners. A frequency of 166 MHz was found at V dd =1.2 V while a minimum frequency of 222 Hz resulted at V dd =0.2 V. Below 0.2 V, incorrect operation of TDTBsub results due to increased leakage.
The energy per operation (EPO) was simulated as V dd was swept from 0.2 V to 1.2 V and is displayed in Fig. 7 . At the minimum energy point (MEP) of 0.4 V, the average power consumption for one period of CLK and with an activity factor of α=0.5 is 0.37 nW. In sub-threshold, the delay grows exponentially for each decrease in V dd . As a result, the leakage power is integrated over a longer period. As V dd is decreased below the MEP of 0.4 V, the leakage energy begins to dominate switching energy.
To verify TDTBsub functionality at each V dd (0.2 V to 1.2 V), a 1000-point Monte-Carlo was performed (Fig.8) . The slow (SS), typical (TT), and fast (FF) process corners of TDTBsub were also tested from V dd 0.2 V to 1.2 V. It operated correctly as shown in Fig. 3 for all SS, TT, FF, and 1000-point Monte-Carlo corner variations. It was also confirmed through 1000-point Monte-Carlo simulations that TDTBsub was operational for sub-threshold operation only with TG in place.
As shown in Fig. 8 , variation at the edges of CLKd and the PULSE are especially important at the edge of CLK (low-tohigh or high-to-low). Variations at PULSE and CLK decrease the accuracy of TDTBsub. Fig. 9 is currently in fabrication using a 65 nm CMOS process. It is an adder circuit consisting of input and output shift registers (shiftIN and shiftO) operating at Vddh=1.2 V, TDTBsub latches and an adder all operating at V dd (0.2 V to 1.2 V), and level shifters (levelS). The version of TDTBsub latch within the test circuit contains and older and less energy efficient version of TDTBsub but has similar functionality to the version in this paper. The input data D is fed into the 60 input shift registers and the output is passed to 60 TDTBsub latches. The output of the TDTBsub latches is then added together. The output of the adder is again placed through 60 TDTBsub latches. From the TDTBsub latches, the output is level-shifted to Vddh=1.2 V and given to the output shift-registers.
IV. TEST CIRCUIT A test circuit shown in
V. CONCLUSION
Future ubiquitous embedded systems require extremely lowpower digital logic. Although conventional low-power circuit topologies and design methods are well established, they do not fully apply to emerging low-power applications. This is due primarily to increased process variance from deep submicron effects and energy minimum operation. Scaling supply voltage until failure with Timing Error Detection (TED) can be used to achieve energy minimum operation without the need for oversized design margins. A TED latch has been presented here and is capable of energy minimum operation in the sub-threshold region. To achieve this, transistor sizes were increased and the keeper redesigned. The latch was designed in 65 nm CMOS and has a operating voltage range of 0.2 V through 1.2 V and a minimum energy point of 0.4 V.
