This paper proposes a 16 bit subthreshold adder design using bootstrapped sense amplifier-based pass transistor logic (bootstrapped SAPTL) to overcome serious performance degradation and enhance the immunity to process variations in the subthreshold region. Through employing a bootstrapped sense amplifier including a voltage boosting part and adopting an adder architecture based on bootstrapped SAPTL, significant improvements in performance and energy efficiency can be achieved. A case study of 16 bit adders in SMIC 130 nm technology demonstrated that the proposed adder outperformed other works in terms of performance, energy consumption, and energy efficiency. Furthermore, the statistical results of the Monte Carlo analysis proved the proposed adder's significant enhancement of robustness against process and temperature variations. At 0.3 V (TT corner, 25 • C), the proposed 16 bit adder achieved improvements of 72% in performance and 8% in energy savings, as well as a 74% reduction in energy-delay production as compared with the current design.
Introduction
The subthreshold technique is becoming increasingly popular with the fast growth in the market for ultra-low-power applications, such as Internet of Things (IoT), wearable devices, and biomedical implants [1] [2] [3] . The working principle is to scale the supply voltage below the threshold voltage, and the operation current in this region [4] is given as:
This equation indicates that I sub has an exponential relationship with V gs and V th . Therefore, the operation current sharply decreases with a descending supply voltage [5, 6] and is sensitive to process variations [7] [8] [9] , which limits the practical usage of subthreshold circuits.
The adder circuitry, as a frequently used arithmetic unit in many ultra-low-power applications [10, 11] , also suffers from these problems in the subthreshold region. Numerous studies have been conducted on subthreshold adder design. One method is to adopt new devices like FinFET [12] and FD-SOI [13] , which are still immature. Another method is to modify the basic structure of the full adder cell [14] [15] [16] [17] . Pass transistors and transmission gates are usually used in this method for a smaller area and lower power, which has a finite effect on improving the performance. In addition, using fast adder architecture optimized at the logic level is also a viable method to enhance the circuit speed [18, 19] , such as carry look-ahead adder (CLA) design [15, 18, 20] . But these works mainly focus on the structure design of gate-level circuits, like an XOR gate, which still have difficulty in achieving an acceptable level of performance in the subthreshold region. Additional circuits for a CLA also bring about a large area and energy overhead. Furthermore, effective solutions for reducing the effect of process variations are often lacking in subthreshold adder designs.
Sense amplifier-based pass transistor logic (SAPTL) [21] can be introduced here for simultaneous optimization of performance and energy dissipation [22] . It is a logic style which performs logical operations through a pass transistor network (PTN) without DC leakage paths and resumes the signals by employing a sense amplifier (SA). The differential amplification mechanism and the internal cross-coupled latch contribute to a stronger robustness against variations. However, there still exist two main drawbacks when applying SAPTL to subthreshold adder design: 1) the performance of the SA degrades severely because its input differential signals are generated by a PTN and are not full swing; and 2) there is a lack of an area-saving method to use SAPTL in the adder circuit. In Reference [23] , a SAPTL circuit was directly used to replace full adders, which needed many SAs and resulted in large overhead. In Reference [24] , SAPTL was adopted to generate all data outputs of an 8 bit adder and each output corresponded to a PTN stack and an SA, leading to a large area. In Reference [25] , SAPTL was used for computing each carry signal in the 4 bit CLA block, but it was still too complicated and needed eight SAs. Moreover, no efforts were made to improve the performance of the SAs for subthreshold operation in these works.
To overcome the above problems of SAPTL-based subthreshold adders, this work proposes a subthreshold adder design based on a bootstrapped SAPTL circuit. The contributions of this paper are listed as follows: 1) we present a bootstrapped sense amplifier to improve the performance of a subthreshold SAPTL circuit, with six additional transistors and a PMOS capacitor; and 2) we introduce an adder architecture that adopts bootstrapped SAPTL in each 4 bit adder block to accelerate the carry chain with low overhead.
The remainder of this paper is organized as follows. Section 2 introduces the proposed bootstrapped sense amplifier circuit and the adder architecture design using SAPTL. Section 3 describes the experiment and results. Section 4 concludes this paper.
Proposed Subthreshold Adder Design

Bootstrapped Sense Amplifier
As shown in Figure 1 , a bootstrapped sense amplifier (BSA) was partitioned into two parts: the basic SAPTL sense amplifier part and the voltage boosting part. This BSA can boost V gs of the input transistors dynamically, thus increasing the operation current and improving the performance. We take the condition that > ̅ , for example, and the delay of the basic sense amplifier in the evaluation mode [26] can be expressed as:
where and are fitting parameters, is the capacitor at node A, which is fixed in general, and is the operation current of MN1. Compared to the first term, the second term can be negligible in the subthreshold region [27] . Therefore, the speed of the SA principally depends on . However, since is the output of PTN and is normally much lower than , is very weak and extremely limits the speed of SA. The voltage boosting part was only applied here to boost the of MN1 and, thus, enlarge to enhance the speed. Figure 2 presents the simulation waveforms of the BSA circuit at 0.3 V (TT corner, 25 °C). As can be seen, the BSA has two operation phases:
1. EN = 0. BSA is in the pre-charge mode. = = 1 and, thus, the outputs = =0. FB = 1. MN3 is on and MN4 is off, making = 1 and = 0. A voltage equal to is thus applied across . 2. EN = 1. BSA is in the evaluation mode. Before the differential inputs become valid, = 0 and MN3 turns off, then the capacitive coupling through will force We take the condition that V S > V S , for example, and the delay of the basic sense amplifier in the evaluation mode [26] can be expressed as:
where k 1 and k 2 are fitting parameters, C A is the capacitor at node A, which is fixed in general, and I SA is the operation current of MN1. Compared to the first term, the second term can be negligible in the subthreshold region [27] . Therefore, the speed of the SA principally depends on I SA . However, since V S is the output of PTN and is normally much lower than V dd , I SA is very weak and extremely limits the speed of SA. The voltage boosting part was only applied here to boost the V gs of MN1 and, thus, enlarge I SA to enhance the speed. Figure 2 presents the simulation waveforms of the BSA circuit at 0.3 V (TT corner, 25 • C). We take the condition that > ̅ , for example, and the delay of the basic sense amplifier in the evaluation mode [26] can be expressed as:
1. EN = 0. BSA is in the pre-charge mode. = = 1 and, thus, the outputs = =0. FB = 1. MN3 is on and MN4 is off, making = 1 and = 0. A voltage equal to is thus applied across . 2. EN = 1. BSA is in the evaluation mode. Before the differential inputs become valid, = 0 and MN3 turns off, then the capacitive coupling through will force As can be seen, the BSA has two operation phases:
MN3 is on and MN4 is off, making V NN = 1 and V NB = 0. A voltage equal to V dd is thus applied across C BN .
2.
EN = 1. BSA is in the evaluation mode. Before the differential inputs become valid, V NN = 0 and MN3 turns off, then the capacitive coupling through C BN will force V NB to be boosted below 0. After the input signals become valid, V gs of MN1 is thus enlarged effectively and makes MN1 faster. At the same time, V bs of MN1 is also increased and, hence, V th of MN1 is reduced slightly by forward body biasing, accelerating MN1 further. Either V A or V B will be pulled down below 0 to generate two inverse outputs through the cross-coupled latch. Then FB will turn to 0 and turn on MN4 to reset the value of V NB to 0. This procedure is to prevent the undesired leakage of MN2 from pulling V B down because V gs of MN2 is larger than 0 after the evaluation is finished if V NB is not reset.
The boosting part brings about extra energy overhead, while it can be offset by lower leakage energy due to the smaller delay. C BN is realized using a PMOS capacitor, with its drain, source, and body terminals connected together. The minimum value of V NB depends on C BN . A larger capacitor produces smaller V NB , with stronger speed enhancement and higher energy consumption. A too-small capacitor cannot offer enough performance improvement. The value of C BN should be set through simulation to provide a minimum energy-delay product. In this work, the PMOS capacitor was sized W/L = 1.1 µm/1.1 µm to provide a C BN of approximately 10 fF. Figure 3 illustrates the adder architecture based on bootstrapped SAPTL. The 4 bit adder block can be divided into the carry chain part, realized by bootstrapped SAPTL, and the summation part using serial full adders. This architecture can take full advantage of the bootstrapped SAPTL technique to improve the performance, while minimizing the extra overhead. to be boosted below 0. After the input signals become valid, of MN1 is thus enlarged effectively and makes MN1 faster. At the same time, of MN1 is also increased and, hence, of MN1 is reduced slightly by forward body biasing, accelerating MN1 further. Either or will be pulled down below 0 to generate two inverse outputs through the cross-coupled latch. Then FB will turn to 0 and turn on MN4 to reset the value of to 0. This procedure is to prevent the undesired leakage of MN2 from pulling down because of MN2 is larger than 0 after the evaluation is finished if is not reset. The boosting part brings about extra energy overhead, while it can be offset by lower leakage energy due to the smaller delay.
The Adder Architecture Based on Bootstrapped-SAPTL
is realized using a PMOS capacitor, with its drain, source, and body terminals connected together. The minimum value of depends on . A larger capacitor produces smaller , with stronger speed enhancement and higher energy consumption. A toosmall capacitor cannot offer enough performance improvement. The value of should be set through simulation to provide a minimum energy-delay product. In this work, the PMOS capacitor was sized W/L = 1.1 μm/1.1 μm to provide a of approximately 10 fF. Figure 3 illustrates the adder architecture based on bootstrapped SAPTL. The 4 bit adder block can be divided into the carry chain part, realized by bootstrapped SAPTL, and the summation part using serial full adders. This architecture can take full advantage of the bootstrapped SAPTL technique to improve the performance, while minimizing the extra overhead. The timing diagram of the bootstrapped SAPTL circuit is illustrated in Figure 5 . The delay of the carry chain part is as follows:
where is the rising delay of the stack driver and is the delay for the PTN stack to develop a valid differential voltage.
can be expressed by Equation (2). was largely decreased and, thus, is not conspicuous in Figure 5 . In addition to using BSA to improve the performance, the stack driver adopts low-threshold PMOS transistors (plvt) to accelerate the pull-up operation to reduce and the PTN stack adopts low-threshold NMOS transistors (nlvt) [28] with minimum size to increase the on/off current ratio to reduce . Using plvt will lead to a certain The timing diagram of the bootstrapped SAPTL circuit is illustrated in Figure 5 . The timing diagram of the bootstrapped SAPTL circuit is illustrated in Figure 5 . The delay of the carry chain part is as follows:
can be expressed by Equation (2). was largely decreased and, thus, is not conspicuous in Figure 5 . In addition to using BSA to improve the performance, the stack driver adopts low-threshold PMOS transistors (plvt) to accelerate the pull-up operation to reduce and the PTN stack adopts low-threshold NMOS transistors (nlvt) [28] with minimum size to increase the on/off current ratio to reduce . Using plvt will lead to a certain The delay of the carry chain part is as follows:
where D Driver is the rising delay of the stack driver and D PTN is the delay for the PTN stack to develop a valid differential voltage. D BSA can be expressed by Equation (2). D BSA was largely decreased and, thus, is not conspicuous in Figure 5 . In addition to using BSA to improve the performance, the stack driver adopts low-threshold PMOS transistors (plvt) to accelerate the pull-up operation to reduce D Driver and the PTN stack adopts low-threshold NMOS transistors (nlvt) [28] with minimum size to increase the on/off current ratio to reduce D PTN . Using plvt will lead to a certain increase in energy consumption but can decrease the delay effectively, while the usage of nlvt in the PTN can reduce the delay with almost no extra energy. As for the summation part, its speed is unrelated to the carry chain part and the delay of the summation part (D sum ) is always larger than D carry . Figure 6 illustrates the D carry and D sum in the worst case with supply voltage scaling in SMIC 130 nm technology. As for the summation part, its speed is unrelated to the carry chain part and the delay of the summation part ( ) is always larger than . Figure 6 illustrates the and in the worst case with supply voltage scaling in SMIC 130 nm technology. It can be observed that the ratio of to ranges from 41% to 50%. In fact, the overall worst-case delay of a multi-bit adder composed of n 4 bit bootstrapped SAPTL adder blocks should be expressed as follows:
Since is the main component in , the summation part can adopt a modified full adder structure with relatively slower speed to reduce power dissipation. The structures of the full adder and XOR gate in Reference [15] were adopted in this work.
For a ripple-carry adder composed of n serial 4 bit adder blocks, its delay was approximately equal to • and, thus, the ratio of reduction in the delay through the use of the proposed 4 bit bootstrapped SAPTL adder blocks is as follows:
It is evident that r goes up with increasing n. Therefore, it is suggested to apply the proposed adder architecture to adder circuits with higher bits to enhance the speed. A 16 bit bootstrapped SAPTL adder with n = 4 can offer an r of 37.50-44.25%, while a 32 bit bootstrapped SAPTL adder increases r to 43.75-51.63%.
Experiment Results
The case of 16 bit adders in SMIC 130 nm technology was studied to evaluate the proposed adder design. This adder consisted of four serial 4 bit bootstrapped SAPTL adder blocks, as depicted in Figure 7a . The worst-case critical path was plotted using a red line. The input signals are set with a 1 ns rising/falling time. Figure 7b shows the transient simulated waveform at 0.3 V (TT corner, 25 °C). It can be observed that the ratio of D carry to D sum ranges from 41% to 50%. In fact, the overall worst-case delay of a multi-bit adder composed of n 4 bit bootstrapped SAPTL adder blocks should be expressed as follows:
Since D carry is the main component in D worst−case , the summation part can adopt a modified full adder structure with relatively slower speed to reduce power dissipation. The structures of the full adder and XOR gate in Reference [15] were adopted in this work.
For a ripple-carry adder composed of n serial 4 bit adder blocks, its delay was approximately equal to n·D sum and, thus, the ratio of reduction in the delay through the use of the proposed 4 bit bootstrapped SAPTL adder blocks is as follows:
The case of 16 bit adders in SMIC 130 nm technology was studied to evaluate the proposed adder design. This adder consisted of four serial 4 bit bootstrapped SAPTL adder blocks, as depicted in Figure 7a . The worst-case critical path was plotted using a red line. The input signals are set with a 1 ns rising/falling time. Figure 7b shows the transient simulated waveform at 0.3 V (TT corner, 25 • C). Based on the sizing method in Reference [29] , the proposed adder was sized through simulations in SMIC 130 nm technology. The transistor parameter settings of the proposed adder are shown in Table 1 . All the NMOS transistors used in this work were sized with the minimum channel width and length (W/L = 150 nm/130 nm) to utilize the reverse narrow channel effect to offer a relatively lower threshold voltage. Based on the sizing method in Reference [29] , the proposed adder was sized through simulations in SMIC 130 nm technology. The transistor parameter settings of the proposed adder are shown in Table 1 . All the NMOS transistors used in this work were sized with the minimum channel width and length (W/L = 150 nm/130 nm) to utilize the reverse narrow channel effect to offer a relatively lower threshold voltage. 
Comparison with Other Works
For comparison, a 16 bit modified CLA (MCLA) [15] , a 16 bit adder based on asynchronous SAPTL (ADSA-SAPTL ADD) [24] , a 16 bit SAPTL CLA [25] , and a conventional 16 bit CLA [29] were also simulated using HSPICE in SMIC 130 nm technology. All these adders were sized based on the sizing scheme in Reference [29] . The V th of SMIC 130 nm technology ranges from 365 mV to 450 mV with the channel width changing when L = 130 nm. Hence, we selected a V dd = 0.3V to compare these 16 bit adders' performance parameters, as presented in Table 2 . The bold-faced numbers in the round brackets represent the normalized ratio.
It can be observed that:
1.
The proposed adder used 38% fewer transistors than ADSA-SAPTL ADD [24] , 48% fewer than SAPTL CLA [25] , and 35% fewer than the conventional CLA [30] , which manifested the proposed architecture's advantage on the area. But the proposed adder used more transistors than Reference [15] , which meant extra area overhead.
2.
The bootstrapped SAPTL adder provided the best performance, 72% faster than the MCLA [15] . The maximum operating frequency of the proposed adder at 0.3 V (TT corner, 25 • C) achieved 5.5 MHz.
3.
From the perspective of energy consumption, the bootstrapped SAPTL adder cost the lowest energy, 8.1% lower than the MCLA despite a larger area overhead.
4.
As for energy efficiency, the proposed adder had a dominant advantage. The EDP of the bootstrapped SAPTL adder was 3.62 fJ/MHz, 74.1% lower than the MCLA [15] , 88.2% lower than the ADSA-SAPTL ADD [24] , 92.6% lower than the SAPTL CLA [25] , and 87.4% lower than the conventional CLA [30] . As can be seen, the proposed adder had the fastest speed and its delay was one magnitude lower than other works in the subthreshold region. The enormous advantage over the other two SAPTL adders manifests the bootstrapped SA's enhancement of speed.
Worst-Case Delay versus Supply Voltage
Energy Consumption versus Supply Voltage
The energy consumptions for the worst case with supply voltage scaling are presented in Figure 9 . Owing to the fewer SAs used in the proposed adder architecture, the energy overhead of the bootstrapped SAPTL adder was far below the other two SAPTL adders. Furthermore, the proposed adder always consumed less energy than the MCLA when V dd ≥ 0.21 V. The minimum energy point (MEP) of the 16 bit bootstrapped SAPTL adder was 0.2 V. The corresponding minimum energy consumption was 13.71 fJ, 75.5% lower than ADSA-SAPTL's 59.99 fJ at MEP = 0.3 V, 79.1% lower than SAPTL CLA's 65.51 fJ at MEP = 0.25 V, and 66.5% lower than the conventional CLA's 40.87 fJ at MEP = 0.2 V and was very close to MCLA's 13.29 fJ at MEP = 0.18 V.
Energy-Delay Production versus Supply Voltage
The energy-delay production (EDP) is an important metric to evaluate the energy efficiency of a circuit [31] . Figure 10 shows the EDP of these 16 bit adders with supply voltage scaling. It can be seen that the proposed adder had the lowest EDP in the subthreshold region, owing to better performance and lower energy overhead. The result implies that the proposed adder was the most energy-efficient among these adders.
Simulation Results of Process and Temperature Variations
Through running 1000 times Monte Carlo simulations using HSPICE, the statistical distributions of the adders' delays at 0.3 V were obtained and are shown in Figure 11 . As can be seen from the graph, the distribution curve of the delay of the proposed 16 bit adder was obviously the most centralized with a relatively lowest ratio of standard deviation to average value (3σ/µ). Moreover, Figure 12 presents the worst-case delay of different adders at all process corners.
The results indicate that the proposed adder exhibited smaller performance fluctuations with the process corner varying. Figure 13 depicts the changing curves of the delay with temperature variations. As can be seen, the proposed adder also had an evident advantage in the aspect of temperature sensibility. Consequently, it can demonstrate that the proposed 16 bit adder had a lower sensitivity to process and temperature variations than the other adders.
On all accounts, the simulation results indicate that the proposed subthreshold adder based on a bootstrapped SAPTL circuit can improve performance effectively. Meanwhile, the energy consumption was reduced and, thus, the energy efficiency was improved significantly. The bootstrapped SAPTL adder also provided stronger immunity to process and temperature variations than other works. Figure 14 shows the layout of the 16 bit bootstrapped SAPTL adder using SMIC 130 nm technology. The red pane is the carry chain part and the summation part of a 4 bit bootstrapped SAPTL adder block, respectively. The area was 95 µm × 21.5 µm. The worst-delay at 0.3 V (TT corner, 25 • C) obtained from the post-layout simulation was 230.17 ns and the energy consumption was 27.36 fJ. It is noticeable that the circuit's performance was severely influenced due to the parasitic parameters in the subthreshold region. The layout still needs to be carefully optimized in future work. 
Post-Layout Simulation Result
Conclusions
In this paper, a subthreshold adder circuit which employed a bootstrapped sense amplifier in SAPTL and adopted a bootstrapped SAPTL-based architecture in the carry chain was proposed. The BSA enhanced the speed of the SAPTL effectively by enlarging the gate-source voltage. The proposed adder architecture was capable of exploiting the bootstrapped SAPTL's advantage on performance in multi-bit adder circuits without using too many sense amplifiers. A 16 bit adder was studied in SMIC 130 nm technology. The comparison results demonstrated that the proposed 16 bit adder outperformed other works in terms of performance, energy consumption, energy efficiency, and the sensitivity to process and temperature variations in the subthreshold region. At 0.3 V, the proposed adder revealed a 72% less worst-case delay with 8% lower energy and 74% lower EDP as compared with an MCLA. However, there is still room for further optimization of the area and performance of the post-layout design. Furthermore, this work was only performed in SMIC 130 nm technology. It still needs to be applied to more advanced technology nodes to verify the advantages of the proposed design in future work. 
