Abstract-This paper presents an improved StrongARM latch comparator, designed and simulated in 90nm and 32nm CMOS technologies. The proposed design provides an improvement of 7% in energy efficiency, 14% in speed and an average reduction of 41% in the clock feedthrough, compared to the conventional design. The new architecture also minimizes the area by reducing the required transistors needed for the enhanced performance.
INTRODUCTION
StrongARM latch comparator is a well-known topology. It has some important features that makes it unique, such as 1) it does not consume static power, 2) it produces rail-to-rail output, 3) it has small input referred offset and 4) it has high input impedance [1, 2] . These favorable features paved the way for the latch to be widely used as a sense amplifier, a comparator or a robust latch [1] . As a consequence, it is common to find the StrongARM latch in analog-to-digital converters (ADCs) [2] and Flip-Flops circuits [3] . With the increasing interest in wearable electronics, Internet-of-Things (IoT) and low power applications, the need for small, fast and power-efficient electronics is always present.
In this paper, we propose an improved design for the StrongARM latch comparator that has smaller area, faster performance and enhanced power efficiency.
II. CONVENTIONAL TOPOLOGY
The original StrongARM latch was first introduced by Kobayashi et al in 1993 , shown in Fig 1a [4] . Since then, some modifications have been implemented to improve the robustness of the circuit [5, 6] . In return, the size, speed and efficiency of the latch were compromised. Fig 1b shows a schematic of the conventional StrongARM latch proposed by [5] . The latch consists of 11 transistors: charging transistors (CT1, CT2, CT3, and CT4), cross-coupled transistors (T1, T2, T3 and T4), input transistors (T5 and T6) and one tail current transistor (T7).
The operation of the latch consists of three phases: Reset, Amplification, and Regeneration as illustrated in Fig 2. Reset phase starts when the CLK goes Low, and consequently the internal capacitors at the nodes A, A', B and B' to V DD through the CTs. Amplification phase starts when the CLK goes High, turning all CTs OFF and allowing the capacitors to discharge through T7. T5 and T6 are biased by a constant common mode [4] , and (b) conventional design proposed by [5] 978-1-5090-6508-0/17/$31.00 ©2017 IEEE PRIME 2017, Giardini Naxos-Taormina, Italy Digital Circuits and Sub-Systems voltage (V CM ), hence, these transistors are always ON. When a small differential voltage (V diff ) is present between the gates of T5 and T6, it causes a slight difference between the current flowing through these two transistors. Consequently, the capacitors at nodes B and B' are discharged at slightly different speeds, therefore, the voltages at these nodes drop at different rates. T3 and T4 turn ON when the voltages at nodes B and B' reach (V DD -V thn ). After that, the voltages at nodes A and A' start to drop at different rates. Regeneration phase starts when the voltage at either A or A' drops to (V DD -V thp ) turning either T1 or T2 ON, and the other transistor remains OFF due to the crosscoupled configuration. As a result, the final voltages reach V DD in either node (A or A') and zero volts for the other node, depending on the polarity of V diff . The output is taken from nodes A and A', and fed into inverters [1] .
One of the limitations in this topology is the clock feedthrough problem. The voltages at A and A' follow the clock for a short period, resulting in spike >V DD when the clock goes High, and another spike <0 volts when the clock goes Low. These spikes make the amplification and reset phases slower and as a result, the overall delay of the circuit is increased. The clock feedthrough effect is due to the gate-source (or gate-drain) coupling, through the internal capacitance. There are many wellknown techniques to reduce the clock feedthrough problem such as connecting capacitors/transistors at the gate of the charging transistors [7, 8] , or replacing the charging transistors with transmission gates [9] . However, by adding any transistor (or capacitor), the total capacitance in the circuit is increased, and hence, the speed of the latch is decreased.
III. PROPOSED DESIGN
In general, to improve the performance of the circuit, it is desired to increase the current. This is achieved by increasing the width of the transistors. However, increasing the width will also increase the total capacitance in the circuit, hence, the net performance will remain the same. In other words, to improve the performance of the latch, the total capacitance in the circuit should be reduced without decreasing the current, or vice versa.
The proposed design, which consists of 9 transistors, is shown in Fig 3. The key advantage of this design is reducing the total internal capacitance in the circuit without compromising the current. This is achieved by placing the input transistors in the middle between the cross-coupled transistors. Since the input transistors are always ON, the need for the CT3 and CT4 is eliminated and nodes B and B' are recharged through the transistors T5 and T6. Therefore, the performance and the energy efficiency of the latch are improved, while the area and the clock feedthrough problem are reduced.
IV. RESULTS AND DISCUSSION
The proposed and conventional latches are simulated in the 90nm technology. Fig. 4 shows the Reset, Amplification and Regeneration phases when V diff equals 1mV for both the proposed and the conventional latches. The figure shows that the proposed design is capable to produce a rail to rail output in a shorter time. Also, the clock feedthrough problem is reduced in both the Amplification and Reset phases. It is also noticeable that the Reset phase in the proposed design takes more time to recharge the circuit. This is theoretically expected as the proposed design utilizes only two transistors (CT1 and CT2) for charging the nodes A, A', B and B', compared to four transistors in the conventional design. Nevertheless, the impact of the slower Reset phase is not critical since it is much faster than the Amplification and Regeneration phases. Paper P63 PRIME 2017, Giardini Naxos-Taormina, Italy
In general, V CM has a significant impact on the performance and power consumption of the latches. Fig 5 shows the Energy Delay Product (EDP) versus V CM when V diff is fixed at 1mV. Generally, the EDP performance for the proposed latch is better compared to the conventional latch. The optimum EDP is achieved when V CM is equal to 0.7V and 0.71V for the conventional and the proposed latches, respectively. The figure also shows that the EDP performance for the new design degrades by lowering V CM . The performance for both latches at low V CM are controlled by the active transistors in the Amplification phase. The active transistors are T5, T6 and T7 for the conventional design, and T3, T4, T5, T6 and T7 for the proposed design. Since the charging/discharging speed depends on current, the performance of the circuit is improved when all the active transistors are in saturation, and degraded when any transistor is operating in the linear region ( "#$%&#$'() > +'),#& ). In other words, the voltages at B and B' are reduced when V CM is lowered. By lowering the voltages at B and B', T3 and T4 are forced to operate in the linear region. As a result, the total current passing through the latch is reduced and the performance is degraded. The conventional design, however, is less likely to suffer from this problem, because the input transistors (T5 and T6) are directly connected to T7. Following the same analogy, as V CM increases to high values (higher than the optimum), T5 and T6 enter the linear region. Consequently, the EDP performance is also worsened. Fig. 6 shows the speed and energy comparison for both designs, using their optimum V CM and varying V diff from 1mV to 100mV. The results show that the proposed design offers a faster performance while consuming less energy. The level of improvement is inversely related to the differential voltage. For example, the proposed design has 14% and 8% speed improvement, and 7% and 3% reduction in energy consumption when the differential voltage is 1mV and 100mV, respectively.
The operation of both designs is also simulated in the 32nm technology to validate the behavior of the proposed design. The simulation results show that the optimum common mode voltage is around 0.84V for both designs. Again, the level of improvement is inversely related to the differential voltage. The proposed design in the 32nm technology has 9% and 7% speed improvement, and 15% and 10% reduction in energy consumption when the differential voltage is 1mV and 100mV, respectively. The level of improvement is summarized in Table 1 .
V. OFFSET AND CIRCUIT MISMATCH
In practice, the fabrication process can cause some variation in the threshold voltage V th and β, where β = μ C ox W/L. This variation can cause an input referred offset in the circuit and limit the minimum V diff value. Nevertheless, standard offset cancellation techniques can be used to reduce the minimum V diff to about 1mV. For example, additional capacitors can be used at the critical nodes to counter the impact of the process variation [1] or other dynamic offset cancellation techniques that control the output load, clock skew or threshold voltage can be utilized [10] [11] [12] .
The critical transistors in both the conventional and the proposed designs are T5 and T6. The mismatch in these transistors can significantly impact the minimum detectable V diff . To mimic this variation, a Monto Carlo analysis for the mismatch is implemented to verify the sensitivity of the proposed latch to the process variation. Fig 8 shows a histogram plot of the output delay in 90nm technology when the threshold voltage and oxide thickness are randomly varied by a mean of ±5% and a standard deviation of 3σ for the critical transistors T5 and T6. The common mode voltage is set to the optimal (V CM = 0.71V). Since no offset cancellation techniques are used, the differential voltage is set to 60mV. The Monto Carlo results for the proposed design has a mean of 133.6ps with a standard deviation of 5.12ps, compared to a mean of 141.3ps and a standard deviation of 5.43ps for the conventional design. In general, the proposed design offers faster performance with more confined standard deviation. PRIME 2017, Giardini Naxos-Taormina, Italy Digital Circuits and Sub-Systems Speed improvement at V diff = 1mV 9% 14%
Speed improvement at V diff = 100mV 7% 8%
Energy improvement at V diff = 1mV 15% 7%
Energy improvement at V diff = 100mV 10% 3%
Average reduction in clock feedthrough 56% 41%
VI. CONCLUSION
In this paper, an improved StrongARM latch comparator topology is presented. In the proposed design, by placing the input transistors between the cross-coupled transistors, the total internal capacitance is reduced without compromising the current. HSPICE simulations and evaluation of our design in 90nm and 32nm CMOS technologies, provided by Synopsys, show that the proposed topology can deliver superior performance, improved energy efficiency and lower clock feedthrough compared to the conventional solutions.
