A novel static single-phase clocked (SSPC) dual-edge triggered flip-flop (DET-FF) is proposed to allow energy-efficient operation with aggressive voltage scaling. By employing two static latches with a singlephase clock, contention and clock phase mismatch is avoided, which significantly improves tolerance to PVT variations. The post-layout simulation performed with 28 nm CMOS technology shows that the proposed SSPC DET-FF consumes less power and has significantly better powerperformance trade off (PDP) than prior-art DET-FFs. Our Monte Carlo analysis also showed that its supply voltage can be aggressively scaled down to 0.3 V even with PVT variations.
Introduction
Flip-flop is a basic circuit element used for storing data and synchronizing the data flow in modern digital circuits. Single-edge triggered flip-flop (SET-FF), which is the most widely used type of flip-flop, can latch the data at either the rising or falling edge of the clock. Large numbers of flipflops are used in a microprocessor [1] , hence consume significant amount of power and the power consumption can be as much as 20 to 45% of the total system power consumption [2] . Therefore, reducing flip-flop's power consumption has significant impact on whole system's energy efficiency. Many different types of flip-flops with unique topologies have been studied for achieving better efficiency of flip-flops and processors [3, 4, 5, 6, 7] .
In contrast, dual-edge triggered flip-flop (DET-FF) can latch the data at both the rising and falling clock edges, making them potentially more energy efficient than SET-FF [8] . Many DET-FF topologies have been studied recently. The dual-edge latching operation is implemented either by generating a latch trigger pulse at both clock edges or by utilizing two-phase clocked flip-flops. For example, in [9, 10, 11, 12, 13, 14] , triggering pulses are explicitly generated with a pulse-generating circuit. Since a separate pulse generator and latch are required, the area overhead for these DET-FFs is significant. In [15, 16, 17, 18, 19, 20, 21] , the pulses are implicit, and the latching time window is generated by a combination of the input clock and other internally generated clocks. Such pulse-based implementations require careful timing of the pulse length and internal clock phase differences, making them vulnerable to process, voltage and temperature (PVT) variations and limiting their usage in ultra-low power applications with aggressive dynamic voltage scaling.
Meanwhile, two-phase clocked DET-FFs utilize complementary clock signals (CLK, CLKB) for reliable operation [22, 23, 24, 25, 26, 27] . Since they do not rely on pulses, they are more robust against PVT variation than pulse-based FFs, and, thanks to symmetric structure, the layout can be made compact despite the use of many transistors. However, the two-phase clocked DET-FF implementation in has contention, which requires careful sizing of the transistors for reliable operation. In addition, two-phase clocking schemes can suffer from clock phase misalignment or overlap, which make them susceptible to PVT variations especially under low supply voltage. Single-phase clocking scheme in DET-FF can potentially address this issue. However, prior-art single-phase clocked DET-FF [28, 29] had a few floated nodes in specific conditions, which makes it susceptible to PVT variations in the problematic conditions.
Proposed design
To address these issues, a static single-phase clocked (SSPC) DET-FF is proposed as shown in Fig. 1 . The SSPC DET-FF consists of two complementary and symmetric latches. In these latches, two data-dependent trigger signals (CTP, CTN) are generated by the NAND and NOR gates, respectively, which control the latch's two complementary operations while guaranteeing that all of the nodes in both latches are always in the fully-static state.
The proposed SSPC DET-FF operates as follows: when CLK ¼ 1, the NOR gate's output CTP is 0, and hence the stacked input inverter of the upper latch is activated (DCP ¼ D). CTP cannot be '1' while CLK = '1' due to the NOR gate (and CTN cannot be '0' while CLK = '0' due to the NAND gate). Since CLK = '1', CTN ¼ DCN. Therefore, when DCN ¼ 0, CTN becomes '1', and hence the tri-state inverter in the upper latch is activated to make DCP ¼ SCP. Furthermore, if DCN = '1', then CTN becomes '0' and NMOS in the tri-state inverter is turned off, which could potentially make the SCP node floated. With a PMOS controlled by CTN pulling up the SCP node to V DD , the SCP node remains static.
At the CLK signal's falling transition (1 ! 0), the SCP node's value is forwarded to output node Q by the stacked output inverter. When the CLK is '0', the lower latch performs a complementary operation, guaranteeing that only either of the positive or negative latch's output driving inverter is activated to drive QB. Fig. 2 shows the simulation waveforms of SSPC DET-FF operation. In this figure, all of the nodes are always in the static state, and correct data synchronization at the output can be observed at the rising and falling clock edges.
Simulation results
The proposed DET-FF is implemented in 28 nm bulk CMOS technology for evaluation with post-layout simulations. In addition, for a comparison, 3 other DET-FF designs (SA DET-FF [13] , DECPFF [20] , and FN_C(sym.) FF [27] ) were selected from the literature for their relatively good efficiency and robustness. Among the 6 DET-FFs proposed in [27] , FN_C(sym.) FF was selected because it was the most robust against PVT variations. These DET-FFs are simulated under the following conditions: standard V DD of 0.95 V, 1 GHz input clock frequency, 27°C and nominal process corner. Fig. 3 shows the layout of the implemented DET-FFs. All of the DET-FF designs are properly sized with reasonable margin for reliable operation at room temperature for TT process corner. Pulse-based FFs (SA DET-FF in Fig. 3(a) , DECPFF in Fig. 3(b) ) require proper transistor sizing to generate appropriate latch trigger pulses, while FN_C(sym.) FF in Fig. 3 (c) also requires transistor sizing for stable operation even with contention. Although the proposed SSPC DET-FF in Fig. 3(d) requires 40 transistors, a compact layout slightly bigger than that of SA DET-FF with 22 transistors could be achieved. This is because SSPC DET-FF does not require careful sizing of transistors, thanks to its fully static operation, thereby avoiding the need for any non-minimum length transistors and allowing the use of many minimum-width transistors. For a fair comparison, the post-layout simulations are performed with clock and data signal drivers, which are test set up similar to the one used in [19, 30, 31] . The total power consumption of each DET-FF is modelled as the sum of the flip-flop power and input driver power minus the input driver power due to its intrinsic capacitance. This can be expressed as follows:
The power consumption is compared under various switching activity conditions ( ¼ 10%, 25%, 50%, 100%) to observe its data dependence. The delay and setup/hold times are also measured to compare their dynamic performance. Fig. 4(a) presents the power consumption of DET-FFs with various switching activity. Proposed SSPC DET-FF consumes significantly lower power than DECP FF and SA DET-FF, and comparable power with FN_C FF. Note that power increase for higher switching activity (α) is the lowest with the SSPC DET-FF. Fig. 4(b) shows Clock to Q delay (T cq ) with voltage scaling where V DD is swept every 0.05 V down to 0.75 V. Thanks to the simple and fullystatic logic gate structure, the proposed SSPC DET-FF shows the lowest T cq . Although power consumption is not the best in low activity ratio (α), its power-performance trade-off presented by PDP is excellent due to fast operation. Table I summarizes the simulated performance of the implemented DET-FFs. The proposed SSPC DET-FF has the fastest CLK to Q delay and the second smallest area despite the largest transistor count. It also exhibits the second lowest power consumption when operated at standard logic voltage (0.95 V). The FN_C(sym.) FF [27] shows the lowest power consumption. However, due to its long CLK to Q delay, similar or better energy efficiency vs. performance trade off (represented with power-delayproduct (PDP)) is achieved by the proposed SSPC DET-FF, especially with a high activity ratio (α). Although FN_C(sym.) FF shows lower power consumption with 0.95 V supply voltage and a realistic activity ratio (10-25%), the voltage scalability is limited to 0.60 V, whereas the proposed SSPC DET-FF can operate with 0.30 V over a temperature range of −40°C to 120°C. This implies that higher energy efficiency can be potentially achieved with the SSPC DET-FF when the supply voltage is aggressively scaled. More detailed simulation results for such voltage scalability is provided in Fig. 5 . To confirm the design's stable operation with PVT variation, a Monte Carlo analysis is performed for V DD of 0.2 V-0.95 V with 50 mV resolution and temperatures ranging from −40°C to 120°C with 10°C resolution. For each condition, 10k Monte Carlo simulation is performed and any functional failure with the 1 kHz input clock frequency is recorded as 'Fail' (orange) in the shmoo plots in Fig. 5 . Among the 4 DET-FFs compared, SA DET-FF [13] is the most prone to process variation due to contention during the explicit pulse generation. Therefore, with process variation, it was not functional with a standard V DD of 0.95 V. To make it robust against process variation, the pulse generator in SA DET-FF requires larger transistors, increasing power consumption and requiring a larger area. For similar reasons, another pulse-triggered DET-FF, DECPFF [20] , was susceptible to process variation, resulting in a high minimum operation voltage (V min ). The twophase clocked DET-FF, FN_C(sym.) FF [27] could reduce V min by avoiding variation-sensitive pulse-generating circuits. However, due to contention during circuit's internal data toggling and the potential misalignment of the twophase clocks, V min was still around 0.6 V. With the proposed SSPC DET-FF, V min could be brought down to 0.3 V thanks to the single-phase clocking scheme. Such a low V min will allow aggressive voltage scaling for energyefficient operation with energy-constrained applications.
Conclusion
A fully-static and single-phase clocked (SSPC) DET-FF is proposed to enable stable operation across PVT variation and energy-efficient operation with aggressive voltage scaling. Compared to the prior-art DET-FFs, the SSPC DET-FF has lower power consumption and excellent power-performance trade off (PDP). Our Monte Carlo analysis shows that it can reliably operate with PVT variation and a minimum operation voltage as low as 0.3 V. We believe the proposed SSPC DET-FF is an excellent sequential logic element for systems with stringent energy budgets where aggressive voltage scaling is required.
