Abstract-This paper compares four previously published static dual-edge-triggered flip-flops (DETFFs) with a proposed design for their performance, power dissipation, and low-voltage low-power applications. For each DETFF, the optimal delay, power consumption, and power-delay product are determined as the primary figures of merit. The proposed design is shown to have the least energy at low voltages.
I. INTRODUCTION
CMOS has been the dominant technology for VLSI implementations. As VLSI circuits continue to grow and technologies evolve, the level of integration is increased and higher clock speeds are achieved. Higher clock speeds, increased levels of integration and technology scaling are causing unabated increases in power consumption. As a result, low power consumption is becoming a critical issue for modern VLSI circuits. Furthermore, power dissipation, dynamic and static, has become a limiting factor for transistor performance, long term device reliability, and increasing integration [1] . Moreover, as we aggressively scale devices toward deep-submicron technologies, scaling paths for high performance and low power applications diverge [2] . For battery operated systems, low power dissipation requirements are well understood and followed. Whereas, for high performance ICs, reducing the delay has been the main objective, and power containment was secondary. However, recent research shows that power containment for high performance applications is becoming critical for reliability, transistor performance, and cooling considerations [1] .
One of the significant components of the dynamic power consumption is the clock related power. The total clock related power dissipation in synchronous VLSI circuits is further divided into three major components [3] : i) power dissipation in the clock network; ii) power dissipation in the clock buffers; and iii) power dissipation in the flip-flops. The total power dissipation of the clock network depends on both the clock frequency and the data rate, and can be computed as follows: C f f;clk capacitance of the clock path seen by the flip-flop; C ;data capacitance of the data path seen by the flip-flop.
From (1) , it is obvious that the clock power can be reduced if any of the parameters on the right-hand side is reduced. is already the trend of contemporary design, and it has the strongest impact on the P clk expression. By reducing the overall capacitance of the clock network C clk , the power dissipation may be reduced. For instance, the capacitance can be reduced by proper design of clock drivers and buffers. Similarly, by reducing the capacitance inside a flip-flop, C ;clk and C ;data , power may also be reduced. Furthermore, the clock power dissipation is linearly dependent on the clock frequency. Although the clock frequency is determined by the system specifications, the usage of dual-edge-triggered flip-flops (DETFFs) can reduce the clock frequency to half of its original value for the same data throughput. As a result, power consumption is reduced, making DETFFs desirable for low power applications. Even for high-performance applications, the usage of DETFFs offer certain benefits. Since the clock speed is reduced by a factor of two, one does not need to propagate a relatively high speed clock signal. Although many DETFFs have been proposed, their use is still uncommon. There are several reasons why DETFFs are not popular in VLSI circuits. In DETFFs, latches are connected in parallel, which increase the input capacitance. Therefore, the setup and hold times of DETFFs are typically larger compared to that of conventional flip-flops [4] . Thus, DETFFs become less attractive for high-performance applications. DETFFs also pay a penalty in the design area [4] , [5] . The larger number of transistors and increased interconnects make the footprint of a DETFF much larger than that of a conventional flip-flop. This increases the parasitic capacitances, which decreases the performance of DETFF. In addition, a DETFF captures data on both clock edges, therefore, a duty cycle of 50% is required. Deviation from a 50% duty cycle may lead to timing failures in the critical paths. As such, the specification on jitter tolerance is more stringent, which increases the design complexity of the system phase lock loop.
To date, a systematic comparison of DETFFs, targeting both performance and power dissipation, has not been reported. This article is focused on the applicability of DETFFs in low-power and low-voltage applications. Section II states the analysis methodology used in this paper. Section III describes all the DETFFs investigated in this paper, including a newly proposed DETFF. Section IV outlines the simulation testbench and parameters. In addition, the DETFF optimization procedure is also explained in this section. Simulation results are reported in Section V. Finally, the discussion and conclusions are drawn in Section VI.
II. ANALYSIS
Several metrics are available for comparative analysis of digital circuits. For example, power consumption, delay and latency, power delay product (PDP), energy delay product (EDP), and energy delay squared product (ED 2 P) have been reported by several researchers [6] , [7] .
In general, a PDP-based metric is appropriate for low power portable systems in which the battery life is the primary index of energy efficiency. This is in contrast with EDP or ED 2 P, where delay is weighted more heavily for high performance systems [6] . In this paper, we are primarily interested in DETFF usage for low-power low-voltage applications. Therefore, we selected PDP as the figure of merit. In particular, our analysis is similar to the comparative technique described by Stojanovic et al. [8] . Their study establishes a set of guidelines for objective comparisons of single-edge-triggered (SET) latches and flip-flops. The details of power and delay parameters employed in this study are defined in Section II-A and B. 
B. Delay
There are two delay parameters of interest in this study. The first delay is the time measured between the clock edge and the output edge, or t CQ . The second delay is the time measured between the input data edge and the output edge, or tDQ. The latter parameter is often referred to as the latency of a flip-flop. For a DETFF, latency is computed indirectly as the maximum t DQ of a rising and a falling data transitions for both rising and falling clock edges. Thus, the delay is taken as the maximum value from the measurements of all combinations of data and clock transitions, i.e., rising clock-rising data, rising clock-falling data, falling clock-rising data, and falling clock-falling data. Latency can also be computed as the sum of the setup time and the t CQ . For this study, t CQ and t DQ are both used as delay parameters. Latency is significant because in synchronous system, the system's cycle time depends on the longest delay of the network [9] . However, t CQ is equally important for this comparison since the setup time is also often a function of the independent variable of the simulations. This is true in the optimization process where changes in the transistor width affects the setup time as well as in the case where the independent variable is the supply voltage.
For completeness, the setup and hold times, the maximum data rate and total transistor width are included as additional flip-flops performance metrics. Total transistor width is used as a measure of the flip-flop area, since the physical layout is not available at this point. However, these parameters are not the focus of this paper.
III. DETFF IMPLEMENTATIONS
We have analyzed four previously reported static DETFFs. The P TOT , delay, PDP with respect to t CQ (P DP CQ ), and PDP with respect to tDQ (PDPDQ) of these flip-flops are compared with a newly proposed DETFF.
A. DETFF Implementations
The flip-flop DET gago proposed in [10] is illustrated in Fig. 1 .
Nodes N2, N3, N4, and N5 represent parallel connections between input buffers and latches. The appropriate phase of clock and its complement connects and disconnects the input buffers and storage elements from the power supply and ground. As a result, it has potential for low power applications. Although the complete isolation of the active and inactive parts of the circuit helps in power saving, but it leads to a larger delay. Fig. 2 shows the circuit implementation of DET llopis proposed in [3] which is a modified version of the DETFF proposed earlier in [5] . Complementary logic gates are employed here to balance the output rise and fall times of the original DETFF. Furthermore, it improves the PDP at the expense of increased total transistor width.
Pedram et al. proposed a DETFF that is shown in Fig. 3 [11] . In DETFF pedram , the role of the clock enable signal and the input data signal is reversed in the feedback transmission gate loops. Another DETFF illustrate in Fig. 4 , DET strollo , is proposed by Strollo et al.
in [12] . This DETFF is a single-latch DETFF. Its operation is based on pulse triggering that is created by its internal clock buffers. The size of the pulsewidth is crucial in this design. Hence, the proper operation of this DETFF is highly dependent on the internal clock buffer sizing and the propagation delay of the internal clock buffers.
The proposed DETFF, DET proposed , is illustrated in Fig. 5 . It consists of two storage elements. A true and complement combination of input data and clock signals controls the latching of the data value in the storage elements. The main advantage of this configuration is the ability to avoid stacking PMOS transistors. As a consequence, low voltage and low power operation becomes feasible. 
IV. SIMULATION
A tradeoff between speed and power consumption is often possible, and it is normally determined by the application. Hence, a given flip-flop can either be optimized for high performance or low power. However, when both power dissipation and performance are critical, one desires to determine a design that operates at the optimum. At this point, the power-delay product is minimum, i.e., optimal energy utilization for a given clock frequency. However, since the optimal delay and power parameters cannot be obtained in a single step, the PDP optimization procedure is often iterative [8] . Table I In order to compute the local data power and the local clock power, the flip-flop under test is initially disconnected, and power dissipated by the grey inverter and the black inverter are recorded, respectively. The flip-flop is then connected to the testbench for performance analysis. The power consumed by the grey and black inverters are recorded again for this time. Hence, the local data power can be calculated as the difference of the two power dissipations of the grey inverter. Likewise, the local clock power is computed as the difference of the two power consumption values of the black inverter.
A. Testbench

B. Optimization
Since the transistors' sizes are interrelated, the preliminary stage of the optimization is simplified as follows. For each circuit, the critical path is first identified. The width of the NMOS transistor, w n , is then selected as the parameter of interest. The sizing of the PMOS transistors that are located on the critical path is kept at a certain ratio with respect to w n . This ratio is determined by balancing the rising and falling edges of the output waveform of a test inverter. Note that this ratio changes with NMOS sizing. Moreover, transmission gates and transistors that are not located on the critical path are implemented with relatively small sizes.
Delay and power are measured as functions of w n . The measured power is the sum of all three components discussed earlier, whereas the delay is expressed by tCQ. Once the power and delay measurements are obtained, the PDP CQ is calculated as the product of the power and delay. Subsequently, PDP CQ is plotted as a function of tCQ. The initial PDPCQ point is taken as a minimum point of the PDP CQ versus t CQ curve. If the minimum point does not exist, the operating point with the minimum t CQ for a given energy is selected as the initial PDPCQ point to begin the optimization process. Once the initial PDP CQ point is determined for each flip-flop, these flip-flops are further optimized using an iterative method, until the best PDPCQ and PDPDQ are found.
C. Data Activity
Once the DETFFs are optimized, they are simulated at different data activity rates: 0 (all zero's and all one's), 0.5 and 1. This is to determine the efficiency and performance of each DETFF for a wide range of data activities. As aforementioned, the total power consumption of a DETFF consists of three separate components. Owing to the diverse design styles, these components can vary from flip-flop to flip-flop. As a result, the total power consumption of a flip-flop may change depending on the data activity. Therefore, it is desirable to simulate various DETFFs with different data activities. Results can then determine which DETFF is appropriate for an application weighted toward a particular data activity.
D. Supply Voltage
The nominal power supply voltage for 0.18 m technology is 1.8 V.
However, for battery operated systems, the power supply voltage is reduced drastically to lower the power consumption. Also, an efficient low voltage flip-flop should demonstrate a lower rate of incremental delay as the power supply voltage is reduced. Therefore delay, power, and PDP of all the DETFFs are computed as a function of power supply voltage. Again since the setup time increases with reduced supply voltage, the simulations require relaxed setup time conditions to provide results over a wide range. Hence in this analysis, t CQ and PDPCQ are determined for precise results.
V. RESULTS
All five DETFFs studied have been optimized as described in Section IV. It is found that delay decreases as the width increases until the minimum point is reached, if such a point exists. At this point, any further increase in width does not result in any further appreciable decrease in the delay. On the contrary, owing to the increased parasitics associated with the increased width, the delay may increase. On the other hand, for all the DETFFs, P TOT increases monotonically as the width increases. PDP CQ is then determined by multiplying P TOT by tCQ for the corresponding width. Furthermore, by combining the t CQ and the PDP CQ curves, we can plot PDP CQ versus t CQ , which is illustrated in Fig. 7 . These curves represent the first step of the optimization process.
The slopes of the PDPCQ curves in Fig. 7 indicate sensitivity of the flip-flops to delay as the width varies. When the t CQ is small, the PDP CQ is large since the total power dominates the product at larger widths. As the width decreases, the power consumption decreases, however the delay is inversely related to the width. This remains true until the local minimum is reached. At this point, both the power and delay increase because of the weakened driver strength. Fig. 7 also depicts the spread of DETFF performance in terms of PDP CQ and delay. As shown, the performance of the DETFFs studied are comparable. PDP CQ ranges from 30 to 75 fJ and delay ranges from 200 ps to 300 ps. The initial optimization points are then extracted from Fig. 7 and an iterative process is used to complete the optimization process. The goal of the optimization is to minimize the energy consumption PDP DQ .
The different DETFFs are compared in terms of power, delay, and energy. The final optimal parameters are summarized in Table II. The  first column of Table II lists the DETFFs and the second column displays the three components of power dissipation and the total power consumption. The third and fourth columns report the delay and energy consumption, CQ and DQ, respectively. Table III lists the other performance characteristics, such as setup and hold times, maximum data rate and total transistor width. As shown in the tables, DET pedram consumes the most power, due to an extensively large internal and data power dissipation. This also leads to the highest energy consumption.
However, it has the smallest total transistor width. DET llopis has the largest delay, yet the smallest consumption of clock and data power. DETgago consumes the least internal and total power, thence the least energy. DET strollo consumes the most clock power, yet this does not affect its overall performance compared to the other DETFFs studied.
DET proposed has the smallest delay, but it requires the largest total width.
After the DETFFs are optimized, they are simulated at different data activity rates. The results are shown in Fig. 8 . In general, applications with = 1, exhibit the largest total power consumption. Clock power dissipation is rather constant over all data activity rates. Data and internal power consumption increase as the data activity increases.
One exception is DET pedram . Where the data sequence consists of all zeros, the internal power is remarkably large. For the case of all ones, the internal power, on the other hand, is especially small, whereas the data power is notably larger. However, the data power at = 0:5 and = 1 are almost the same. Furthermore, DET pedram demonstrates the worst power consumption at all data rates, except when = 1 DETgago is the best in terms of power dissipation, at all different data rates. The total power consumption of DET llopis is very close to DET gago in all data activity. DET proposed has similar power consumption as DETgago, except in the case of = 1, in which it exhibits a substantially large internal power dissipation.
The performance of DETFFs under reduced voltage conditions is depicted in Figs. 9-11. switches to 1. Node B then switches M2 on. As a result, M1 and M2 attempt to write 0 and (V DD 0 V tn ) voltages simultaneously onto Node A. This voltage conflict is present until the clock changes state. Such a conflict results in a degraded noise margin. This has two implications. First, this structure allows large current to flow through the transmission gates at the input. Second, the degraded voltage level at Node A also causes a direct path current in the subsequent inverters. Hence, large data and internal power dissipation results. In addition, both data power and internal power depend on the data level rather than the data DET llopis has the best clock and data power dissipation. Its clock power consumption is low because of the small clock capacitance, whereas its data power dissipation is low due to the use of an inverting input buffer. Despite the fact that it has one of the smallest power consumptions at all data activity, it has the longest delay at nominal voltage since the data must propagate through the most logic stages compared to the other DETFF configurations. This leads to a comparatively large energy consumption at nominal condition. As a function of supply voltage, its total power consumption drops at a much lower rate and its delay rises at a slightly higher rate, compared to other DETFFs studied. Hence, it results in a higher energy consumption at low voltage. Therefore, its application for low voltage conditions is limited and its best energy consumption is seen around 1.5 V.
DETgago is found to be the most energy efficient DETFFs in all circumstances under nominal conditions in this study. Its superior low power performance is mainly due to the complete isolation of the elements when they are not in use. Its low power application is demonstrated. Under low supply voltage condition, although it has the lowest power consumption, but its delay is relatively higher than that of the proposed DETFF. It results in a slightly higher energy consumption than DET proposed at low supply voltage.
DET strollo consumes the largest clock power because of the chain of internal clock buffers. The delay through these clock buffers defines the activation pulse for the flip-flop. The definition of the activation pulsewidth is crucial to its operation. As the supply voltage reduces, the activation pulsewidth varies that causes the delay to increase at a much higher rate. The delay rapidly approaches the clock pulsewidth, hence it fails to latch the input data anymore. Therefore, it is not suitable to use in low voltage environment.
DET proposed has superior delay because the use of NMOS transistors and the avoidance of PMOS transistor stacking in its design. However, its inferior slew rate leads to an especially prominent power consumption at high data rates. As a result, its overall energy consumption at nominal condition is close to DET gago which has the lowest energy dissipation. In reduced supply voltage condition, DET proposed has the second best power consumption and the best delay. Therefore, the best energy consumption at low-supply voltage results. Hence, it has promising usage in low-energy and low-voltage applications. The proposed design is an attempt to design a low voltage DETFF.
Although DET proposed can achieve good performance, it is found that the complete isolation of the deactivated elements, as in the case of DETgago, is a key to low power dissipation. However, DET proposed has been shown to operate the most efficiently at low supply voltage. Hence, the proposed DETFF is recommended for further research in low power low voltage subsystems.
