Hindawi Publishing Corporation The Scientific World Journal Volume 2014, Article ID 453675, 14 pages http://dx.doi.org/10.1155/2014/453675



### Research Article

## A Modified Implementation of Tristate Inverter Based Static Master-Slave Flip-Flop with Improved Power-Delay-Area Product

### Kunwar Singh, Satish Chandra Tiwari, and Maneesha Gupta

<sup>1</sup> Department of Electrical Engineering, Delhi Technological University, Room No. FW1-SF1, EED, DTU, New Delhi 110042, India

<sup>2</sup> Division of ECE, Netaji Subhas Institute of Technology (NSIT), University of Delhi, Sector 3, Dwarka, New Delhi 110078, India

Correspondence should be addressed to Kunwar Singh; kunwarsingh@dce.ac.in

Received 28 August 2013; Accepted 13 October 2013; Published 27 February 2014

Academic Editors: L. Donetti, E. Tlelo-Cuautle, and F. Yuan

Copyright © 2014 Kunwar Singh et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The paper introduces novel architectures for implementation of fully static master-slave flip-flops for low power, high performance, and high density. Based on the proposed structure, traditional C²MOS latch (tristate inverter/clocked inverter) based flip-flop is implemented with fewer transistors. The modified C²MOS based flip-flop designs mC²MOSff1 and mC²MOSff2 are realized using only sixteen transistors each while the number of clocked transistors is also reduced in case of mC²MOSff1. Postlayout simulations indicate that mC²MOSff1 flip-flop shows 12.4% improvement in PDAP (power-delay-area product) when compared with transmission gate flip-flop (TGFF) at 16X capacitive load which is considered to be the best design alternative among the conventional master-slave flip-flops. To validate the correct behaviour of the proposed design, an eight bit asynchronous counter is designed to layout level. LVS and parasitic extraction were carried out on Calibre, whereas layouts were implemented using IC station (Mentor Graphics). HSPICE simulations were used to characterize the transient response of the flip-flop designs in a 180 nm/1.8 V CMOS technology. Simulations were also performed at 130 nm, 90 nm, and 65 nm to reveal the scalability of both the designs at modern process nodes.

#### 1. Introduction

Flip-flops are the key elements used in sequential digital systems. The appropriate selection of flip-flop topologies is instrumental in the design of VLSI integrated circuits such as microprocessors, microcontrollers, and other high complexity chips. However, factors such as high performance, low power, transistor count, clock load, design robustness, power-delay, and power-area tradeoffs are generally considered before choosing a particular flip-flop design. The highest operating frequency of clocked digital systems is determined by the flip-flops. Flip-flops and clock distribution network generally account for 30–70% of the total chip power consumption [1, 2]. Clock load is another major concern for digital system designers and several contributions have been reported in the past to reduce clock load and the associated power dissipation in the clocking network [3–5]. A design with elevated transistor count occupies a larger area on chip and leads to an increase in the overall manufacturing cost. Hence, design and implementation of low power high performance flip-flops with the least possible chip area is the main target of the modern chip manufacturing industry.

Flip-flops are broadly classified into three main categories, namely, master-slave [6–11], pulse triggered [12–17], and differential flip-flops [18–21]. Among them, master-slave and pulse-triggered flip-flops are the most efficient in terms of power-delay product. Master-slave flip-flops exhibit positive (negative) set-up time (hold time) requirements and hence not suitable for high speed systems due to extended data to output delays. But they are power efficient and can be used in low power applications. However, their main limitation is less robustness to clock skew. Pulse-triggered flip-flops have negative set-up time and thus lead to smaller data to output delay. They exhibit inherent soft clock edge property which minimizes clock skew related cycle time loss.



FIGURE 1: Classification of master-slave flip-flops.



FIGURE 2: Conventional architecture with clocked switches in the critical path.



FIGURE 3: Conventional architecture with clocked inverters in the critical path.

A classification of master-slave flip-flops is further elaborated in Figure 1. Clock-gated topologies exhibit internal clock gating to suppress the power consumption at lower data switching activities based on a clock gating logic and a comparator circuit. However, clock gated flip-flops have extended latency due to enhanced clock to output delays along with increased chip area overhead. Clock gated structures generally consume lesser power at low switching activities [22]. TGFF represents the best choice in the nonclock gated flip-flop category in terms of power-delay product [6], whereas existence of NMOS transistors in the critical path along with partially nongated keepers leads to less significant power-delay tradeoff characteristics in case of write port master-slave flip-flop (WPMS) [7, 8] and pass transistor logic based flip-flop (PTLFF) [9].

In this paper, we introduce an alternative design approach for designing  $C^2MOS$  based master-slave flip-flop, based on a new architecture with reduced transistor count and improved power-delay-area product. The proposed configurations  $mC^2MOSff1$  and  $mC^2MOSff2$  fall under the nonclock gated flip-flop category as shown in Figure 1.

The rest of the paper is organized as follows. Section 2 compares the conventional master-slave flip-flop configurations with proposed designs. Section 3 highlights the simulation parameters and test bench along with techniques used for transistor sizing and methodology adopted for optimization of timing and power-delay product. Section 4 describes

the simulation results. Section 5 concludes the paper. An appendix is added to show calibration of parameters for delay calculations using LE theory and to outline the strategy followed for designing the eight-bit ripple counter.

# 2. Overview of Previous Work and Proposed Designs

Figure 2 shows the conventional master-slave flip-flop architecture, whereby two regenerative loops (L1 and L2) are present in the master and slave sections to account for a static functionality. Both loops operate independently of each other on complementary clock signals. Regenerative loops are composed of cross coupled inverters. It can be observed from Figure 2 that for each loop, regenerative action is achieved through one inversion in the forward (critical) path while the other (clocked) inversion takes place in the feedback path. Moreover, there is no common component between both loops.

Since an inverter followed by transmission gate is equivalent to a clocked inverter, the combination is replaced by a clocked inverter to form a C<sup>2</sup>MOS based flip-flop architecture as shown in Figure 3 [23]. Two regenerative loops L3 and L4 are used in a similar manner as in the previous case to maintain the static nature of the flip-flop.



FIGURE 4: Proposed architectures.

However, in the proposed architecture as reported in Figure 4(a), both inversions take place in the forward (critical) path and the loop is completed by a clocked switch for loop L6 while loop L5 is completed by using an inverter in the feedback path. It is clearly noticed from Figure 4(a) that the output node is always driven and never floating thus ensuring a static flip-flop operation. The size of transistors in the feedback path marked by asterisks (\*) is kept at 360 nm (minimum technology width) to eliminate race conditions at nodes U and V. Yet another implementation is shown in Figure 4(b) which uses inverter INVX in the critical path and a clocked switch to form a regenerative loop L7. It is to be noted that INVX is common to both the regenerative loops L7 and L8 which is contrary to the realization of previous architectures.

Figure 5 represents the actual circuit design based on the proposed architectures in Figure 4, while TGFF is implemented using transmission gates as switches in the conventional architecture as demonstrated in Figure 6.

It can be clearly observed that mC<sup>2</sup>MOSff1 and mC<sup>2</sup>MOSff2 both are realized using sixteen transistors each. As a result, the area occupied by the proposed designs is significantly lesser than the conventional designs. Moreover, the number of clocked transistors in mC<sup>2</sup>MOSff1

is six as compared to eight in case of TGFF or conventional clocked inverter based flip-flop C<sup>2</sup>MOSff [23].

To illustrate the superior performance of the proposed flip-flop configurations, other flip-flop topologies, namely, TGFF, WPMS, PTLFF, gated master-slave latch (GMSL) [10], and data transition look ahead flip-flop (DTLA) [11] belonging to the master-slave class have been used for comparisons. Out of the above mentioned topologies GMSL, and DTLA represent flip-flops with internal clock gating. Schematic diagrams of WPMS, PTLFF, GMSL and DTLA are shown in Figures 7, 8, 9, and 10, respectively.

# 3. Simulation Parameters, Test Bench, and Optimization Methodology

Table 1 lists the CMOS parameters used for creating the simulation environment. The flip-flops were designed to layout level in 180 nm/1.8 V CMOS process at 250 MHz clock frequency. The width of transistors in the feedback structures was invariably fixed at the minimum value 360 nm while the slope of the data and clock signals was kept at 100 ps. Performances of the various flip-flop configurations are evaluated through SPICE simulation of the circuits extracted from the layout with the inclusion of parasitics.



FIGURE 5: Schematic diagrams of the proposed designs.

TABLE 1: Simulation parameters.

| Parameter               | Value   |
|-------------------------|---------|
| $\overline{W_{ m min}}$ | 360 nm  |
| $L_{ m min}$            | 140 nm  |
| $C_{\min}$              | 1.24 fF |
| $V_{ m DD}$             | 1.8 V   |
| Frequency               | 250 MHz |
| Signal slope            | 100 ps  |

Figure 11 shows the simulation test bench for characterization and comparison of the FF designs [3]. The clock and data signals are fed to the flip-flop through a two stage buffer. Data-to-output delay ( $T_{\rm DQ,min}$ ) is used for performance comparisons. Logical effort theory is extensively used for designing fast CMOS circuits based on pencil and paper calculations and is widely adopted in the literature [24]. Hence, the delay sensitivity factor introduced by Alioto

et al. [25] based on logical effort theory has been used for performance optimization.

A 16-cycle long pseudorandom sequence with a switching factor  $\alpha=0.5$  is supplied at the data input for measurement of average power [26]. Since the delay and power characterization are strongly dependent on the capacitive load offered to FFs [27], varying capacitive loads {4, 16, 64}  $C_{\min}$ , where  $C_{\min}$  is the input capacitance of a symmetrical minimum inverter ( $W_p=2W_n=2W_{\min}$ ), have been used to test the FF behaviour. Transistor sizing methodology adopted is the same as that in [28, 29], whereas power-delay product (PDP) and power-delay-area product (PDAP) are the chosen figures of merit (FOM).

The expression relating the absolute gate capacitance  $(C_{\text{GATE}})$  in terms of fF (femtofarads) and absolute transistor width (W) in terms of nanometers (nm) obtained at 180 nm process node by fitting simulation data [30] is given as

$$C_{\text{GATE}} = \left(1.15 \cdot 10^{-3}\right) \cdot W. \tag{1}$$



FIGURE 6: TGFF based on conventional architecture.



FIGURE 7: Schematic diagram for Write port master slave flip-flop (WPMS).

LE method states that the optimized delay D of a path of N cascaded stages is

$$D = N \sqrt[N]{GBH} + P, \tag{2}$$

$$D = N\sqrt[N]{F} + P, (3)$$

where G, B, H (=  $C_L/C_{\rm in}$ ) are the logical effort, branching effort, and electrical effort while P, F (= GBH) and  $C_L$  are parasitic delay, path effort, and final load capacitance, respectively. One has the following:

$$D = P(1+t). (4)$$

From (2) and (4),

$$t = \frac{N \sqrt[N]{GB} \sqrt[N]{C_L}}{P \sqrt[N]{C_{\rm in}}},$$
 (5)

where t represents the relative delay increment with respect to parasitic delay. Equations (4) and (5) indicate that larger values of  $C_{\rm in}$  lead to a saturation in the optimized delay and based on the above analysis, the delay sensitivity factor introduced by Alioto et al. [25] is utilized to obtain the upper bound on the transistor widths for exploration of the power-delay design space with least computational effort. Consider the following:

$$S_D^{C_{\rm in}} = \frac{\partial D}{\partial C_{\rm in}} \frac{C_{\rm in}}{D} = -\frac{1}{N} \frac{t}{t+1},\tag{6}$$



FIGURE 8: Schematic diagram for pass transistor logic style flip-flop (PTLFF).



FIGURE 9: Schematic diagram for Gated master slave latch (GMSL).



FIGURE 10: Schematic diagram for data transition look ahead flip-flop (DTLA).



FIGURE 11: Simulation setup.

where  $S_D^{C_{\rm in}}$  is the delay sensitivity factor and is obtained from (3) to (5). The upper bounds on the normalized transistor widths  $w_i$  (normalized with respect to  $W_{\rm min}$ ) have been obtained such that the delay sensitivity remains under a minimum value  $S_{\rm min}$  which is chosen as -5% for our analysis. The input capacitance  $C_{\rm in}$  of the flip-flop is expressed in terms of normalized width w1 as follows:

$$C_{\rm in} = (w1 \cdot 360 + 2 \cdot w1 \cdot 360) (1.15 \cdot 10^{-3}).$$
 (7)

Figure 12 shows the conventional TGFF design. The sizing is done by assuming the transistors in the critical path to be independent design variables (IDVs) and optimizing for maximum performance using LE theory. The inverter before transmission gate in the first stage protects the input terminal from noise variations [31]. Table 2 exhibits delay variation for increasing  $C_{\rm in}$  values. It is noteworthy that the delay saturates at 153 ps for  $C_{\rm in}=24.8$  fF. As a result, the upper bounds on transistor widths are exposed and the limits of power (energy)-delay design space are defined early in the design cycle [32]. The table also includes the corresponding power dissipation along with the power-delay product and it is observed that minimum power-delay product is obtained at  $C_{\rm in}=9.92$  fF. The technology parameters used for capacitance calculations throughout this paper are listed in Table 3.

#### 4. Results and Discussion

It is a well-established fact that the conventional  $C^2MOS$  although slower, is skew tolerant and occupies lesser area than TGFF [23, 33]. Moreover,  $mC^2MOSff1$  and  $mC^2MOSff2$  show nearly identical characteristics in terms of power, delay, and area and hence only  $mC^2MOSff1$  is considered for comparisons.

The waveforms in Figure 13 represent the transient analysis of mC<sup>2</sup>MOSFF1 carried out over a period of 8 clock cycles. The SPICE simulation results verify the correct flip-flop operation at 1 GHz clock frequency (all the flip-flops reported in the paper are designed for negative edge triggered operation). The variation of absolute data-to-output delays  $T_{\rm DQ,min}$  with FF input capacitance ( $C_{\rm in}$ ) for 16X (19.92 fF) capacitive load is illustrated in Figure 14.

TGFF utilizes transmission gates in the critical path and hence it is faster than the rival designs. There is exactly the same number of stages in the critical path of TGFF and mC<sup>2</sup>MOSffl, the only difference being that the latching circuit in case of TGFF is an inverter followed by a clocked transmission gate (inverting latch), whereas a clocked/tristate inverter is present in mC<sup>2</sup>MOSffl. Logical effort of both the latches is considered to be two; however, it is apparent that an inverter followed by a transmission gate is faster because

the output node is driven by both the transistors of the transmission gate in parallel and this behaviour is reflected in Figure 14. From the above discussion, it is obvious that the value of logical effort for an inverting latch can be assumed to be two for most theoretical purposes, but for comparison with a  $\rm C^2MOS$  latch, it must be slightly less than two if delays are to be modelled precisely.

Equation (2) clearly indicates that lesser branching effort leads to a faster circuit operation. The branching effort for a path with internal fan-out is expressed as [24]

$$b = \frac{C_{\text{on-path}} + C_{\text{off-path}}}{C_{\text{on-path}}},$$
 (8)

where  $C_{
m on-path}$  represents the load capacitance along the path under analysis and  $C_{
m off-path}$  represents the capacitance of the connections that lead off the path.

The branching effort along the critical path is given as

$$B = \prod b_i. \tag{9}$$

There are two branches each in TGFF and mC<sup>2</sup>MOSff1 represented as b1, b2 and b3, b4 in Figures 6 and 5(a), respectively. The branching effort corresponding to branches b1, b2, b3, and b4 is calculated as follows.

4.1. Branching Effort in Case of TGFF. One has the following.

b1 Calculation:

$$\begin{split} C_{\text{on-path}} &= C_{gd} \, (\text{TN5}) + C_{db} \, (\text{TN5}) \\ &+ C_{gd} \, (\text{TP5}) + C_{db} \, (\text{TP5}) = 8.43 \, \text{fF}, \end{split} \tag{10} \\ C_{\text{off-path}} &= C_{g} \, (\text{TN2}) + C_{g} \, (\text{TP2}) = 1.12 \, \text{fF}, \end{split}$$

$$b1 = 1.13$$
.

b2 Calculation:

$$C_{\text{on-path}} = C_g \text{ (TN9)} + C_g \text{ (TP9)} = 12.33 \text{ fF,}$$

$$C_{\text{off-path}} = C_g \text{ (TN6)} + C_g \text{ (TP6)} = 1.12 \text{ fF,}$$
(11)

$$b2 = 1.09,$$
  
 $B = b1 * b2 = 1.23.$ 

4.2. Branching Effort in Case of mC<sup>2</sup>MOSff1. One has the following.

b3 Calculation:

$$\begin{split} C_{\text{on-path}} &= C_g \, (\text{TN14}) + C_g \, (\text{TP14}) = 7.76 \, \text{fF}, \\ C_{\text{off-path}} &= C_{gd} \, (\text{TN16}) + C_{db} \, (\text{TN16}) + C_{gd} \, (\text{TP16}) \\ &+ C_{db} \, (\text{TP16}) = 1.47 \, \text{fF}, \end{split} \tag{12}$$

$$b3 = 1.18.$$



FIGURE 12: LE theory based transistor sizing methodology for transmission gate flip-flop.

| Table 2: Tradition | al transmissior | ı gate flip-flo | p at 19.92 fF | load (1 | 6X) | ). |
|--------------------|-----------------|-----------------|---------------|---------|-----|----|
|--------------------|-----------------|-----------------|---------------|---------|-----|----|

| C <sub>in</sub> (fF) | w1 | w2   | w3   | w4   | $T_{\rm DQ,min}$ (ps) | Power (uW) | PDP (fJ) |
|----------------------|----|------|------|------|-----------------------|------------|----------|
| 2.48                 | 2  | 2.35 | 2.79 | 6.65 | 226                   | 554        | 125.2    |
| 4.96                 | 4  | 3.95 | 3.95 | 7.91 | 191                   | 585        | 111.7    |
| 7.44                 | 6  | 5.35 | 4.84 | 8.76 | 173                   | 599        | 103.6    |
| 9.92                 | 8  | 6.65 | 5.59 | 9.41 | 166                   | 615        | 102      |
| 12.4                 | 10 | 7.86 | 6.25 | 9.95 | 162                   | 632        | 102.3    |
| 14.8                 | 12 | 9.01 | 6.85 | 10.4 | 159                   | 648        | 103      |
| 17.3                 | 14 | 10.1 | 7.40 | 10.8 | 157                   | 665        | 104.4    |
| 19.8                 | 16 | 11.1 | 7.91 | 11.2 | 155                   | 675        | 104.6    |
| 22.3                 | 18 | 12.2 | 8.39 | 11.5 | 154                   | 682        | 105      |
| 24.8                 | 20 | 13.2 | 8.84 | 11.8 | 153                   | 689        | 105.4    |

TABLE 3: Technology parameters used for estimation of capacitances.

| Parameter | C <sub>gdo</sub> (F/m) | $C_{\rm gso}$ (F/m) | $C_{\rm jsw}$ (F/m) | $C_j$ (F/m <sup>2</sup> ) | $L_D$ (m)  | $L_{S}$ (m) |
|-----------|------------------------|---------------------|---------------------|---------------------------|------------|-------------|
| NMOS      | 2.78E - 10             | 2.78E - 10          | 7.9E - 10           | 0.00365                   | 31.6E - 09 | 31.6E - 09  |
| PMOS      | 2.78E - 10             | 2.78E - 10          | 1.44E - 9           | 0.00138                   | 31.6E - 09 | 31.6E - 09  |



FIGURE 13: HSPICE simulation waveforms at 1 GHz clock frequency for  $\rm mC^2MOSffl.$ 

*b*4 Calculation:

$$\begin{split} &C_{\text{on-path}} = C_g \, (\text{TN14}) + C_g \, (\text{TP14}) = 7.76 \, \text{fF}, \\ &C_{\text{off-path}} = C_g \, (\text{TN13}) + C_g \, (\text{TP13}) = 0.828 \, \text{fF}, \\ &b4 = 1.10, \\ &B = b3 * b4 = 1.30, \end{split} \tag{13}$$

where  $C_{gd}$  is gate to drain capacitance,  $C_{db}$  is drain to body capacitance, and  $C_g$  is the gate capacitance of respective transistors.

Accordingly, using (2) and putting G=4, B=1.23, H=19.92/12.4=1.60, N=4, and P=6, we have D=12.7 (absolute delay 165.1 ps) for TGFF, whereas putting G=4, B=1.30, H=19.92/12.4=1.60, N=4, and P=6, we have D=12.79 (absolute delay 166.27 ps) for mC<sup>2</sup>MOSffl. Absolute delays  $D_{\rm abs}$  are obtained by multiplying parameter D with parameter  $\tau$  as follows:

$$D_{\rm abs} = D\tau. \tag{14}$$



FIGURE 14: Variation in data-to-output delay with respect to FF input capacitance.

It is clearly observed that the delay of mC<sup>2</sup>MOSff1 is marginally higher than the delay of TGFF. Now, keeping other parameters to be the same and assuming the logical effort of inverting latch to be 1.8, the updated value of TGFF is evaluated as D = 12.35 (absolute delay 160.55 ps).

The value of process dependent parameter  $\tau$  is determined as approximately 13 ps using the calibration technique as mentioned by Sutherland et al. [24]. The detailed procedure is discussed in the Appendix. The absolute delay measurements obtained through simulation are 162 ps for TGFF and 196 ps for mC<sup>2</sup>MOSffl which is in close agreement with the theoretical values 160.55 ps and 166.27 ps, respectively (typically within 15% error).

WPMS and PTLFF topologies show degraded performance due to the presence of pass transistors in the critical path while the speed of clock-gated structures is worst mainly because gating circuit is inserted between the clock and the flip-flop terminals which deteriorates the timing characteristics. The characterizations are done assuming that  $C_{\rm in}=12.4\,{\rm fF}$  and  $C_L=19.92\,{\rm fF}$  (16X) where  $C_L$  represents the flip-flop load capacitance.

The variation of average power with  $C_{\rm in}$  for 16X loading condition is depicted in Figure 15. Due to threshold voltage drop at internal nodes, WPMS and PTLFF display worst power dissipation characteristics because of short circuit power dissipation. GMSL and DTLA exhibit greater power dissipation than nongated counterparts because pseudorandom sequence has an activity factor of 0.5. The reason being the presence of additional comparator and clock gating circuit which is beneficial only at sufficiently low switching activities or otherwise leads to both increased area and power overhead.



FIGURE 15: Variation in power dissipation as a function of FF input capacitance.

4.3. Clock Load Calculations. One has the following. TGFF:

$$\begin{split} \left\{ C_g \left( \text{TN1} \right) + C_g \left( \text{TP1} \right) + C_g \left( \text{TN5} \right) + C_g \left( \text{TP5} \right) \right\} \\ + \left\{ C_g \left( \text{TN3} \right) + C_g \left( \text{TP3} \right) + C_g \left( \text{TN7} \right) + C_g \left( \text{TP7} \right) \right\} \end{split} \tag{15}$$

 ${Transistors\ contributing\ towards\ clock\ load\ in\ the\ critical\ path} + {Transistors\ contributing\ towards\ clock\ load\ in\ the\ feedback\ structure}$ 

$$= 14.78 \, \text{fF} + 1.66 \, \text{fF}$$
  
= 16.44 fF.

mC<sup>2</sup>MOSff1:

$$\begin{split} \left\{ C_g \left( \text{TN10} \right) + C_g \left( \text{TP10} \right) + C_g \left( \text{TN11} \right) + C_g \left( \text{TP11} \right) \right\} \\ + \left\{ C_g \left( \text{TN16} \right) + C_g \left( \text{TP16} \right) \right\} \end{split} \tag{16}$$

{Transistors contributing towards clock load in the critical path} + {Transistors contributing towards clock load in the feedback structure}

$$= 22.18 \, \text{fF} + 0.84 \, \text{fF}$$
  
= 23.02 fF.

Apart from the clock load, the capacitance value at internal nodes of mC<sup>2</sup>MOSffl is reduced as compared to TGFF by eliminating transistors TN6 and TP6 from the feedback structure.

The Scientific World Journal

**TGFF** mC2MOSff1 PTLFF Design **WPMS GMSL** DTLA Transistor count 20 16 24 16 31 46 No. of clocked transistors 8 6 6 4 2 3 Clock-to-output delay (ps) 92 116 206 204 419 683 Optimum setup time (ps) 70 80 40 50 80 -140Hold time (ps) -19-21-33-32-2325  $T_{\rm DQ,\,min}$  (ps) 162 196 246 254 499 543 Clock load (fF) 16.44 23.02 9.05 8.22 7.76 7.31 Power dissipation (uW)\* 632 640 786 679 676 643 Leakage Power (uW) 59.38 57.51 72.64 69.83 74.91 76.73

Table 4: Comparison of flip-flop parameters at  $C_{in} = 12.4 \, \text{fF}$  and 16X capacitive loading.

#### 4.4. Capacitance Calculations at Internal Nodes of TGFF

Internal Capacitance at Nodes P and K

Node P: 
$$C_g(\text{TN2}) + C_g(\text{TP2}) + C_{gd}(\text{TN5}) + C_{db}(\text{TN5}) + C_{gd}(\text{TP5}) + C_{db}(\text{TP5}) = 9.28 \text{ fF.}$$
  
Node K:  $C_g(\text{TN6}) + C_g(\text{TP6}) + C_g(\text{TN9}) + C_g(\text{TP9}) = 9.02 \text{ fF.}$ 

Internal Capacitance at Nodes M and N

$$\begin{array}{llll} \text{Node} & \text{M:} & C_{gd}(\text{TN1}) \ + \ C_{db}(\text{TN1}) \ + \ C_{gd}(\text{TP1}) \ + \\ C_{db}(\text{TP1}) \ + \ C_{fd}(\text{TN3}) \ + \ C_{db}(\text{TN3}) \ + \ C_{gd}(\text{TP3}) \ + \\ C_{db}(\text{TP3}) \ + \ C_{g}(\text{TN4}) \ + \ C_{g}(\text{TP4}) = 18.41\,\text{f}\text{F}. \\ \\ \text{Node} & \text{N:} & C_{gd}(\text{TN5}) \ + \ C_{db}(\text{TN5}) \ + \ C_{gd}(\text{TP5}) \ + \\ C_{db}(\text{TP5}) \ + \ C_{gd}(\text{TN7}) \ + \ C_{db}(\text{TN7}) \ + \ C_{gd}(\text{TP7}) \ + \\ C_{db}(\text{TP7}) \ + \ C_{g}(\text{TN8}) \ + \ C_{g}(\text{TP8}) = 14.80\,\text{f}\text{F}. \\ \end{array}$$

#### 4.5. Capacitance Calculations at Internal Nodes of mC<sup>2</sup>MOSff1

Internal Capacitance at Nodes P' and K'

Node P': 
$$C_g(\text{TN12}) + C_g(\text{TP12}) = 9.76 \text{ fF.}$$
  
Node K':  $C_g(\text{TN13}) + C_g(\text{TP13}) + C_g(\text{TN14}) + C_g(\text{TP14}) + C_{gd}(\text{TN16}) + C_{db}(\text{TP16}) + C_{gd}(\text{TN16}) + C_{db}(\text{TP16}) = 10.06 \text{ fF.}$ 

Internal Capacitance at Node M'

Node M': 
$$C_g(TN15) + C_g(TP15) = 12.35$$
 fF.

It can be easily concluded from calculations above that a total of 19.34 fF capacitance has been reduced from the internal nodes in the critical path of mC<sup>2</sup>MOSffl in comparison to TGFF. This leads to reduced internal power dissipation at these nodes as lesser capacitance has to be charged or discharged per clock cycle. However, reduction in the clock load of mC<sup>2</sup>MOSffl due to transistors eliminated from the feedback structure is nullified due to PMOS transistors TP10 and TP11 whose size is twice that of transistors TP1 and TP5 in case of TGFF and as a result the total power dissipation of both the flip-flops is nearly the same as it can be clearly observed from Figure 16. Following a similar procedure, the clock load of various flip-flops is obtained and listed in Table 4 along with number of clocked transistors and power



FIGURE 16: Power-delay product characteristics with varying FF input capacitance at 16X load.

consumption values. It is seen that TGFF and mC<sup>2</sup>MOSff1 represent the most efficient designs in terms of reduced power consumption having power dissipation comparable to DTLA at  $C_{\rm in} = 12.4\,{\rm fF}$  and  $C_L = 19.92\,{\rm fF}$ .

It can be observed that mC<sup>2</sup>MOSffl has the least transistor count along with PTLFF while GMSL and DTLA consist of maximum number of transistors. Since only sixteen transistors are used for circuit realization of mC<sup>2</sup>MOSffl, power dissipation is comparable to TGFF. It is worth noting that GMSL and DTLA offer minimum clock load, as a result, these topologies exhibit least power dissipation at lower switching activities. The reason for extended clock-to-output delays of GMSL and DTLA is the insertion of clock gating circuitry while DTLA has a pulsed operation and hence shows negative set-up time requirements. Based on the power and delay measurements, power-delay product characteristics are derived for all the flip-flops as shown in Figure 16. The optimum power-delay product of gated structures GMSL and DTLA

<sup>\*</sup>Pseudorandom sequence with  $\alpha = 0.5$  is used for power calculations.

| Design                 | Transistor count | Transistor widths (um) | Delay (ps) | Power (uW) | Layout area (um²) | PDP (fJ) | PDAP (fJ·um²) |
|------------------------|------------------|------------------------|------------|------------|-------------------|----------|---------------|
| TGFF                   | 20               | 52.52                  | 162        | 632        | 175               | 102.3    | 17902         |
| mC <sup>2</sup> MOSff1 | 16               | 58.95                  | 196        | 640        | 125               | 125.4    | 15675         |

TABLE 5: PDAP comparison of TGFF and mC<sup>2</sup>MOSff1.

TABLE 6: Flip-flop simulation parameters at 65 nm CMOS technology.

| Process corner | Temperature (°C) | $V_{ m DD}$ | Simulation/technology parameters |              |                 |               |                                  |  |
|----------------|------------------|-------------|----------------------------------|--------------|-----------------|---------------|----------------------------------|--|
| TT             | 70               | 1           | $L_{\mathrm{min}}$               | $W_{ m min}$ | $C_{\min}$      | Frequency     | Signal slope                     |  |
| FF             | 0                | 1.1         | 60 nm                            | 120 nm       | 507 aF          | 2 GHz         | 20 ps                            |  |
| SS             | 125              | 0.9         |                                  |              |                 |               |                                  |  |
| FS             | 70               | 1           | $C_{\text{Poly}} =$              | $0.268C_{G}$ | $C_{ m metall}$ | $= 0.215 C_G$ | $C_{\text{metal2}} = 0.175  C_G$ |  |
| SF             | 70               | 1           | •                                |              |                 |               |                                  |  |



FIGURE 17: Layout implementation of TGFF.

FIGURE 18: Layout implementation of mC<sup>2</sup>MOSff1.

is, respectively, 3.30x and 3.34x times greater than optimum PDP of TGFF. Among the nonclock gated structures, pass transistors based designs WPMS and PTLFF exhibit 1.77x and 1.57x enhancement in the power-delay product with respect to the benchmark flip-flop TGFF. TGFF also shows 20% improvement over mC<sup>2</sup>MOSffl in terms of minimum powerdelay product. However, despite the fact that TGFF represents a better alternative in terms of performance and optimum power-delay product, the area requirements also remain a major concern. It has been observed in the literature that conventional C<sup>2</sup>MOS based flip-flop is up to 20–25% more efficient in terms of occupied chip area. This stems mainly from the fact that at layout level (i) in comparison to TGFF, diffusion areas of most of the transistors can be shared in C<sup>2</sup>MOS flip-flop [33], (ii) the number of contact holes can be reduced in the layout pattern [23], and (iii) less complicated feedback structure leads to fewer interconnections.

The layouts were implemented using  $C_{\rm in}=12.4\,{\rm fF}$ , indicating almost similar transistor sizes throughout the critical path with the exception of TP10 and TP11 belonging to mC²MOSffl which are twice in size compared to TP1 and TP5 in accordance with the LE theory. The layouts for TGFF and mC²MOSffl are shown in Figures 17 and 18, respectively. Table 5 clearly shows that while TGFF is better in terms of PDP by 18.4%, mC²MOSffl shows a 12.4% improvement in the PDAP making it suitable for high density applications where performance can be compromised.



FIGURE 19: Comparison chart of power dissipation at different switching frequencies.

The power dissipation results as illustrated in Figure 19 are obtained using  $C_{\rm in} = 12.4\,\rm fF$  which ensures that all the transistors in the critical path have similar widths. At zero switching activity, clock-gated topologies are the most power



FIGURE 20: Delay variations for mC<sup>2</sup>MOSffl at 16X loading for different process corners.



FIGURE 21: Delay variations for mC<sup>2</sup>MOSff1 at 16X loading for different process corners.

efficient. GMSL and DTLA show GMSL 32.5% and 46.3% reduction in power in case of logic high at the input, whereas for logic low, the power consumption is reduced by 19.2% and 35.4%, respectively. Again, it can be clearly observed that there is only a slight difference in the power dissipation of TGFF and mC<sup>2</sup>MOSffl at different switching activities.

The correct functionality of the proposed flip-flop mC<sup>2</sup>MOSffl is validated by designing an 8-bit ripple counter at 16X capacitive load and the average power measurements were carried out over 256 clock cycles. It was noticed that



FIGURE 22: Delay versus fanout curve for an inverter at 180 nm/1.8 V CMOS process.



FIGURE 23: Conversion of D flip-flop to T flip-flop.

the power consumption of the mC<sup>2</sup>MOSffl based counter is comparable to the TGFF at varying frequencies. Again, LE theory has been adopted for sizing individual flip-flops in each counter for optimum performance which is expressed in detail in the Appendix.

The flip-flops were also designed and simulated to layout level with inclusion of parasitics at  $130 \, \mathrm{nm}$ ,  $90 \, \mathrm{nm}$ , and  $65 \, \mathrm{nm}$  CMOS processes to address scalability issues at more advanced process nodes. The simulation test bench and optimization methodology are similar as mentioned in Section 3. PVT variations are emphasized to evaluate the performance of flip-flops at all process corners, namely, FF, SS, FS, and SF with voltages scaled from  $0.9 \, \mathrm{to} \, 1.1 \, \mathrm{V}$  while the temperatures varied from  $0 \, \mathrm{to} \, 125 \, \mathrm{degrees}$  as shown in Table 6. The simulation and technology parameters are also listed in Table 6 where  $C_G$  represents the capacitance per unit gate oxide and was evaluated to be  $1.3 \, \mathrm{fF/um}$  by fitting simulation data. In addition, the capacitances per unit length of poly, metal 1 and metal 2 interconnects are also mentioned.

For illustration purposes, the delay and power variations with the flip-flop input capacitance with respect to different process corners at 65 nm CMOS technology for mC<sup>2</sup>MOSffl are demonstrated in Figures 20 and 21, respectively, at 16X capacitive loading. Both mC<sup>2</sup>MOSffl and mC<sup>2</sup>MOSff2 showed correct circuital behaviour at the aforementioned process nodes which indicates that no internal noise violations exist especially due to the fact that logic levels are



FIGURE 24: TGFF based T flip-flop.



FIGURE 25: Schematic diagram of a modulo 256 ripple counter with intermediate buffers.

retained even at FF process corner. However, it is to be pointed out that  $mC^2MOSffl$  in a manner similar to TGFF starts to fail at SS corner for lower values of  $C_{in}$  [34].

#### 5. Conclusion

In this paper, an alternative architecture for designing C<sup>2</sup>MOS based flip-flops is presented with a modified feedback strategy while preserving the fully static operation. Using the new feedback approach, a modified topology mC<sup>2</sup>MOSff1 is proposed with decreased parasitic capacitances at internal nodes in comparison to the TGFF which is the finest design in terms of PDP. However, postlayout simulations and analyses indicate that the modified configuration mC<sup>2</sup>MOSffl presents the best alternative in terms of PDAP among all the conventional designs. Therefore, for high performance applications, TGFF still remains the best choice but it can be replaced by mC<sup>2</sup>MOSff1 for high density applications. Comparisons were carried out with state-ofthe-art flip-flops in the master-slave class. The simulation results are well supported with mathematical analysis based on logical effort theory within acceptable error (typically less than 15%).

#### Appendices

#### A. Delay Calibration Using LE Theory

For modelling delays using LE theory initially, all the delays are expressed in terms of a basic delay unit  $\tau$  which is process

dependent such that the absolute delay is represented as the product of a unit less delay of the gate as shown in (2), and the delay unit  $\tau$ . Accordingly,

$$D_{\rm abs} = D\tau. \tag{A.1}$$

While D represents the delay for a multistage path, d corresponds to the delay of a single stage logic gate. Parameter  $\tau$  needs to be estimated in order to obtain absolute delays and accordingly a delay versus fanout curve is determined for an inverter as shown in Figure 22 by fitting simulation data. The curve is approximated as a straight line and the slope of the line represents  $\tau$  since  $d = (gh + p)\tau$  and logical effort of an inverter is 1. In our case,  $\tau$  is estimated as 13 ps.

#### B. Implementation of 8-Bit Ripple Counter

An 8-bit asynchronous counter was implemented by converting the D flip-flop configuration to a T flip-flop configuration using an EXOR gate as illustrated in Figure 23.

The T flip-flop designed using TGFF is shown in Figure 24. It is considered to be a five stage design and optimized for highest speed using LE theory. The EXOR gate was realized using transmission gates as revealed in Stage 1 of Figure 24. A similar procedure was followed for designing mC<sup>2</sup>MOSffl based T flip-flop.

For designing the modulo 256 counter, the output Q of each stage is connected to the clock terminal of the next stage through two intermediate inverters (acting as a buffer) sized ( $W_p = 11.52 \,\mathrm{u}$ ,  $W_n = 5.76 \,\mathrm{u}$ ) such that the input capacitance of the first inverter acts as the load capacitance for the flip-flop

configuration of the previous stage as depicted in Figure 25. As a result, the load at the output terminal of each flip-flop is uniformly fixed at 19.92 fF.

#### **Conflict of Interests**

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### References

- [1] H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, pp. 807–811, 1998.
- [2] G. Yeap, Practical Low Power Digital VLSI Design, Kluwer Academic, 1998.
- [3] V. Oklobdzija, V. Stojanovic, D. Markovic, and N. Nedovic, Digital System Clocking: High-Performance and Low-Power Aspects, Wiley-IEEE Press, 2003.
- [4] B. Mesgarzadeh, M. Hansson, and A. Alvandpour, "Jitter characteristic in charge recovery resonant clock distribution," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 7, pp. 1618–1625, 2007.
- [5] C. Giacomotto, N. Nedovic, and V. G. Oklobdzija, "The effect of the system specification on the optimal selection of clocked storage elements," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 6, pp. 1392–1404, 2007.
- [6] G. Gerosa, S. Gary, C. Dietz et al., "2.2 W, 80 MHz superscalar RISC microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 12, pp. 1440–1454, 1994.
- [7] D. Markovic, J. Tschanz, and V. De, "Transmission-gate based flip-flop," US Patent 6642765, 2003.
- [8] S. K. Hsu, S. K. Mathew, M. A. Anders et al., "A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS," IEEE Journal of Solid-State Circuits, vol. 41, no. 1, pp. 256–264, 2006
- [9] R. Hossain, L. D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 2, no. 2, pp. 261–264, 1994.
- [10] A. G. M. Strollo, E. Napoli, and D. de Caro, "Low-power flipflops with reliable clock gating," *Microelectronics Journal*, vol. 32, no. 1, pp. 21–28, 2001.
- [11] M. Nogawa and Y. Ohtomo, "A data-transition look-ahead DFF circuit for statistical reduction in power consumption," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, pp. 702–706, 1998.
- [12] F. Klass, C. Amir, A. Das et al., "A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 5, pp. 712–716, 1999.
- [13] P. Zhao, T. Darwish, and M. Bayoumi, "Low power and high speed explicit-pulsed flip-flops," in *Proceedings of the 45th Midwest Symposium on Circuits and Systems*, pp. II477–II480, August 2002.
- [14] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," in *Proceedings of the IEEE International Solid-State Circuits Conference*, pp. 138–139, February 1996.
- [15] R. Heald, K. Aingaran, C. Amir et al., "Third-generation SPARC V9 64-b microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 11, pp. 1526–1538, 2000.

- [16] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, "Conditional techniques for low power consumption flip-flops," in *Proceedings of the 8th IEEE International Conference on Electronics, Circuits and Systems (ICECS '01)*, pp. 803–806, September 2001.
- [17] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski, "The implementation of the itanium 2 microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 11, pp. 1448–1460, 2002.
- [18] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flipflop for statistical power reduction," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 8, pp. 1263–1271, 2001.
- [19] S. Shin and B. Kong, "Variable sampling window flip-flops for low power high-speed VLSI," *IEE Proceedings of Circuits, Devices and Systems*, vol. 152, no. 3, pp. 266–271, 2005.
- [20] B. Nikolić, V. G. Oklobdžija, V. Stojanovič, W. Jia, J. K.-S. Chiu, and M. M.-T. Leung, "Improved sense-amplifier-based flip-flop: design and measurements," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 6, pp. 876–884, 2000.
- [21] N. Nedovic, V. G. Oklobdzija, and W. W. Walker, "A clock skew absorbing flip-flop," in *Proceedings of the IEEE International Solid-State Circuits Conference*, vol. 1, pp. 342–344, February 2003.
- [22] A. G. M. Strollo and D. de Caro, "Low power flip-flop with clock gating on master and slave latches," *Electronics Letters*, vol. 36, no. 4, pp. 294–295, 2000.
- [23] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS Calculator Circuitry," *IEEE Journal of Solid-State Circuits*, vol. SC-8, no. 6, pp. 462–469, 1973.
- [24] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann, Los Altos, Calif, USA, 1998.
- [25] M. Alioto, E. Consoli, and G. Palumbo, "General strategies to design nanometer flip-flops in the energy-delay space," *IEEE Transactions on Circuits and Systems I*, vol. 57, no. 7, pp. 1583–1596, 2010.
- [26] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, 1999.
- [27] S. Heo and K. Asanovic, "Load-sensitive flip-flop characterization," in *Proceedings of the IEEE Computer Society Workshop on VLSI*, pp. 87–92, 2001.
- [28] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS Flip-Flops. Part I: methodology and design strategies," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 5, pp. 725–736, 2011.
- [29] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS Flip-Flops. Part II: results and figures of merit," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 5, pp. 737–750, 2011.
- [30] G. Palumbo and M. Pennisi, "Design guidelines for high-speed transmission-gate latches: analysis and comparison," in Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems (ICECS '08), pp. 145–148, September 2008.
- [31] E. Consoli, G. Palumbo, and M. Pennisi, "Reconsidering high-speed design criteria for transmission-gate-based master-slave flip-flops," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 20, no. 2, pp. 284–295, 2012.

- [32] M. Alioto, E. Consoli, and G. Palumbo, "From energy-delay metrics to constraints on the design of digital circuits," *International Journal of Circuit Theory and Applications*, vol. 40, pp. 815–834, 2012.
- [33] H. J. Chao and C. A. Johnston, "Behavior analysis of CMOS D flip-flops," *IEEE Journal of Solid-State Circuits*, vol. 24, no. 5, pp. 1454–1458, 1989.
- [34] H. Q. Dao, K. Nowka, and V. G. Oklobdzija, "Analysis of clocked timing elements for dynamic voltage scaling effects over process parameter variation," in *Proceedings of the International Symposium on Low Electronics and Design (ISLPED '01)*, pp. 56–59, August 2001.

















Submit your manuscripts at http://www.hindawi.com























