# A Comparative Analysis for <br> Low-voltage, Low-power, and Low-energy Flip-flops 

by<br>Yugal Kishore Maheshwari

A thesis<br>presented to the University of Waterloo<br>in fulfillment of the<br>thesis requirement for the degree of Master of Applied Sciences<br>in<br>Electrical \& Computer Engineering

Waterloo, Ontario, Canada, 2020
(c) Yugal Kishore Maheshwari 2020

## Author's Declaration

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.


#### Abstract

Recently, several flip-flops have been proposed to increase their speed while reducing their power and energy consumption. Flip-flop power is dependent on data activity and in many applications data activity is between $5-15 \%$. In such cases, significantly large clock power and energy is wasted. This thesis explores performance of seven advanced D flip-flops in terms of power, delay, and energy using 65 nm CMOS process technology. The main objective of this research is to compare and contrast recent flip-flops under different voltage and data activity conditions, and draw conclusions.

Transmission gate flip-flop (TGFF) is used as a reference flip-flop, and based on comparison result TGFF has shown power-performance trade-offs. 18-T single-phase clocked static flip-flops (18TSPC, TSPC18), and Low-power at low data activity flip-flop (LLFF) are the fastest alternatives suitable for higher performance. However, LLFF consumes more power on higher data activities, where as TSPC18 and 18TSPC consume more power on lower data activities. For lower data activities Topologically compressed flip-flop (TCFF) is power efficient amongst all, but poor in performance. Furthermore, post-layout simulation result illustrates that LLFF is the most energy efficient amongst all up to $20 \%$ of data activities, and 18TSPC is energy efficient for higher data activities.


## Acknowledgements

I would like to thank all the little people who made this thesis possible.

## Dedication

This is dedicated to the one I love.

## Table of Contents

List of Figures ..... viii
List of Tables ..... x
1 Introduction ..... 1
1.1 Flip-flops and Latches ..... 2
1.1.1 Flip-flop Power Consumption Sources ..... 3
1.1.2 Timing and Delay Definitions for Flip-flop ..... 4
1.1.3 PDP and EDP for Flip-flop ..... 6
1.2 Motivation ..... 7
1.3 Contribution ..... 7
1.4 Thesis Organization ..... 8
2 Background ..... 9
2.1 Classifications of Flip-flops ..... 9
2.1.1 Master-Slave Versus Pulse-Triggered Latch ..... 9
2.1.2 Static Versus Dynamic Flip-flop ..... 12
2.1.3 Single Clock Phase Versus Multi Clock Phase Flip-flop ..... 13
2.1.4 Single Edge Triggered Versus Double Edge Triggered Flip-flop ..... 14
2.2 Recent Advanced Flip-flops ..... 15
2.2.1 Adaptive Coupling Flip-flop (ACFF) ..... 15
2.2.2 Static Single Phase Contention-Free Flip-flop (S2CFF) ..... 16
2.2.3 Topologically Compressed Flip-flop (TCFF) ..... 17
2.2.4 True-Single-Phase-Clock 18T Flip-flop (TSPC18) ..... 18
2.2.5 18-Transistor Fully Static Contention-Free Single-Phase Clocked Flip- flop (18TSPC) ..... 19
2.2.6 A Contention-Free, Static, Single-Phase Flip-flop for Low Data Ac- tivity Applications (LLFF) ..... 20
3 Simulation Results ..... 22
3.1 Schematic Simulation ..... 24
3.2 Post-layout Simulation ..... 30
3.3 Pre-layout Versus Post-layout Simulation ..... 39
4 Scan chain of Flip-flops ..... 41
4.1 Schematic Simulation Result ..... 42
4.2 Post-layout Simulation Result ..... 43
5 Conclusions and Future Work ..... 46
References ..... 48

## List of Figures

1.1 Working principle of (a) Positive Edge-Triggered Flip-flop, and (b) Active High Latch [2]. ..... 3
1.2 Flip-flop Timing Diagram. ..... 5
1.3 Normalized delay, energy, and energy-delay plots for CMOS inverter [1] ..... 6
2.1 Block Diagram and Timing Waveform of Master-Slave Flip-flop. ..... 10
2.2 The Conventional Static Transmission Gate Flip-flop (TGFF). ..... 10
2.3 Pulse Triggered Flip-flops (a) Hybrid-Latch Flip-flop (HLFF) proposed in [2], and (b) Semi-Dynamic Flip-flop (SDFF) proposed in [3]. ..... 12
2.4 (a) Static Flip-flop, and (b) Dynamic Flip-flop [4]. ..... 13
2.5 Block Diagram of Double Edge-Triggered D Flip-flop [5]. ..... 14
2.6 Schematic Diagram of Dual-edge Triggered D Flip-flop [6] ..... 15
2.7 Adaptive Coupling Flip-flop (ACFF) proposed in [7]. ..... 16
2.8 Static Single-Phase Clocked Flip-flop (S2CFF) proposed in [8]. ..... 17
2.9 Topologically Compressed Flip-flop (TCFF) proposed in [9]. ..... 18
2.10 18T Single-Phase Clocked Flip-flop (TSPC18) proposed in [10]. ..... 19
2.11 18T Single-Phase Clocked Flip-flop (18TSPC) proposed in [11]. ..... 20
2.12 A Contention-Free, Static, Single-Phase Flip-flop (LLFF) proposed in [12]. ..... 21
3.1 Test Bench of Post-layout Simulation ..... 23
3.2 Transient Simulation (Schematic) at VDD $=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$ ..... 24
3.3 Transient Simulation (Schematic) at VDD $=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$ ..... 24
3.4 Total Avg. Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ ..... 25
3.5 Total Avg. Power Versus Input Data Activity (DA) at VDD $=0.5 \mathrm{~V}$. ..... 25
3.6 Clock Buffer Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ and 0.5 V ..... 26
3.7 Data Buffer Power Versus Input Data Activity (DA) at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V. ..... 27
3.8 D-to-Q Delay Versus D-to-C Delay at VDD $=1 \mathrm{~V}$. ..... 28
3.9 D-to-Q Delay Versus D-to-C Delay at VDD $=0.5 \mathrm{~V}$. ..... 28
3.10 Layout of (a) TSPC18, (b) 18TSPC, (c) LLFF, (d) TGFF, (e) TCFF, (f) ACFF, and (g) S2CFF. ..... 31
3.11 Transient Simulation (Layout) at VDD $=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$ ..... 32
3.12 Transient Simulation (Layout) at VDD $=0.5 \mathrm{~V}$, CK $=100 \mathrm{MHz}$ ..... 32
3.13 Total Avg. Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ ..... 33
3.14 Total Avg. Power Versus Input Data Activity (DA) at VDD $=0.5 \mathrm{~V}$. ..... 33
3.15 Clock Buffer Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ and 0.5 V . ..... 34
3.16 Data Buffer Power Versus Input Data Activity (DA) at VDD=1V ..... 35
3.17 D-to-Q Delay Versus D-to-C Delay at VDD $=1 \mathrm{~V}$. ..... 36
3.18 D-to-Q Delay Versus D-to-C Delay at VDD $=0.5 \mathrm{~V}$. ..... 36
4.1 Block Diagram of Scan chain of 256 Flip-flops. ..... 41
4.2 Total Avg. Power (Schematic) Versus DA at (a) VDD $=1 \mathrm{~V}$ and $\mathrm{CK}=50$ MHz , and $(\mathrm{b}) \mathrm{VDD}=0.5 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$ ..... 42
4.3 Total Avg. Power (Layout) Versus DA at (a) VDD $=1 \mathrm{~V}$ and $\mathrm{CK}=50$ MHz , and $(\mathrm{b}) \mathrm{VDD}=0.5 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$. ..... 44

## List of Tables

3.1 Timing Comparison of Flip-flop (Schematic) ..... 29
3.2 PDP Comparison of Flip-flop (Schematic) ..... 30
3.3 Timing Comparison of Flip-flop (Layout). ..... 37
3.4 PDP Comparison of Flip-flop (Layout). ..... 37
3.5 Flip-flop Characteristic Comparison. ..... 38
4.1 Scan Chain: Total Avg. Power Versus Voltage (Schematic) ..... 43
4.2 Scan Chain: Total Avg. Power Versus Voltage (Layout). ..... 44
5.1 Recommended Flip-flop. ..... 47

## Chapter 1

## Introduction

Low-power and low-energy are important aspects of today's circuit design because of mobile computing and communications. Due to VLSI technology scaling, number of on-chip transistors increase following Moore's Law, however lack of similar improvement in battery capacity (in terms of long service life, broad range of operating temperatures, high power and energy densities, reliability etc.) necessitates low-power, low-energy methods and strategies. Moreover, it is expected that approximately 75 billion IoT (Internet of Things) devices will be connected through internet by 2025 [13]. As demand of these IoT devices is increasing continuously, there is a constant push to sustain battery-operated remote devices longer. Additionally, cooling system act seems poor on increasing power dissipation, and in near future VLSI chips are anticipated to have more challenging problems (temperature non-uniformity, localized hot spots, complex fluidic connection, mechanical design etc.) in cooling systems, and solving these issues will be exorbitant and unproductive.

Design automation has dramatically improved the designer productivity and resulted in faster design time, and lower design cost. Designing every single gate from scratch is certainly not a best approach. Alternatively, an attractive approach is to use a library of appropriate predefined standard cells as a building block to design most functional blocks. Semiconductor companies provide CAD tools with standard cell libraries, but the selection of standard cells and their performance is often limited. Regardless of performance confine, standard cell libraries are valuable even in design of high-performance VLSI chips. Most of the time, only small portion of chip has performance critical units, and rest of the design could be maximally automated to take the advantage of standard cells without deteriorating the predicted performance. Standard cells library is also useful in full-custom design. Custom cell libraries can be made and shared by designers of performance critical units, as a result it causes a smaller number of cells to be created and verified, thus,
dropping the overall chip layout time substantially. Therefore, development of a quickwitted cell library for high performance chips is crucial.

A cell library consists numerous cells with different sizes, functionalities and driving load capabilities; flip-flop and latches are part of them. In synchronous VLSI circuits, flip-flops and latches play a critical role in timing, functionality and performance of the overall chip. However, these are clock driven circuits and consume a significant amount of total power, even when the data activity is low. Therefore, substantial research has been carried out to design flip-flops that are power and energy efficient at low, medium and high data activity levels.

Thus, continuous research on finding new architectures and methodologies for lowpower, low-energy and high-performance flip-flops is desirable. An ideal universal flip-flop has lowest-power and energy consumption, fastest speed, and highest robustness against noise. In practical flip-flops, increasing performance causes a trade-off between power and robustness. Therefore, there is need to have a standard cells library with set of diverse flip-flops and latches based on different performances in order to get the benefits of lowpower consumption and robustness, which in turn saves the design time, cost and efforts of overall VLSI chip. The idea for this thesis is to compare the performance of recent advanced flip-flops based on their power delay product versus input data activities. This would provide us a decent understanding of different flip-flop architectures and help the designers to pick-up the best flip-flop based on their application needs.

### 1.1 Flip-flops and Latches

In clocked sequential circuits, flip-flops and latches provide memory elements and store state variables. They are different than combinational logic whose output changes after its inertial delay. A generic memory element consists on internal memory and its control circuitry. Clock input is used in control circuitry to control the access of memory. Clock signal instructs the memory element to read its data input and stores that value in its memory. After some delay, output imitates stored value. Memory elements are categorized into two main classes, flip-flops and latches.

- Flip-flops: Flip-flops are often edge triggered. Edge triggered flip-flops are realized through a master-slave arrangement ensuring that data does not flow-through the flip-flop.
- Latches: A latch can be in transparent state or holding state depending on the clock levels. In the transparent state, the data flows-through, while in holding state the output is disconnected from the input.

Waveforms of Figure 1.1 illustrates the working principle of flip-flop and latch. As it can be seen in this figure, output of positive edge triggered flip-flop is only available on positive edge of the clock, whereas in the case of active high latch, inputs are transmitted to output during high clock cycle time.


Figure 1.1: Working principle of (a) Positive Edge-Triggered Flip-flop, and (b) Active High Latch [2].

In synchronous digital designs, flip-flops and latches are typically involved in data and control paths. Latches are preferred for non-timing critical configurations due to a smaller number of logical gates, low-power consumption, and clock skew issues. For example, these are used in binary encoders to keep the tracking of bits, asynchronous systems, power gating and clock storage devices, computing, data storage, etc. Flip-flops are preferred over latches for timing critical configurations. For example, registers, counters, frequency dividers, storage devices, etc. Additionally, latches suffer with noise issue in their enable signal which disrupts output easily, while flip-flops are robust.

### 1.1.1 Flip-flop Power Consumption Sources

There are three main sources of power consumption in digital Complementary Metal Oxide Semiconductor (CMOS) circuits, summarized in following Equation 1.1.

$$
\begin{gather*}
P_{\text {total }}=\text { DynamicPower }+ \text { DirectPathPower }(\text { ShortCircuits })+\text { Leakage } \\
\Longrightarrow P_{\text {total }}=D A\left(C_{L} V_{D D}^{2} f_{C K}\right)+I_{S C} V_{D D}+I_{\text {Leakage }} V_{D D} \tag{1.1}
\end{gather*}
$$

The first term characterizes the switching element of power, where $C_{L}$ indicates load capacitance, $f_{C K}$ is the clock frequency, $V_{D D}$ is the supply voltage and DA is the input data activity. The second term denotes the direct path short circuit current $I_{S C}$, which arises when both PMOS and NMOS transistors are active and current flows directly from supply voltage to ground. Last term represents leakage power, which is getting more and more attention as we progress towards deep sub-micron technologies [14]. Primary reasons for $I_{\text {leakage }}$ are substrate injection, gate leakage and subthreshold effects etc.

Equation 1.1 specifies, flip-flop power consumption depends heavily on its circuit structure and input data activities. All nodes in the circuit involve in total power dissipation of the circuit, so equation 1.1 is applicable to every single node in the circuit. In well-designed sequential circuits, switching component is the dominant term, thus, the goal of low-power circuit designer should be minimizing it, while maintaining the required functionality and finding the cost of such minimization in terms of area and performance.

### 1.1.2 Timing and Delay Definitions for Flip-flop

The performance of flip-flop is qualified by four important delay terms which are, D-C (Data-to-Clock), C-Q (Clock-to-Output), D-Q (Data-to-Output) and hold time. D-C and D-Q delays are often referred as setup time and propagation delay, respectively. Setup and hold time define the relationship between clock and input data.

## Setup Time (D-C Delay):

In order to function correctly, edge-triggered flip-flop needs input data to be stable some time before the arrival of active clock edge. The data value must stay stable during this time to ensure that flip-flop retains the correct value at output. Setup time with low-to-high transition of data can be different from high-to-low transition of data. Thus, maximum value obtained between these transitions is considered as setup time.

$$
\begin{equation*}
t_{\text {setup }}=\max \left(t_{\text {setupLH }}, t_{\text {setup } H L}\right) \tag{1.2}
\end{equation*}
$$

## Hold Time:

Flip-flop correct operation needs data signal to be stable for some time after the passing of clock edge, this time is known as hold time. Hold time can also be negative, which means


Figure 1.2: Flip-flop Timing Diagram.
data can be altered even before the clock edge and still previous value will be stored. Again, hold time for low-to-high transition can be different from high-to-low transition of data, thus, maxim value obtained between them is chosen as a hold time.

$$
\begin{equation*}
t_{\text {hold }}=\max \left(t_{\text {holdLH }}, t_{\text {hold } H L}\right) \tag{1.3}
\end{equation*}
$$

## C-Q Delay:

C-Q delay is the time interval between clock edge and output signal edge. It gives the information that when new output begins being stable after the arrival of clock edge signal.

$$
\begin{equation*}
t_{C-Q}=\max \left(t_{C Q_{L H}}, t_{C Q_{H L}}\right) \tag{1.4}
\end{equation*}
$$

## Propagation Delay (D-Q):

Propagation delay is the time delay between data signal to propagate at the output. It is sum of D-C and C-Q delays. Mathematically,

$$
\begin{equation*}
t_{D-Q}=t_{D-C}+t_{C-Q} \tag{1.5}
\end{equation*}
$$

Generally, propagation delay for low-to-high transitions differs from high-to-low transitions. Therefore, maximum value from these two is chosen as a propagation delay.

$$
\begin{equation*}
t_{D-Q}=\max \left(t_{D Q_{L H}}, t_{D Q_{H L}}\right) \tag{1.6}
\end{equation*}
$$

The definitions of D-C, C-Q, D-Q, and hold time are illustrated in timing diagram of Figure 1.2.

### 1.1.3 PDP and EDP for Flip-flop

Power delay product (PDP) is used a performance-oriented metric and defined as a product of total power consumption and D-Q delay. It is viewed as amount of energy consumed in each switching event. Equation 1.7 defines PDP in terms of DA, $C_{L}$ and $V_{D D}$.

$$
\begin{equation*}
P D P=D A\left(C_{L} V_{D D}^{2} f_{C K}\right) / f_{C K}=D A\left(C_{L} V_{D D}^{2}\right) \tag{1.7}
\end{equation*}
$$

An ideal flip-flop is fast in speed and consumes lesser energy. The energy delay product (EDP) is a combined metric that brings those two elements together, and is generally used as the ultimate quality metric [4]. EDP is equivalent to power-delay ${ }^{2}$.


Figure 1.3: Normalized delay, energy, and energy-delay plots for CMOS inverter [1].

When there is a need of low-power and high speed, PDP and EDP play a vital role. PDP measures energy needed for a switching gate, and it requires optimum voltage as lowest possible to achieve minimum performance. Low PDP energy efficient circuits may be slow in performance and in such cases EDP is preferred as a figure of merit. EDP accounts for both energy and performance. Higher voltages decrease delay but increase energy, similarly, lower voltage decreases energy but increase delay, thus, in both cases there exists an optimum supply voltage. Figure 1.3 illustrates the trade-off between delay and energy.

### 1.2 Motivation

Numerous researchers have worked on low-power flip-flop circuits, but generally they are focused on comparison of two or very few flip-flops. The motivation of this research is to independently investigate recently published flip-flops on different supply voltages and data activity ranges. Flip-flops are the basic building block of digital circuits, and their power, speed, size, and robustness play a critical role in overall performance of a digital systems. In modern flip-flops and latches, clock system is composed of clock buffers in clock tree and is one of the most power consuming element in contemporary System on Chip (SoC) VLSIs. It causes nearly $30 \%-60 \%$ of total power dissipation in a system [15], [16], [17], [18]. In addition of that, to maintain higher performance and throughput, more timing elements are needed for extensive pipelining of data path sections and global bus interconnections. As a result, power reduction in flip-flop would have a deep impact on total power consumption.

Moreover, from timing context flip-flop latency consume a huge percentage of clock cycle time, while operating frequency increases. Thus, in high performance systems, providing more slack time for easier time budget is desirable. These reasons encourage for research on flip-flop designs and analysis.

### 1.3 Contribution

There are many factors which need to be considered in flip-flop design based on a required application. For example, high speed, low-power dissipation, smaller area, lower number of transistors count, robustness and noise stability, low leakage power dissipation, low glitch probability, supply voltage scalability, insensitivity to clock edge, and insensitivity
to process variables etc. Some of these factors are dependent on each other and a trade-off between these parameters is required for high performance systems.

The aim of this thesis is to figure out a small set of recent advanced flip-flop topologies and ascertain their respective strengths. The strategy is to first explore the traditional transmission gate flip-flop (TGFF) and use it as a reference for rest of others. In this research seven different flip-flops are incorporated in initial benchmark and all these are recreated using 65 nm TSMC tool kit with minimum sizes of NMOS and PMOS transistors. Layouts of all these investigated flip-flops are designed from scratch using Cadence tool. Additionally, a scan chain of 256 flip-flops is designed to compare overall power consumption of scan chain.

### 1.4 Thesis Organization

This thesis is organized as follows. Chapter 1 presents introduction. Chapter 2 discusses related work on flip-flops in terms of its classifications and highlights recent advanced flipflops such as Static Single Phase Clocked Flip-flop (S2CFF), Adaptive Coupling Flip-flop (ACFF), Topologically Compressed Flip-flop (TCFF), 18-T Single Phase Static Flip-flop (TSPC18 and 18TSPC), and Low-Power at Low Data Activity Flip-flop (LLFF). Chapter 3 compares schematic simulation and extracted post-layout simulation results of different advanced flip-flops. Chapter 4 explains scan chain of flip-flops and their power simulation results. Finally, chapter 5 draws conclusions and future work.

## Chapter 2

## Background

### 2.1 Classifications of Flip-flops

Flip-flops can be categorized in many different ways based on the behavior of their clock, input and output signals. For example, edge triggered versus pulse latch, static versus dynamic, single clock phases versus multi clock phase, single edge triggered versus dual edge triggered etc. In this section, some of these classification will be discussed.

### 2.1.1 Master-Slave Versus Pulse-Triggered Latch

Latches are transparent, and can be active low or active high. During the time of active clock level, any change at the input is reflected at the output after its propagation delay. Data is accepted continuously as long as clock level is active, and as clock level switches latch is open and output no longer follows the input. Flip-flops are preferred over latches for their simpler timing design, robustness, and lower likelihood of race conditions. An edge triggered flip-flop can be made-up by connecting two latches in series which work in different phases of clock. In this structure, one of the latches would be transparent high and other would be transparent low and this structure is known as master-slave flip-flop. Generally, setup time of flip-flop is given by master latch, and propagation delay is given by slave latch. Figure 2.1 depicts the block diagram and timing waveform of master-slave flip-flop.

Transmission gate flip-flop is an example of master-slave configuration. It is extensively used in digital sequential circuits due to its less number of transistors as well as low-power


Figure 2.1: Block Diagram and Timing Waveform of Master-Slave Flip-flop.


Figure 2.2: The Conventional Static Transmission Gate Flip-flop (TGFF).
consumption compared to other CMOS FFs. TGFF has clk and its complement clkn signals, which are connected to eight transistors of the transmission gates and overall it needs total of 12 transistors for clocking the flip-flop as shown in Figure 2.2. When clk $=" 1 "$, master latch samples data input by switching transmission gate on in feed-forward path, while slave latch holds previous value by switching feedback transmission gate on. When clk $=" 0$ ", feedback transmission gate of master is turned on to restore the logic levels of previous stored data and forward path of transmission gate of slave latch is turned on to pass the data sampled during clkn to output. Advantage of TGFF is that it needs total of 24 transistors for flip-flop operation, but drawbacks is that it requires 2 clock phases and its clock circuitry needs 12 transistors, which causes high capacitive loading and as a result dynamic power consumption of TGFF is high e

In pulse-triggered latch, very sharp pulses are used as a clock signal. During the pulse period the latch enables data transfer to the output, and rest of the time the latch is in blocking mode. Consequently, a pulse-triggered latch acts as an edge triggered flip-flop. The data should be stable during the pulse width. In pulse generator concept of delaying the signal is used, in which a clock signal is applied at the input and very short pulse is obtained on each clock cycle at the output. The hybrid-latch flip-flop (HLFF) and semi-dynamic flip-flop (SDFF) represent the category of pulse triggered flip-flop, in which performance is improved because of negative setup time as data inputs get brief transparent period created by pulse generator. These flip-flops also show soft-edge property in which robustness improves against clock skew. Schematic diagrams of HLFF and SDFF are shown in Figure 2.3.

In HLFF, a pulse generator is part of flip-flop circuit. When CK = '0', NMOS transistors M2 and M8 are turned off whereas PMOS transistor M1 is turned on, which causes node X to be pre-charged to logic '1', and output node Q is decoupled from X and holds previous state. Similarly, when CK $=$ ' 1 ', M1 is turned off, M2 and M8 are turned on while M4 and M10 are switched on for a short amount of time period determined by pulse generator delay. During this time period, flip-flop is in transparent mode and samples the input data D . Once pulsed clock goes low, node X decouples from input D and either remains in previous state or pre-charged to logic '1' through M1.

On the other hand, SDFF combines dynamic input stage with static operation. When CK $=$ ' 0 ', node X pre-charged to logic '1' and output node Q holds the previous state. When $\mathrm{CK}={ }^{\prime} 1$ ', node N3 remains at logic ' 1 ' for a time window equal to the delay of two inverters and the NAND gate. During this time period, if $\mathrm{D}={ }^{\prime} 0$ ', then X remains at logic '1'. If $\mathrm{D}=$ ' 1 ' then X starts to discharge for output transition. SDFF has large pre-charge capacitance and huge clock load, which causes it to consume more power.

(a)

(b)

Figure 2.3: Pulse Triggered Flip-flops (a) Hybrid-Latch Flip-flop (HLFF) proposed in [2], and (b) Semi-Dynamic Flip-flop (SDFF) proposed in [3].

### 2.1.2 Static Versus Dynamic Flip-flop

In static flip-flops, stored values are preserved even clock signal is stopped. These flipflips are reliable and robust, which makes them attractive to be used widely in industrial applications.

On the other hand, in dynamic flip-flops stored values are destroyed if they don't get refresh for a while. These flip-flops are fast, having less clock load, and consume less power and area, but susceptible to noise and unable to work at low clock frequency. Because of their dynamic nature, these are only suitable for particular applications such as frequency divider. These flip-flops also suffer from potential failures. Reverse leakage current in pn junctions and subthershold leakage in MOSFET are the reasons for discharging of dynamic nodes. With the help of keepers at dynamic nodes, dynamic flip-flops can be changed into static flip-flops. Figure 2.4 shows the schematic diagrams of static and dynamic flip-flops.


Figure 2.4: (a) Static Flip-flop, and (b) Dynamic Flip-flop [4].

### 2.1.3 Single Clock Phase Versus Multi Clock Phase Flip-flop

As discussed previously that, in master-slave flip-flops, two latches are connected in cascaded manner which work in different clock phases. If master and slave both have same structure then generally two clock phases are needed. However, by doing some modifications in two latches number of needed clock phases can be dropped to one. For example by using complementary latch of master for slave.

In this regard, true single phase clock flip-flop is a single clock phase flip-flop normally operates at higher speeds than two clock phases flip-flops.

### 2.1.4 Single Edge Triggered Versus Double Edge Triggered Flipflop

Single edge triggered flip-flops are ordinary flip-flops which work on every active edge of the clock cycle, whereas in the case of double edge triggered flip flops data is captured on both edges of the clock cycle. Figure 2.5 shows the block diagram of double edge triggered D flip-flop,in which positive edge triggered flip-flop samples the data during positive edge of the clock and negative edge triggered flip-flop samples the data during negative edge of the clock cycle.


Figure 2.5: Block Diagram of Double Edge-Triggered D Flip-flop [5].
Generally, double edge triggered flip-flops are attractive for low-power applications; as they utilized both edges of the clock signal to capture the input data, but each flip-flop needs multiplexer at its output and consume more area than ordinary D flip-flop. Figure 2.6 is transistor level implementation of Figure 2.5 . Positive and negative latches composed of transistors M1 to M4 and M5 to M8, respectively. Both these master latches conduct on opposite phases of the clock. The multiplexer, used in the place of slave latch, chooses the result of opaque master. Thus, data changes on both edges of clock.


Figure 2.6: Schematic Diagram of Dual-edge Triggered D Flip-flop [6]

### 2.2 Recent Advanced Flip-flops

In this section, some recent advanced static CMOS flip-flops such as ACFF, S2CFF, TCFF, TSPC18, 18TSPC, and LLFF will be discussed.

### 2.2.1 Adaptive Coupling Flip-flop (ACFF)

ACFF is based on differential master-slave topology with adaptive coupling scheme to make state-retention coupling weaker if input state is different than its internal state. Adaptive control element (ACE) has one PMOS and one NMOS transistors in parallel manner, and same data signal is used to control the gates. Figure 2.7 shows the schematic diagram of ACFF, where M7 and M8 represent the ACE.

Advantage of ACFF is that it operates on single clock phase without any local clock buffer and pre-charging stage. Its adaptive coupling nature helps to weaken state retention


Figure 2.7: Adaptive Coupling Flip-flop (ACFF) proposed in [7].
coupling during a transition and makes it suitable for process variations. The circuit needs total of 26 transistors. By adding few more transistors or increasing the width ratio of transistors in slave latch, contention issue can be mitigated at the cost of increased area and more power dissipation.

### 2.2.2 Static Single Phase Contention-Free Flip-flop (S2CFF)

Static single phase contention-free flip-flop is based on dynamic true single phase flip-flop and made static by adding a slave latch and few additional transistors. It was designed for near threshold voltage operation in 2014 using 45 nm technology. It uses total of 24 transistors equivalent to TGFF, and improves power efficiency for all the range of data activities. The schematic diagram of S2CFF is shown in Figure 2.8. When CK $=" 0 "$, node n1 holds complement of D value, node X pre-charges through M 8 , and slave latch stores the previous value of D . When $\mathrm{CK}=" 1 ", \mathrm{M} 7 / 10$ causes node n 1 to be low, while M6 causes node X to be high. If previous value of Q is same as current value of D then there is no transition at node n2, otherwise, node n2 discharges through M12-M14.

Advantage of S2CFF is that, it is a contention-free fully static CMOS circuit and needs only one pre-charging node X , and drawback is that it requires 5 transistors for


Figure 2.8: Static Single-Phase Clocked Flip-flop (S2CFF) proposed in [8].
clock circuitry, which leads to higher clock tree capacitances and related power overheads. S2CFF topology has improved its total and clock related power efficiency at high data activities but at low data activities it still suffers with power and energy efficiency issues as compare to other CMOS advanced flip-flops.

### 2.2.3 Topologically Compressed Flip-flop (TCFF)

Topologically compressed flip-flop is well known for its power efficiency. TCFF comprises on different types of master and slave latches, its slave latch is Reset-Set (RS) type, and master latch is asymmetrical single data input type [9]. The schematic diagram of TCFF is shown in Figure 2.9. When CK goes "1" to "0", PMOS transistors connected to it turned on and master latch samples the data input. Both nodes VD1 and VD2 are pulled up to VDD, and input data is stored in master latch. When CK goes "0" to "1", PMOS transistors connected to it turned off and NMOS transistor turn on and slave latch becomes data output mode. In this condition, data stored in master latch is shifted to the slave latch and then outputted to Q.

Advantage of TCFF is that it operates on single clock phase, with 3 transistors clock load, and total of 21 transistors. Moreover, it is also power efficient amongst all flip-flops


Figure 2.9: Topologically Compressed Flip-flop (TCFF) proposed in [9].
for low data activities. However, it has low-speed and low-voltage drawbacks, because of temporary short circuit-path in circuit (shown through red line in the Figure 2.9) when CK $=" 0$ " and data changes from " 0 " to " 1 ", as a result it degrades source voltage of PMOS transistor M5 (shown in blue line in Figure 2.9) and slowing down the charging process of Node N2.

### 2.2.4 True-Single-Phase-Clock 18T Flip-flop (TSPC18)

Figure 2.10 shows the schematic diagram of True-single-phase-clock 18T flip-flop. This flip-flop has 4 transistors clock load and implemented in 28 nm FDSOI technology [10]. Two conditional feedbacks are implemented to allow data retention. In this regard, seven additional transistors are used to make this happen without enhancing the clock load of the flip-flop circuit. First feedback path exists between node N3 and N2, and second feedback path exists between output node and N3.

Advantages of this flip-flop are that, it has total of 18 transistors and it improves the energy as compared to conventional flip-flop. Drawbacks of this flip-flop are that it has contention issues (through M7 with M6, and VGS of M15), needs interconnection in poly (in feedback transistors), and gate biasing is required to manipulate the threshold voltage


Figure 2.10: 18T Single-Phase Clocked Flip-flop (TSPC18) proposed in [10].
of transistors in contention.

### 2.2.5 18-Transistor Fully Static Contention-Free Single-Phase Clocked Flip-flop (18TSPC)

Figure 2.11 shows the schematic diagram of 18 -transistor fully static contention-free singlephase clocked flip-flop. when $\mathrm{CK}=" 0 "$, transistors on D only change the state of node X in master latch. Switching of X does not prompt any data corruption in slave latch; as slave is detached from D when $\mathrm{CK}=" 0 "$. When $\mathrm{CK}=" 1 ", \mathrm{D}$ is isolated, and previous stored data in X of the master latch is outputted.

Advantage of 18TSPC is that, it is efficient pre-charging flip-flop and has 4 transistors clock load, which reduces its total power at high data activities. However, node X of flip-flop is pre-charged, and its power at low data activities is not as good as TCFF. Additionally, when $\mathrm{D}=" 0$ " and CK has low to high transition, then parasitic capacitances of seven transistors charged through three PMOS transistors (shown in red path in Figure 2.11) as a result its speed reduces compared to TGFF.


Figure 2.11: 18T Single-Phase Clocked Flip-flop (18TSPC) proposed in [11].

### 2.2.6 A Contention-Free, Static, Single-Phase Flip-flop for Low Data Activity Applications (LLFF)

Figure 2.12 shows the schematic diagram of contention-free, static, single-phase flip-flop. It has clock load of four transistors. Master and slave latches are complementary of each other with respect to the clock. Therefore, LLFF operates on the single clock phase. In the master circuit, when CK $=$ " 1 ", M9 is turned-off and M19 is turned-on, Thus, back-to-back inverters (M3-M5, and M4-M6) are keeping the sampled value. When CK $=$ " 0 ", M9 is turned-on and M19 is turned-off, therefore, D is sampled at nodes X and Xn without any contention. M1 and M2 are derived by D and its complement, thus, either one of them would be turned-off to cut the pull down and so the contention. It is therefore, the Master circuit behaves as a latch which is transparent when $\mathrm{CK}=" 0$ ", and it holds the data when $\mathrm{CK}=$ ' 1 '. As the slave circuit is complementary of Master circuit, it holds the sampled data when $\mathrm{CK}=" 0$ ", and when $\mathrm{CK}=" 1$ ", it samples the output of the Master at nodes Y and Yn. As a result, the circuit of Fig. 10 operates as a contention free flip-flop.

Advantage of LLFF is that it has 4 transistors clock load and energy efficient for low data activities. Drawbacks of LLFF are that it is not power/energy efficient for high data activities and needs total of 24 transistors, thus, not area efficient as compare to 18TSPC


Figure 2.12: A Contention-Free, Static, Single-Phase Flip-flop (LLFF) proposed in [12].
or TCFF. Moreover, it has negative hold time, which cause large D-Q delay.

## Chapter 3

## Simulation Results

A test bench is designed as shown in Figure. 3.1, in which data and clock buffers are placed at input side and output buffer with 10 fF capacitor is placed at output side to include all the loading effects on our performance metrics power, delay and energy. All, the input buffers are designed with minimum sizes NMOS $(\mathrm{W}=120 \mathrm{~nm})$ and PMOS $(\mathrm{W}=180 \mathrm{~nm})$. Inverter-1 (white) of output buffer has widths NMOS $=180 \mathrm{~nm}$ and $\operatorname{PMOS}=300 \mathrm{~nm}$, whereas inverter-2 (gray) has NMOS $=270 \mathrm{~nm}$ and PMOS $=450 \mathrm{~nm}$. The clock and data signals fed to FF are the outputs of two-stage buffers, sized to attain a typical FO3 slope at the clock/data input node of the FF [19], thus, simulation values taken at the second stage inverter.

This test bench is designed to provide realistic data and clock signals in order to get delay and overall power measurement on toggling of clock and data input signals. Total power consumption of the design is the sum of three individual power considerations.

- Data power consumption is characterized by gray's inverter power dissipation, as shown in Data-Buffer segment of Figure. 3.1.
- Clock power consumption is characterized by gray's inverter power dissipation, as shown in Clock-Buffer segment of Figure. 3.1.
- Flip-flop power consumption is characterized by Flip-flop power dissipation, as shown in DUT segment of Figure. 3.1. This power dissipation denotes the intrinsic switching power of internal nodes of flip-flop.

Since same test bench is used for all the flip-flops, thus, output buffer power consumption is same for all flip-flops on corresponding data activities. For power measurement,


Figure 3.1: Test Bench of Post-layout Simulation
total of 101 clock-cycles timing window is used and first clock cycle is dedicated to reset the flip-flop.

Similarly, D-to-Q delay is the sum of D-to-C delay plus C-to-Q delay.

- D-to-C delay is the setup time between D and CK signals, as shown in Data and Clock-Buffer sections of Figure. 3.1.
- C-to-Q delay is the time frame between CK and Q signals, as shown in Clock-Buffer and DUT sections of Figure. 3.1.

To make a fair and reasonable comparison on power, delay and energy, all flip-flops are designed with minimum size transistor length 60 nm , and widths for NMOS $=120 \mathrm{~nm}$ and PMOS $=180 \mathrm{~nm}$ except in the case of TCFF which fails to work at $\mathrm{VDD}=0.5 \mathrm{~V}$, thus, width of PMOS transistors M8 and M13 of TCFF is kept at 360 nm . There are two primary reasons for TCFF failure at low-voltage, one is three minimum sizes PMOS are in series and second is it makes a short-circuit path to ground when data changes from 0 to 1, which causes voltage to drop at node VS1 and VS2 as shown in Figure. 2.9. These low node voltages make extra delay in charging of node N2. Thus, performance of TCFF at low-voltage deteriorates. Cadence virtuoso with hspice simulator tool is used to perform all the simulations for TSMC 65 nm CMOS process technology.

### 3.1 Schematic Simulation

Figure 3.2 and Figure 3.3 illustrate transient simulation waveforms of all considered flipflops at $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHZ}$, and $\mathrm{VDD}=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$, respectively. Here, first clock pulse is used to reset the flip-flops. In both cases, TGFF is timing efficient amongst all flip-flops in terms of setup time and propagation delay.


Figure 3.2: Transient Simulation (Schematic) at VDD $=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$


Figure 3.3: Transient Simulation (Schematic) at VDD $=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$
Figure 3.4, presents total avg. power consumption at different DAs of all investigated flip-flops at VDD $=1 \mathrm{~V}$. As seen TCFF is power efficient amongst all flip-flops for all DAs,


Figure 3.4: Total Avg. Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$


Figure 3.5: Total Avg. Power Versus Input Data Activity (DA) at VDD $=0.5 \mathrm{~V}$.
followed by LLFF for up to $50 \%$ of DAs and and 18TSPC for rest of DAs. ACFF and TGFF consume more power at higher DAs. However, DA as high as more than $50 \%$ to $100 \%$ is not realistic and rarely happens.

Figure 3.5, presents total avg. power consumption as a function of DA at $\mathrm{VDD}=$ 0.5 V of all investigated flip-flops. Again, TCFF is power efficient amongst all considered flip-flops for all the DAs, followed by LLFF up to $55 \%$ of DAs and 18TSPC for rest of DAs. TGFF consume more power at higher DAs, followed by ACFF.

## Clock Buffer Power Versus Data Activity



Figure 3.6: Clock Buffer Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ and 0.5 V.

Figure 3.6 presents comparison of the clock buffer power versus DA at VDD $=1 \mathrm{~V}$ and 0.5 V . The clock buffer power consumption is a function of number of clock transistors in a flip-flop. TCFF, S2CFF with three and five clock transistor show lowest and highest clock buffer power consumption. TGFF shows lowest clock buffer transistor power as all internal
clock transistors are driven by internal clock buffers as shown in Figure 2.2. Consequently, the part of its clock related power consumption is added in the flip-flop power consumption.

Data Buffer Power Versus Data Activity


Figure 3.7: Data Buffer Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ and 0.5 V.

Figure 3.7 illustrates data buffer power for different DAs at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V . LLFF consume more data power for all DAs range.

Figure 3.8 presents D-to-Q delay versus D-to-C delay for each investigated flip-flop at VDD $=1 \mathrm{~V}$. Schematic simulation result show that TGFF has the lowest D-to-Q delay followed by 18TSPC, LLFF and TSPC18. ACFF exhibits the highest D-to-Q delay amongst all considered flip-flops. TCFF performance deteriorates despite increasing sizes of M8 and M13 and shows comparable delay to the S2CFF and ACFF. At VDD $=0.5 \mathrm{~V}$, TGFF still exhibits lowest D-to-Q delay followed by TSPC18, 18TSPC and LLFF. The TCFF owing to aforementioned short circuit path shows the largest D-to-Q delay as shown in Figure 3.9.


Figure 3.8: D-to-Q Delay Versus D-to-C Delay at VDD $=1 \mathrm{~V}$.


Figure 3.9: D-to-Q Delay Versus D-to-C Delay at VDD $=0.5 \mathrm{~V}$.

Table 3.1: Timing Comparison of Flip-flop (Schematic).

| Timing Comparison (ps) |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Flip-Flop | D-to-C <br> (L-to-H) | D-to-C <br> (H-to-L $)$ | C-to-Q <br> (L-to-H) | C-to-Q <br> (H-to-L) | D-to-Q <br> (L-to-H) | D-to-Q <br> (H-to-L) | D-to-Q <br> (Worst-Case) $)$ |  |  |  |  |
| S2CFF | 51.93 | 50.74 | 38.92 | 57.11 | 90.85 | 107.9 | 107.9 |  |  |  |  |
| ACFF | 67.43 | 65.6 | 45.26 | 41.22 | 112.7 | 106.8 | 112.7 |  |  |  |  |
| TCFF | 54.59 | 53.26 | 43.12 | 44.08 | 97.71 | 97.34 | 97.71 |  |  |  |  |
| TSPC18 | 23.72 | 21.62 | 32.92 | 69.8 | 56.64 | 91.42 | 91.42 |  |  |  |  |
| 18TSPC | 37.45 | 35.57 | 42.84 | 44.85 | 80.29 | 80.41 | 80.41 |  |  |  |  |
| TGFF | 21.14 | 19.47 | 48.95 | 53.21 | 70.09 | 72.68 | 72.68 |  |  |  |  |
| LLFF | 37.18 | 35.92 | 43.33 | 45.14 | 80.51 | 81.07 | 81.07 |  |  |  |  |
| VDD $=\mathbf{0 . 5}$ |  |  |  |  |  |  |  |  |  |  | $\mathbf{V}, \mathbf{C K}=\mathbf{0 . 1} \mathbf{~ G H z}$ |
| S2CFF | 362.3 | 353.5 | 286.2 | 472.9 | 648.6 | 826.4 | 826.4 |  |  |  |  |
| ACFF | 567 | 557.8 | 309.9 | 288.6 | 876.9 | 846.3 | 876.3 |  |  |  |  |
| TCFF | 811.3 | 804.9 | 772.7 | 323.6 | 1584 | 1128 | 1584 |  |  |  |  |
| TSPC18 | 166.6 | 156.3 | 254.8 | 421.8 | 421.4 | 578.2 | 578.2 |  |  |  |  |
| 18TSPC | 247.6 | 238.3 | 311.5 | 382.1 | 559.1 | 620.4 | 620.4 |  |  |  |  |
| TGFF | 211.7 | 202.2 | 317.3 | 357.3 | 529.1 | 559.5 | 559.5 |  |  |  |  |
| LLFF | 280.2 | 271.5 | 343 | 362.7 | 623.7 | 634.2 | 634.2 |  |  |  |  |

Table 3.1 presents worst case timing comparison of all the investigated flip-flops at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V . At VDD $=1 \mathrm{~V}$, TGFF offers smallest D-to-C (i.e. setup time) and 3rd larger C-to-Q delay amongst all examined flip-flops. D-to-Q delay is the sum of setup time plus C-to-Q delay. D-to-Q delay of LLFF and 18TSPC are comparable with TGFF, whereas TSPC18, TCFF, ACFF and S2CFF has larger D-to-Q delay. Similarly, At VDD $=0.5 \mathrm{~V}$, TFGG offers lowest D-to-Q delay, TSPC18 offers lowest D-to-C delay and ACCFF offers lowest C-to-Q delay. In the case of TCFF the size of M8 and M13 were increased to improve its low-voltage switching speed. However, it still exhibits worst D-to-Q delay amongst all considered flip-flops.

Table 3.2 depicts the PDP of flip-flops at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V for $\mathrm{DA}=1 \%, 10 \%$, $20 \%, 50 \%$ and $100 \%$. At VDD $=1 \mathrm{~V}$, TCFF has lowest PDP for all DAs, followed by LLFF for $\mathrm{DA}=1 \%, 10 \%, 20 \%$, and $50 \%$, and by 18TSPC for $\mathrm{DA}=100 \%$. TSPC18 shows highest PDP for low DAs, and ACFF and S2CFF show highest PDPs for higher DAs. At $\mathrm{VDD}=0.5 \mathrm{~V}$, LLFF shows lowest PDP for $1 \%, 10 \%, 20 \%$, and $50 \%$, whereas 18 TSPC shows lowest PDP for $100 \%$ DAs. ACFF and S2CFF still show highest PDPs for higher DAs

Table 3.2: PDP Comparison of Flip-flop (Schematic).

| PDP (aJ) Versus Data Activity |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Flip-flop | Data Activity |  |  |  |  |  |  |  |  |  |
|  | $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{GHz}$ |  |  |  |  | VDD $=$ |  | $5 \mathrm{~V}, \mathrm{CK}=0.1 \mathrm{GHz}$ |  |  |
|  | 1\% | 10\% | 20\% | 50\% | 100\% | 1\% | 10\% | 20\% | 50\% | 100\% |
| S2CFF | 276 | 302 | 330 | 414 | 555 | 50 | 55 | 60 | 74 | 99 |
| ACFF | 103 | 163 | 230 | 429 | 762 | 20 | 30 | 41 | 73 | 126 |
| TCFF | 75 | 82 | 90.1 | 114 | 154 | 31 | 33 | 34 | 39 | 46 |
| TSPC18 | 333 | 346 | 357 | 395 | 457 | 45 | 47 | 49 | 55 | 65 |
| 18TSPC | 171 | 188 | 207 | 262 | 354 | 32 | 35 | 38 | 47 | 63 |
| TGFF | 193 | 216 | 242 | 322 | 454 | 36 | 40 | 45 | 59 | 84 |
| LLFF | 83 | 114 | 151 | 259 | 438 | 16 | 21 | 28 | 46 | 78 |

### 3.2 Post-layout Simulation

Layouts of all considered flip-flops are carefully designed and DRC, LVS, and PECS are verified using TSMC 65 nm CMOS process technology as shown in Figure 3.10.

Figure 3.11 and Figure 3.12 illustrate transient simulation waveforms of all considered flip-flops at VDD $=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHZ}$, and $\mathrm{VDD}=0.5 \mathrm{~V}$, CK $=100 \mathrm{MHz}$, respectively. Here, first clock pulse is used to reset the flip-flops. Again, in both cases TGFF is timing efficient amongst all flip-flops in terms of setup time and propagation delay.

Figure 3.13, presents total avg. power consumption at different DAs of all investigated flip-flops at VDD $=1 \mathrm{~V}$. As seen, power of TCFF, LLFF and ACFF is smaller than others when DA is less than $30 \%$. At higher DA, 18TSPC becomes most power efficient amongst all flip-flops. However, DA as high as more than $50 \%$ to $100 \%$ is not realistic and rarely happens.

Figure 3.14, presents total avg. power consumption as a function of DA at $\mathrm{VDD}=0.5$ V of all investigated flip-flops. Again at DA less than 30\%, TCFF, ACFF and LLFF are power efficient, while at higher DAs 18TSPC consumes less power.

Figure 3.15 presents comparison of the clock buffer power versus DA at VDD $=1 \mathrm{~V}$ and 0.5 V . The clock buffer power consumption is a function of number of clock transistors in a flip-flop. TCFF, S2CFF with three and five clock transistor show lowest and highest clock buffer power consumption. TGFF shows lowest clock buffer transistor power as all internal clock transistors are driven by internal clock buffers as shown in Figure 2.2. Consequently, the part of its clock related power consumption is added in the flip-flop power consumption.

Figure 3.16 illustrates data buffer power for different DAs at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V .


Figure 3.10: Layout of (a) TSPC18, (b) 18TSPC, (c) LLFF, (d) TGFF, (e) TCFF, (f) ACFF, and (g) S2CFF.


Figure 3.11: Transient Simulation (Layout) at VDD $=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$


Figure 3.12: Transient Simulation (Layout) at $\mathrm{VDD}=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$


Figure 3.13: Total Avg. Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$


Figure 3.14: Total Avg. Power Versus Input Data Activity (DA) at VDD $=0.5 \mathrm{~V}$.

## Clock Buffer Power Versus Data Activity



Figure 3.15: Clock Buffer Power Versus Input Data Activity (DA) at VDD $=1 \mathrm{~V}$ and 0.5 V.

At low DAs, all flip-flops consume comparable data buffer power. However, at higher DAs S2CFF and TSPC18 exhibit lowest data buffer power.

The post-layout power simulation results confirm TCFF, ACFF and LLFF have lowerpower consumption at DA less than $30 \%$. Subsequently, we examine timing behavior of considered flip-flops.

Figure 3.17 presents D-to-Q delay versus D-to-C delay for each investigated flip-flop at $\mathrm{VDD}=1 \mathrm{~V}$. Post-layout simulation result show that TGFF has the lowest D-to-Q delay followed by 18TSPC, and LLFF. TCFF performance deteriorates despite increasing sizes of M8 and M13 and shows comparable delay to the ACFF. S2CFF exhibits the highest D-to-Q delay amongst all considered flip-flops. At VDD $=0.5 \mathrm{~V}$, TGFF still exhibits lowest D-to-Q delay followed by 18TSPC and LLFF. The TCFF owing to aforementioned

## Data Buffer Power Versus Data Activity



Figure 3.16: Data Buffer Power Versus Input Data Activity (DA) at VDD=1V.
short circuit path shows the largest D-to-Q delay as shown in Figure 3.18.
Table 3.3 presents worst case timing comparison of all the investigated flip-flops at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V . At VDD $=1 \mathrm{~V}$, TGFF offers smallest D-to-C (i.e. setup time) and larger C-to-Q delay amongst all examined flip-flops. D-to-Q delay is the sum of setup time plus C-to-Q delay. D-to-Q delay of LLFF is comparable with TGFF and 18TSPC, whereas TSPC18, TCFF, ACFF and S2CFF has larger D-to-Q delay. Similarly, At VDD $=0.5$ V, TFGG offers lowest D-to-Q delay, whereas D-to-Q delay of LLFF is comparable with TGFF, 18TSPC and TSPC18, while ACFF, S2CFF and TCFF offer largest delay. In the case of TCFF the size of M8 and M13 were increased to improve its low-voltage switching speed. However, it still exhibits worst D-to-Q delay amongst all considered flip-flops.

Table 3.4 depicts the PDP of flip-flops at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V for $\mathrm{DA}=1 \%, 10 \%$, $20 \%, 50 \%$ and $100 \%$. At VDD $=1 \mathrm{~V}$ LLFF has the lowest PDP for $\mathrm{DA}=1 \%, 10 \%$ and


Figure 3.17: D-to-Q Delay Versus D-to-C Delay at VDD $=1 \mathrm{~V}$.


Figure 3.18: D-to-Q Delay Versus D-to-C Delay at VDD $=0.5 \mathrm{~V}$.

Table 3.3: Timing Comparison of Flip-flop (Layout).

| Timing Comparison (ps) |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{GHz}$ |  |  |  |  |  |  |  |
| Flip-flop | $\begin{gathered} \hline \text { D-to-C } \\ \text { (L-to-H) } \\ \hline \end{gathered}$ | $\begin{gathered} \hline \text { D-to-C } \\ \text { (H-to-L) } \end{gathered}$ | $\begin{gathered} \hline \text { C-to-Q } \\ \text { (L-to-H) } \\ \hline \end{gathered}$ | $\begin{gathered} \hline \text { C-to-Q } \\ \text { (H-to-L) } \end{gathered}$ | $\begin{gathered} \hline \text { D-to-Q } \\ \text { (L-to-H) } \end{gathered}$ | $\begin{gathered} \hline \text { D-to-Q } \\ \text { (H-to-L) } \end{gathered}$ | $\begin{gathered} \text { D-to-Q } \\ \text { (Worst-Case) } \end{gathered}$ |
| S2CFF | 85.07 | 85.36 | 85.92 | 174.5 | 171 | 259.8 | 259.8 |
| ACFF | 131 | 128.9 | 100.5 | 82.97 | 231.4 | 211.8 | 231.4 |
| TCFF | 108.8 | 106.9 | 97.31 | 88.81 | 206.1 | 195.7 | 206.1 |
| TSPC18 | 41.58 | 39.51 | 71.43 | 149.4 | 113 | 189.1 | 189.1 |
| 18TSPC | 66.54 | 65.06 | 77.51 | 94.43 | 144 | 159.5 | 159.5 |
| TGFF | 19.95 | 18.49 | 125.6 | 130.4 | 145.5 | 148.9 | 148.9 |
| LLFF | 88.17 | 88.71 | 82.87 | 81.19 | 171 | 169.9 | 171 |
| $\mathrm{VDD}=0.5 \mathrm{~V}, \mathrm{CK}=0.1 \mathrm{GHz}$ |  |  |  |  |  |  |  |
| S2CFF | 864.9 | 862.3 | 661.5 | 1358 | 1526 | 2221 | 2221 |
| ACFF | 1182 | 1174 | 734.5 | 733.3 | 1916 | 1907 | 1916 |
| TCFF | 2075 | 2067 | 1882 | 642.2 | 3958 | 2709 | 3958 |
| TSPC18 | 361.3 | 353 | 643.9 | 1132 | 1005 | 1485 | 1485 |
| 18TSPC | 576.5 | 564.7 | 585.8 | 859.1 | 1162 | 1424 | 1424 |
| TGFF | 322.7 | 316.4 | 677 | 902.1 | 999.7 | 1219 | 1219 |
| LLFF | 825.8 | 826.6 | 755.3 | 741 | 1581 | 1568 | 1581 |

Table 3.4: PDP Comparison of Flip-flop (Layout).

| PDP (aJ) Versus Data Activity |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Flip-flop | Data Activity |  |  |  |  |  |  |  |  |  |
|  | $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{GHz}$ |  |  |  |  | $\mathrm{VDD}=0.5 \mathrm{~V}, \mathrm{CK}=0.1 \mathrm{GHz}$ |  |  |  |  |
|  | 1\% | 10\% | 20\% | 50\% | 100\% | 1\% | 10\% | 20\% | 50\% | 100\% |
| S2CFF | 1700 | 1810 | 1930 | 2280 | 2860 | 358 | 380 | 404 | 475 | 591 |
| ACFF | 573 | 807 | 1070 | 1850 | 3170 | 117 | 157 | 203 | 337 | 563 |
| TCFF | 437 | 589 | 763 | 1280 | 2140 | 209 | 273 | 346 | 566 | 954 |
| TSPC18 | 1330 | 1380 | 1420 | 1560 | 1790 | 242 | 251 | 260 | 287 | 331 |
| 18TSPC | 713 | 767 | 825 | 995 | 1280 | 158 | 168 | 179 | 215 | 272 |
| TGFF | 1010 | 1090 | 1180 | 1460 | 1920 | 207 | 224 | 241 | 296 | 388 |
| LLFF | 516 | 650 | 799 | 1250 | 1980 | 119 | 148 | 182 | 280 | 446 |

Table 3.5: Flip-flop Characteristic Comparison.

| Comparison Characteristic | Flip-flop |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | S2CFF | ACFF | TCFF | TSPC18 | 18TSPC | TGFF | LLFF |
| Proposed Year | ISSC" 14 <br> [8] | $\text { ISSC" } 11$ <br> [7] | JSSC"14 <br> [9] | $\begin{gathered} \text { TCAS1'18 } \\ {[10]} \end{gathered}$ | $\begin{gathered} \text { JSSC" } 19 \\ {[11]} \\ \hline \end{gathered}$ | Std-Cell | $\begin{gathered} \mathrm{XYZ} \\ {[12]} \end{gathered}$ |
| Type | Static | Static | Static | Semi-Dynamic | Static | Static | Static |
| Contention-Free | Yes | Weak | Yes | No | Yes | Yes | Yes |
| Single-Phase | Yes | Yes | Yes | Yes | Yes | No | Yes |
| Clock Load | 5 T | 4T | 3T | 4 T | 4T | 12T | 4T |
| Total \# of Transistors | 24 | 26 | 21 | 18 | 18 | 24 | 24 |
| Area (um ${ }^{\text {2 }}$ ) | 15.48 | 11.52 | 11.16 | 9.36 | 10.44 | 11.16 | 11.16 |
| Schematic Simulation |  |  |  |  |  |  |  |
| $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$ |  |  |  |  |  |  |  |
| Leakage Power (nW) | 130.2 | 325.7 | 97.62 | 190.1 | 125.6 | 129.2 | 129 |
| $\begin{gathered} \text { Total Avg. Power (uW) } \\ \text { DA }=0 \% \end{gathered}$ | 2.52 | 0.819 | 0.761 | 3.63 | 2.10 | 2.60 | 0.941 |
| Total Avg. Power (uW) $\mathrm{DA}=10 \%, 90 \%$ | 2.80, 4.88 | $1.45,6.17$ | 0.84, 1.50 | $3.78,4.87$ | 2.34, 4.17 | 2.97, 5.88 | 1.41, 4.96 |
| D-to-Q (ps) | 107.9 | 112.7 | 97.71 | 91.42 | 80.41 | 72.68 | 81.07 |
| $\begin{gathered} \text { PDP (aJ) } \\ \mathrm{DA}=10 \%, 90 \% \\ \hline \end{gathered}$ | 302, 527 | 163, 695 | 82, 147 | 346, 445 | 188, 335 | 216, 427 | 114, 402 |
| VDD $=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$ |  |  |  |  |  |  |  |
| Leakage Power (nW) | 3.441 | 3.754 | 7.867 | 4.929 | 2.981 | 4.145 | 3.598 |
| $\begin{gathered} \text { Total Avg. Power (nW) } \\ \text { DA }=0 \% \end{gathered}$ | 59.9 | 21.3 | 19.8 | 77.6 | 51 | 63.2 | 23.4 |
| $\begin{gathered} \text { Total Avg. Power (nW) } \\ \text { DA }=10 \%, 90 \% \end{gathered}$ | 66.2, 114 | 34.3, 132 | 20.6, 28.1 | 81.2, 109 | 56.2, 96.6 | 71.9, 141 | 33.9, 113 |
| D-to-Q (ns) | 0.8264 | 0.8763 | 1.584 | 0.5782 | 0.6204 | 0.5595 | 0.6432 |
| $\begin{gathered} \text { PDP (aJ) } \\ \text { DA }=10 \%, 90 \% \\ \hline \end{gathered}$ | 55, 94 | 30, 116 | 33, 44 | 47, 63 | 35, 60 | 40, 79 | 22, 73 |
| Post-Layout Simulation |  |  |  |  |  |  |  |
| $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{CK}=1 \mathrm{MHz}$ |  |  |  |  |  |  |  |
| Leakage Power (nW) | 114 | 23.53 | 23.74 | 124.9 | 111.4 | 22.74 | 42.65 |
| $\begin{gathered} \text { Total Avg. Power (uW) } \\ \text { DA }=0 \% \end{gathered}$ | 4.62 | 2.27 | 2 | 4.46 | 3.22 | 6.7 | 2.88 |
| Total Avg. Power (uW) $\mathrm{DA}=10 \%, 90 \%$ | 6.98, 10.6 | 3.49, 12.6 | 2.86, 9.53 | 7.28, 9.22 | 4.81, 7.67 | 7.33, 12.3 | 3.80, 10.8 |
| D-to-Q (ps) | 260 | 231 | 206 | 189 | 160 | 149 | 171 |
| $\begin{gathered} \text { PDP (aJ) } \\ \mathrm{DA}=10 \%, 90 \% \end{gathered}$ | 1810, 2860 | 807, 3170 | 589, 2140 | 1380, 1790 | 767, 1280 | 1090, 1920 | 650, 1980 |
| $\mathrm{VDD}=0.5 \mathrm{~V}, \mathrm{CK}=100 \mathrm{MHz}$ |  |  |  |  |  |  |  |
| Leakage Power (nW) | 6.298 | 2.684 | 2.651 | 3.678 | 5.952 | 2.853 | 6.33 |
| $\begin{gathered} \text { Total Avg. Power (uW) } \\ \text { DA }=0 \% \end{gathered}$ | 114 | 57 | 49.5 | 105 | 79.6 | 168 | 71.6 |
| Total Avg. Power (nW) $\mathrm{DA}=10 \%, 90 \%$ | 170, 266 | 82, 294 | 69, 241 | 169, 223 | 118, 191 | 184, 318 | 94, 282 |
| D-to-Q (ns) | 2.221 | 1.916 | 3.958 | 1.485 | 1.424 | 1.219 | 1.581 |
| $\begin{gathered} \text { PDP (aJ) } \\ \mathrm{DA}=10 \%, 90 \% \end{gathered}$ | 380, 591 | 157, 563 | 273, 954 | 251, 331 | 168, 272 | 224, 388 | 148, 446 |

$20 \%$, whereas 18 TSPC provides the lowest PDP for $\mathrm{DA}=50 \%$ and $100 \%$. At $\mathrm{VDD}=$ 0.5 V at $1 \%$ DA ACFF and LLFF show comparable energy consumption while at $10 \%$ DA LLFF shows lowest PDP. At 20\% DA LLFF and 18TSPC show comparable energy consumption and at higher DAs, the 18TSPC clearly shows lowest PDP.

Table 3.5 provides an overall comparison of all investigated flip-flops. Based on postlayout simulation, at $\mathrm{DA}=0 \%$ TCFF has lowest-power consumption amongst all. ACFF and LLFF also offer low-power comparable to TCFF, while 18TSPC, TSPC18, S2CFF and TGFF consume more power. At low DAs, LLFF provides the best PDP. The D-to-Q delay of LLFF is commensurate to best fast flip-flops TGFF and 18TSPC both at VDD $=$ 1 V and $\mathrm{VDD}=0.5 \mathrm{~V}$. ACFF has large D-to-Q delay which increases its PDP at low DAs whereas TCFF has performance issue at low-voltage. Although both TCFF and ACFF offer low-power consumption at low DAs.

### 3.3 Pre-layout Versus Post-layout Simulation

This section compares pre-layout and post-layout simulation results in terms of power, delay, and energy.

## Power Comparison:

In pre-layout, TCFF is power efficient for all DAs. LLFF follows TCFF upto $50 \%$ of DAs and 18TSPC for rest of DAs. In post-layout, TCFF shown power efficiency up to $50 \%$ of DAs and 18TSPC for rest of higher DAs. ACFF and LLFF follow TCFF up to $30 \%$ of DAs.

## Delay Comparison:

Both in pre-layout and post-layout simulations, TGFF is fastest amongst all considered flip-flops, followed by 18TSPC, TSPC18 and LLFF. In pre-layout ACFF has worst D-to-Q delay, followed by S2CC, whereas in post-layout S2CFF has worst D-to-Q delay followed by ACFF at VDD $=1 \mathrm{~V}$. As voltage decreases, performance of TCFF deteriorates and it becomes poor in performance in both pre and post-layout simulations at $\mathrm{VDD}=0.5 \mathrm{~V}$.

## Energy Comparison:

In pre-layout simulation, TCFF is energy efficient for all the DAs and in post-layout it is upto $30 \%$ of DAs at VDD $=1 \mathrm{~V}$. It is all because, sizes of M8 and M13 in TCFF schematic are increased to make it works at 0.5 V . Moreover, in pre-layout, LLFF follows TCFF upto $50 \%$ of DAs, and 18TSPC for rest of DAs, whereas in post-layout LLFF follows TCFF upto $20 \%$ of DAs, and 18TSPC for rest of higher DAs at VDD $=1 \mathrm{~V}$. At VDD $=$
0.5 V, LLFF is energy efficient upto $25 \%$ and $19 \%$ of DAs in pre-layout and post-layout simulation, respectively. TCFF followed by 18TSPC is energy efficient for rest of higher DAs in pre-layout, whereas 18TSPC for rest of higher DAs in post-layout. If we consider TCFF without tuning M8 and M13 transistors, then LLFF would replace TCFF in term of energy efficiency.

## Chapter 4

## Scan chain of Flip-flops

In scan chain flip-flops are connected in cascaded manner, where output of one flip-flop is connected with the input of next. Input data is fed on first flip-flop called scan-in and output is taken from last flip-flop called scan-out. Scan chains with flip-flops are often deployed in modern Soc designs for variety of purposes, such as to shift the test data into and out of the chip, to measure power of clock processor, etc. Selection of chain length is one of the important considerations in scan chain design, it should not be too large; otherwise it would cost more number of cycles to shift data in and out, it should not be too small; otherwise it would required more number of input/output ports as scan-in and scan-out ports. Thus, scan chains of 256 flip-flops with synthesized clock trees and no hold buffers between flip-flop stages are designed for all investigated flip-flops to examine their relative power consumption as shown in Figure. 4.1. TSPC18 has failed to perform in scan chain fashion due to its limitation for FDSOI technology only.


Figure 4.1: Block Diagram of Scan chain of 256 Flip-flops.
Hspice simulation is performed on scan chain of flip-flops with $\mathrm{CK}=50 \mathrm{MHz}$ and T $=25{ }^{\circ} \mathrm{C}$ at different supply voltages and data activities. For power simulation, total of

356 clock-cycles timing window is used; as 256 clock-cycles are needed to load the scan chain with appropriate DA pattern. Total avg. power consumption is the sum of power consumption of clock buffers and scan chain of flip-flops, and calculated in the range of 256 to 356 clock-cycles timing window. All data patterns applied to the scan chain are periodic to ensure uniform filling of the data pipeline. With $100 \%$ DA, each flip-flop input has a new value at every clock cycle. With $50 \% \mathrm{DA}$, the flip-flop input has a new value every two clock cycles, so on and so forth.

### 4.1 Schematic Simulation Result



Figure 4.2: Total Avg. Power (Schematic) Versus DA at (a) VDD $=1 \mathrm{~V}$ and $\mathrm{CK}=50$ MHz , and $(\mathrm{b}) \mathrm{VDD}=0.5 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$.

Figure 4.2(a) shows DAs versus total avg. power consumption for scan chain of all investigated flip-flops at VDD $=1 \mathrm{~V}$. In terms of leakage 18TSPC is power efficient, while rest of flip-flops are nearly comparable to each other, and as DA increases their

Table 4.1: Scan Chain: Total Avg. Power Versus Voltage (Schematic).

| Power (uW) Versus VDD |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Scan chain | Voltage (V) |  |  |  |  |  |  |  |  |  |  |  |
|  | $\mathrm{DA}=0 \%, \mathrm{CK}=50 \mathrm{MHz}$ |  |  |  |  |  | $\mathrm{DA}=10 \%, \mathrm{CK}=50 \mathrm{MHz}$ |  |  |  |  |  |
|  | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
| S2CFF | 9.251 | 13.87 | 19.93 | 27.48 | 37.63 | 52.34 | 7.856 | 11.84 | 16.98 | 23.80 | 33.34 | 46.10 |
| ACFF | 3.620 | 5.518 | 8.054 | 11.73 | 16.14 | 23.32 | 5.168 | 7.816 | 11.25 | 15.96 | 22.25 | 30.96 |
| TCFF | 3.442 | 5.295 | 7.736 | 11.53 | 15.56 | 23.19 | 4.832 | 7.33 | 10.58 | 15.03 | 21.03 | 29.65 |
| 18TSPC | 7.471 | 11.15 | 15.71 | 21.61 | 29.36 | 39.67 | 6.517 | 9.756 | 13.90 | 19.29 | 26.41 | 35.85 |
| TGFF | 8.968 | 13.41 | 19.04 | 25.79 | 34.83 | 46.92 | 10.03 | 14.90 | 21.18 | 28.60 | 38.45 | 51.14 |
| LLFF | 4.342 | 6.683 | 9.938 | 14.54 | 21.68 | 31.03 | 5.704 | 8.699 | 12.69 | 18.26 | 26.27 | 37.17 |

power consumption increases rapidly. TCFF is power efficient up to $20 \%$ of DAs and then 18TSPC becomes power efficient for rest of DAs. For low DAs, power dissipation of ACFF is comparable with TCFF and 18TSPC, but at higher DAs it is comparable with TGFF and LLFF. Moreover, S2CFF and TGFF both consume large power particularly at low DAs. Figure. 4.2(b) depicts total avg. power of scan chain of 256 flip-flops versus DA at $\mathrm{VDD}=0.5 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$. Here also TCFF is power efficient amongst all up to $20 \%$ of DAs, while ACFF, 18TSPC, and LLFF are comparable with it. Again, 18TSPC is power efficient for higher DAs.

Table 4.1 depicts total avg. power with $\mathrm{DA}=0 \%$ and $10 \%$, and $\mathrm{CK}=50 \mathrm{MHz}$ at different VDDs. For $\mathrm{DA}=0 \%$ and at $\mathrm{VDD}=1 \mathrm{~V}, \mathrm{TCFF}$ is power efficient while S 2 CFF consumes more power amongst all. As voltage decreases, power improvement increases for all flip-flops, but still TCFF is dominant in power efficiency for all the voltages while ACFF is pretty comparable with it. For $\mathrm{DA}=10 \%$ and at VDD $=1 \mathrm{~V}$, TCFF is power efficient, followed by ACFF. TGFF consumes more power, followed by S2CFF. Again with decrement in VDD, their power consumption decreases and at VDD $=0.5 \mathrm{~V}$, ACFF and LLFF are comparable with TCFF, followed by 18TSPC, ACFF, and TGFF.

### 4.2 Post-layout Simulation Result

A post-layout simulation is performed on scan chain of 256 flip-flops. Figure 4.3 shows DAs versus total avg. power consumption for scan chain of all investigated flip-flops at $\mathrm{VDD}=1 \mathrm{~V}$ and 0.5 V . 18TSPC consumes less leakage power compare to rest of the investigated flip-flops. TCFF is power efficient amongst all up to $20 \%$ of DAs followed by ACFF, 18TSPC and LLFF, while 18TSPC is power efficient for higher DAs. TGFF


Figure 4.3: Total Avg. Power (Layout) Versus DA at (a) VDD $=1 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$, and (b) VDD $=0.5 \mathrm{~V}$ and $\mathrm{CK}=50 \mathrm{MHz}$.

Table 4.2: Scan Chain: Total Avg. Power Versus Voltage (Layout).

| Power (uW) Versus VDD |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Scan chain | Voltage (V) |  |  |  |  |  |  |  |  |  |  |  |
|  | $\mathrm{DA}=0 \%, \mathrm{CK}=50 \mathrm{MHz}$ |  |  |  |  |  | $\mathrm{DA}=10 \%, \mathrm{CK}=50 \mathrm{MHz}$ |  |  |  |  |  |
|  | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
| S2CFF | 22.13 | 32.46 | 45.01 | 60.25 | 79.11 | 101.8 | 18.13 | 26.62 | 37.03 | 49.89 | 65.93 | 85.90 |
| ACFF | 8.222 | 12.10 | 16.90 | 23.17 | 30.93 | 41.59 | 11.39 | 16.83 | 23.66 | 32.22 | 43.07 | 56.67 |
| TCFF | 7.396 | 11.02 | 15.51 | 21.45 | 29.01 | 39.93 | 10.44 | 15.45 | 21.71 | 29.51 | 39.39 | 52.21 |
| 18TSPC | 15.25 | 22.33 | 30.84 | 41.20 | 54.02 | 69.87 | 12.92 | 19.01 | 26.48 | 35.58 | 46.86 | 60.94 |
| TGFF | 22.21 | 32.39 | 44.76 | 59.29 | 77.10 | 97.86 | 24.27 | 35.31 | 48.69 | 64.53 | 83.32 | 105.9 |
| LLFF | 10.62 | 15.77 | 22.27 | 30.37 | 41.15 | 54.98 | 13.52 | 19.97 | 28.04 | 38.16 | 51.11 | 67.56 |

and S2CFF are power hungry for lower DAs, whereas ACFF power consumption increases dramatically as DA increases. At 100\% DA, ACFF, TGFF and LLFF power consumption are comparable to each other.

Table 4.2 illustrates total avg. power with $\mathrm{DA}=0 \%$ and $10 \%$, and $\mathrm{CK}=50 \mathrm{MHz}$ at different VDDs. TCFF is power efficient for both DAs at all VDDs, followed by ACFF. For $0 \%$ DA, S2CFF consumes more power, followed by TGFF, whereas for $10 \%$ DA TGFF consumes more power followed by S2CFF. LLFF is power efficient than 18TSPC for $0 \%$ DA, whereas at $10 \%$ DA power consumption of both flip-flops are comparable.

## Chapter 5

## Conclusions and Future Work

D flip-flops are most commonly used flip-flops in the design industry due to smallest number of logical gates and reduced cost as compare to other types of flip-flops. This thesis presents comparison of seven advanced D flip-flops to be added in a high performance D flip-flop cell library addressing low-voltage and broad range of power, energy, and performance objectives.

Post-layout simulation shows that TGFF is the fastest flip-flop amongst all considered flip-flops. Only the problem with TGFF is that it's clock circuitry needs total of 12 transistors, thus, it consumes more power amongst all flip-flops for all Das and therefore, it is not convenient for low-power applications.

LLFF has lowest value of PDP for low DAs compare to other flip-flops. For a DA $=$ $20 \%$ or less, the LLFF offers the best energy efficiency behavior compare to all the other rivals. The D-to-Q delay of LLFF is commensurate to 18TSPC and is considerably better than TCFF. Thus, LLFF is suited for low-voltage, low-energy applications with DA of $20 \%$ or lower. The drawback of LLFF is that it needs data and its complementary signal and have total of 24 transistors with area $11.16 \mathrm{um}^{2}$. Additionally, it consumes more power and energy for higher DAs applications.

18TSPC is Power and energy efficient for high DAs applications, and it's D-to Q-delay is also comparable to fastest flip-flop TGFF. Additionally, it only needs 18 transistors for its working operation, which is the minimum number of transistors used in any static contention-free D flip-flop, thus, it consumes $10.44 \mathrm{um}^{2}$ area. Drawbacks of 18 TSPC is that it is not suited for low DAs applications; as most of the SoC designs need $5 \%$ to $15 \%$ of DAs.

TSPC18 is only suitable for 28nm FDSOI technology; as it needs interconnect poly and gate biasing for threshold voltage manipulation. Additionally, it also failed to work in scan chain fashion.

TCFF is power efficient amongst all investigated flip-flops up to $50 \%$ of DAs. It only utilizes 3 clock load transistors, which is the lowest number of transistors used for any static D flip-flop clock circuitry. As nearly $40 \%$ of chip power is due to clock power, thus, TCFF is attractive for low-power applications. Drawback of TCFF is that, it only works on high voltage and to make it to work on low-voltage; modification in circuit design or transistors sizes required.

S2CFF was basically designed to improve power efficiency but its clock circuitry needs 5 transistors, which makes it power hungry flip-flop like TGFF. Moreover it it has largest D-to-Q delay amongst all considered flip-flop, as a result it consumes more energy for all DAs range.

ACFF is power efficient for low DAs and its power consumption is comparable with TCFF and LLFF. However, D-to-Q delay of ACFF is comparable with S2CFF, thus, it consumes significant amount of energy as DAs tend to increase. Furthermore, it has total of 26 transistors in the circuit, which not only increases its power consumption but also areas ( $11.52 \mathrm{um}^{2}$ ) as well.

In this work, we have focused on comparison of advanced $D$ flip-flops in terms of power, energy, and speed in order to have a wide range of options in our high performance standard cell library, so that users can pick-up suitable flip-flop based on their particular application needs. Table 5.1 illustrates recommended flip-flop based on required performance metric. For higher speed, TGFF is suggested for entire DA range. For power efficiency at low DA TCFF, and at high DAs 18TSPC are recommended. For energy efficiency, LLFF and TCFF at low DA, while 18TSPC at high DA are suggested. Further, research can be explored on investigating other flip-flop techniques, and designing new advanced power and energy efficient flip-flop to make our standard cell library more robust.

Table 5.1: Recommended Flip-flop.

| Measuring Metric | Recommended Flip-flop |  |
| :---: | :---: | :---: |
|  | Low DA | High DA |
| Delay | TGFF, 18TSPC, LLFF | TGFF, 18TSPC, LLFF |
| Power | TCFF | 18TSPC |
| PDP | LLFF, TCFF | 18 TSPC |

## References

[1] J. Samandari-Rad and R. Hughey, "Power/energy minimization techniques for variability-aware high-performance $16-\mathrm{nm} 6 \mathrm{t}$-sram," IEEE Access, vol. 4, pp. 594613, 2016.
[2] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flowthrough latch and edge-triggered flip-flop hybrid elements," in 1996 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC, pp. 138139, IEEE, 1996.
[3] F. Klass, "Semi-dynamic and dynamic flip-flops with embedded logic," in 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No. 98CH36215), pp. 108109, IEEE, 1998.
[4] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolić, Digital integrated circuits: a design perspective, vol. 7. Pearson Education Upper Saddle River, NJ, 2003.
[5] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, "Comparative analysis of doubleedge versus single-edge triggered clocked storage elements," in 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No. 02CH37353), vol. 5, pp. V-V, IEEE, 2002.
[6] N. H. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
[7] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77\% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40 nm cmos ," pp. 338-340, IEEE, 2011.
[8] Y. Kim, W. Jung, I. Lee, Q. Dong, M. Henry, D. Sylvester, and D. Blaauw, "A static contention-free single-phase-clocked 24t flip-flop in 45 nm for low-power applications,"
in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 466-467, IEEE, 2014.
[9] N. Kawai, S. Takayama, J. Masumi, N. Kikuchi, Y. Itoh, K. Ogawa, A. Ugawa, H. Suzuki, and Y. Tanaka, "A fully static topologically-compressed 21-transistor flipflop with $75 \%$ power saving," IEEE Journal of Solid-State Circuits, vol. 49, no. 11, pp. 2526-2533, 2014.
[10] F. Stas and D. Bol, "A 0.4-v $0.66-\mathrm{fj} /$ cycle retentive true-single-phase-clock 18 t flipflop in $28-\mathrm{nm}$ fully-depleted soi cmos," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 3, pp. 935-945, 2017.
[11] Y. Cai, A. Savanth, P. Prabhat, J. Myers, A. S. Weddell, and T. J. Kazmierski, "Ultralow power 18 -transistor fully static contention-free single-phase clocked flip-flop in $65-\mathrm{nm}$ cmos," IEEE Journal of Solid-State Circuits, vol. 54, no. 2, pp. 550-559, 2018.
[12] A. Khorami, M. Sachdev, and M. Sharifkhani, "A contention-free, static, single-phase flip-flop for low data activity applications," in 2019 32nd IEEE International System-on-Chip Conference (SOCC), pp. 11-16, IEEE, 2019.
[13] "https://www.statista.com/statistics/471264/iot-number-of-connected-devicesworldwide/,"
[14] S. Kumar, S. Sharma, and B. Kaur, "Leakage power estimation for iscas c17 benchmark circuit," in 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 108-112, 2019.
[15] S. Hsu, A. Agarwal, S. Realov, M. Anders, G. Chen, M. Kar, R. Kumar, H. Sumbul, P. Knag, H. Kaul, V. Suresh, S. Mathew, I. Rajwani, S. Damaraju, R. Krishnamurthy, and V. De, "Low-clock-power digital standard cell ips for high-performance graphics/ai processors in 10 nm cmos," in 2020 IEEE Symposium on VLSI Circuits, pp. 1-2, 2020.
[16] M. Asyaei and A. Peiravi, "Low power wide gates for modern power efficient processors," Integration, vol. 47, no. 2, pp. 272-283, 2014.
[17] S. Purohit, M. Lanuzza, S. Perri, P. Corsonello, and M. Margala, "Design and evaluation of an energy-delay-area efficient datapath for coarse-grain reconfigurable computing systems," Journal of Low Power Electronics, vol. 5, no. 3, pp. 326-338, 2009.
[18] H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (rcsff) for $63 \%$ power reduction," IEEE Journal of Solid-State Circuits, vol. 33, no. 5, pp. 807-811, 1998.
[19] V. G. Oklobdzija, "Clocking and clocked storage elements in a multi-gigahertz environment," IBM Journal of Research and Development, vol. 47, no. 5.6, pp. 567-583, 2003.
[20] A. Hirata, K. Nakanishi, M. Nozoe, and A. Miyoshi, "The cross charge-control flipflop: A low-power and high-speed flip-flop suitable for mobile application socs," in Digest of Technical Papers. 2005 Symposium on VLSI Circuits, 2005., pp. 306-307, IEEE, 2005.
[21] M. Hamada, H. Hara, T. Fujita, C. K. Teh, T. Shimazawa, N. Kawabe, T. Kitahara, Y. Kikuchi, T. Nishikawa, M. Takahashi, et al., "A conditional clocking flip-flop for low power h. 264/mpeg-4 audio/visual codec lsi," in Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005., pp. 527-530, IEEE, 2005.
[22] K. Absel, L. Manuel, and R. Kavitha, "Low-power dual dynamic node pulsed hybrid flip-flop featuring efficient embedded logic," IEEE transactions on very large scale integration (vlsi) systems, vol. 21, no. 9, pp. 1693-1704, 2012.
[23] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE Journal of Solid-State Circuits, vol. 36, no. 8, pp. 1263-1271, 2001.
[24] Y. Ueda, H. Yamauchi, M. Mukuno, S. Furuichi, M. Fujisawa, F. Qiao, and H. Yang, " 6.33 mw mpeg audio decoding on a multimedia processor," in 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, pp. 1636-1645, IEEE, 2006.

