# Area Efficient Design of Shift Register through Comparative Analysis of Latches and Flip-Flops 

M. Karthick<br>Department of EEE, SVS College of Engineering<br>Coimbatore, India<br>E-mail: saranspectra@gmail.com


#### Abstract

This work presents a energy and area-efficient shift register using pulsed latches. Energy consumption plays an important role in digital systems, because of the requirement to dissipate this energy in high-density circuits and the battery life need to be extended in portable systems such as devices with wireless communication capabilities. Flip-flops consume more energy in digital circuits. In flip flops timing problem occurs due to the redundancy, when the input and the output are in the same state. Several low-power techniques are available but all of them incur transistor-count penalties, leading to an increase in size. In this work the power and energy efficiency of several CMOS master-slave flip-flops and latches are designed and investigated. Among the flip-flops and latches compared, the proposed SSASPL (Static Sense Amplifier with Shared Pulse Generator) circuit is found to be the best energy and area efficient and this circuit is used to design a shift register. This method solves the timing problem through the use of multiple non-overlap delayed pulsed clock signals instead of single pulsed clock signal. The shift register designed by grouping the latches to several sub shift registers and using additional temporary storage latches but uses a small number of the pulsed clock signals. A 16-bit shift register using pulsed latches was fabricated using a 0.18 $\mu m$ CMOS process with $V_{D D}=1.8 v$. The proposed shift register saves $52 \%$ area and $44 \%$ power compared to the conventional shift register with flip-flops.


Keywords: Area-efficient, shift register, pulsed latches, flip-flops

## InTRODUCTION

Flip-flops are the fundamental building blocks for all sequential circuits and used in digital systems for storing information. Flip-flops contents are changed only either at the rising or falling edge of the enable signal. But, after the rising or falling edge of the enable signal, the flip-flop's content remains constant even if the input changes or not. Flip-Flops consume a large amount of power because they are clocked at the system's operating frequency. To reduce this redundancy, several techniques as well as their flip-flops have been proposed recently. Careful design of the flip-flop is important to a low power VLSI system. Major configurations of
designing a Flip-flop are Master slave and pulse triggered flip-flops [1, 2].

A shift register is the basic building block in a VLSI circuit. It consists of " N " number of flip-flops connected in series. Shift registers are commonly used in many applications, such as digital filters, communication receivers, and image processing ICs. The speed of the flip-flop is less important than the area and power consumption. The smallest flipflop is suitable for the shift register to reduce the area and power consumption [3]. In recent times, pulsed latches are used instead of flip-flops in many
applications, because a pulsed latch is much smaller than a flip-flop. But the pulsed latch cannot be used in a shift register due to the timing problem between pulsed latches.

This paper proposes an energy and area-efficient shift register using pulsed latches. The shift register solves the timing problem using multiple nonoverlap delayed pulsed clock signals instead of single pulsed clock signal. The shift register designed by grouping the latches to several sub shift registers and using additional temporary storage latches but uses a small number of the pulsed clock signals.

## LATCHES AND FLIP-FLOPS DESIGN



Fig. 1: Conventional DFF (CDFF).


Fig. 2: Transmission Gate DFF (TGDFF).


Fig. 3: Adaptive Coupling DFF (ACDFF).


Fig. 4: Power PC DFF (PPCDFF).


Fig. 5: Push-Pull DFF (PPDFF).


Fig. 6: Static Sense Amplifier DFF (SSADFF).


Fig. 7: Transsmission Gate Latch (TGLA).


Fig. 8: Power PC Latch (PPCLA).


Fig. 9: Hybrid Latch Flip-Flop (HLFF).


Fig. 10: Static Sense Amplifier Latch with Pulse Generator (SSAPLA).


Fig. 11: Static Sense Amplifier Latch with Shred Pulse Generator (SSASPL).

Figures 1 to 12 present schematics for the flip-flop and latch designs evaluated. Figure 1 shows the Conventional D flip-Flop. The D flip-flop captures the value of the D -input at a definite portion of the clock cycle. That captured value becomes the Q output. At other times, the output Q does not change. Figure 2 shows the Transmission Gate D Flip-Flop. Flip-flops consist of two latches in series controlled by inverted clock signals. The first latch monitors the input during the true level of the control signal and freezes at the falling edge, while the second copies the output of the first to the output during the false level of the control and freezes at the rising edge [4]. It never experiences changes during these half periods as its input; the
output of the first latch is frozen. This way changes only occur at the falling edge of the control signal, when the first latch freezes and the second one copy its output to the flip-flop's output. Figure 3 shows the Adaptive Coupling DFF. It has a reduced transistor count compared to other low-power flipflops, and 2 fewer transistors than the mainstream transmission-gate flip-flop (TGFF). Figure 4 shows the Power PC DFF. PPCFF is a flip-flop design using master-slave Power PC-style latch stages, which is known to have low energy and delay [5, 6].

Figure 5 shows the Push-Pull DFF. In order to improve performance of a conventional DFF, an inverter and transmission gate between the outputs of master and slave latches is inserted to accomplish a push-pull effect at the slave latch, i.e., input and output of the output inverter will be driven to opposite logic values during switching. Figure 6 shows the Static Sense Amplifier DFF. It is a master-slave flip-flop using static sense-amp latch stages which have low clock load. Figure 7 shows the Transsmission Gate Latch. A D latch has two inputs: a data input (D) and an enable or clock input (EN). When then enable input is true it copies its input to its output. When the enable becomes false the output freezes and stays at the logic level the input had at the time of enable input's falling edge. Figure 8 shows the Power PC Latch. It is a transparent latch based on the PowerPC 603 design, which is known to be reasonably fast and energyefficient.

Figure 9 shows the Hybrid Latch Flip-Flop. It is the hybrid latch flip-flop which operates as a pulsed transparent latch design and which is generally regarded as one of the fastest known flip-flop designs [7]. Figure 10 shows the Static Sense Amplifier Latch with Pulse Generator. Figure 11 shows the Static Sense Amplifier Latch with Shred Pulse Generator. The differential data inputs ( D and Db ) of the latch come from the differential data outputs ( Q and $\mathrm{Q}_{\mathrm{b}}$ ) of the previous latch. The SSASPL uses the smallest number of transistors (7
transistors) and it consumes the lowest clock power because it has a single transistor driven by the pulsed clock signal [8, 9].

Table 1: Transistor and Power Comparison of Latches and Flip-Flops.

| Latches <br> \& Flip- <br> Flops | Total <br> Number of <br> Transistors | Power <br> Consumption <br> (mW) |
| :--- | :---: | :---: |
| CDFF | 16 | 2.123 |
| TGDFF | 18 | 2.234 |
| ACDFF | 22 | 3.35 |
| PPCDFF | 16 | 2.123 |
| PPDFF | 20 | 3.01 |
| SSADFF | 18 | 2.234 |
| TGLA | 8 | 1.23 |
| PPCLA | 8 | 1.23 |
| HLFF | 18 | 2.234 |
| SSAPLA | 16 | 2.123 |
| SSASPL | 7 | 1.19 |

Table 1 show the transistor and power comparison of pulsed latches and flip-flops. When counting the total number of transistors in pulsed latches and flip-flops, the transistors for generating the differential clock signals and pulsed clock signals are not included because they are shared in all latches and flip-flops. The SSASPL uses 7 transistors, which is the smallest number of transistors among the pulsed latches. The PPCFF uses 16 transistors, which is the smallest number of transistors among the flip-flops.

## REVIEW OF ARCHITECTURE USING PULSED LATCHES



Fig. 12: Master-Slave Flip-Flop.


Pulse generation circuit

Fig. 13: Pulsed Latch.
The architecture of a shift register is quite simple. An N -bit shift register is composed of series connected N data flip-flops. The speed of the flipflop is less important than the area and power consumption because there is no circuit between flip-flips in the shift register. The smallest flip-flop is suitable for the shift register to reduce the area and power consumption. A master-slave flip-flop using two latches in Figure 12 can be replaced by a pulsed latch consisting of a latch and a pulsed clock signal shown in Figure 13 [10]. A pulsed latch is much smaller than a Flip-Flop. All pulsed latches share the pulse generation circuit for the pulsed clock signal. As a result, the area and power consumption of the pulsed latch become almost half of those of the master-slave flip-flop. The pulsed latch is an attractive solution for small area and low power consumption. But the pulsed latch cannot be used in a shift register due to the timing problem between pulsed latches [11].

(b)

Fig. 14: Shift Register with Latches and a Pulsed Clock Signal (a) Schematic (b) Waveforms.

The pulsed latch cannot be used in shift registers due to the timing problem, as shown in Figure 14. The shift register in Figure 14 (a) consists of several latches and a pulsed clock signal (CLK_pulse). The operation waveforms in Figure 14 (b) show the timing problem in the shifter register. The output signal of the first latch (Q1) changes correctly because the input signals of the first latch (IN) is constant during the clock pulse width ( $\mathrm{T}_{\text {PULSE }}$ ). But the second latch has an uncertain output signal (Q2) because its input signal (Q1) changes during the clock pulse width.


Fig. 15: Shift Register with Latches, Delay Circuits and a Pulsed Clock Signal (a) Schematic (b) Waveforms.

One solution for the timing problem is to add delay circuits between latches, as shown in Figure 15 (a). The output signal of the latch is delayed ( $\mathrm{T}_{\text {DELAY }}$ ) and reaches the next latch after the clock pulse. As shown in Figure 15 (b) the output signals of the first and second latches (Q1 and Q2) change during the clock pulse width ( $\mathrm{T}_{\text {PULSE }}$ ), but the input signals of the second and third latches (D2 and D3) become the same as the output signals of the first and second latches (Q1 and Q2) after the clock pulse. As a result, all latches have constant input signals during the clock pulse and no timing problem occurs between the latches. However, the delay circuits cause large area and power overheads.


Fig. 16: Shift Register with Latches and Delayed Pulsed Clock Signal (a) Schematic (b) Waveforms.

Another solution is to use multiple non-overlap delayed pulsed clock signals, as shown in Figure 16 (a). The delayed pulsed clock signals are generated when a pulsed clock signal goes through delay circuits. Each latch uses a pulsed clock signal which is delayed from the pulsed clock signal used in its next latch. Therefore, each latch updates the data after its next latch updates the data. As a result, each latch has a constant input during its clock pulse and no timing problem occurs between latches.

However, this solution also requires many delay circuits.

(a)

(b)

Fig. 17: Proposed Shift Register (a) Schematic (b) Waveforms.


Fig. 18: Delayed Pulsed Clock Generator (a) Schematic
(b) Waveforms.

Figure 17 (a) shows an example of the proposed shift register. The proposed shift register is divided into sub shifter registers to reduce the number of delayed pulsed clock signals. A 4-bit sub shifter register consists of five latches and it performs shift operations with five non-overlap delayed pulsed clock signals. In the 4-bit sub shift register \#1, four latches store 4-bit data (Q1-Q4) and the last latch stores 1-bit temporary data (T1) which will be stored in the first latch (Q5) of the 4-bit sub shift register \#2. Figure 17 (b) shows the operation waveforms in the proposed shift register. Five nonoverlap delayed pulsed clock signals are generated by the delayed pulsed clock generator in Figure 18 (a). The sequence of the pulsed clock signals is in the opposite order of the five latches. Initially, the pulsed clock signal CLK_pulse<T> updates the latch data T1 from Q4. And then, the pulsed clock signals CLK_pulse<1:4> update the four latch data
from Q4 to Q1 sequentially. The latches Q2-Q4 receive data from their previous latches Q1-Q3 but the first latch Q1 receives data from the input of the shift register (IN). The operations of the other sub shift registers are the same as that of the sub shift register \#1 except that the first latch receives data from the temporary storage latch in the previous sub shift register.

In the conventional delayed pulsed clock circuits, the clock pulse width must be larger than the summation of the rising and falling times in all inverters in the delay circuits to keep the shape of the pulsed clock. However, in the delayed pulsed clock generator in Figure 18 (a) the clock pulsed width can be shorter than the summation of the rising and falling times because each sharp pulsed clock signal is generated from an AND gate and two delayed signals. Therefore, the delayed pulsed clock generator is suitable for short pulsed clock signals. The numbers of latches and clock-pulse circuits change according to the word length of the sub shift register $(\mathrm{K})$. K is selected by considering the area, power consumption, speed.

## CONCLUSION

This paper proposed a low-power and area-efficient shift register using pulsed latches. The shift register reduces area and power consumption by replacing flip-flops with pulsed latches. The timing problem between pulsed latches is solved using multiple non-overlap delayed pulsed clock signals instead of a single pulsed clock signal. A small number of the pulsed clock signals are used by grouping the latches to several sub shifter registers and using additional temporary storage latches. A 16-bit shift register was fabricated using a 0.18 CMOS process with VDD $=1.8 \mathrm{~V}$. The proposed shift register saves $52 \%$ area and $44 \%$ power compared to the conventional shift register with flip-flops.

## REFERENCES

1. P. Reyes, P. Reviriego, J. A. Maestro, O. Ruano. New protection techniques against SEUs for moving average filters in a radiation environment. IEEE Trans. Nucl. Sci. 2007; 54(4): 957-964p.
2. M. Hatamian et al. Design considerations for gigabit ethernet 1000 base-T twisted pair transceivers. Proc. IEEE Custom Integr. Circuits Conf. 1998; 335-342p.
3. H. Yamasaki, T. Shibata. A real-time image-feature-extraction and vector-generation vlsi employing arrayed-shift-register architecture. IEEE J. Solid-State Circuits. 2007; 42(9): 2046-2053p.
4. H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, G.-H. Cho. A 10-bit column-driver IC with parasitic-insensitive iterative chargesharing based capacitor-string interpolation for mobile active-matrix LCDs. IEEE J. Solid-State Circuits. 2014; 49(3): 766-782p.
5. S.-H. W. Chiang, S. Kleinfelder. Scaling and design of a 16-megapixel CMOS image sensor for electron microscopy. In Proc. IEEE Nucl. Sci. Symp. Conf. Record (NSS/MIC). 2009; 1249-1256p.
6. S. Heo, R. Krashinsky, K. Asanovic. Activity-sensitive flip-flop and latch selection for reduced energy. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2007; 15(9): 1060-1064p.
7. V. Rukkumani, N. Devarajan. Power efficient design of amplifier using submicron technology. In International Journal of a Mechanics of Robotics Systems of Inderscience Publishers. 2014; 2(1): 116p.
8. S. Naffziger, G. Hammond. The implementation of the nextgeneration 64 b itanium microprocessor. In IEEE Int. SolidState Circuits Conf. (ISSCC) Dig. Tech. Papers. 2002; 276-504p.
9. H. Partovi et al. Flow-through latch and edge-triggered flip-flop hybrid elements. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers. 1996; 138-139p.
10. E. Consoli, M. Alioto, G. Palumbo, J. Rabaey. Conditional push-pull pulsed latch with 726 fJops energy delay product in 65 nm CMOS. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers. 2012; 482-483p.
11. V.Rukkumani, N.Devarajan. Design and analysis of SRAM cell for ultra low voltage variation. In Australian Journal of Basic and Applied Sciences. 2014; 8(3): 41-50p.
