# An MTCMOS Design Methodology and Its Application to Mobile Computing

Hyo-Sig Won

Ki-Tae Park<sup>¶</sup>

Kyo-Sun Kim Kyu-Myung Choi Kwang-Ok Jeong Jeong-Taek Kong

CAE, Samsung Electronics, San #24 Nongseo-Ri, Giheung-Eup, Yongin-City, Gyeonggi-Do, 449-711, Korea <sup>¶</sup> Dept. of Machine Intelligence and Systems Engineering, Tohoku University, Sendai, 980-8578, Japan hs.won@samsung.com

# ABSTRACT

The Multi-Threshold CMOS (MTCMOS) technology provides a solution to the high performance and low power design requirements of modern designs. While the low V<sub>th</sub> transistors are used to implement the desired function, the high V<sub>th</sub> transistors are used to cut off the leakage current. In this paper, we (i) examine the effectiveness of the MTCMOS technology for the Samsung's 0.18 m process, (ii) propose a new special flip-flop which keeps a valid data during the sleep mode, and (iii) develop a methodology which takes into account the new design issues related to the MTCMOS technology. Towards validating the proposed technique, a Personal Digital Assistant (PDA) processor has been implemented using the MTCMOS design methodology, and the 0.18 m process. The fabricated PDA processor operates at 333MHz, and consumes about 2 W of leakage power. Whereas the performance of the MTCMOS implementation is the same as that of the generic CMOS implementation, three orders of reduction in the leakage power has been achieved.

# **Categories and Subject Descriptors**

B.7.1 Types and Design Styles

**General Terms**: Measurement, Performance, Design, Experimentation

**Keywords:** MTCMOS, leakage current, CPFF, CCS, low power

# **1. INTRODUCTION**

The market that supplies integrated circuits (ICs) to application specific consumer electronics, industrial electronics, and communication and computer products has been historically sharply divided into two groups as shown in Figure 1. The first group consists of low performance and low stand-by power applications such as smart phones, Personal Digital Assistants (PDAs) and hand-held Personal Computers (PCs), while high performance, high stand-by power applications such as high-end digital computers, network devices, and high-end peripherals form

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED'03, August 25-27, 2003, Seoul, Korea.

Copyright 2003 ACM 1-58113-682-X/03/0008...\$5.00.

the second group. Recently, in accordance with the convergence of communication and computation, the advent of data/voice centric mobile computing terminals and mobile media terminals demands the high performance as well as the low stand-by power characteristic.



In this digital convergence era, Multi-Threshold CMOS (MTCMOS) is an important enabling technology that provides high performance and low power operations by utilizing both high and low threshold voltage (Vth) transistors. By using low Vth transistors in the signal path, the supply voltage (VDD) can be lowered to reduce switching power dissipation while still maintaining the performance. Although the switching power can be reduced quadratic according to the VDD reduction, the Vth that has been decreased for the performance compensation incurs an exponential increase in the sub-threshold leakage current. In fact, the increased leakage power can dominate the switching power if the voltages are scaled down aggressively. In many digital convergence applications, like a mobile media terminal, circuits spend most of their time in an idle state where no computation is being performed. During these stand-by time intervals, this large sub-threshold leakage current is the major source of the power dissipation. This static power dissipation in the stand-by mode can be reduced dramatically by using high V<sub>th</sub> transistors with very low leakage currents to gate the power supply. This is the basic rationale behind the MTCMOS technology.

The MTCMOS circuit scheme is a very efficient low-power and high-performance circuit technique that employs high  $V_{th}$ transistors to switch on and off the power supplies to the low  $V_{th}$ logic blocks [1][3]. However, MTCMOS has a serious problem that the stored data of latches and flip-flops in logic blocks cannot be preserved when the power supply is turned off (sleep mode). Therefore, extra circuits and complex timing design must be provided for holding the stored data [3]. These cause great penalties on performance, power and area of the system. In other MTCMOS schemes using diode clamps, such as auto-back gate control-MTCMOS [4] and variable power/ ground rail clamp [5], the extra circuits in the latches and flip-flops are not required and the control timing design is not complicated, but the leakage current of the logic circuits in the sleep mode cannot be sufficiently suppressed. As an alternative way to avoid such undesirable leakage, Variable Threshold CMOS (VTCMOS) has been reported. While MTCMOS requires supplementary circuit techniques to hold the latched data, VTCMOS does not need any extra circuitry to hold the latched data because it increases the threshold voltage of transistors by controlling the back gate bias instead of using a high V<sub>th</sub> power switch to suppress the leakage during sleeping periods [2]. However, VTCMOS entails additional complexity and overheads due to its triple-well structure, and substrate bias voltage generator. Also, the junction leakage and gate tunnel current becomes even worse by the substrate bias [6].

This paper is organized as follows. Section 2 introduces MTCMOS design issues such as the power switch optimization, the data-preserving flip-flop, the short circuit current due to floating inputs, and an MTCMOS design methodology in an ASIC design environment. Section 3 shows the test results of the fabricated PDA processor. Finally, the concluding remarks are given in Section 4.

# 2. Preliminaries

# 2.1 The Principles of the MTCMOS

The MTCMOS circuit technology can achieve a lower threshold voltage, and therefore, higher performance as well as smaller standby leakage current. Figure 2 illustrates the basic circuit scheme of MTCMOS.



The functional logic gates are implemented by using low  $V_{th}$  MOS transistors that are powered by the supply line (VDD) and a virtual ground line (VGND). A VGND is connected to the real ground line (GND) through a high  $V_{th}$  MOS transistor switch, Q1. MTCMOS designs have two operating modes, active and sleep. In the active mode, the Sleep Control (SC) signal is high, and the Q1 switch is turned on, directly connecting VGND to GND. Consequently, the low  $V_{th}$  logic gates operate normally and at a high speed. On the other hand, in the sleep mode, SC goes to low, and Q1 is turned off. In this state, the leakage current flows to

GND through only the Q1 transistor. Due to the high  $V_{th}$  of Q1 and its low leakage characteristic, the leakage current from the low  $V_{th}$  logic gates is almost completely suppressed.

Since the high  $V_{th}$  transistor, Q1 acts as a switch that cuts off the leakage current from logic gates in the sleep mode, we call it Current Cut-off Switch (CCS). However, due to the turn-on resistance of the CCS, the MTCMOS logic gates suffer from the ground bounce that may lead to performance degradation or malfunction. In order to alleviate this ground bounce, the channel width of the CCS can be increased. Unfortunately, the increased channel width increases not only the area overhead but also the leakage current.



Figure 3. Sleep Mode Current Comparison L18: Low V<sub>th</sub>, L18L: High V<sub>th</sub>, MTCMOS: Low V<sub>th</sub> + High V<sub>th</sub>

Towards examining the feasibility of the MTCMOS technology, we ran the SPICE simulation on an MTCMOS inverter, a low  $V_{th}$  inverter (L18) and a high  $V_{th}$  inverter (L18L) that are implemented by using a 0.18µm technology. Using the leakage current of the high  $V_{th}$  inverter and the delay of the low  $V_{th}$  inverter as the lower bounds, we measured the delay and the leakage current of the MTCMOS implementation, varying the width of the CCS. The results are depicted in Figure 3. The MTCMOS inverter with a properly selected CCS achieves lower leakage current than the high  $V_{th}$  inverter while maintaining the speed not to be degraded from that of the low  $V_{th}$  is aggressively lowered for further performance enhancement.

# 2.2 MTCMOS Design Issues

## 2.2.1 Current Cut-off Switch (CCS)

The area, and therefore, the leakage current of a CCS should be traded-off with the performance of the logic gates. However, it is not trivial to determine the best size of a CCS since the dynamic current that flows through the CCS in the active mode, and the consequent ground bounce are not known a priori.

Several approaches to sizing the CCS for guaranteeing performance constraint have been reported [7][8][9]. One of the most conservative approaches is the dedication of a CCS to each logic cell and the optimization of individual CCSs. However, such an extreme approach incurs an enormous area overhead. In order to reduce the area overhead effectively, Kao, *et. al.* [7][9] proposed hierarchical sizing in that the CCSs are shared by logic gates that do not operate simultaneously. Whereas this method

provides an upper bound on the CCS size that guarantees the performance constraint, it is not easy to identify operation characteristics of all gates and group them properly.

Mutoh, *et. al.* presented Average Current Method (ACM) as a static approach to sizing the CCS [8]. ACM can be applied to MTCMOS designs under the premise that different designs have the same average ground bounce when they consume the same average current. This makes it easy for designers to determine the minimum CCS sizes that do not degrade the required performance. We have validated the effectiveness of the ACM in the selection of the proper CCS sizes for the targeted design using the 0.18µm technology and the 1.8V supply voltage.

In the physical design of the CCS cell as an entry to the standard cell library, the electro-migration on vias and wires should be taken into account as well as the channel width because large current flows through the CCS cells in the active mode. The number of vias is determined based on the average current and the maximum allowable current per via.

# 2.2.2 Special Cells for MTCMOS Design 2.2.2.1 Complementary Flip Flop (CPFF)

As a low-power and high-speed circuit technique in low supply voltage Very Large Scale IC (VLSI), MTCMOS is a very effective scheme that uses high  $V_{th}$ , low leakage transistors to switch on and off the power supplies to low  $V_{th}$ , high speed logic blocks. However, due to the lack of data preserve-ability of standard latches and flip-flops, extra circuits and complex timing design must be provided. These also degrade the performance, power, and area characteristics of the design, and increase the complexity. We have been working on the circuit technique that embeds a power-off-proof data-preserving latch, and obviates the complicated timing control design.



#### Figure 4. Complementary Pass-transistor Flip Flop (CPFF)

The circuit diagram of the MTCMOS data-preserving Complimentary Pass-transistor Flip-Flop (CPFF) is shown in Figure 4 [10][11]. Basically, it consists of low  $V_{th}$  MOSFETs, except for the data-preserving latch C1-2 that is composed of high  $V_{th}$  MOSFETs and directly connected to the real power supply lines, VDD and GND. The proposed CPFF is a positive edgetriggered flip-flop. The CPFF operates as follows.

① Clock 'Low': Since N1 and N2 are turned off and N3 and N4 are turned on, the static latch C1-2 holds the previous state.

The new state on the complementary inputs to the latch, **data** and **data\_b** should be ready for sampling.

- Clock to 'High': At the rising edge of the clock, N1 and N2 are turned on while N3 and N4 stay on for a short interval that is determined by the delay of the inverter chain (I1-3). During this interval, data and data\_b are passed through N1 and N3, and N2 and N4, respectively, and sampled into the latch. After the short sampling interval, N3 and N4 are turned off, and Q and Q\_b are decoupled from the data input. Therefore, the CPFF behaves as a positive<sup>1</sup> edge-triggered flip-flop.
- ③ Sleep Mode: As all output nodes of logic gates become floating, the gate states of N1-4 will be unknown. In order to cut off the leakage current path, the last stage of the inverter chain, I3 has been replaced with a NOR gate as shown in Figure 4. The high V<sub>th</sub> PMOS, Q1 effectively cuts off the leakage current when it is turned off. In the sleep mode, SCB goes high, ck\_b becomes low, and the high V<sub>th</sub> NMOS pass-transistors, N3-4 are turned off so that the data stored in C1-2 can be retained. At the transition from the sleep mode to the active mode, SCB is set to low a little later than the power-up. This delay prevents the destruction of data on the latch, C1-2 by delaying the turn-on of N3-4 until the input data becomes valid.

Since any additional storage and control to save and restore the state as in [3] are not required, the area overhead is relatively much smaller, and the timing design has been quite simplified.





Figure 5. Power Lines of Logic Cells

The layout implementation is one of the hard challenges in the MTCMOS design. Especially, the power architecture of the primitive cells may bring up additional MTCMOS-driven issues on Placement and Routing (P&R) tools. As shown in Figure 5, a horizontal VGND line is added to the conventional power architecture so that any cell can access the GND as well as a VGND. Although this additional virtual ground line incurs about 12% of the area overhead, the conventional P&R methodology can be used with minor modification using commercially available tools. Besides, most MTCMOS logic cells can be developed by a straightforward transformation from the existing cells in the generic library.

### 2.2.3 Floating Input Induced Short-Circuit Current

A System-On-Chip (SOC) design includes a variety of IPs such as processors, memories, and analog components. Some of these IPs may not be implemented by using the MTCMOS technology.

<sup>&</sup>lt;sup>1</sup> A negative edge triggered flip-flop can be easily designed by inverting the clock input.

These non-MTCMOS IPs are directly powered by VDD and GND, and therefore, *vigilant* (always awake) even in the sleep mode. However, since the output nodes of all MTCMOS gates get floating as the VGND gets floating in the sleep mode, the floating inputs to the *vigilant* IPs can cause very large short-circuit current that flows from VDD to GND directly as shown in Figure 6. For an inverter as an example, the leakage current caused by floating inputs can vary from several micro- to milli-amperes. This also happens at the interface to I/O cells.



Figure 6. Floating Input Induced Short-Circuit Current

To eliminate this leakage current, we insert a data-holding circuit that is composed of a tri-state buffer and a level holder at the output port of an MTCMOS logic gate which is the input to a *vigilant* IP as shown in Figure 7. This data-holding circuit is a *vigilant* cell as well, and called Floating Prevention Circuit (FPC). In the sleep mode, the **SCB** signal goes to high, the output of the tri-state buffer goes to the high impedance state, and the state on the latch is preserved.



**Figure 7. Floating Prevention Circuit** 

## 2.2.4 MTCMOS Power Management

In a leakage-power-conscious application like mobile computing, a Power Management Block (PMB) is embedded, and supposed to disconnect dormant sub-blocks from the power supply in the sleep mode. Our MTCMOS technology requires only two global control signals, Sleep Control (SC) and Sleep Control Bar (SCB) to switch from the active mode to the sleep mode (cool-down process), or *vice versa* (warm-up process). The timing diagram of SC and SCB that are activated by the SLEEP command signal is shown in Figure 8. The cool down time, T1 is necessary to isolate the data-preserving latches in CPFFs before floating VGNDs corrupt their inputs. During the warm-up time, T2, the charges on the VGND lines are drained through CCS cells. The RC time constant of VGND will determine T2.



| ACTIVE |    | SLEEP                                 |    | ACTIVE |
|--------|----|---------------------------------------|----|--------|
| SLEEP  |    |                                       |    |        |
| SCB    |    | · · · · · · · · · · · · · · · · · · · |    |        |
| SC     |    |                                       |    |        |
|        | Т1 |                                       | T2 | 1      |

(b) MTCMOS Control Signals

Figure 8. MTCMOS Power Management Control



Figure 9. MTCMOS Design Flow

## 2.3 MTCMOS Design Flow

The conventional design methodology requires additional steps to take into account the MTCMOS-driven issues as shown in Figure 9. Initially, the RTL code should include a PMB so that SC and SCB are generated. After synthesizing the RTL code in step 1, all the flip-flops in the netlist are replaced by CPFFs. Also, FPCs are inserted at the interface to vigilant IPs and I/O cells. The CCS cells are not included in this netlist, yet, but are inserted in additional steps 2, 4, and 6. After the power consumption is estimated in step 2, the corresponding average current, and in turn, the aggregate channel width of the CCSs are calculated by using the ACM. Based on this CCS size, and the floor plan from step 3, a set of rules not to exceed the maximum allowable ground bounce is decided. At least a CCS must be placed in a non-empty placement row, and the distance between CCSs must not farther than the maximum allowable value. Complying with these rules, a script automatically finds optimal locations on the placement rows to insert the CCS cells, and connects them to the Sleep Control (SC) signal in step 4. Step 5 is almost the same as the conventional P&R process. The SC and SCB signals should be routed by using the Buffer Tree Synthesis (BTS) technique. During the BTS, many buffers are inserted along the signals. Vigilant cells must be used for these buffers. Figure 10 shows a

simplified example of the layout implementation. Finally, while the netlist is checked for floating nodes in the sleep mode, the layout is checked against the CCS placement rules for a double check.



Figure 10. A Layout Example with Placed CCSs

## **2.4 Performance Degradation Due to CCSs**

Towards evaluating the performance degradation due to the CCSs, a 16-bit DSP has been implemented by using the 0.18 $\mu$ m 1.8V technology, and the proposed MTCMOS methodology. The module size is 958 x 957 $\mu$ m<sup>2</sup>, and 342 CCSs with the size of 5 $\mu$ m are inserted. The transistor level simulation shows that the ground bounce is 9 mV on average, varies up to 49 mV, and results in 2% of performance degradation.

# 3. PDA Application and Results

The proposed MTCMOS design techniques have been validated on a 32-bit RISC microprocessor for hand-held devices like PDAs and general applications with low power and high performance requirements. The design has been fabricated and fully tested. Figure 11 shows a microphotograph of the chip. The chip characteristics are summarized in Table 1.

| Chip size         | 5.7mm x 5.7mm       |  |  |
|-------------------|---------------------|--|--|
| CCS width (total) | 18mm                |  |  |
| Process           | 0.18µm 5-metal CMOS |  |  |
| Gate count        | 1,914,895           |  |  |
| Power Dissipation | 270mW               |  |  |

**Table 1. Chip Features** 

Figure 12 shows the shmoo plot for the operating frequency, and the supply voltage. The frequency varies from 172MHz at 1.0V to 333MHz at 1.8V. The leakage power in the sleep mode is about  $2\mu W$  that is 6000 times smaller than that of the non-MTCMOS implementation.

## 3. Conclusions

An MTCMOS design methodology has been developed, and validated on a PDA processor. MTCMOS-driven techniques such as sizing and insertion of Current Cut-off Switches (CCSs) to

suppress the ground bounce, the Complementary Pass-transistor Flip-Flop (CPFF) to preserve the data in the sleep mode, the Floating Prevention Circuit (FPC) to prevent short-circuit current, and the simplified MTCMOS power management are integrated into the conventional design flow using the commercially available tools. The test results of the fabricated PDA processor show three orders of reduction of the leakage current in the sleep mode without performance degradation.



Figure 11. Photograph of the Chip



Figure 12. Shmoo plot

## 4. REFERENCES

- S. Mutoh, et al., "A 1V Multi-Threshold Voltage CMOS DSP with an Efficient Power Management Technique for Mobile Phone Application", ISSCC, 1996.
- [2] T. Kuroda, et al., "A 0.9V 150MHz 10mW 4mm2 2-D Discrete Cosine Transform Core Processor with Variable-Threshold Voltage Scheme" ISSCC, 1996.
- [3] S. Shigematsu, S. Mutoh, et al, "A 1-V high-Speed MTCMOS Circuit Scheme for Power-Down Application Circuits", IEEE Journal of Solid-state Circuits, 1997.

- [4] H. Makino, et al., T. Shimizu and T. Arakawa, "An Auto-Backgate-Controlled MT-CMOS Circuit", Symp. on VLSI Circuits Digest of Technical Papers, 1998.
- [5] K. Kumagai, et al., "A Novel Powering-down Scheme for Low Vt CMOS Circuits", Symp. on VLSI Circuits Digest of Technical Papers, 1998.
- [6] A. Keshavarzi, et al, "Technology scaling behavior of optimum reverse bias for standby leakage power reduction in CMOS IC's", ISLPED, 1999.
- [7] James Kao, Siva Narendra and Anantha Chandrakasan, "MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns", DAC, 1998.
- [8] S. Mutoh, et al., "Design Method of MTCMOS Power Switch for Low-Voltage High-Speed LSIs", ASP-DAC, 1999.
- [9] J.T. Kao and A.P. Chandrakasan, "Dual- Threshold Voltage Techniques for Low-Power Digital Circuits", Journal of Solid-State Circuits, 2000.
- [10] K.T. Park, H.S. Won et al., "A New Low -Power Edge-Triggered and Logic-Embedded FF Using Complementary Pass-Transistors Circuit", ITC-CSCC, 2001.
- [11] K.T. Park, H.S. Won et al., "Low-Power Data-Preserving Complementary Pass-Transistor-Based Circuit for Power-Down Circuit Scheme", SSDM, 2001.