# Implementation of a 5x5 trits multiplier in a Quasi-Adiabatic Ternary CMOS Logic

Diego Mateo and Antonio Rubio

Electronic Engineering Department, Univ. Politècnica de Catalunya Campus Nord, Mòdul C-4, C. Gran Capità s/n, 08034 Barcelona, Spain

#### Abstract

Adiabatic switching is one technique to design low power digital IC. In order to diminish its expensive silicon area requirements an adiabatic ternary logic is proposed. A 5x5 trits (ternary signals) multiplier has been designed and implemented using this logic in a  $0.7\mu m$  CMOS technology. Results show a satisfactory power saving and a decreasing of the area needed with respect to an adiabatic binary one.

#### 1 Introduction

Design of digital very low-power integrated circuits has become a strategic topic of research nowadays. Different techniques at the different levels of the design can be applied for achieving low consumption. Adiabatic switching is one of these techniques [1, 2, 3, 4], which is based on two basic principles: slowing-down the transport of charge, and recovering of the charge stored in the parasitic capacitors. A drawback of adiabatic techniques is the increase of the silicon area required, due basically to the implementation of computational reversibility needed to get the recovery of charge [1]) and to the routing of the multiple non-constant power supplies used. An alternative to face this problem is presented in this paper: a *Quasi-Adiabatic Ternary* (QAT) *CMOS Logic* is proposed in order to get the ternary circuit benefits of reducing the area [7]. Their basic gates are presented in section 2. Their interconnection in a two phase pipeline is shown in section 3, and a 5x5 trits multiplier is implemented and its performances are measured. In section 4 conclusions are summarised.

### 2 Basis of the QAT CMOS Logic

Adiabatic techniques usually consider four basic phases in one computational cycle for each logic stage (fig. 1): (1) Input validation; (2) Output information actualisation (power supplies -clocks- are activated by slow ramp signals, computing the input information); (3) Output evaluation (the value of the output is used by the next gate); and (4) Output information recovery (the clocks are de-activated in an inverse manner than in (2), returning the output to its previous value). The algebra used to implement the ternary logic is the Yoeli-Rosenfeld algebra, which allows easily integrated CMOS implementations [5]. The three logic levels ('-1', '0', '1') are represented by the voltage levels  $V_{-1}$ ,  $V_0$ , and  $V_1$  respectively, with  $V_1 > V_0 > V_{-1}$ . Ternary gates presented here are based on the dynamic ternary gates shown in [5].



Figure 1: (a) Implementation of the 3 ternary inverters. (b) Clocks and output of the STI.



Figure 2: (a) Structure of the implementation of a general function in the QAT logic. (b) Sequence of computing of the power supplies for one phase of the pipeline.

are made from conventional binary CMOS structures with a maximum positive power supply voltage  $(V_1)$  chosen in such a way that when an intermediate voltage  $(V_0)$  is applied to the input of the gate both types of transistors P and N are off. This condition implies that  $V_0 = V_{tn} - \Delta_n$ and  $(V_1 - V_0) = |V_{tp}| - \Delta_p$ , being  $\Delta_n, \Delta_p \ge 0$ . Undesired charge drift may appear due to the subthreshold conduction when the output must hold its precharged value (both nets off). In order to avoid this effect,  $\Delta_n$  and  $\Delta_p$  must be strictly positive (usual values are about 0.3V).

Special clocks are used as power supplies to make all these gates adiabatic, giving and recovering the energy used to compute. They use slow ramp signals to achieve adiabatic transfer of charge. The adiabatic Simple Ternary Inverter (*STI*) is implemented from a conventional CMOS inverter by using the clock signals  $\phi_{pn}$  and  $\phi_{np}$  as positive and negative power rails (fig. 1). When  $V_{in} = V_0$  (2nd cycle in fig. 1b) both transistors are off, and the output remains at its precharged voltage ( $V_0$ ) after the output actualisation phase. When  $V_{in} = V_{-1}$  (1st cycle), the PMOS is ON, and in the output actualisation phase  $V_{out}$  follows  $\phi_{pn}$  from  $V_0$  to  $V_1$ . When  $V_{in} = V_1$  (3rd cycle) the NMOS is ON and  $V_{out}$  follows  $\phi_{np}$  from  $V_0$  to  $V_{-1}$ . The output is returned to its precharge value  $V_0$  in the recovery phase. Because of  $\Delta_n$  and  $\Delta_p$  switching is not fully adiabatic since  $V_{ds}$  (in N and P transistors) may be different to 0 during a short moment in the actualisation phase ( $t_1$  and  $t_2$  in fig. 1b). Two parts may be distinguished in this quasi-adiabatic switching: the first one is non-adiabatic and its energy waste, taking  $V_{tn} = |V_{tp}|$ and  $\Delta_n = \Delta_p = \Delta$ , is  $\frac{1}{2}C_L\Delta^2$ ; the second one is fully adiabatic and its energy waste has the typical dependence in adiabatic circuits of  $T^{-1}$  [1] (T is the period of charging/decharging).

The values of  $\Delta_n$  and  $\Delta_p$  should be selected in order to minimise this non-adiabatic consumption, but keeping transistors in a safe off state. Moreover, because of  $\Delta_n$  and  $\Delta_p$ , output voltage does not return to the desired precharge value in the recovery phase. A refreshment technique is proposed to solve this requirement. A CMOS transmission gate is used to refresh the output, activated by  $\phi_{ref}^p$  and  $\phi_{ref}^n$ . The energy waste in this non-adiabatic transport of

|                         | # devices | # clocks +  | power $[\mu W]$ | delay $[\mu s]$    | PDP [pJ] | Alim. [V]   |
|-------------------------|-----------|-------------|-----------------|--------------------|----------|-------------|
|                         |           | power lines |                 |                    |          |             |
| QAT 5x5 trits           | 3850      | 18 + 3      | 8               | 10                 | 80       | $V_0 = 0.7$ |
| $\operatorname{mult}$ . |           |             |                 |                    |          | $V_1 = 1.4$ |
| Fully Ad.               | 6300      | 48 + 3      | 0.1             | 1.6                | 0.16     | 5           |
| SCRL 8x8                |           |             |                 |                    |          |             |
| bits mult. [2]          |           |             |                 |                    |          |             |
| Fully Ad.               | 32000     | 4+3         | 0.5             | 0.4                | 0.2      | 5           |
| CRL 8x8 bits            |           |             |                 |                    |          |             |
| mult. [6]               |           |             |                 |                    |          |             |
| Static CMOS             | 2400      | 0+2         | $52 \cdot 10^3$ | $20 \cdot 10^{-3}$ | $10^{3}$ | 5           |
| 8x8 bits mult.          |           |             |                 |                    |          |             |

Table 1: Comparison between different multipliers.



Figure 3: (a) Photograph of the IC. (b) Measurements: p0 and p1 are the trits 0 and 1 of a certain product, and  $\phi_{32}^{pn}$  and  $\phi_{32}^{np}$  are the cloks  $\phi_3^{pn}$  and  $\phi_3^{np}$  of the second phase of the pipeline.

charge is  $\frac{1}{2}C_L\Delta^2$ . Positive and Negative TI have a structure and behavior similar to the STI. In fig. 2 the structure of a generic function f is shown. Variables  $x_i^{-1}$ ,  $x_i^0$ ,  $x_i^1$  and  $x_i^{-10}$  are four unary functions of each input variable  $x_i$  [5], and the bloc f is implemented by a one level structure, where the N and P nets are in general not complementary: N net implements '-1', and P net implements '1'; when both nets are off the output will remain at '0'.

## **3** Interconnection of QAT gates

Pipeline techniques are used when adiabatic basic gates are interconnected to build complex functions, in order to have a good throughput [1]. Different adiabatic pipelines have been previously implemented, using between 4 clocks (in [3], but dissipation is  $\propto C_L V_{dd} V_t$ ) and 48 clocks [6]. In order to recover the stored charge computational reversibility is usually applied. Breaking the reversibility in some points saves area (it is not necessary to implement a fully reversible computer) but it produces an extra waste of energy in these points, where charge can not be recovered from. In the presented logic the computation is done quasi-adiabatically in each block by a local retractile cascade that uses 10 clocks (fig. 2), and reversibility is broken at the end of the block. A total of 18 clocks are used to implement a 2 phases depth pipeline ( $\phi_2^p$ ) and  $\phi_2^n$  of phase 1 are common to  $\phi_2^n$  and  $\phi_2^p$  of phase 2). To evaluate QAT logic an experimental ASIC (a 5x5 trits multiplier) has been implemented using this two phases pipeline in the ecpd07ES2 CMOS technology. The IC has a total area of  $6mm^2$  and its photo can be seen in fig. 3. In table 1 a comparison between 4 different multipliers is shown. This comparison is done in function of the area, consumption and delay of each one. The delay has been defined as the time needed to carry out one operation. The area is evaluated in function of the number of devices and number of power supplies. The parameter used to compare the global performance of the different multipliers is the Power-Delay Product (PDP). The adiabatic binary multipliers are implemented and their performances evaluated from the logic shown in [2] and [6]. QAT multiplier performances are measured by using a Tek-DSA602A Digitizing Signal Analyzer (fig. 3b). Power supplies generated to make the measurements use exponencial waves instead of ideal ramps. The power dissipated shown in table 1 is only the xip consumption, without the dissipation due to the clocks generation, since those have not been implemented yet with ability to recover energy: taking into account the clocks generation efficiency is in our actual investigation.

From table 1 the following results can be summarised: the PDP of the QAT-5x5-mult is worse than the PDP of the adiabatic binary-8x8-mult due to the non fully adiabatic switching and the breakage of reversibility, but it is still more than one order of magnitud better than the PDP of conventional binary CMOS 8x8 mult. The saving of area of the QAT-multiplier in front of the fully adiabatic but binary one is 60% in number of devices, as well as the intrinsic benefit in routing because of having 5 trits in front of 8 bits.

#### 4 Conclusions

A new low-power logic has been presented, which uses quasi-adiabatic switching and partial energy recovery. A special feature of this logic is to be ternary, in order to diminish the area needed with respect to other adiabatic binary logics, keeping a satisfactory power saving. A 5x5 trits multiplier has been implemented using this logic. Measured results show a power-delay product 12 times better than a conventional 8x8 bits CMOS multiplier and an area saving with respect to a fully adiabatic 8x8 bits multiplier of 60%. As future work power supplies with capability to recover the energy must be included in the design.

#### Acknowledgments

This work has been granted by the Spanish Research Commission (CICYT) under project contract number TIC95-0469.

## References

- W. C. Athas, L. "J." Svensson, J. G. Koller, N. Tzartzanis, and E. Y. Chou. "Low-power Digital Systems Based on Adiabatic-Switching Principles". *IEEE Transactions on very large scale integration (VLSI) systems*, 2(4):398-407, dec 1994.
- [2] S. G. Younis and T. F. Knight. "Asymptotically zero energy split-level charge recovery logic". In *International Workshop on Low Power Design*, pages 177–182, 1994.
- [3] A. G. Dickinson and J. S. Denker. "Adiabatic dynamic logic". *IEEE Journal of Solid-State Circuits*, 30(3):311–315, mar 1995.
- [4] D. Mateo and A. Rubio. "Quasi-adiabatic ternary CMOS logic". *Electronics Letters*, 32:99–101, 1996.
- [5] J. S. Wang, C. Y. Wu, and M. K. Tsai. "Low power dynamic ternary logic". *IEE Proceedings*, 135(6):221–230, dec 1988.
- [6] S. G. Younis. "Asymptotically Zero Energy Computing with Split-Level Charge Recovery Logic". PhD thesis, MIT, jun 1994.