# VLSI IMPLEMENTATION OF A MICROPROCESSOR COMPATIBLE 128- BIT PROGRAMMABLE CORRELATOR

#### A THESIS

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING AND THE INSTITUTE OF ENGINEERING AND SCIENCES OF BILKENT UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE



By Ismail Ents Ungan May, 1989

# VLSI IMPLEMENTATION OF A MICROPROCESSOR COMPATIBLE 128-BIT PROGRAMMABLE CORRELATOR

A THESIS SUBMITTED TO THE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING AND THE INSTITUTE OF ENGINEERING AND SCIENCES OF BILKENT UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

> By İsmail Enis Ungan May 1989

İsmail Enis Ungan tarafıadan başışlanaıştır.

# Thesis TK 7874 Un 3 1989 B 1870

to my lovely family and friends

I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.

Assoc. Prof. Dr. Abdullah Atalar(Principal Advisor)

I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.

Leventhia

Dr. Levent Onural

I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.

Asst. Prof. Dr. Mehmet Ali Tan

Approved for the Institute of Engineering and Sciences:

Prof. Dr. Mehmet Baray, Director of Institute of Engineering and Sciences

### ABSTRACT

## VLSI IMPLEMENTATION OF A MICROPROCESSOR COMPATIBLE 128-BIT PROGRAMMABLE CORRELATOR

İsmail Enis Ungan M.S. in Electrical and Electronics Engineering Supervisor: Assoc. Prof. Dr. Abdullah Atalar May 1989

A single chip microprocessor compatible digital 128-bit correlator design is implemented in 3  $\mu$ m M<sup>2</sup>CMOS process. Full-custom design techniques are applied to achieve the best trade off among chip size, speed and power consumption. The chip is to be placed in a microprocessor based radio communication system. It marks the beginning of a synchronous data stream received from a very noisy channel by detecting the synchronization (sync) word. Two chips can be cascaded to make a 256-bit correlator. It is fully programmable by a microprocessor to set the number of tolerable errors in detection and to select the bits of the 128-bit (or 256-bit) data stream to be used in the correlation. The latter feature makes the correlator capable for use in detection of distributed sync words and PRBS generation.

The silicon area of the chip and hence the chip cost is minimized by reducing the gate count in the logic design, by keeping the transistor sizes minimum without avoiding the timing specifications of the design and by a proper placement (floor plan) of the transistors on the silicon. The layouts are laid in a hierarchical manner. Unused areas are minimized and the layouts are designed in compact forms. During the layout design, charge sharing, body effect, latch-up, metal migration, noise and clock skew problems are considered.

Mainly, the softwares, Magic, Spice, Esim and Rnl are used for layout editing, timing and function simulations. These programs are run on SUN workstations under 4.3 BSD UNIX<sup>1</sup> operating system.

Keywords: Correlator, chip, VLSI.

<sup>&</sup>lt;sup>1</sup>UNIX is a Trademark of Bell Laboratories.

# ÖZET

# BİR MİKROİŞLEMCİ UYUMLU 128-BİT PROGRAMLANABİLEN KORELATÖRÜN ÇOK YÜKSEK YOĞUNLUKLU TÜMLEŞİK DEVRE OLARAK GERÇEKLEŞTİRİMİ

İsmail Enis Ungan Elektrik ve Elektronik Mühendisliği Bölümü Yüksek Lisans Tez Yöneticisi: Doç. Dr. Abdullah Atalar Mayıs 1989

3 mikron M<sup>2</sup>CMOS teknolojisinde üretilecek mikroişlemci uyumlu sayısal 128-bit korelatör tasarımı tek yonga içerisinde gerçekleştirildi. Full-custom tasarım yöntemleri yonga büyüklüğüne, çalışma hızına ve güç tüketimine en iyi çözümü sağlamak için uygulandı. Yonga, mikroişlemci kontrollu bir radyo haberleşme sisteminde yer alacaktır ve çok gürültülü bir kanaldan alınacak eş zamanlı veri akışının başlangıcını senkron sözcüğü yakalayarak belirleyecektir. İki yonganın kademeli bağlanmasıyla 256-bit korelatör yapılabilmektedir. Bir mikroişlemci tarafından tümüyle programlanarak yakalama sırasında tolere edilebilecek hata sayısı belirlenmekte ve korelasyonda kullanılacak bitler 128-bit ya da 256-bitlik veriden seçilmektedir. Anılan son özellik, korelatörün dağıtılmış senkron sözcüklerin yakalanmasında ve yalancı rasgele ikili seri (PRBS) üretiminde kullanılmasını sağlamaktadır. Yonganın silikon alanı ve dolayısıyla ederinin, mantık tasarımındaki kapı sayılarının azaltılmasıyla, zamanlama belirlemelerini sağlayacak en küçük tranzistör büyüklüklerinin seçilmesiyle ve tranzistörlerin silikon alanı üzerine uygun yerleştirimiyle en az olması sağlandı.

Yonganın serimleri hiyerarşik biçimde yapıldı. Kullanılmayan silikon alanlar en aza indirildi ve serimler sıkıştırılmış biçimlerde tasarlandı. Serim tasarımında elektrik yükü paylaşımı, gövde etkisi, kitlenme (latch-up), metal göçü, gürültü ve saat çakışımı (clock skew) problemleri ele alındı.

Temelde **Magic**, **Spice** ve **Esim** yazılımları serim çiziminde, zamanlama ve işlevsel simülasyonlarda kullanıldı. Bu programlar 4.3 BSD UNIX<sup>2</sup> işletim sistemi altında SUN bilgisayar sistemlerinde çalıştırıldı.

Anahtar kelimeler : Korelatör, yonga, VLSI.

<sup>&</sup>lt;sup>2</sup>UNIX, Bell Laboratuvar larının ticari markasıdır.

## ACKNOWLEDGEMENT

I wish to thank the research assistants in VLSI group; Satılmış Topcu and Mustafa Karaman for their valuable ideas and helps. Many thanks also to Şenol Toygar who is the original designer of the correlator, to Nesip Aral, Tuncay Ergün, Oğuz Şener (all from ASELSAN) who spent many times on the improvement of the correlator system design.

A special acknowledgement is also due to Assoc. Prof. Dr. Abdullah Atalar for his constructive suggestions.

Finally, a hearty thanks to the staff research assistants who have been helpful on computer systems.

# TABLE OF CONTENTS

| 1 | IN' | TRODUCTION                                                         | 1  |
|---|-----|--------------------------------------------------------------------|----|
| 2 | TH  | E LOGIC AND THE CIRCUIT DESIGN                                     | 4  |
|   | 2.1 | Introduction                                                       | 4  |
|   | 2.2 | Gate and transistor reduction                                      | 6  |
| 3 | TH  | E LAYOUT DESIGN                                                    | 10 |
|   | 3.1 | Introduction                                                       | 10 |
|   | 3.2 | Determination of the transistor sizes                              | 13 |
|   | 3.3 | Floor planning                                                     | 17 |
|   | 3.4 | Latch-up, metal migration, and noise                               | 24 |
|   | 3.5 | The layouts and the simulations of the lowest level cells $\ldots$ | 28 |
|   | 3.6 | Higher level cell layouts and routing                              | 30 |
|   | 3.7 | Clock Distribution                                                 | 38 |
|   | 3.8 | Top level cell simulations                                         | 43 |
|   | 3.9 | Power rails                                                        | 47 |
|   |     |                                                                    |    |

### 4 CONCLUSION

| REFERENCES | 55  |
|------------|-----|
| APPENDIX A | 56  |
| APPENDIX B | 58  |
| APPENDIX C | 59  |
| APPENDIX D | 61  |
| APPENDIX E | 101 |

# LIST OF FIGURES

| 2.1  | Simplified architecture design of the correlator               | 5  |
|------|----------------------------------------------------------------|----|
| 2.2  | A typical full adder circuit diagram.                          | 7  |
| 2.3  | Architecture of the 1's counter.                               | 8  |
| 3.1  | Illustrative layout of an inverter                             | 12 |
| 3.2  | The block diagram used in the delay analysis                   | 14 |
| 3.3  | Minimum size MOSFETs layouts                                   | 16 |
| 3.4  | Blocks interconnection model                                   | 18 |
| 3.5  | Initial floor plans                                            | 19 |
| 3.6  | U-block and array structure of U-blocks                        | 20 |
| 3.7  | 1-bit and 2-bit adders embedded into the SRMC block $\ldots$ . | 22 |
| 3.8  | Placement of 3, 4, 5, 6, 7 and 8-bit adders and their sizes    | 23 |
| 3.9  | Up-date floor plan                                             | 24 |
| 3.10 | (a) Capacitive coupling (b) Resistive coupling                 | 27 |
| 3.11 | The location of the top level cells in BAC128                  | 36 |
| 3.12 | Routing of the higher level cells.                             | 37 |

| 3.13 | Clock skew analysis. The state of the msdff cells are shown                       |    |
|------|-----------------------------------------------------------------------------------|----|
|      | for all intervals                                                                 | 39 |
| 3.14 | The logic diagram of the clock driver.                                            | 40 |
| 3.15 | Spice plots for clock skew effects. (a) $\Delta T=5$ nsec, (b) $\Delta T=7$ nsec, |    |
|      | (c) $\Delta T=10$ nsec                                                            | 42 |
| 3.16 | Power rail distribution of the BAC128                                             | 51 |
| 4.1  | BAC128 pin identification                                                         | 54 |

## 1. INTRODUCTION

Digital communication systems are replacing analog systems at an increasing rate. Synchronous digital systems are usually preferred for their speed, error checking and correction efficiency. The data streams are sent typically in packets, and the beginning of the packet is marked with, so called, a synchronizing word (sync-word). The bit length of the sync-word is a critical parameter for the sync-word detection probability, data integrity and eventually the reliability of the whole communication system. A too short sych-word may cause too many false sync-word detections, whereas a too long one will cause a speed degradation. The bit length of the sync-word will in general depend on error rate and on the desired reliability. If the communication channel is very noisy (this is the case in a HF radio communication system), the task of detecting the sync-word is not trivial. A simple shift register and a digital comparator may never find the sync-word because of the high error rate. In this case, a more tolerant detection scheme is necessary. Reliability and data integrity is especially important in military applications. A digital correlator is found to be a good solution [1] for this problem.

A correlator with 32-bit length sync-word has been designed and implemented from SSI and MSI circuit components on a printed circuit board by ASELSAN [1]. Obviously, a 128-bit length correlator would do much better than a 32-bit one in terms of sensitivity and security. Therefore, a 128-bit correlator was designed to be used as a programmable peripheral device through a microprocessor in the communication system [2]. Unfortunately, the design involved about 5000 gates which would require too large area on the printed circuit board to be used in portable and light communication equipment. Instead, a single chip 128-bit correlator, involving 5000 gates, would reduce the communication device size drastically.

In this work, the 128-bit correlator design is modified for the chip implementation and additional features are added. The system modifications jointly carried out by S. Topçu and the author and are explained in detail in [3]. The chip is fully programmable by a microprocessor to set the number of tolerable errors in detection of the inverted or non-inverted sync-word and to select the bits of the 128-bit long serial data stream to be used in the correlation. The latter feature makes the chip capable of detecting the distributed sync-words and pseudo random binary sequence (PRBS) generation. The resultant chip is called BAC128. When and if required, two such chips can be cascaded to make a 256-bit correlation.

The 128-bit microprocessor programmable correlator design is implemented in very large scale integration (VLSI) in 3 micron double metal complementary metal-oxide semiconductor (CMOS) technology. Full-custom structured design style is used to provide the best performance (speed, power) and the smallest die size (silicon area). An alternative design style was the gate array design. Its design time is typically much less than that of the full custom design because the layout design is completely done by the software. On the other hand, usually a large number of transistors are left unused. So the wasted silicon area can be very large in the gate arrays. Another alternative design style was the standard cell design (semi-custom). In this design style, the predesigned cells are used from a standard cell library to construct the overall design of the chip. The design time with the standard cells is also much less than that of the full-custom design. But the speed and power are sacrificed because of the fixed sized standard cell blocks that fixes the transistor sizes in the cells, hence limits the speed and puts a lower limit to the power consumption. The area consumption is also larger if there happen to be some special function blocks in the design that does not exist in the standard cell library. Finally, the full-custom is the design style with which the most valuable experience can ever be gained in designing of VLSI chips.

During the correlator chip design, an interactive layout editor (Magic) with circuit extraction, CIF, GDS-II files generation and on-line design rule checking capabilities, functional and timing simulators (Esim, Rnl, Spice) are used as the computer aided design (CAD) tools [4], [5]. These tools run on SUN workstations (SUN 3/50, SUN 3/160, SUN 3/110) under 4.3 BSD UNIX operating system. In addition to these tools, Spiceview, Spiceplot and CIFplot programs have been developed for viewing the Spice outputs on SUN, plotting the Spice output and design layout from the CIF file on a CALCOMP plotter.

# 2. THE LOGIC AND THE CIRCUIT DESIGN

#### 2.1 Introduction

The architecture of BAC128 chip is composed of blocks and modules that are found in a hierarchical structure which reduces the complexity of the system design [6]. The architecture design is shown as a simplified block diagram in figure 2.1. It is basically composed of shift, reference, mask, status, threshold registers, a comparator, an integrator, a decision maker and a controller. For 128-bit correlation or PRBS generation, it becomes a slave chip and operates in the slave mode, whereas for 256-bit correlation, it is a master chip and operates in the master mode. A  $\mu P$  can access all the registers in BAC128 chip through the 8-bit data bus and 3-bit address bus. Each of the shift, reference and mask register is loaded by the 8-bit wide serial data in 16 clock cycles. Similarly, the threshold register is loaded in two clock cycles and the status register in a single cycle. The most significant 8 bits of the shift register, the integrator output and three bits of the status register, holding the states of the clock signal and the decision maker outputs are readable through the data bus. The detailed description of the BAC128 architecture design is found in [3].

In the CMOS technology, there are various logic structures; complementary static, dynamic, pseudo nMOS, domino, and clocked CMOS. Among these, the complementary static logic structure is selected for BAC128, because it takes less design time and is more reliable and easier than the other logic structures. Therefore, this logic structure provides greater probability



Figure 2.1: Simplified architecture design of the correlator.

of the first time successful chip than the other logic structures.

In the logic and circuit design part of BAC128, extensive effort is spent to realize the architectural design with the minimum number of gates and transistors in order to minimize the silicon area and to increase the performance. In the logic design part, the logic blocks are functionally merged to find a way of reduction in the gate count. That is, instead of considering the logic blocks separately, a number of logic blocks which are functionally related to each other are designed as a single logic block. In this way, the logic blocks' gate counts and the propagation delays are reduced. In the circuit design part, some of the logic blocks are designed at the transistor level rather than at the gate level to reduce the number of transistors in the logic blocks. Also, the connection structure of the transistors are designed to minimize the gate delay, body effect and charge sharing which affect the performance of the circuit [7]. The body effect is the term given to the threshold voltage change due to the change in substrate (bulk) and source bias of a transistor. The body effect is reduced by placing the transistors whose gates have the latest arriving signals, nearest to the output of the gate. The switching time of a gate is increased by reducing the source and drain capacitances at the output of the gate, so, parallel connected transistors are placed nearest to the ground node. In the case of coincidence of the transistor placing strategies, the layout of the circuits are considered for area minimization. In the following section, gate and transistor count reduction in the blocks of the architecture design are described.

#### 2.2 Gate and transistor reduction

The 1's counter found in the integrator block is composed of half and full adders connected as an inverse binary tree structure. The source of the integration delay is mainly the sum and carry propagation delays in the half and full adders. The purpose is to find a way of removing the inverters at the sum and carry outputs of the adders, and to design a logic that performs correct addition operation when the adders are connected as an inverse binary tree structure. For this purpose, a functional analysis is made on the adders. A typical full adder circuit diagram is shown in figure 2.2. It has a carry and a sum stage. The output function,  $F_C$ , of the carry stage and the output function,  $F_S$ , of the sum stage are given by,

$$F_C = C (A + B) + AB$$
  

$$F_S = ABC + \overline{F_C} (A + B + C)$$

If all the inputs are inverted, the output function,  $G_C$ , of the carry stage and the output function,  $G_S$ , of the sum stage is found as,

$$G_C = \overline{C} (\overline{A} + \overline{B}) + \overline{A} \overline{B}$$
  
$$\overline{G_C} = \overline{C} (\overline{A} + \overline{B}) \bullet \overline{\overline{A}} \overline{B}$$



Figure 2.2: A typical full adder circuit diagram.

$$= (C + \overline{A} + \overline{B}) \bullet (A + B)$$
$$= (C + AB) \bullet (A + B)$$
$$= C (A + B) + AB$$
$$= F_C.$$

$$G_{S} = \overline{A} \overline{B} \overline{C} + G_{C} (\overline{A} + \overline{B} + \overline{C})$$

$$\overline{G_{S}} = \overline{\overline{A} \overline{B} \overline{C}} \bullet (\overline{G_{C}} + \overline{\overline{A} + \overline{B} + \overline{C}})$$

$$= (A + B + C) \bullet (ABC + \overline{G_{C}})$$

$$= ABC + (A + B + C) \overline{G_{C}}$$

$$= ABC + (A + B + C) F_{C}$$

$$= F_{S}.$$

The result is that, if all the inputs of a full adder (A, B and C) are inverted, the sum and carry outputs are inverted. So, if the inverters at the outputs of sum and carry stages are removed then, for inverted inputs, the outputs will be non-inverted and for non-inverted inputs, the outputs will be inverted. The full adder, whose inverters at the outputs are removed, is called as FA. Its circuit diagram is given in Appendix D.

Half adder is designed by simplifying the full adder (FA). When the inputs A and B are inverted, the half adder's carry and sum outputs will not be non-inverted as in the FA because of non-invertable carry input which is



Figure 2.3: Architecture of the 1's counter.

at logic 0. Therefore, two types of half adders are designed; half adder-even (HAE) and half adder-odd (HAO). HAE outputs inverted sum and carry with non-inverted inputs and it is designed by simplifying FA for C = 0. HAO outputs non-inverted sum and carry with inverted inputs and it is designed by simplifying FA for C = 1. HAE and HAO half adder circuits are given in Appendix D.

So, the adders of the 1's counter in one stage of the tree will generate inverted sum outputs to the next stage and the outputs of the next stage adders will become non-inverted. The architecture of the 1's counter with these adders are given in figure 2.3. The 8-bit adder for 256-bit correlation is not shown in the figure.

In the 1's counter logic, there are 107 HAE's, 21 HAO's, 127 FA's and

128 inverters. From the circuit diagrams of the HAE, HAO, FA and inverter, the total number of transistors is 4404. If the 1's counter logic were to be designed by using half and full adders with non-inverted inputs and outputs, there would be 128 half adders and 127 full adders in the logic design and a total of 5092 transistors. With the current design, 688 transistors are saved. Although 688 transistors constitute a small percent of the overall transistor count, it will be shown in Section 3.5 that the reduction in the 1's counter delay and its area is considerable.

The gate count is reduced also in the decision maker block. It is seen that only the 8-th and the 9-th bits of the 9-bit adder B and only the 9-th bit of the 9-bit adder A are used in the design. Therefore, full adders are used in 9-bit adders only for the sum and carry outputs. For unused bits, the carry stages of the full adders (FA) are used. The number of transistors both in 9-bit adder A and adder B is 210. If the decision maker logic had been designed by using the carry stages of the full adders with non-inverted outputs, 252 transistors would have been required. Only 42 transistors are saved, but again, the delay is considerably reduced (see Section 3.5).

In the comparator block, each comparator logic has an EX-NOR and an OR gate. These two gates can be implemented with 4 NAND, 2 INV and a NOR gate, namely with 24 transistors. The comparator is designed at the transistor level by only 10 transistors and called COMPARE. Since the overall comparator block uses 128 comparator logics, it can be seen that 1792 transistors are saved in total. The circuit diagram of the comparator (COMPARE) is given in Appendix D.

The circuit diagrams of the elementary logic blocks which are used in the construction of complicated logic blocks are found in Appendix D. The logic diagrams of the blocks used in the correlator can be found in [3].

## **3. THE LAYOUT DESIGN**

#### 3.1 Introduction

In the layout design, the blocks and the modules of the architecture design are replaced by the cells. The layouts are designed in a structured hierarchy. The layout editing is started from the basic cells that are found at the lowest level of the hierarchy. It is progressed by the construction of the higher level cells which use the lower level cells as their instances. Finally, it is completed at the top level with the routing of the pads and the highest level cells.

The stick diagram representation is used during the design of the lowest level cells. The diagram is composed of colored sticks drawn by hand and each color represents a layer. The stick diagram gives information about the placement of the transistors, the layers connecting the transistors to each other and the cell area. This information is used during the floor planning and the layout editing.

The layouts are drawn on a SUN110 color monitor workstation with the aid of the layout editor Magic. All information about the design rules [8], CIF, GDSII (Calma) codes and process parameters of the chip fabricator is given to Magic for the design rule checking, CIF (and GDSII) file generation and the circuit extraction from the layout. The design rules are the restrictions on the layers about their sizes and interactions with each other. These rules are given to Magic in a file together with the process parameters which gives the electrical characteristics of the layers for the circuit extraction. The circuit extractor generates a circuit descriptor file in which the transistors with their sizes, the node connections and the node capacitances are found.

This file is used by the simulators Esim, Rnl and Spice. Esim is a switch level simulator that models the transistors as switches, either on or off. Rnl is an event driven logic simulator that models the transistors as the resistors and makes timing simulation as well as the functional simulation. Spice is a general purpose simulator which can produce accurate simulations. Esim and Rnl are very fast simulators compared to Spice. In the simulations of the cells, Spice is used for the cells having less than 300 transistors. Esim is used for the simulation of the whole chip. After completing the layout of BAC128, CIF and GDSII files are generated. CIF file is used for the layout plotting of BAC128 and GDSII is sent to the fabricator for production. The fabricator of the BAC128 chip is the IMEC company which is in Belgium.

An illustrative layout of an inverter is laid out using the Magic. After the layers are drawn, CIF file is generated and then plotted in figure 3.1 by the CALCOMP plotter using the CIFplot program. In this layout, the masks of 3-micron double-metal CMOS technology are shown in colored rectangles. The masks and their corresponding colors, CIF codes and GDS-II levels are tabulated in Appendix B. Different kinds of abstract layers can be formed by these masks. The abstract layers as used in Magic and their mask compositions are shown below.

| Layer\Mask      | N-Well       | P+           | Active       | Poly         | Contact      | Via          | Metal-1      | Metal-2      |
|-----------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|
| Poly            |              |              |              | $\checkmark$ |              |              |              |              |
| Metal-1         |              |              |              |              |              |              | $\checkmark$ |              |
| Metal-2         |              |              |              |              |              |              |              | $\checkmark$ |
| N-diff.         |              |              | $\checkmark$ |              |              |              |              |              |
| P-diff.         | $\checkmark$ | $\checkmark$ | $\checkmark$ |              |              |              |              |              |
| N-subs. diff.   | $\checkmark$ |              | $\checkmark$ |              |              |              |              |              |
| P-subs. diff.   |              | $\checkmark$ | $\checkmark$ |              |              |              |              |              |
| Metal-2 contact |              |              |              |              |              | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| Poly contact    | -            |              |              | $\checkmark$ | $\checkmark$ |              | $\checkmark$ |              |
| N-diff. contact |              |              | $\checkmark$ |              | $\checkmark$ |              | $\checkmark$ |              |
| P-diff. contact | $\checkmark$ | $\checkmark$ | $\checkmark$ |              | $\checkmark$ |              | $\checkmark$ |              |
| N-subs. contact | $\checkmark$ |              | $\checkmark$ |              | $\checkmark$ |              | $\checkmark$ |              |
| P-subs. contact |              | $\checkmark$ | $\checkmark$ |              | $\checkmark$ |              | $\checkmark$ |              |

Poly, subs. and diff. stands for polysilicon, substrate and diffusion respectively.



FIG. 3.1: ILLUSTRATIVE LAYOUT OF AN INVERTER

#### **3.2** Determination of the transistor sizes

The transistor sizes are calculated by using the speed constraint which requires the completion of the correlation process in  $5\mu sec$ . The correlation process starts with the falling edge of the clock and ends when the signal at the input of the DLATCHR (Resettable D-latch) cell is valid. The most time consuming process during the correlation is the integration in which the addition of logic 1's at the outputs of the comparator takes place. Considering the shift register, comparator, integrator and decision maker blocks, a simplified block diagram is drawn in figure 3.2 for delay estimation. The diagram involves the cells that are used in correlation. A path with the maximum propagation delay can be found from this block diagram. First, maximum delay of each cell will be determined using the cell circuit diagrams and the delay unit will be in terms of a new defined unit. Then, a signal at the master flipflop output will be started to propagate along the cells as soon as the clock falls. The signal will follow such a path that it will be delayed the most. Delays from each cell on the path will be added up. Finally, as the signal arrives at the input of the DLATCHR cell, sum of the delays on the path will have to be less than  $5\mu sec$ .

A unit,  $\delta$ , is defined as the switching time (rising or falling output) of a transistor with the assumption that the rise and fall times of n-type and p-type transistors are equal. The switching time  $\delta$ , depends on the transistor's drain-source resistance and the capacitance between the drain and the ground. Approximately, the worst case rise time (fall time) of a gate is linearly proportional to the maximum number of p-type (n-type) transistors in series, connecting the output of the gate to the supply. Average gate delay is defined as;

$$\tau_{ave} = \frac{t_{rise} + t_{fall}}{4}$$

For example, a NAND gate has two n-type serial and two p-type parallel connected MOSFETs. Its worst case rise time is  $\delta$ , the fall time is  $2\delta$  and  $\tau_{ave}$  is 0.75 $\delta$ . In a similar way, the worst case switching times and average gate delays of MSDFF, COMPARE, HAE, HAO, FA, CRRY, INV and MUX cells can be calculated. The results are tabulated below.



Figure 3.2: The block diagram used in the delay analysis.

|             | H            | AE           | H            | HAO          |           | FA        |           | INV         |
|-------------|--------------|--------------|--------------|--------------|-----------|-----------|-----------|-------------|
|             | Sum          | Carry        | Sum          | Carry        | Sum       | Carry     | Carry     |             |
| $t_{rise}$  | $2\delta$    | δ            | $2\delta$    | $2\delta$    | $4\delta$ | $2\delta$ | 2δ        | δ           |
| $t_{fall}$  | 3δ           | 2δ           | 38           | δ            | $4\delta$ | $2\delta$ | $2\delta$ | δ           |
| $	au_{ave}$ | $1.25\delta$ | $0.75\delta$ | $1.25\delta$ | $0.75\delta$ | $2\delta$ | δ         | δ         | $0.5\delta$ |

#### WORST CASE SWITCHING TIMES AND AVERAGE GATE DELAYS OF ADDER CELLS

### WORST CASE SWITCHING TIMES AND AVERAGE GATE DELAYS OF MSDFF, COMPARE AND MUX CELLS

|             | MSDFF       | COMPARE      | MUX         |
|-------------|-------------|--------------|-------------|
| trise       | 3δ          | $2\delta$    | δ           |
| $t_{fall}$  | 38          | $3\delta$    | δ           |
| $	au_{ave}$ | $1.5\delta$ | $1.25\delta$ | $0.5\delta$ |

In the figure 3.2, sufficient number of cells to estimate the delay are shown. Other cells that are not placed in the figure will have identical delays as the cells shown. Using the cell delays tabulated above, average gate delays are summed and written at the output of each cell. The path with the maximum delay is drawn in heavy line. The delays from I/O operations are excluded, since these delays make up a very small percentage of the overall delay. For 128-bit and 256-bit correlation processes, the total average gate delays are calculated to be 27.5 $\delta$  and 31.5 $\delta$ , respectively.

To calculate the transistor sizes, the cells and gates are modeled as two complementary MOSFETs with different input functions. The rise and fall times of a complementary CMOS inverter with step input can be found as [7],

$$t_{rise[fall]} = \frac{2C_{OUT}}{\beta_{p[n]}(Vdd - |V_{Tp[Tn]}|)} \left[ \frac{|V_{Tp[Tn]}| - 0.1Vdd}{Vdd - |V_{Tp[Tn]}|} + \frac{1}{2} \ln \left( 19 - \frac{20|V_{Tp[Tn]}|}{Vdd} \right) \right]$$

Substitution of the maximum threshold values  $V_{Tp[Tn]} = 1.2V$  and 5 volts for Vdd yields,

$$t_{rise[fall]} \cong 0.8 \frac{C_{out}}{\beta_{p[n]}}$$

Switching times (rise and fall times) of the inverter can be made equal by equating the  $\beta_p$  to  $\beta_n$ . From the process parameters [8],  $K_n \approx 3K_p$ . This



implies that p-MOSFET should have gate width three times larger than that of n-MOSFET. The full adder cell that has the most capacitive input and output is selected to calculate the maximum value for  $\delta$ . The full adder's sum stage output is connected to 4 p-type and 4 n-type MOSFET gates of the next stage full adder. Two p-type and two n-type MOSFET drains are connected to the sum output node of the full adder. Total output capacitance can be calculated as,

$$C_{OUT} = 4(C_{g_n} + C_{g_p}) + 2(C_{d_p} + C_{d_n})$$

The design rules permit  $L_n = 3\mu$  by  $W_n = 3\mu$  gate area for a minimum MOS-FET. For  $3\mu$  channel width of n-type MOSFET,  $9\mu$  channel width of p-type MOSFET is constructed to equalize the  $\beta$  values of the two complementary MOSFETs ( $\beta_n = \beta_p$ ). However, because of the **bird's beak** problem, which highly affects the transistor  $\beta$  value, n-channel width is increased to  $7\mu$ . In turn, fall time is approximately halved. Figure 3.3 shows the layouts of n-type and p-type minimum size MOSFETs. The gate and diffusion capacitances are calculated using the process parameters [8].

Gate and diffusion capacitances are calculated as,

$$Cg_n = C_{OX}(3 \times 7) \cong 17fF$$
  
 $Cg_p = C_{OX}(3 \times 9) \cong 22fF$   
 $Cd_n = CA_n(7 \times 9) + CP_n(14 + 18) \cong 25fF$ 

$$Cd_p = CA_p(9 \times 9) + CP_p(18 + 18) \cong 28fF$$
$$C_{OUT} = 4(Cg_n + Cg_p) + 2(Cd_n + Cd_p) \cong 262fF$$

Since the rise and fall times are no longer the same, for the worst case delay, switching time,  $\delta$ , will be calculated for the rise time and it will be assumed that both  $\beta_n$  and  $\beta_p$  are equal. Worst case  $\beta_p$  value is,

$$\beta_p = K_p \frac{W_p}{L_p} = 11 \times 10^{-6} \frac{9 \times 10^{-6}}{3 \times 10^{-6}} = 33 \times 10^{-6} A/V^2$$
$$\delta \cong 0.8 \frac{C_{OUT}}{\beta_p} \cong 0.8 \frac{262 \times 10^{-15}}{33 \times 10^{-6}} \cong 6.4 nsec$$

The average total delay becomes 176nsec and 202nsec for 128-bit and 256-bit correlation, respectively, and these results are far below the specified  $5\mu sec$ .

Analysis results showed that minimum size transistors can be used in the cells of BAC128 chip. In the analysis, the delays due to I/O and wiring RCs are excluded. The assumption was the delays from these components would be much smaller than that of the cascaded full adders in the integrator block (INT). Also, the operations of the other blocks are assumed to be non-critical from the point of timing specifications of the BAC128. More accurate delay calculations will be made after the layout drawings of the cells and their simulations are made. The transistor sizes are subject to change if the timing specifications are not satisfied with the simulation results.

#### 3.3 Floor planning

In the floor planning, the problem is to position the logic blocks so that the chip area is minimum and square shaped. The difficulty, besides the complexity of the block placement, is where to start the floor planning. The sub-modules, hence the block sizes and shapes are unknown. Therefore, it is not possible to make a floor plan with unknown size of blocks. Also, the block sizes and shapes are even not possible to estimate without knowing the sizes of the sub-modules. The starting point might be to design several number of cells of the sub-modules with different sizes and shapes, then to construct several blocks in different sizes and finally, to conceive the best floor plan that could be achieved by considering the combinations of the blocks created. Although this procedure might give good results, the design time would be too much because of great number of cell designs and many block location combinations.

The chip area not only depends on the orientation and sizes of the blocks, but also on the wiring channels between the cells and the blocks. Therefore, it is also necessary to minimize the routing area. The routing area minimization is found to be the starting point to the floor planning with the known number of blocks and the number of wires used in the inter-block connections.

An interconnection model, shown in figure 3.4, is made by dividing the architecture design in to the blocks. This model shows the number of wires used to connect the blocks to each other. It has 9 nodes representing the shift (SH), reference (REF), mask (MSK), threshold (TH), status (ST) register blocks, the comparator (CMP), the integrator (INT), the decision maker (DM) and the controller (CNTL) blocks. The CNTL node has connections with all the other nodes except the CMP node, however these connections are not shown in the model.

A square shape chip area is divided into 9 rectangular regions and the regions are assigned to the blocks. The regions have neighborhood edges with the regions that have a connection with. The idea is to place the blocks which have the maximum number of connections among them as close as possible to each other. Figure 3.5 (a) shows the initial placement. For a bit more realistic



Figure 3.4: Blocks interconnection model.

| REFERENCE<br>REGISTER |                               | HJ<br>REG | ASK<br>ISTER      |  |  |
|-----------------------|-------------------------------|-----------|-------------------|--|--|
| COMPARATOR            |                               |           | SHIFT<br>REGISTER |  |  |
| INTEGRATO             | CONTROLLER STATUS<br>REGISTER |           |                   |  |  |
| DECIS<br>MAKE         | THRESHOLD<br>REGISTER         |           | ESHOLD<br>GISTER  |  |  |

| REFERENCE<br>REGISTER |                       | MASK<br>REGISTER  |                    |  |
|-----------------------|-----------------------|-------------------|--------------------|--|
| COMPARATOR            |                       | SHIFT<br>REGISTER |                    |  |
|                       | CONTROLLER            |                   |                    |  |
| INTEGRATOR            | THRESHOLD<br>REGISTER |                   | STATUS<br>REGISTER |  |
|                       | DECISION<br>MAKER     |                   |                    |  |

| (a) | Initial Placement | (b)     | Placement  | after | block | size | approximation |
|-----|-------------------|---------|------------|-------|-------|------|---------------|
|     | Figure 3.5:       | Initial | floor plan | s.    |       |      |               |

view of floor plan, the block sizes are simply guessed using the number of gates in each block. As a result of guessing, the block size inequalities are found to be; INT>SH=REF=MSK>CMP>CON>DM>TH>ST. Redrawn floor plan is shown in figure 3.5 (b).

Although the long run of 128-bit lines are avoided on the floor plan in figure 3.5 (b), and squeezed among the SH, REF, MSK and CMP blocks, the routing of  $3 \times 128$  wires among these blocks becomes so complex that the wiring channels occupy very large area.  $3 \times 128$  wires in metal-1 layers occupy at least  $2700 \mu m$ . A solution to this problem is found by merging the SH, REF, MSK and CMP blocks in a block called SRMC. The SRMC block consists of 128 pieces of U-blocks in  $8 \times 16$  array structure and each U-block has a single bit from SH, REF, MSK and CMP blocks. Routing of single bits from SH, REF, MSK and CMP blocks is made in the U-block by using again the blocks interconnection model above. Figure 3.6 shows the U-block structure and array structure of the U-blocks.

Each of the 1-bit of SH, REF, and MSK block is a MSDFF cell and the CMP block is a COMPARE cell. The input signals DS (for shift register), DR (for reference register) and DM (for mask register) are connected to the D inputs of the MSDFF cells separately. Two-phase clock is applied to each MSDFF in the U-block. The input and output signals are located around the U-block so that they can be placed horizontally and vertically (array structure) without any need for a wiring channel between them. The COMPARE cell inputs are Q and QB outputs of the MSDFF cells.



Figure 3.6: U-block and array structure of U-blocks.

In the array structure of the U-blocks (SRMC block), the serial data input to the shift register is the node SI, which is connected to node DS0 of the block U0. The serial input data propagate in the array structure through U0, U1, ...U127 and leave the array at node SO. When a 128-bit long data in the shift register at a fixed time is viewed, the most significant bit of the data is found in block U0 and the least significant bit in block U127. The reference and mask registers are loaded in 16 cycles from the data bus. In the first cycle, least significant 8-bit word and in the 16-th cycle most significant 8-bit word is loaded to the registers. The 8-bit word is shifted from left to right along the registers. The most significant bit of the data bus (D7) is connected to the inputs of the registers (in U0 block) with which the most significant bit of the data in the shift register is compared. In this way, the correct bits of the reference, mask and shift registers are compared in the COMPARE cells. The shift register can also be loaded through the data bus in 16 cycles. D7 is connected to the shift register input in block U0, D6 to U9, ...D0 to U112.

After drawing the stick diagrams of the MSDFF cells and the COMPARE cells, SRMC block size is estimated from the cell stick diagram sizes. The

following table gives the sizes.

| CELL    | Height       | Width        |
|---------|--------------|--------------|
| MSDFF   | $90 \mu m$   | $150 \mu m$  |
| COMPARE | $70 \mu m$   | $100 \mu m$  |
| U       | $350 \mu m$  | $150 \mu m$  |
| SRMC    | $2400 \mu m$ | $2800 \mu m$ |

ESTIMATED SIZES OF CELLS IN SRMC BLOCK

SRMC block size does not include 128-bit comparator output width which is at least  $900\mu m$ . Including this width, SRMC block size becomes  $2400\mu m \times 3700\mu m$ .

The integrator (INT block) is now to be placed next to the SRMC block. The stick diagrams of the cells used in the adders are drawn and their sizes are estimated.

|   | CELL | Height     | Width       |
|---|------|------------|-------------|
|   | INV  | $60 \mu m$ | $30 \mu m$  |
|   | HAE  | $60 \mu m$ | $100 \mu m$ |
|   | HAO  | $60 \mu m$ | $100 \mu m$ |
| ſ | FA   | $70 \mu m$ | $200 \mu m$ |

ESTIMATED CELL SIZES FOR ADDERS

It can be seen that two 1-bit adders (HAE cell with inverted sum output) and a 2-bit adder (HAE, FA and INV cells), placed side by side, requires about 590 $\mu$ m width. Four U-blocks in the array have 600 $\mu$ m width. It was then decided to append these adders (1-bit and 2-bit adders which are the first two stages of the 1's counter) into the SRMC block. See figure 3.7. So, 128-bit lines from 128 COMPARE cells are now reduced to 96-bit lines as the outputs of 32 2-bit adders. Consequently, 900 $\mu$ m wiring channel width is used for about 600 $\mu$ m wiring channel of 96-bit lines and first two stages of the 1's counter. Now, the SRMC block has a low fraction of area devoted to the wiring channels among the blocks. The exact size of the SRMC block is required for the placement of other cells and blocks whose shapes and



Figure 3.7: 1-bit and 2-bit adders embedded into the SRMC block

sizes are dependent on the size of the SRMC block. The layouts of MSDFF, comparator, inverter, half and full adders are drawn, then the U-block and SRMC block is constructed. The exact size of the SRMC block layout is found to be  $2130\mu m \times 4170\mu m$ . The SRMC layout can be found in the BAC128 layout plot in Appendix A and Appendix E.

All the adders of the 1's counter in the integrator block (INT) are constructed by cascading half and full adder cells. The stick diagrams are drawn and the sizes of the long thin adder cells are estimated. 3 and 4-bit adders of the INT block are placed perpendicular to the SRMC block and separated in the middle of the INT block for minimizing the wiring channel width between SRMC block and 3 and 4 bit adders. 5, 6, 7 and 8-bit adders are placed parallel to the SRMC block below the 3 and 4-bit adders. The adder cell orientations and their estimated sizes (in microns) are shown in figure 3.8. The routing is to be done using two metal layers. Both of the metal layers, metal-1 and metal-2 are used in the wiring channels to reduce the channel width. Metal-1 lines are assumed as  $5\mu m$  wide and  $4\mu m$  separation. Metal-2 lines are assumed as  $7\mu m$  wide and  $5\mu m$  separation [8]. The wiring channel sizes between the adders are estimated by calculating the width of the maximum number of lines that may exist in parallel to the channel. The results are


Figure 3.8: Placement of 3, 4, 5, 6, 7 and 8-bit adders and their sizes.

rounded up to integers divisible by 5. Below, a line in the rectangular wiring channel will be called orthogonal to the channel if the line runs parallel to the short side of the channel, and it will be called parallel to the channel if it runs parallel to the long side of the channel. The channel between the SRMC block and the 3-bit adders may contain at most 24 metal-1 lines parallel to SRMC block and the wiring channel width is calculated as  $220 \mu m$ . There are 6 metal-1 lines parallel to the channel next to 3-bit adder and this channel width is  $55\mu m$ . The 3-bit adders are connected to the 4-bit adders by 4 metal-2 lines orthogonal to the channel and 4 metal-1 lines parallel to 4-bit adder occupying  $10\mu m$  width. The connection of 4-bit adders to the 5-bit adders are made by metal-2 layers. The metal-2 lines which are laid over metal-1 layers in the wiring channel between 3 and 4-bit adders, create a wiring channel between 4 and 5-bit adders. In this channel, at most 5 metal-2 lines may run parallel to the channel and 2 metal-1 lines exist parallel to each 5-bit adder. This wiring channel width is  $80\mu m$ . The wiring channel below 5-bit adders has double 6 metal-1 lines connecting 4 5-bit adder outputs to 2 6-bit adder inputs. 12 lines may exist parallel to the channel and the channel width is  $110\mu m$ . 2 6-bit adders are connected to the 7-bit adder by 14 metal-1 lines which may be parallel to the channel and occupy  $130\mu m$ . Finally, the channel below the 8-bit adder has 8 metal-2 lines from 7-bit adder outputs and parallel to the  $100\mu m$  channel.

After drawing the stick diagram of the controller block (CNTL), and



Figure 3.9: Up-date floor plan

estimating its size as  $500\mu m \times 300\mu m$ , it is seen that the CNTL block can be placed inside the INT block which is indicated in figure 3.8. At this point of the floor planning, the placement of the adders are very well arranged on floor plan with feasible wiring channel sizes. The threshold (TH) and the decision maker (DM) blocks are merged (THDM block) in order to minimize the length of 16-bit lines running from the TH block to the DM block. TH, DM and STATUS blocks internal structures are designed after completing the layout drawings of the SRMC, CNTL and INT blocks. The latest floor plan is shown in figure 3.9. This floor plan will continue to change as the layout of the blocks are drawn and the other components of the blocks, such as muxes, and buffers, are included to the chip during the layout drawings. With the present floor plan, the chip area without the muxes, buffers, pads, power rails and their routing is estimated as  $4mm \times 4.2mm$ .

#### 3.4 Latch-up, metal migration, and noise

The latch-up in the bulk CMOS integrated circuits occurs because of the existence of parasitic **pnpn** paths in this structure [9]. The occurence probability of the latch-up is high at the places where large currents flow through the devices. The I/O pads have electrostatic discharge protection devices and guard rings around the wide transistors, and their layout designs require special techniques and design rules. The I/O pads are used in BAC128 chip as the standard cells from IMEC standard cell library. While designing the cell layouts, the following rules that reduce the possibility of latch-up are applied:

- Each n-well is tied to VDD by n-well contacts.
- One substrate contact is placed for every supply connection and for at least every five transistors.
- The substrate contacts are placed as close as possible to the supply rails (VDD and GND).
- N-type and p-type transistors are laid out close to GND and VDD rails respectively.

If the current density in a conductor exceeds a threshold value, then metal migration (electromigration) occurs [10]. Electromigration is the transport of the metal ions through a conductor by the transfer of momentum from electrons to the positive metal ions. This causes a void or a break in the conductor. The design rules provide limitations on the current density for conductors in order to avoid electromigration. For example, according to the design rules in [8], current density in metal-1 conductor, whose thickness is between 1.05  $\mu m$  and 1.4 $\mu m$ , must not exceed  $800\mu A/\mu m$ . Therefore,  $10\mu m$  wide metal-1 can carry at most 8mA current. Consequently, in the layout designs, the layers in which large currents may exist (power rails) should be made wide enough to avoid metal migration. The currents in these layers are determined from the simulation results.

Noise margin is a measure of allowable noise voltage on the input of a gate, which will not affect the output state. It is specified in terms of low noise margin,  $NM_L$ , and high noise margin,  $NM_H$ , given by,

$$NM_L = |V_{ILmax} - V_{OLmax}|$$
$$NM_H = |V_{OHmin} - V_{IHmin}|$$

where  $V_{IL}$ ,  $V_{OL}$ ,  $V_{OH}$  and  $V_{IH}$  are the low input, low output, high output and high input voltages of a gate, respectively. These voltages are found from the input  $(V_I)$ -output  $(V_O)$  transfer characteristic of the gate [7].  $V_{IL}$  is the solution of the equation  $(\partial V_O / \partial V_I) = -1$  at  $V_I = V_{IL}$  and while pmosfet(s) and nmosfet(s) are operating at linear and saturated regions respectively.  $V_{IH}$  is the solution of the equation  $(\partial V_O / \partial V_I) = -1$  at  $V_I = V_{IH}$  and while pmosfet(s) and nmosfet(s) are operating at saturated and linear regions respectively.

If either  $NM_L$  or  $NM_H$  of a gate is found to be less than about 0.1VDD, then the gate may easily be affected from the noise that may exist on the inputs of the gate. The noise margins of an inverter, 3-input NAND and 2-input NAND and NOR gates are calculated.  $7\mu m$  and  $9\mu m$  channel widths are used in n-type and p-type transistors, respectively, with  $3\mu m$  channel length for both transistors. NAND and NOR gate noise margins are calculated by using the transfer characteristic of the inverter derived in [7]. NAND gate inputs and NOR gate inputs are tied together forming an inverter for each gate. The gate transistors in series are considered as one with scaled  $\beta$  value by (1/no. of mosfets in series) and the transistors in parallel are considered as one with scaled  $\beta$  value by (no. of mosfets in parallel). The scaling results are summarized in the table below.

| eta of | inverter  | n-input NAND | $n-input \ NOR$ |
|--------|-----------|--------------|-----------------|
| pmos   | $\beta_p$ | $neta_p$     | $eta_p n$       |
| nmos   | $\beta_n$ | $\beta_n/n$  | $neta_n$        |

The noise margins are found for minimum, nominal and maximum threshold and  $K_P$  values of **Spice** parameters (see Appendix C). The results are verified by simulations and tabulated below. Gate threshold values are also included in the table.

|          |      | INV.  | 2 - NAND | 2 - NOR | 3 - NAND |
|----------|------|-------|----------|---------|----------|
|          | min. | 2.05V | 2.70V    | 1.50V   | 3.07V    |
| $V_{TG}$ | nom. | 2.12V | 2.67V    | 1.66V   | 2.98V    |
|          | max. | 2.19V | 2.64V    | 1.82V   | 2.89V    |
|          | min. | 1.42V | 2.25V    | 0.89V   | 2.74V    |
| $NM_L$   | nom. | 1.63V | 2.33V    | 1.16V   | 2.73V    |
|          | max. | 1.84V | 2.41V    | 1.43V   | 2.74V    |
|          | min. | 2.66V | 1.81V    | 3.35V   | 1.35V    |
| $NM_H$   | nom. | 2.60V | 1.88V    | 3.20V   | 1.50V    |
|          | max. | 2.53V | 1.94V    | 3.03V   | 1.64V    |



Figure 3.10: (a) Capacitive coupling (b) Resistive coupling

None of the cells in BAC128 has more than three mosfets in series or more than two mosfets in parallel between the supply and the output of the cell. Therefore, the nominal noise margins of each cell will be more than 1.16V for low noise margin and 1.5V for high noise margin. So, it can be concluded that the sensitivity of the cells to the noise is high enough, if 0.1VDD criterion is considered.

There are two main sources of noise in digital MOS circuits; capacitive coupling and resistive coupling [11]. Figure 3.10 (a) shows a part of a circuit where coupling capacitance  $C_{AB}$  exists between the nodes A and B. The logic transitions at node A cause a noise on the node B by means of  $C_{AB}$ . The noise coupling can be reduced by decreasing  $C_{AB}$ , the resistance  $R_B$  and increasing the  $C_B$ . Many coupling capacitances that exist in the layouts are decreased by reducing the number of overlaps, decreasing the overlapping areas, and avoiding long overlaps between the layers. Metal-2 layer, which has a very low overlap and fringe capacitances with the layers, is laid out without considering the lower level layers beneath it. For long runs of metal-2 and metal-1 wires in parallel, overlapping of these two layers are avoided wherever possible.

Resistive coupling, as a source of noise, is the result of resistive feedback in GND and VDD power rails. Figure 3.10 (b) shows a typical resistive coupling that causes noise by the effect of one gate on the other. While node A is high, node C may change its state to low if  $C_L$  discharges and causes a voltage drop,  $V_P$ , on  $R_P$ , greater than nmosfet threshold voltage. In the layouts, the sources of the transistors (the diffusion layers, which have large resistances compared to the metal layer resistances) are kept as short as possible and power rails connections are made by metal layers instead of the diffusion layers in order to reduce the resistive coupling. Also, contact cut resistances are reduced by using multiple cuts at the places where large currents flow (clock drivers, buffers).

# 3.5 The layouts and the simulations of the lowest level cells

There are ten lowest level cells used in the chip; inv, hae, hao, carry, fa, msdff, compare, dlatchr, mux21, dec37. These cells are used in the layout design of the higher level cells and they are the layout designs of the corresponding elementary modules in the logic and circuit design. The layouts of the cells in BAC128 are not designed for general purpose usage. Each cell has its own layout characteristics, such as the location of cuts, vias, extension of p+ implant and n-well masks to the cell boundary, and even unused areas between the diffusion regions. The layouts of these cells are shown in Appendix D.

For the simulation of each cell, Magic's circuit extractor is used. The extracted layout file is converted to the Spice format and transistor model parameters that are supported by IMEC are appended. The Spice transistor parameters are given in Appendix C. In Appendix D, worst case propagation delays and drive capability for loaded and unloaded (intrinsic) outputs, node capacitances, dynamic power dissipation and Spice output waveforms are given. Worst case speed simulations are done for rise/fall times and propagation delays. 500fF load capacitance ( $C_L$ ), is used to simulate the capacitance of the interconnection node between the cells.  $C_L$  is assumed to be higher than the value of the wiring capacitance plus the input capacitance of a cell. The most capacitive cell input has 230fF capacitance (CI input of fa cell), which is less than 500fF. dec37 cell is not considered here, because its inputs which have capacitances higher than 400fF are driven by the buffers in the pad cells. Therefore, the assumption is valid as long as the wiring capacitance does not exceed 270fF and wired OR's do not exist at the output of the cells.

For example, it can be calculated that  $4\mu m$  wide  $1500\mu m$  long metal-1,  $3\mu m$  wide  $1200\mu m$  long polysilicon and  $7\mu m$  wide  $1300\mu m$  long metal-2 layers have approximately 270fF wiring capacitances.

The gate capacitances of the transistors are added to the input wiring capacitance at the input nodes of a cell and listed in Appendix D under Node Capacitances. In the calculation of maximum gate capacitances, channel length for both p-type and n-type mosfets are taken as  $3.2\mu m$ . Nominal Spice parameters are used in the simulations for dynamic power dissipation. During these simulations, output nodes of the cells are loaded with 500fF capacitance and the output signals at 1MHz with 50% duty cyle is considered.

In Section 3.2 (determination of the transistor sizes), the delay in the 128-bit and 256-bit correlation has been calculated. More accurate delay is calculated in this section using the simulation results of the cells. The calculations are carried out in the same way as described in Section 3.2. The result is that, as soon as PHI falls, the correlation result ready to be latched is found in 450nsec for 128-bit correlation and in 526nsec for 256-bit correlation. Therefore, assuming a 50% duty cycle PHI clock signal, minimum period should be about  $1\mu$ sec, which implies a maximum clock frequency of about 1MHz for 128-bit correlation.

In logic and circuit design section, the transistor count of the 1's counter and decision maker logic designs was reduced. If the 1's counter logic and decision maker logic were designed by using half adders, full adders and carry stages of the full adders with non-inverted outputs, it can be calculated as in Section 3.2 that the overall delay from the falling edge of the clock to the D-latch input would be 579nsec for 128-bit correlation and 664nsec for 256-bit correlation. This result is 28.7% and 26.2% slower than the present design results for 128-bit and 256-bit correlation, respectively. Also, it has been calculated in logic and circuit design that the design would require 730 more transistors which might occupy an area of about  $700 \times 700 \ \mu^2$ .

#### 3.6 Higher level cell layouts and routing

In the layout hierarchy, higher level cells are designed by using the lowest level cells, as explained in the previous section. The higher level cells and their routings are laid out with the aid of the floor plan that has been drawn as the update floor plan in the floor planning section. Higher level cell layouts given in this section are the final layout drawings achieved after a number of repetitive cell displacements and layout modifications for the area minimization. So, the layout hierarchy becomes completely different from the hierarchy of the chip logic design [3]. Therefore, in the layout design, the layouts of the gates that belong to a certain block in the logic design, may not belong to the cell representing the block. The final locations of the higher level cells and the routing of the higher level cells are given in figure 3.11 and figure 3.12, respectively, at the end of the section. The top level cell layouts are plotted in Appendix E. The layout hierarchy of these cells and the complete layout of BAC128 can be found in Appendix A.

The higher level cell design is started with the most area consuming top level cell, **srmc**, and its instances. The **srmc** cell is the implementation of SRMC block in the floor plan and it has an array structure of dimension  $8 \times 16$ . The array elements are the **u** cells which consist of master slave Dtype flipflop (**msdff**) cells and comparator (**compare**) cells. First two stages of the 1's counter are embedded into the array. The arrangement of the cells has been determined during the floor planning.

Each u cell is composed of three msdff cells and a compare cell. 1-bit and 2-bit adders are laid out in add12 cell which includes two add1b (1bit adder) cells and a single add2b (2-bit adder) cell. The add1b cell is actually a hae (half adder-even) cell with inverted sum output. This cell is created for matching four u cell lengths to add12 cell length. The hae and inv (inverter) cells are used in constructing the add1b cell with some modifications to reduce the cell area, therefore these cells are not found as the subcells of add1b cell. The srmc cell has 96 metal-1 layers which are grouped as 12-24-24-24-12 and run vertically in the cell. These layers are the outputs of the 2-bit adders and they are the inputs to the 3-bit adders which are to be located in add34 cells below the srmc cell. The horizontal metal-2 lines are the interconnections among the shift registers, compare and add12 cells.

Before completing the srmc cell layout, power rail widths are checked for current density capability with the assumption of a chip operating frequency 1MHz. In the u cell, there are four pairs of power rails; three pairs from three msdff cells and one pair from compare cell. When the u cells are arranged side by side in the array structure, the power rails extend and supply 16 u cells in a column. Each cell in the **u** cell has  $7\mu m$  wide metal-1 layers for VDD and GND power lines, and these metal-1 lines are capable of supplying the total current drawn by 16 msdff cells or 16 compare cells. Using the simulation results, average dynamic currents for msdff cell and compare cell are  $7.2\mu A$ and  $6.4\mu A$  respectively. Consequently, 16 msdff cells draw  $115.2\mu A$  and 16 compare cells draw 102.4 $\mu A$  current. The design rules state that, maximum current density should not exceed  $800\mu A/\mu$  or  $5600\mu A/7\mu m$  for metal-1 layer . Therefore, there is no need to increase the power rail widths of the u cell. In the add12 cell, there are three inv cells, three hae cells and a fa (full adder) cell. Average dynamic current ratings of these cells are  $3\mu A$  for inv cell,  $11.5\mu A$  for hae cell and  $20\mu A$  for fa cell. A single add12 cell draws  $63.5\mu A$  current. 4 add12 cells which are located adjacent to 16 u cells, draw  $254\mu A$  total current and therefore, there is no need to increase the power rail widths of the add12 cell. It should be noted that, in the simulations of the lowest level cells, load capacitance of 500fF has been used. The cells used in srmc cell may have load capacitances higher than 500fF, especially due to the wiring capacitances. Even if the load capacitances were doubled and hence the average dynamic current doubled, the new current ratings would be still much smaller than  $5600\mu A$ . In Section 3.9, the power dissipation of the cells that have load capacitances larger that 500fF are calculated more accurately.

The exact size of the srmc cell is used in the floor plan as a fixed sized block and the adders of the INTEG block are placed relative to it. The INTEG block in the floor plan is implemented by two top level cells; add34 and add5678. add34 cell has two 3-bit adders (add3b) and a 4-bit adder (add4b). add5678 cell has four 5-bit adders (add5b), two 6-bit adders (add6b), a 7-bit adder (add7b) and an 8-bit adder (add8b). Each n-bit adder cell consists of inverter, half adder and full adder cells which are the lowest level cells. An n-bit adder is constructed by placing one half adder followed by (n-1) full adders and an inverter for the carry output. As n increases, the length of the n-bit adder increases but the height remains the same as the height of a full adder (fa) cell.

add7b cell outputs are buffered to drive the input/output pad cells, multiplexers which are placed in the thdm cell, add8b cell inputs and the tristate buffers through which add7b cell outputs are put on the data bus. The buffers are laid out in the add5678 cell and the remaining area is used as the wiring channel. The spacing between add34 cells in the middle of the layout drawing is left to be occupied by the cntl cell. Therefore, the wiring channel between add34 cells and the add5678 cell is expanded in order to achieve the routing of cntl cell through this wiring channel.

The power rail width to which the most cells are connected in the add5678 cell is the one that feeds four add5b cells. There are four inv, 16 fa and four hao cells in add5b cells. The total average dynamic current at 1MHz is calculated as  $383\mu A$  which is much smaller than  $5600\mu A$ . Therefore, the power rail widths ( $7\mu m$  metal-1) in add34 and add5678 cells are not changed. Power dissipation and the current ratings of the buffers in the add5678 cell are simulated in Section 3.9 and the power rail widths of the buffers are checked in that section also.

The CNTL block is laid out in **cntl** cell. The logic that prevents read and write (R, W) signals to be both active and generates RD and WR signals, is located near the R and W signal input pad cells. The **cntl** cell is located about at the center of the chip with its output lines running over the entire chip to control the cells. Therefore, the outputs of **cntl** cell are buffered to drive the large capacitive loads. These buffers are also used in the generation of the two-phase clock for writing data to the threshold registers, signals to tristate buffers (EN and ENB) for the shift register read operation and finally, signals to the 2-to-1 multiplexers' select inputs for TEST operation.

The THDM block layout is drawn in thdm cell. In the thdm cell, crry9a cell is constructed by arranging 6 carry stages of the full adders (crry cells) side by side and attaching two fa cells at the end. The cell crry9b is constructed by placing 7 crry cells side by side and attaching a fa cell at the end. The msdff cells which are used in the threshold registers could not simply be placed side by side, because the area devoted to this cell in the floor plan is not wide enough. Therefore, each register is placed as two rows of four msdff cells. The cells crry9a and crry9b are located above the two threshold registers. The two-phase clock of msdff cells for write operation is generated in the cntl cell and carried into the thdm cell. The power rail widths are not checked because the number of cells sharing a pair of power rail is less than that of the cells in srmc and add5678 cells. Therefore, the rail widths are sufficient to supply the cells.

The STATUS block is implemented in status cell. The status cell contains three non-inverting tristate buffers, five msdff and five mux21 cells. The tristate buffers drive the three bit of the data bus whenever read status register operation is invoked. An inverter is used to generate ENB signal for the tristate buffers and two-phase clock is generated by another inverter in the status cell. The non-inverting tristate buffer is created as a cell, nitbuf, whose layout and simulation results are given in Appendix D.

Prior to the routing of the top level cells, the multiplexers of the shift register are placed at the rigth side of the srmc cell. See figure 3.11 at the end of the section. The multiplexers of the reference and mask registers and the non-inverting tristate buffers of the shift register are placed in pairs below the srmc cell. The multiplexer at the output of the shift register is placed at the upper left corner of the srmc cell.

In the wiring channel between srmc and add34 cells, there are 96 metal-1 wires connecting the srmc cell outputs to the add34 cells. Input/output and control signals of the mux21 and nitbuf cells below srmc run in metal-2 layers and crosses the 96 metal-1 layers. The 8-bit data bus, 3-bit address lines

for the cntl cell, three pairs of two-phase clock signals, clock driver inputs and a wide VDD power rail are also found in the channel. The channel is used efficiently without leaving any free area on it. The data bus is extended up and down at the lower right corner of the **srmc** cell to reach the shift register multiplexers, the data bus pads on the rigth and the top, the status register inputs/outputs and the threshold register inputs. At the lower left corner of the **srmc** cell, data bus turns down and reaches the outputs of the **add5678** cell.

One of the wide wiring channels in the chip is found between the add34 cells and the add5678. The channel is wide because cntl cell is located in between the add34 cells and uses the channel to tranfer the control signals to the thdm and status cells via metal-1 lines. Also, the clock (CLK), read (R) and write (W) input signals from the pads, located at the left side, run in this channel.

The third wide wiring channel is in the add5678 cell below add8b cell and below the buffers of the add7b cell. This channel is used for the connection of the adder outputs to the data, bus, to the input/output pads of the 1's counter, to the add8b cell and to the inverters and multiplexers in the thdm cell.

The remaining routings are completed with modifications on the wide wiring channels for area reduction. Three clock drivers are placed above the status cell, partially occupying the wide wiring channel between the srmc cell and add34 cells. Clock driver transistor sizes are determined considering the area in Section 3.7. The routings are shown in the chip routing plot at the end of the section.

There are 28 input/output cells (pad cells) that are distributed around the chip active area in equal numbers (7 pads) on each side. The pad cells are all TTL compatible and selected from IMEC standard cell library. For some of the pad cells additional logic is required for invert and enable/disable purposes. The additional logic is laid out below the pad cells so that when the pad cell is placed at one the side, the logic layout stands between the pad cell and the chip active area. The pad cells distribution together with the higher level cells are shown in the chip routing plot.

For bidirectional data input/output to/from the data bus in the chip, BITT2 cells (Bidirectional inverting input, non-inverting tristate output with 2 TTL drive) from IMEC cell library are used. The BITT2 cells are also used for cascading the two chips to have 256-bit correlation process. For serial data input, clock, read, write, chip select and 3-bit address input signals, inverting input IMEC pad cells IIT are used. The power supply pad cells are IOVDD and IOVSS. The IOVSS cell is for GND connection and placed approximately in the middle of either side. Its location is considerably important from the point of view of latch-up and n-type transistor threshold voltages. The best place to keep the bulk potential in all regions at GND level would be therefore in the middle of either side of the chip. The pad cells OI2 and OT2 are inverting output and non-inverting tristate output cells respectively. Both cells have 2 TTL drive capability. The OI2 cell is for SYNCH signal output and OT2 is for SOUT (serial data output) signal. The electrical characteristics of the pad cells are given in Appendix D.



Figure 3.11: The location of the top level cells in BAC128.



Figure 3.12: Routing of the higher level cells.

#### 3.7 Clock Distribution

Two-phase clocking scheme is used in the BAC128 chip. Three two-phase clocks are generated inside the chip for shift, reference and mask registers as the signals DPHI-DPHIB, DWREF-DWREFB, and DWMSK-DWMSKB. Two-phase clock is generated from a single clock by simply inverting it. To minimize the clock skew, local clock drivers are designed and clock signal lines are distributed with metal-1 and metal-2 layers for both phases in equal lengths [12]. In the complementary CMOS clock skew occurs in two different ways. One way is the overlapping of two-phase clocks at high or low levels and the other way is the slow transition times of both clock phases [7].

The clock skew effect is analysed for the registers, each having more than 20pF clock capacitance per phase. In the analysis, two cascade connected msdff cells are used. Transistors in the cells are modeled as switches that are either ON or OFF. A skewed two-phase clock is applied to the input. One period of the clocks is divided into the intervals in which the transistors are found at different states. The intervals of the input clocks and the mixed level logic representation (mosfets and gates) of the two msdffs for each interval are shown in figure 3.13.

For correct shift operation,  $Q_i$  node must not change the state of the node  $QB_{i+1}$ . And  $O_i$  node must not change the state of the node  $OB_{i+1}$ . The overlapping time,  $T_{OV}$ , during which a path from node  $Q_i$  to node  $QB_{i+1}$  and node  $O_i$  to node  $OB_{i+1}$  exists, covers the intervals B,C,D and F,G,H. Dominant intervals for the clock skew are C and G. In these intervals, the clocked transistors are found at their minimum channel resistances. In the time interval C the propagation delay between nodes  $Q_i$  and  $QB_{i+1}$  can be approximated as,

$$\tau_C = \frac{1}{2} \left[ t_{fall}(OB_{i+1}) + t_{rise}(O_{i+1}) + t_{fall}(QB_{i+1}) \right] \\ = 0.4 \left[ 2 \frac{C_{OB_{i+1}}}{\beta_n} + \frac{C_{O_{i+1}}}{\beta_p} + 2 \frac{C_{QB_{i+1}}}{\beta_n} \right] \\ = 3.3 \, nsec.$$



Two-phase skewed clock input to MSDFFs The Half period is divided into intervals.

The logic diagram of a MSDFF



Figure 3.13: Clock skew analysis. The state of the msdff cells are shown for all intervals.



Figure 3.14: The logic diagram of the clock driver.

For the time interval G, the delay is,

$$\tau_{G} = \frac{1}{2} \left[ t_{rise}(QB_{i}) + t_{fall}(Q_{i}) + t_{rise}(OB_{i+1}) \right]$$
  
=  $0.4 \left[ 2 \frac{C_{QB_{i}}}{\beta_{p}} + \frac{C_{Q_{i}}}{\beta_{n}} + 2 \frac{C_{OB_{i+1}}}{\beta_{p}} \right]$   
=  $5.1 \, nsec.$ 

Therefore, the overlapping of the clocks at high level should be less than 3.3 nsec and overlapping at low level should be less than 5.1 nsec. In the other intervals of  $T_{OV}$ , the channel resistances are relatively high and they increase the average gate delays of the inverters. The propagation delay between the nodes  $Q_i$  and  $QB_{i+1}$  ( $O_i$  and  $OB_{i+1}$ ) is higher in the B and D (F and H) intervals than the delay in the C (G) interval. Therefore, the influence of the intervals B,D,F,H to the clock skew is small.

The BAC128 chip can operate at most at 1MHz clock frequency. Therefore the clock rise and fall times are not critical from the point of speed considerations. Rather, they are considered for the clock skew problem. The area of the chip can also affect the size and hence the rise/fall times of the clock drivers. After the blocks are placed and the inter-block routings are completed, the clock drivers are fit into a restricted area which is not used either for the wiring channels or for the cells. (See locations of the top level cells Section 3.6). Three identical clock drivers are placed in this area; one for shift registers, one for reference registers and one for the mask registers. In figure 3.14, the logic diagram and the transistor sizes of a clock driver are shown. The layout can be found in Appendix D.

The clock driver is simulated after calculating its load capacitance. The clock driver load capacitance is sum of the gate capacitances of 128 piece of msdff cells plus the wiring capacitances. A single msdff cell has 177fF clock capacitance per phase (see msdff cell electrical characteristics in Appendix D), and for 128 pieces of msdff cells the clock load is about 23pF per phase. The clock distribution wires run about  $4500\mu m$  in metal-2 with  $7\mu m$  thickness and about  $1500\mu m$  in metal-1 with  $7\mu m$  thickness. Using the process parameters [8], total wiring capacitance is calculated as 1.25pF. In the simulation, the clock driver is loaded by 25pF capacitance per phase. The simulation results are given in Appendix D.

The msdff cell is simulated for several intentionally skewed input clocks. The simulations are done for 5nsec, 7nsec and 10nsec of clock overlaping durations ( $\Delta T$ ) to observe the skew effect in the intervals B,C and D only, because  $\tau_C < \tau_G$ . The clock is the output of the clock drivers with 25nsec rise and fall times (rise/fall times are the simulation results of the clock drivers). Spice outputs are plotted by Spiceplot program on CALCOMP plotter and are given in figure 3.15.



Figure 3.15: Spice plots for clock skew effects. (a) $\Delta T=5$ nsec, (b) $\Delta T=7$ nsec, (c) $\Delta T=10$ nsec.

#### 3.8 Top level cell simulations

At the end of the layout design part, the cells at the top level of the layout hierarchy are simulated by Spice for accurate cell delay determination and by Rnl for functional check. For Rnl simulation, test waveforms are applied to the entire chip through the I/O pads and functionally the chip is tested [3]. For Spice simulation, the top level cells are gathered in groups and each group is simulated separately. The output nodes of the groups are loaded by the capacitors whose values are extracted from the layout by the circuit extractor of Magic. Since Spice can handle limited number of transistors for a feasible simulation time, the cell count of the groups is reduced so that the simulation result still gives the worst case delay in the group. In Appendix E, the block schematics of the groups with reduced number of cells are drawn and some of the **Spice** output waveforms of each group are plotted by Spiceplot. The groups are independent of the layout and the logic design hierarchies. The cells used in the groups are taken from the logic design schematics [3]. The simulation results which are obtained from the Spice outputs are summurized in tabular forms below.

#### Reference (Mask) register group:

The layout structure of the reference and mask registers are exactly the same and both of their ROUT and MOUT nodes have almost the same node capacitances (2pf). Therefore, a single simulation run is made for the reference register (the results are the same for the mask register). Simulation results of this group is tabulated below.

| FROM  | DI  | DWREF | DWREF |
|-------|-----|-------|-------|
| TO    | Z   | Q1    | ROUT  |
| DELAY | 2ns | 9ns   | 30ns  |

Shift register group:

| FROM  | SI  | Z   | DPHI | SQ0 | DPHI | SOUT  | RSH   |
|-------|-----|-----|------|-----|------|-------|-------|
| ТО    | Z   | D1  | SQ0  | D2  | SOUT | BSOUT | D-BUS |
| DELAY | 3ns | 3ns | 18ns | 3ns | 16ns | 8ns   | 26ns  |

The delay up to the pad output can be calculated by using the I/O cells delay characteristics given in Appendix D. For example, the cell **OT2** has an intrinsic delay of 9.6 nsec. and delay/pf of 0.12nsec. If the load capacitance of the pad is 50pf, then as DPHI falls, the output of the last msdff cell will be delayed for 25nsec up to the cell **OT2** and 9.6nsec+50x0.12nsec (15.6nsec) at the output. The total delay will be 40.6nsec.

#### Threshold registers group:

In this group, the cells are laid exactly the same way as in the **thdm** cell. Therefore, the node capacitances are added only for the cells that are not found in the **thdm** cell. Polysilicon layer resistances at the output of the **msdff** cells are calculated because of their long run, and included into the **Spice** deck. Simulation results are tabulated below.

| FROM  | WT  | WT   | WT   | WT   | WT   |
|-------|-----|------|------|------|------|
| ТО    | A6  | B6   | A7B  | B7   | B7B  |
| DELAY | 6ns | 11ns | 15ns | 16ns | 22ns |

#### Status register group:

No cell reduction is made for this group, because of its few number of transistors. Three simulation runs are made for writing, reading and TEST mode operations. The results are tabulated below.

| FROM  | WS    | WS    | WS    | WS    | WS   | WS    |
|-------|-------|-------|-------|-------|------|-------|
| ТО    | CLEAR | SOE   | PRBSB | BPRBS | M/S  | BM/SB |
| DELAY | 22ns  | 43ns  | 34ns  | 47ns  | 39ns | 58ns  |
| FROM  | WS    | WS    | RS    | RS    | RS   |       |
| ТО    | BWSH  | BWSHB | D0    | D1    | D2   |       |
| DELAY | 36ns  | 46ns  | 32ns  | 31ns  | 31ns |       |

#### **Integrator group:**

The cells for this group are selected for the maximum delay by using the analysis results of Section 3.2. The simulation results are tabulated below.

| FROM  | SQ  | SQ   | SQ   | SQ    | SQ    | SQ    | SQ    | SQ    |
|-------|-----|------|------|-------|-------|-------|-------|-------|
| ТО    | CP  | 1.0  | 2.1  | 3.2   | 4.3   | 5.4   | 6.5   | 7.6   |
| DELAY | 8ns | 13ns | 72ns | 102ns | 135ns | 171ns | 210ns | 234ns |

#### Decision maker group:

This group is divided into two parts; 128-bit correlation and 256-bit correlation. The cells are selected by using the results of Section 3.2. The simulation results for both 128-bit and 256-bit correlation are tabulated below.

| 128-  | bit Cor | 256-bit Correlation |      |      |      |
|-------|---------|---------------------|------|------|------|
| FROM  | 7.6     | 7.6                 | INT  | 7.6  | 7.6  |
| то    | B7.6    | D                   | D6   | 8.7  | D    |
| DELAY | 14ns    | 52ns                | 30ns | 42ns | 90ns |

#### Controller group:

The complete **cntl** cell and three **clkbuf** cells are included into this group without any cell reduction. Five simulation runs are made for many number of input/output signals in the group. The results of these simulations are tabulated below.

|       |       |       | · · · · · · · · · · · · · · · · · · · |       |       |
|-------|-------|-------|---------------------------------------|-------|-------|
| FROM  | I-WRB | I-WRB | I-WRB                                 | I-WRB | I-WRB |
| TO    | DPHI  | DWREF | DWMSK                                 | WT    | WS    |
| DELAY | 41ns  | 44ns  | 44ns                                  | 38ns  | 32ns  |
| FROM  | A-BUS | A-BUS | I-CLKB                                | I-RDB | I-RDB |
| ТО    | INT   | TEST  | DPHI                                  | RS    | RSH   |
| DELAY | 34ns  | 53ns  | 34ns                                  | 40ns  | 40ns  |

#### Input/Output pads group:

This group includes all the types of the I/O pad cells used for input and output signals and the logic circuits related to these cells. Three simulation runs are made. The SOUT signal is not included in this group, since it was simulated in the Shift register group above. The A0B, A1B, A2B signals characteristics are calculated from the I/O pad cells' electrical characteristics. The results are tabulated below.

| FROM  | I-CLKB | I-CI | I-SINB | I-CSB | I-RDB | I-RDB | I-WRB | I-WRB | DPHI |
|-------|--------|------|--------|-------|-------|-------|-------|-------|------|
| то    | CLK    | CI   | SIN    | CS    | RD    | EDOUT | WR    | DI    | SYNC |
| DELAY | 8ns    | 8ns  | 3ns    | 12ns  | llns  | 33ns  | 12ns  | 74ns  | 35ns |

The top level cells simulation results give accurate timing characteristics of the BAC128 chip. These characteristics are summarized under shifting, integrating (includes decision making), latching, write and read operations. The signals considered are; CLK, WR, RD and 3-bit address bus (A0, A1, A2). The minimum durations of the low level and high level logic states of these signals are determined by tracing the group simulation results for the maximum time required to complete an operation. The results are given in the table below. The I/O pad cell delays are included to the calculations with 10pf load capacitances.

| Operation            | Maximum propagation delay path                                                                                                                          | Time    |
|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| Shifting(128-bit)    | $CLKpad(\searrow) \rightarrow SOUTpad$                                                                                                                  | 87nsec  |
| Shifting(256-bit)    | $\operatorname{CLKpad}(\searrow) \rightarrow \operatorname{SOUTpad} \rightarrow \operatorname{SINpad} \rightarrow \operatorname{SD0}$                   | 106nsec |
| Integration(128-bit) | $\operatorname{CLKpad}(\searrow) \to \operatorname{Latch input}$                                                                                        | 286nsec |
| Integration(256-bit) | $\operatorname{CLKpad}(\searrow) \rightarrow \operatorname{Cpad} \rightarrow \operatorname{Cpad} \rightarrow \operatorname{Latch} \operatorname{input}$ | 367nsec |
| Latching             | $\operatorname{CLKpad}(\nearrow) \rightarrow \operatorname{SYNCpad}$                                                                                    | 96nsec  |
| Write                | $WRpad(\searrow) \rightarrow DI \rightarrow SDI$                                                                                                        | 87nsec  |
| Write                | $WRpad(\nearrow) \rightarrow WS \rightarrow BM/SB$                                                                                                      | 100nsec |
| Read                 | $RDpad(\searrow) \rightarrow RS \rightarrow D0pad$                                                                                                      | 92nsec  |
| Address decoding     | $AIpad \rightarrow TEST$                                                                                                                                | 63nsec  |

After the CLK falls, 87nsec is required for shifting and outputing the data on SOUT pad for 128-bit correlation. For 256-bit correlation, the data output by the slave chip must be input by the master chip and be ready at the input of the U0 cell, that requires 106nsec, and the valid 128-bit shift register output is compared, integrated then a decision is made. This operation is completed in 286nsec for 128-bit correlation. For 256-bit correlation, 367nsec is required because of the additional delays due to I/O operations through the C pads in the slave chip and the master chip. The decision maker output is latched and output by the SYNC pad in 96nsec as the CLK rise. The maximum write time is required for the shift register and for the master/slave mode selection in the status register. As WR falls the 8-bit data at the inputs of the shift register is ready in 87nsec. As WR rises the master/slave mode is determined at the output of the status register in 100nsec. The maximum read time is required for the status register, which is 92nsec. Finally, the address (A0-A2) input to the BAC128 can only be decoded in 63nsec. for the TEST signal.

Using the table above, the minimum low level and high level logic states of the CLK, WR, RD and A0-A2 signals can be found. The CLK signal should be low for at least 373nsec (87nsec+286nsec) for 128-bit correlation and 473nsec (106nsec+367nsec) for 256-bit correlation, also the CLK signal should be high for at least 96nsec. The WR signal should be low and high for at least 87nsec and 100nsec, respectively. The RD signal should be low for at least 92nsec. Finally, the address bits, A0,A1 and A2, should be valid for at least 63 nsec.

The timing characteristics obtained here are the worst case characteristics. The real characteristics may highly differ from the present results in the better direction. The characteristics are also dependent on the process parameters and the I/O capacitances that may be different from the considered I/O capacitances in the calculations. Therefore, the BAC128 chip timing characteristics should be measured for precise results after the fabrication.

#### 3.9 Power rails

In complementary static CMOS circuits, there are three current components that cause power dissipation: Leakage current, charging and discharging current of load capacitances, and the overlap current.

The leakage currents are the reverse saturation currents of the parasitic diodes at the n-well, p-substrate, n-diffusion and p-diffusion junctions. They cause static power dissipation during which the inputs and outputs of the circuits are at their stable logic levels. Usually, the static power dissipation is not considered because of very low value of the leakage currents (0.1nA - 0.5nA per gate at room temperature) [7].

Charging and discharging currents are the most effective components of the power consumption. These currents are found during the transient of the output logic level, therefore they cause dynamic power dissipation. The power required to charge and discharge a capacitor at the output of a gate,  $C_{OUT}$ , at a switching frequency, f, is given by,

$$P_d = f C_{OUT} V^2$$

where V is the difference between the high and low voltage on the load capacitor. It is assumed that, the energy on the capacitor is dissipated in each period of switching frequency. For complementary static CMOS circuits, V is equal to VDD.

Dynamic power dissipation also occurs due to the overlap current even if the load capacitor does not exist. Overlap current results when both ptype and n-type transistors are simultaneously found at their 'ON' states. Although this state lasts in a very short period of time and hence average power dissipation becomes very low, overlap currents can cause power dissipation as much as  $fC_{OUT}V^2$  for slow varying switching inputs (especially in I/O buffers) [11].

Two metal layers, metal-1 and metal-2, are used for both VDD and GND power supply lines in the BAC128 chip. All the internal power supply lines of the top level cells, which are  $7\mu m$  wide metal-1 layers, are connected to the  $21\mu m$  wide power supply lines around the cells. The connection is established either directly, when the  $21\mu m$  wide power line is metal-1, or by three vias, when the  $21\mu m$  wide power line is metal-2. Multiple vias are used instead of a single large via in the connections, because the current flowing through a via is proportional with the perimeter of it and the total perimeter of seperate vias is larger than that of a single large via. The vias are separated by  $4\mu m$ and the length of the three vias is  $21\mu m$ . The maximum allowed current through the three vias is 2.52mA [8]. The via resistance is less than  $0.1\Omega$ and the voltage drop on the via is less than  $84\mu V$  for maximum allowed current. This value causes a negligible drop on VDD which is 5V. The  $21 \mu m$ wide power lines among the top level cells are connected to  $60\mu m$  wide power lines around the chip active area. At the connections, 18 vias are used and the vias are again separated by  $4\mu m$ . The pad cells IOVDD and IOVSS

are connected to  $60\mu m$  wide metal-1 power lines. The design rule related with the current carrying capability of a layer states that, the maximum current density should not exceed  $800\mu A/\mu$  or  $5.6mA/7\mu$  or  $16.8mA/21\mu$  or  $48mA/60\mu$  for metal-1 layer. The distribution of the power rails are shown in figure 3.16.

The power dissipation of the lowest level cells have been simulated for 500fF load capacitances at 1MHz operating frequency (Appendix D). As the routings of the top level cells are completed and the routing capacitances are exactly known, it is seen that some of the load capacitances at the cell outputs are greater than 500fF. Therefore, the power consumption of the cells having load capacitances greater than 500fF are calculated and the power rail widths of all the cells are checked for the possibility of metal migration. For the dynamic power consumption due to the transient output voltage of the cells, the power consumption is assumed to be linearly proportional with the load capacitance. For the power consumption due to the slow varying inputs of the cells (that cause overlapping currents), the power consumption is found by **Spice** simulation. In the calculation of the current drawn by the core cells (active area cells that exclude the I/O pad cells), it is assumed that all the devices in the core change their output logic levels simultanously at 1MHz rate. Actually, this situation can never be met, because, at least, either mask or reference or threshold or status register can be accessed at a time. All the devices can change their output logic levels in the TEST mode, but the rate of change in this case is far below 1MHz. The current drawn by the top level cells and the average dynamic power dissipated in the core are given in the table below. The total current rating of buffers in various sizes including the nitbuf cells in the core are included in the table also.

| CORRENT RATINGS OF THE CORE CELLS AT IMP | CURRENT | RATINGS | OF THE | CORE | CELLS | AT 1M | [Hz] |
|------------------------------------------|---------|---------|--------|------|-------|-------|------|
|------------------------------------------|---------|---------|--------|------|-------|-------|------|

| SRMC cell     | 10mA               |  |  |
|---------------|--------------------|--|--|
| ADD34 cells   | 2.4mA              |  |  |
| ADD5678 cell  | $1.5 \mathrm{mA}$  |  |  |
| STATUS cell   | 0.15mA             |  |  |
| THDM cell     | 0.35 mA            |  |  |
| CNTL cell     | $0.55 \mathrm{mA}$ |  |  |
| CLKBUF cells  | 1mA                |  |  |
| Buffers       | 0.65mA             |  |  |
| Total current | 16.6mA             |  |  |
| Total power   | 83mW               |  |  |

In the I/O pad cells, maximum current flows through the pads when the bidirectional pads behave as the output pads. There can be 18 output pad cells that can sink 57.6mA (each sinks 3.2mA to drive two TTL logic at logic 0) total current to the GND pad through  $60\mu m$  wide metal-1 power rails connected in a ring shape around the core. These output pad cells can draw 28.8mA source current from the VDD pad (each draws 1.6mA to drive two TTL logic at logic 1) through  $60\mu m$  wide metal-1 power rail. The current drawn by the remaining 8 input pad cells are calculated by using  $P_d = fC_{OUT}V^2$ , where  $C_{OUT}$  is the capacitance at the output of the input pad cell and assumed to be 1pF. The result is  $40\mu A$  which can be neglected in the total maximum current flows in the I/O pad cells. As a result, the total current that will be supplied to BAC128 via the VDD pad is less than 46mA (28.8mA + 16.6mA) and the total current that will be sunk from BAC128 via the GND pad is less than 75mA (57.6mA + 16.6mA).



Figure 3.16: Power rail distribution of the BAC128.

## 4. CONCLUSION

A microprocessor compatible digital 128-bit correlator, which has been designed previously [2], is improved at the system design level and implemented in a VLSI chip using full-custom  $3\mu$ m double metal CMOS technology. The work is jointly carried on by S. Topcu and the author. The system design details, the logic design schematics and the testing of the chip can be found in [3], whereas in this study, the logic and circuit designs, the layout design and the simulations are found.

The chip is to be placed in a microprocessor based portable data terminal using HF radio communication. It marks the beginning of a synchronous data stream received from the very noisy channel by detecting the synchronization (sync) word. The sync word can be detected for either inverted or noninverted input data stream. Two chips can be cascaded to make 256-bit correlator. It is fully programmable by a microprocessor to set the number of tolerable errors in detection and to select the bits of the 128-bit (or 256bit) data stream to be used in the correlation. The latter feature makes the correlator capable for use in detection of distributed sync words and pseudo random binary sequence (PRBS) generation.

Full-custom design techniques are applied to the layout design to have a high performance chip. The design complexity is reduced by constructing a hierarchical structure for both the logic and the layout designs. In order to have a small chip size and less cost per chip, the number of gates and the number of transistors are reduced by considering the functional relations among the logic blocks and the functions of the logic blocks at the transistor level. Minimum transistor size is determined from a critical path delay analysis. Also, a proper floor planning is achieved by merging the blocks that require too many interconnections among them into a single block and placing the blocks which have large number of interconnections close to each other. These full-custom design techniques result in the best trade-off among chip size, speed and power consumption. For the purpose of chip area comparison, the layout of the chip is made by CALMP software (automatic layout editor) using the standard cells. The silicon area is found out to be about 70 sq.mm. at the end of a 4.5 hour CPU time.

The timing simulations of the chip is done by taking the advantage of the hierarchical structure of the layout. The cells at the top level are searched for the critical paths, and then, considering the output loading capacitances of each subcell, they are simulated by **Spice**.

The clock skew is minimized by laying out the clock lines in metal-1 and metal-2 layers and for large capacitive clock lines, the layers of the two different phases are kept in equal lengths. The transistor ratios of the two-phase clock drivers are carefully adjusted by **Spice** simulations. Also, precautions are taken for latch-up, body effect, charge sharing, metal migration and noise problems.

Magic as the layout editor, Esim, Rnl, Spice as the simulators and Spiceview, Spiceplot, CIFplot as the plotter programs are used along this study. Last three programs are written at Bilkent University.

In figure 4.1, bonding pins and bonding pads are shown. The names and numbers are assigned to the pins of the chip. The chip will be fabricated by IMEC. The prototype of the chip will be in ceramic package and the bonding from pads to the pins will be done by hand. The chip characteristics are given in the table below.



# BAC128 CHIP CHARACTERISTICS

| Estimated maximum frequency         | · 1 MHz                                              |  |
|-------------------------------------|------------------------------------------------------|--|
| Estimated maximum power consumption | <83 mW @1 Mhz                                        |  |
| Pin count                           | 28                                                   |  |
| Number of transistors               | 14918                                                |  |
| Size of the chip                    | $29.78 \text{ mm}^2 (5685 \times 5238 \text{ um}^2)$ |  |
| Active area of the chip             | $21.1 \text{ mm}^2 (4825 \times 4378 \text{ um}^2)$  |  |
| Total mask area                     | $23.8 \text{ mm}^2$                                  |  |
| Number of transistors/Active area   | 706 mm <sup>-2</sup>                                 |  |
| Total mask area/Chip size           | 80%                                                  |  |
| Polysilicon layer length            | 58 cm                                                |  |
| Metal-1 layer length                | 158 cm                                               |  |
| Metal-2 layer length                | 48 cm                                                |  |
| Number of contacts                  | 25729                                                |  |
| Number of vias                      | 2759                                                 |  |
| Level of hierarchy                  | 4                                                    |  |
| Number of transistors designed/Day  | 26                                                   |  |
|                                     |                                                      |  |

## REFERENCES

- [1] ASELSAN, 32-bit Correlator Design Report, Ankara, 1986.
- M. S. Toygar, A Microprocessor Compatible 128-bit Correlator, B.S. Project, Middle East Technical University, 1987.
- [3] S. Topcu, Design and Testing of a Microprocessor Compatible 128-bit Correlator, M.S. Thesis, Bilkent University, 1989.
- [4] Berkeley CAD Tools User's Manual, EECS Dep., University of California at Berkeley, 1986.
- [5] VLSI Tools Reference Manual, TR#87-02-01, Release 3.1, NW Lab. Int. Sys., Dep. Computer Sci., University of Washington, Feb.1987.
- [6] C. H. Sequin, "Managing VLSI Complexity: An Outlook", Proc. IEEE, vol. 71, pp. 149-166, Jan. 1983.
- [7] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Reading MA: Addison-Wesley, 1985.
- [8] IMEC, 3.0 micron double poly, double metal n-well CMOS Design Rules Manual, Revision 6.0, 1987.
- [9] D. B. Estreich, The Physics and Modeling of Lach-up and CMOS Integrated Circuits, Ph.D. dissertation, Stanford University, Oct. 1980.
- [10] Sze, VLSI Technology, McGraw-Hill, 1985.
- [11] L. A. Glasser and D. W. Dobberpuhl, The Design and Analysis of VLSI Circuits Addison-Wesley, 1985.
- [12] C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1980.

# APPENDIX A

## THE LAYOUT HIERARCHY OF THE BAC128



56



# APPENDIX B

## CIF & GDS-II CODES OF THE MASKS AND CORRESPONDING PLOT COLORS

| Mask                | CIF | GDS - II | Color     |
|---------------------|-----|----------|-----------|
| n_well              | CW  | 1        | darkgreen |
| active              | CD  | 2        | green     |
| polysilicon         | CP  | 4        | red       |
| $p^+ implant$       | CS  | 5        | darkgreen |
| contacts            | CC  | 8        | black     |
| metal-1             | CM  | 9        | blue      |
| $\ ^{*}passivation$ | CG  | 10       |           |
| via                 | CV  | 11       | black     |
| metal-2             | CQ  | 12       | black     |

\*Passivation, which is used in pads, is not shown in layout drawing.
## APPENDIX C

| Parameter | Nominal    | Minimum         | Maximum         | Unit       |
|-----------|------------|-----------------|-----------------|------------|
| LEVEL     | 2          | -               | _ ·             |            |
| VTO       | 0.9        | 0.6             | 1.2             |            |
| KP        | 57E - 6    | $40E - 6^*$     | 60E - 6         | $A/V^2$    |
| GAMMA     | 0.3        | 0.15            | 0.45*           | $V^{0.5}$  |
| PHI       | 0.7        | 0.68            | 0.71*           | V          |
| LAMBDA    | 0.05       | 0.03            | 0.07*           | $V^{-1}$   |
| CGSO      | 1.76E - 10 | $1.2E - 10^{*}$ | 2.4E - 10       | F/m        |
| CGDO      | 1.76E - 10 | $1.2E - 10^*$   | 2.4E - 10       | F/m        |
| RSH       | 25         | 15              | 35*             | $Ohm/\Box$ |
| CJ        | 0.7E - 4   | 0.5E - 4        | $1.0E - 4^{*}$  | $F/m^2$    |
| MJ        | 0.5        | -               | -               | —          |
| CJSW      | 3.9E - 10  | 2.0E - 10       | $5.0E - 10^{*}$ | F/m        |
| MJSW      | 0.33       | —               | -               | -          |
| JS        | 1.0E - 3   |                 | -               | $A/m^2$    |
| TOX       | 425E - 10  | 390E - 10       | $460E - 10^{*}$ | m          |
| NFS       | 1.0E11     | -               | _               | $cm^{-2}$  |
| LD        | 0.22       | 0.15*           | 0.30            | $\mu m$    |
| UCRIT     | 1.0E4      |                 |                 | V/cm       |

#### SPICE PARAMETERS FOR N-MOSFET

#### SPICE PARAMETERS FOR P-MOSFET

| Parameter | Nominal    | Minimum       | Maximum         | Unit       |
|-----------|------------|---------------|-----------------|------------|
| LEVEL     | 2          |               | _               |            |
| VTO       | -0.9       | -0.6          | -1.2*           |            |
| KP        | 17E - 6    | $11E - 6^{*}$ | 20E - 6         | $A/V^2$    |
| GAMMA     | 0.5        | 0.35          | 0.65*           | $V^{0.5}$  |
| PHI       | 0.69       | 0.67          | 0.70*           | V          |
| LAMBDA    | 0.04       | 0.02          | 0.06*           | $V^{-1}$   |
| CGSO      | 2.80E - 10 | $2.0E - 10^*$ | 3.6E - 10       | F/m        |
| CGDO      | 2.80E - 10 | $2.0E - 10^*$ | 3.6E - 10       | F/m        |
| RSH       | 45         | 25            | 65*             | $Ohm/\Box$ |
| CJ        | 3.3E - 4   | 2.0E - 4      | $5.0E-4^*$      | $F/m^2$    |
| MJ        | 0.5        |               |                 | -          |
| CJSW      | 4.4E - 10  | 3.5E - 10     | $7.0E - 10^{*}$ | F/m        |
| MJSW      | 0.33       | -             |                 | -          |
| JS        | 1.0E - 3   | -             | _               | $A/m^2$    |
| TOX       | 425E - 10  | 390E - 10     | $460E - 10^{*}$ | m          |
| NFS       | 1.0E11     | _             | _               | $cm^{-2}$  |
| LD        | 0.35       | 0.25*         | 0.45            | $\mu m$    |
| UCRIT     | 1.0E4      | -             | _               | V/cm       |

\* The values that are used in the **Spice** input deck for the worst case speed simulations.

### APPENDIX D

## ELECTRICAL CHARACTERISTICS, CIRCUIT DIAGRAMS, SIMULATION WAVEFORMS, LAYOUTS OF THE LOWEST LEVEL CELLS

## CARRY OUTPUT INVERTER (INV) CELL

| W                   | WORST CASE PROPAGATION DELAYS                                   |       |       |  |
|---------------------|-----------------------------------------------------------------|-------|-------|--|
| Input               | $Input \ Output \ Intrinsic Delay \ Delay \ for \ C_L = 0.5 pF$ |       |       |  |
| CI                  | СОВ 🗡                                                           | 3nsec | 8nsec |  |
| COB $1nsec$ $4nsec$ |                                                                 |       |       |  |

| NODE CAPACITANCES |             |  |
|-------------------|-------------|--|
| Node              | Capacitance |  |
| CI                | 59 fF       |  |
| COB 69fF          |             |  |

| DRIVE CAPABILITY |                                       |        |       |
|------------------|---------------------------------------|--------|-------|
| Output           | $ut   C_{LOAD}   t_{rise}   t_{fall}$ |        |       |
| COB              | 0                                     | 6nsec  | 4nsec |
| COB              | 500 fF                                | 15nsec | 8nsec |

• Dynamic Power Dissipation:  $15 \,\mu W @ 1MHz$ ,  $C_{OUT} = 569 fF$  at node COB.

# INV CELL LAYOUT





### 2-TO-1 MULTIPLEXER (MUX21) CELL

|       | WORST CASE PROPAGATION DELAYS |                 |                          |  |
|-------|-------------------------------|-----------------|--------------------------|--|
| Input | Output                        | Intrinsic Delay | Delay for $C_L = 0.5 pF$ |  |
| A     | $Z \nearrow$                  | 2nsec           | 5nsec                    |  |
|       | $Z \searrow$                  | 2nsec           | 4nsec                    |  |
| В     | $Z \nearrow$                  | 2nsec           | 5nsec                    |  |
|       |                               | 2nsec           | 4nsec                    |  |
| S     | $Z \nearrow$                  | 4nsec           | 7nsec                    |  |
|       | $Z \searrow$                  | 3nsec           | 5nsec                    |  |

| _        |                   |             |  |
|----------|-------------------|-------------|--|
|          | NODE CAPACITANCES |             |  |
| <u> </u> | Node              | Capacitance |  |
|          | A                 | 83 fF       |  |
|          | В                 | 101 fF      |  |
|          | S                 | 52 fF       |  |
|          | SB                | 52 fF       |  |
|          | Z                 | 141 fF      |  |

.

| DRIVE CAPABILITY                                             |        |        |            |
|--------------------------------------------------------------|--------|--------|------------|
| Output C <sub>LOAD</sub> t <sub>rise</sub> t <sub>fall</sub> |        |        | $t_{fall}$ |
| Z                                                            | 0      | 7nsec  | 6nsec      |
| Z                                                            | 500 fF | 17nsec | 8nsec      |

# MUX21 CELL LAYOUT



#### MUX21 CELL SIMULATION



### 3-TO-7 DECODER (DEC37) CELL

| W     | WORST CASE PROPAGATION DELAYS |                 |                          |  |
|-------|-------------------------------|-----------------|--------------------------|--|
| Input | Output                        | Intrinsic Delay | Delay for $C_L = 0.5 pF$ |  |
| A0B   | $OB_i \nearrow$               | 7nsec           | 13nsec                   |  |
|       | $OB_i \searrow$               | 4nsec           | 10nsec                   |  |
| A1B   | $OB_i \nearrow$               | 7nsec           | 12nsec                   |  |
|       | $OB_i \searrow$               | 4nsec           | 10nsec                   |  |
| A2B   | $OB_i \nearrow$               | 7nsec           | 13nsec                   |  |
|       | $OB_i \searrow$               | 4nsec           | 10nsec                   |  |

| NODE CAPACITANCES |             |  |
|-------------------|-------------|--|
| Node              | Capacitance |  |
| A0B               | 436 fF      |  |
| A1B               | 428 fF      |  |
| A2B               | 366 fF      |  |
| $OB_i$            | 142 fF      |  |

| DRIVE CAPABILITY |            |            |                   |
|------------------|------------|------------|-------------------|
| Output           | $C_{LOAD}$ | $t_{rise}$ | t <sub>fall</sub> |
| OB <sub>i</sub>  | 0          | 5nsec      | 6nsec             |
| $OB_i$           | 500 fF     | 19nsec     | 19nsec            |

- Dynamic Power Dissipation: 78  $\mu W @ 1MHz$ ,  $C_{OUTS} = 642 fF$  at nodes  $OB_i$ .
- The decoder outputs:  $OB_i$   $(i = 0 \cdots 7)$ .



DEC37 CELL LAYOUT



#### **RESETABLE D-LATCH (DLATCHR) CELL**

| W     | WORST CASE PROPAGATION DELAYS |                 |                          |  |
|-------|-------------------------------|-----------------|--------------------------|--|
| Input | Output                        | Intrinsic Delay | Delay for $C_L = 0.5 pF$ |  |
| D     | $Q \nearrow$                  | 8nsec           | 20nsec                   |  |
|       | $Q \searrow$                  | 9nsec           | 13nsec                   |  |
|       | QB 🗡                          | 7nsec           | 19nsec                   |  |
|       | $QB\searrow$                  | 3nsec           | 8nsec                    |  |
| EN    | $Q \nearrow$                  | 9nsec           | 22nsec                   |  |
|       | $Q \searrow$                  | 9nsec           | 12nsec                   |  |
|       | $QB \nearrow$                 | 7nsec           | 19nsec                   |  |
|       | $QB \searrow$                 | 4nsec           | 8nsec                    |  |
| R     | $Q \nearrow$                  | 6nsec           | 18nsec                   |  |
|       | $Q \searrow$                  | 2nsec           | 5nsec                    |  |
|       | $QB \nearrow$                 | 8nsec           | 20nsec                   |  |

| NODI | NODE CAPACITANCES |  |  |
|------|-------------------|--|--|
| Node | Capacitance       |  |  |
| D    | 54 fF             |  |  |
| EN   | 131 fF            |  |  |
| ENB  | 131 fF            |  |  |
| R    | 54 fF             |  |  |
| Q    | 142 fF            |  |  |
| QB   | 147 fF            |  |  |

| DRIVE CAPABILITY |                                  |        |        |  |
|------------------|----------------------------------|--------|--------|--|
| Output           | $C_{LOAD}$ $t_{rise}$ $t_{fall}$ |        |        |  |
| Q                | 0                                | 9nsec  | 5nsec  |  |
| QB               | 0 12nsec 6nse                    |        |        |  |
| Q                | 500 fF                           | 37nsec | 9nsec  |  |
| QB               | 500 fF 35nsec 14ns               |        | 14nsec |  |

- Dynamic Power Dissipation:  $36 \ \mu W @ 1MHz$ ,  $C_{OUT} = 642 fF$  and  $C_{OUT} = 647 fF$  at nodes Q and QB.
- Set-up Time: 2nsec

# DLATCHR CELL LAYOUT



-

#### DLATCHR CELL SIMULATION





.

#### CARRY STAGE OF FULL ADDER (CRRY) CELL

| W     | WORST CASE PROPAGATION DELAYS |                                                        |        |  |
|-------|-------------------------------|--------------------------------------------------------|--------|--|
| Input | Output                        | $put \mid Intrinsic Delay \mid Delay for C_L = 0.5 pF$ |        |  |
| A, B  | C0 🗡                          | 6nsec                                                  | 19nsec |  |
|       | $CO \searrow$                 | 3nsec 7nsec                                            |        |  |
| CI    | CO 🗡                          | 5nsec                                                  | 17nsec |  |
|       | $CO \searrow$ 3nsec 7nsec     |                                                        |        |  |

| NOD. | NODE CAPACITANCES |  |  |
|------|-------------------|--|--|
| Node | Capacitance       |  |  |
| A    | 102 fF            |  |  |
| B    | 102 fF            |  |  |
| CI   | 65 fF             |  |  |
| CO   | 88 fF             |  |  |

| DRIVE CAPABILITY                                             |        |        |        |  |
|--------------------------------------------------------------|--------|--------|--------|--|
| Output C <sub>LOAD</sub> t <sub>rise</sub> t <sub>fall</sub> |        |        |        |  |
| CO                                                           | 0      | 5nsec  |        |  |
| CO                                                           | 500 fF | 32nsec | 14nsec |  |

• Dynamic Power Dissipation:  $24 \,\mu W @ 1MHz$ ,  $C_{OUT} = 588 fF$  at node CO.

# CRRY CELL LAYOUT



#### CRRY CELL SIMULATION





#### HALF ADDER-EVEN (HAE) CELL

| WORST CASE PROPAGATION DELAYS |               |                                          |        |  |
|-------------------------------|---------------|------------------------------------------|--------|--|
| Input                         | Output        | Intrinsic Delay Delay for $C_L = 0.5 pF$ |        |  |
| A, B                          | SB 🗡          | 6nsec                                    | 20nsec |  |
|                               | $SB \searrow$ | 7nsec                                    | 20nsec |  |
|                               | СОВ 🗡         | 5nsec                                    | 11nsec |  |
|                               | COB 📐         | 4nsec                                    | 8nsec  |  |

| NODE CAPACITANCES |             |  |
|-------------------|-------------|--|
| Node              | Capacitance |  |
| A                 | 104 fF      |  |
| B                 | 104 fF      |  |
| SB                | 81 fF       |  |
| COB               | 182 fF      |  |

| DRIVE CAPABILITY |                                  |       |        |
|------------------|----------------------------------|-------|--------|
| Output           | $C_{LOAD}$ $t_{rise}$ $t_{fall}$ |       |        |
| SB               | 0                                | 4nsec | 4nsec  |
| COB              | 0 7nsec 6nsec                    |       |        |
| SB               | 500fF 18nsec 16                  |       | 16nsec |
| COB              | 500 fF 18nsec 14nsec             |       |        |

• Dynamic Power Dissipation:  $58 \ \mu W @ 1MHz$ ,  $C_{OUT} = 581 fF$  and  $C_{OUT} = 682 fF$  at nodes SB and COB.



#### HAE CELL SIMULATION





#### HALF ADDER-ODD (HAO) CELL

| WORST CASE PROPAGATION DELAYS |              |                                            |        |  |
|-------------------------------|--------------|--------------------------------------------|--------|--|
| Input                         | Output       | $Intrinsic Delay   Delay for C_L = 0.5 pF$ |        |  |
| A, B                          | S 🗡          | 6nsec                                      | 22nsec |  |
|                               | $S \searrow$ | 9nsec 27nsec                               |        |  |
|                               | CO 🗡         | 7nsec 20nsec                               |        |  |
|                               | CO 📐         | 3nsec 5nsec                                |        |  |

| NOD. | NODE CAPACITANCES |  |  |
|------|-------------------|--|--|
| Node | Capacitance       |  |  |
| A    | 104 fF            |  |  |
| B    | 106 fF            |  |  |
| S    | 83 fF             |  |  |
| CO   | 173 fF            |  |  |

| DRIVE CAPABILITY |                                  |        |        |
|------------------|----------------------------------|--------|--------|
| Output           | $C_{LOAD}$ $t_{rise}$ $t_{fall}$ |        |        |
| S                | 0                                | 7nsec  | 4nsec  |
| CO               | 0 11nsec 4nse                    |        |        |
| S                | 500 fF                           | 30nsec | 17nsec |
| CO               | 500 fF $35 nsec$ $8 ns$          |        | 8nsec  |

• Dynamic Power Dissipation: 63  $\mu W @ 1MHz$ ,  $C_{OUT} = 583 fF$  and  $C_{OUT} = 673 fF$  at nodes S and CO.



#### HAO CELL SIMULATION





#### MASTER/SLAVE D-FLIPFLOP (MSDFF) CELL

| WORST CASE PROPAGATION DELAYS |                                                 |       |        |  |
|-------------------------------|-------------------------------------------------|-------|--------|--|
| Input                         | Output Intrinsic Delay Delay for $C_L = 0.5 pF$ |       |        |  |
| $PHI \searrow$                | $Q \nearrow$                                    | 7nsec | 12nsec |  |
|                               | $Q \searrow$                                    | 9nsec | 12nsec |  |
|                               | QB 🗡                                            | 7nsec | 7nsec  |  |
|                               | $QB \searrow$                                   | 4nsec | 4nsec  |  |

| NODE CAPACITANCES |             |  |
|-------------------|-------------|--|
| Node              | Capacitance |  |
| D                 | 53 fF       |  |
| PHI               | 177 fF      |  |
| PHIB              | 176 fF      |  |
| 0                 | 275 fF      |  |
| OB                | 157 fF      |  |
| Q                 | 131 fF      |  |
| QB                | 146 fF      |  |

| DRIVE CAPABILITY |                                  |        |       |  |  |
|------------------|----------------------------------|--------|-------|--|--|
| Output           | $C_{LOAD}$ $t_{rise}$ $t_{fall}$ |        |       |  |  |
| Q                | 0                                | 6nsec  | 5nsec |  |  |
| QB               | 0                                | 9nsec  | 7nsec |  |  |
| Q                | 500 fF                           | 18nsec | 9nsec |  |  |
| QB               | 500 fF                           | 9nsec  | 7nsec |  |  |

- Dynamic Power Dissipation:  $36 \ \mu W @ 1MHz$ ,  $C_{OUT} = 590 fF$ at node Q.
- Set-up Time: 2nsec.

# MSDFF CELL LAYOUT



#### MSDFF CELL SIMULATION





#### FULL ADDER (FA) CELL

| WORST CASE PROPAGATION DELAYS |               |                                                      |        |  |
|-------------------------------|---------------|------------------------------------------------------|--------|--|
| Input                         | Output        | $\hline Intrinsic \ Delay \ Delay \ for \ C_L = 0.5$ |        |  |
| A, B                          | S 🗡           | 10nsec                                               | 28nsec |  |
|                               | $S \searrow$  | 12nsec                                               | 34nsec |  |
|                               | CO 🗡          | 8nsec                                                | 20nsec |  |
|                               | $CO \searrow$ | 4nsec                                                | 8nsec  |  |
| CI                            | S >           | 9nsec                                                | 29nsec |  |
|                               | $S \searrow$  | 12nsec                                               | 33nsec |  |
|                               | CO 🗡          | 7nsec                                                | 19nsec |  |
|                               | CO 📐          | 4nsec                                                | 8nsec  |  |

| NOD. | NODE CAPACITANCES |  |  |
|------|-------------------|--|--|
| Node | Capacitance       |  |  |
| A    | 208 fF            |  |  |
| В    | 208 fF            |  |  |
| CI   | 230 fF            |  |  |
| S    | 135 fF            |  |  |
| CO   | 125 fF            |  |  |

| DRIVE CAPABILITY |                                 |        |        |  |  |
|------------------|---------------------------------|--------|--------|--|--|
| Output           | $ut C_{LOAD} t_{rise} t_{fall}$ |        |        |  |  |
| S                | 0                               | 11nsec | 7nsec  |  |  |
| CO               | 0                               | 13nsec | 7nsec  |  |  |
| S                | 500 fF                          | 35nsec | 27nsec |  |  |
| CO               | 500 fF                          | 36nsec | 14nsec |  |  |

• Dynamic Power Dissipation: 100  $\mu W @ 1MHz$ ,  $C_{OUT} = 635 fF$  and  $C_{OUT} = 625 fF$  at nodes S and CO.



FA CELL LAYOUT

#### FA CELL SIMULATION



### COMPARATOR (COMPARE) CELL

| WORST CASE PROPAGATION DELAYS |               |                 |                          |  |
|-------------------------------|---------------|-----------------|--------------------------|--|
| Input                         | Output        | Intrinsic Delay | Delay for $C_L = 0.5 pF$ |  |
| SQ(RQ:1)                      | CP 🗡          | 9nsec           | 22nsec                   |  |
|                               | $CP \searrow$ | 5nsec           | 12nsec                   |  |
| (RQ:0)                        | CP 🗡          | 10nsec          | 22nsec                   |  |
|                               | $CP \searrow$ | 5nsec           | 12nsec                   |  |
| $RQ\left(SQ:1 ight)$          | CP 🗡          | 11nsec          | 24nsec                   |  |
|                               | $CP \searrow$ | 5nsec           | 14nsec                   |  |
| (SQ:0)                        | CP 🗡          | 8nsec           | 20nsec                   |  |
|                               | $CP \searrow$ | 5nsec           | 12nsec                   |  |
| MQ(SQ, RQB: 0)                | CP 🗡          | 5nsec           | 12nsec                   |  |
|                               | $CP \searrow$ | 4nsec           | 12nsec                   |  |

| NODE | NODE CAPACITANCES |  |  |
|------|-------------------|--|--|
| Node | Capacitance       |  |  |
| SQ   | 55 fF             |  |  |
| SQB  | 54 fF             |  |  |
| RQ   | 57 fF             |  |  |
| RQB  | 54 fF             |  |  |
| MQB  | 52 fF             |  |  |
| CP   | 153 fF            |  |  |

| DRIVE CAPABILITY                                             |                     |        |       |  |  |
|--------------------------------------------------------------|---------------------|--------|-------|--|--|
| Output C <sub>LOAD</sub> t <sub>rise</sub> t <sub>fall</sub> |                     |        |       |  |  |
| CP                                                           | 0                   | 15nsec | 9nsec |  |  |
| CP                                                           | 500fF 42nsec 22nsec |        |       |  |  |

• Dynamic Power Dissipation:  $32 \ \mu W @ 1MHz$ ,  $C_{OUT} = 650 fF$  at node CP.

# COMPARE CELL LAYOUT



#### COMPARE CELL SIMULATION





# NON-INVERTING TRI-STATE BUFFER (NITBUF) CELL

| W                          | WORST CASE PROPAGATION DELAYS                         |       |       |  |  |
|----------------------------|-------------------------------------------------------|-------|-------|--|--|
| Input                      | Input Output Intrinsic Delay Delay for $C_L = 0.5 pF$ |       |       |  |  |
| IN                         | OUT 🗡                                                 | 3nsec | 8nsec |  |  |
| $OUT \searrow$ 1nsec 4nsec |                                                       |       |       |  |  |

| NODE CAPACITANCES |             |  |
|-------------------|-------------|--|
| Node              | Capacitance |  |
| CI                | 59 fF       |  |
| COB               | 69 fF       |  |

| DRIVE CAPABILITY                                             |                    |  |  |  |  |
|--------------------------------------------------------------|--------------------|--|--|--|--|
| Output C <sub>LOAD</sub> t <sub>rise</sub> t <sub>fall</sub> |                    |  |  |  |  |
| COB                                                          | 0 6nsec 4nsec      |  |  |  |  |
| COB                                                          | 500fF 15nsec 8nsec |  |  |  |  |

•

• Dynamic Power Dissipation:  $15 \,\mu W @ 1MHz$ ,  $C_{OUT} = 569 fF$  at node COB.

# NITBUF CELL LAYOUT



#### NITBUF CELL SIMULATION




# CLOCK BUFFER/DRIVER (CLKBUF) CELL

| WORST CASE PROPAGATION DELAYS                     |                |       |        |  |  |  |
|---------------------------------------------------|----------------|-------|--------|--|--|--|
| Input Output Intrinsic Delay Delay for $C_L = 25$ |                |       |        |  |  |  |
| IN                                                | OUT 🗡          | 5nsec | 14nsec |  |  |  |
|                                                   | $OUT \searrow$ | 5nsec | 14nsec |  |  |  |
|                                                   | OUTB 🗡         | 7nsec | 15nsec |  |  |  |
|                                                   | OUTB 📐         | 7nsec | 15nsec |  |  |  |

| NODE CAPACITANCES |        |  |  |  |
|-------------------|--------|--|--|--|
| Node Capacitance  |        |  |  |  |
| IN                | 410 fF |  |  |  |
| OUT               | 900 fF |  |  |  |
| OUTB              | 900 fF |  |  |  |

| DRIVE CAPABILITY         |       |        |        |  |  |  |
|--------------------------|-------|--------|--------|--|--|--|
| Output CLOAD trise tfall |       |        |        |  |  |  |
| OUT                      | 0 f F | 4nsec  | 4nsec  |  |  |  |
| OUTB                     | 0 f F | 2nsec  | 2nsec  |  |  |  |
| OUT                      | 25 pF | 20nsec | 20nsec |  |  |  |
| OUTB                     | 25 pF | 20nsec | 20nsec |  |  |  |

• Dynamic Power Dissipation: 1.5 mW @ 1MHz,  $C_{OUT} = 25pF$  and  $C_{OUT} = 25pF$  at nodes OUT and OUTB.

# CLKBUF CELL LAYOUT



## CLKBUF CELL SIMULATION





# STANDARD I/O PAD CELLS ELECTRICAL CHARACTERISTICS

## • INVERTING INPUT CELL, TTL COMPATIBLE (IIT):

Input Capacitance: 2 pF Vil: 0.8 V max. Vih: 2.0 V min.

| WORST CASE PROPAGATION DELAYS |                                            |         |         |  |  |  |  |
|-------------------------------|--------------------------------------------|---------|---------|--|--|--|--|
| Input                         | Input Output Intrinsic Delay Delay/pF Load |         |         |  |  |  |  |
| IN                            | OUT 🖍                                      | 9.1nsec | 1.3nsec |  |  |  |  |
|                               | $OUT \searrow$                             | 5.8nsec | 1.1nsec |  |  |  |  |

• INVERTING OUTPUT CELL, 2 TTL DRIVE (OI2):

Input capacitance: 0.24 pF Sink capability: 3.2 mA at 0.4 V Source capability: 1.6 mA at 4.6 V

| WORST CASE PROPAGATION DELAYS   |                                            |         |          |  |  |  |  |
|---------------------------------|--------------------------------------------|---------|----------|--|--|--|--|
| Input                           | Input Output Intrinsic Delay Delay/pF Load |         |          |  |  |  |  |
| IN                              | OUT 🗡                                      | 6.9nsec | 0.15nsec |  |  |  |  |
| $OUT \searrow$ 6.5nsec 0.09nsec |                                            |         |          |  |  |  |  |

- TRISTATE NON-INVERTING OUTPUT CELL
  - 2 TTL DRIVE (OT2):

Input capacitance: 0.24 pF

Enable input capacitance: 0.24 pF

Output capacitance: 5 pF

Sink capability: 3.2 mA at 0.4 V

Source capability: 1.6 mA at 4.6 V

| WORST CASE PROPAGATION DELAYS |                                         |         |          |  |  |  |  |
|-------------------------------|-----------------------------------------|---------|----------|--|--|--|--|
| Input                         | ut Output Intrinsic Delay Delay/pF Load |         |          |  |  |  |  |
| IN                            | OUT 🗡                                   | 9.6nsec | 0.12nsec |  |  |  |  |
|                               | $OUT \searrow$                          | 8.5nsec | 0.08nsec |  |  |  |  |
| EN                            | OUT 🗡                                   | 8.6nsec | 0.16nsec |  |  |  |  |
|                               | $OUT \searrow$                          | 8.7nsec | 0.09nsec |  |  |  |  |

## • BI-DIRECTIONAL I/O CELL TTL COMPATIBLE INVERTING INPUT TRISTATE NON-INVERTING OUTPUT WITH 2 TTL DRIVE (BITT2):

Input capacitance: 0.23 pF

Enable input capacitance: 0.17 pF

Output capacitance: 5 pF

Vil: 0.8 V

Vih: 2.0 V

Sink capability: 3.2 mA at 0.4 V

Source capability: 1.6 mA at 4.6 V

| WORST CASE PROPAGATION DELAYS |                |                 |               |  |  |
|-------------------------------|----------------|-----------------|---------------|--|--|
| Input                         | Output         | Intrinsic Delay | Delay/pF Load |  |  |
| IN                            | I/0 🗡          | 9.6nsec         | 0.12nsec      |  |  |
|                               | $I/O \searrow$ | 8.5nsec         | 0.08nsec      |  |  |
| ENABLE                        | I/O 🗡          | 8.6nsec         | 0.16nsec      |  |  |
|                               | $I/O \searrow$ | 8.7nsec         | 0.09nsec      |  |  |
| I/O                           | OUT 🗡          | 9.1nsec         | 1.3nsec       |  |  |
|                               |                | 5.8nsec         | 1.1nsec       |  |  |

• POSITIVE SUPPLY VOLTAGE PAD (IOVDD):

Maximum source current: 80 mA

• GROUND CONNECTION PAD (IOVSS):

Maximum sink current: 80 mA

## APPENDIX E

## SIMULATIONS OF THE TOP LEVEL CELLS AND THE TOP LEVEL CELL LAYOUTS

## **REFERENCE (MASK) REGISTER GROUP**



#### SHIFT REGISTER GROUP



#### THRESHOLD REGISTERS GROUP





#### STATUS REGISTER GROUP







#### INTEGRATOR GROUP



#### DECISION MAKER GROUP

#### 128-BIT CORRELATION







### CONTROLLER GROUP











### INPUT/OUTPUT PADS GROUP







, 117



STATUS CELL LAYOUT



ADD34 CELL LAYOUT

# CNTL CELL LAYOUT



THDM CELL LAYOUT



SRMC CELL LAYOUT

|              |              | ration crime i     | antinin a faith an a suite | 4            |                                                 |
|--------------|--------------|--------------------|----------------------------|--------------|-------------------------------------------------|
|              |              |                    |                            |              |                                                 |
| c            | c c c        | с с                |                            | c c          |                                                 |
|              |              |                    |                            |              |                                                 |
|              |              | с с<br>193 193 193 | c c c                      | с. с<br>1    |                                                 |
|              |              |                    |                            |              |                                                 |
|              |              |                    |                            |              | ┽┽┽╫╢                                           |
|              |              |                    |                            |              |                                                 |
|              | с с с<br>    | с с<br>та та та    | C C C C                    | C C          |                                                 |
|              |              |                    |                            |              |                                                 |
|              | ┞╼┫┞╼┨┞╼     |                    |                            | ┞╼┨┞╼┨┞      | ╼ <u>┨┞╼</u> ┫┞╼┫┞╼<br>╷┼╴ <mark>╅┼╴┇</mark> ┼╴ |
|              |              |                    |                            |              |                                                 |
| TH           |              |                    |                            |              |                                                 |
| с с<br>      | с. с. с.<br> | c c                | c c c                      | c c c        |                                                 |
|              |              |                    | c c c                      | с. с. с.<br> |                                                 |
|              |              |                    |                            |              |                                                 |
|              |              | ŢŢŢ                |                            |              |                                                 |
| C C<br>43    |              |                    |                            |              |                                                 |
| с с<br>ч т т | c c c        | e e                | c                          | c c c        |                                                 |
|              |              |                    |                            |              |                                                 |