# DESIGN AND IMPLEMENTATION OF A SUB-THRESHOLD WIRELESS BFSK TRANSMITTER

A Thesis

by

## SUGANTH PAUL

## Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

# MASTER OF SCIENCE

December 2007

Major Subject: Computer Engineering

# DESIGN AND IMPLEMENTATION OF A SUB-THRESHOLD WIRELESS BFSK TRANSMITTER

## A Thesis

# by

## SUGANTH PAUL

## Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

## MASTER OF SCIENCE

Approved by:

| Chair of Committee, | Sunil P. Khatri       |
|---------------------|-----------------------|
| Committee Members,  | Peng Li               |
|                     | Duncan M. Walker      |
| Head of Department, | Costas N. Georghiades |

December 2007

Major Subject: Computer Engineering

#### ABSTRACT

Design and Implementation of a Sub-threshold Wireless BFSK Transmitter. (December 2007) Suganth Paul, B.E., Anna University, India Chair of Advisory Committee: Dr. Sunil P. Khatri

Power Consumption in VLSI (Very Large Scale Integrated) circuits is currently a major issue in the semiconductor industry. Power is a first order design constraint in many applications. Several of these applications need extreme low power but do not need high speed. Sub-threshold circuit design can be used in these cases, but at such a low supply voltage these circuits exhibit an exponential sensitivity to process, voltage and temperature (PVT) variations. In this thesis we implement and test a robust sub-threshold design flow which uses circuit level PVT compensation to stabilize circuit performance. This is done by dynamic modulation of the delay of a representative signal in the circuit and then phase locking it with an external reference signal.

We design and fabricate a sub-threshold wireless BFSK transmitter chip. The transmitter is specified to transmit baseband signals up to a data rate of 32kbps over a distance of 1000m. In addition to the sub-threshold implementation, we implement the BFSK transmitter using a standard cell methodology on the same die operating at super-threshold voltages on a different voltage domain. Experiments using the fabricated die show that the sub-threshold circuit consumes  $19.4 \times$  lower power than the traditional standard cell based implementation. To Mom, Dad and Bro

#### ACKNOWLEDGMENTS

I am honored to have been a part of Sunil's group for my Master of Science degree at Aggieland. I would like to acknowledge without exaggeration that working under Sunil has been a life changing experience. The amount of energy levels maintained in the group are monumental and highly contagious. I am extremely grateful for the kind of involvement that Sunil gives to each of his students. His constant support and guidance and ability to provide an environment, conducive to scientific research were invaluable. Thank you Sunil for all your efforts!

I would like to express my heartfelt gratitude to Rajesh, Nikhil and Kanu. They have been a constant source of inspiration and support. I want to thank them for the deep and strong ties of friendship that they have provided. Thank you guys for being there for me. I would like to give a special mention to Rajesh for the insane amount of effort that he put in towards this thesis. His contribution was priceless.

I would also like to profusely thank my parents and brother, for their constant encouragement and confidence in me. I will always be indebted to my parents for teaching me the value of hard work and instilling in me a strong sense of moral values. I want to thank them for providing me the opportunity to pursue higher education and nurturing me with the skills necessary to complete this Master of Science degree.

Finally, I would like to thank all my friends, who directly and indirectly supported and helped me in completing this thesis.

# TABLE OF CONTENTS

# CHAPTER

| Ι   | INTRODUCTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     | I-A. The Demand for Low-Power Electronics     1       I-B. Thesis Objectives     3       I-C. Thesis Outline     4                                                                                                                                                                                                                                                                                                                                                                            |
| Π   | RELATED PREVIOUS WORK 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|     | II A Adventeges and Disadventeges of Sub-threshold Circuit                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|     | II-A. Advantages and Disadvantages of Sub-threshold Circuit       Design     5       II-B. Previous Work in Sub-threshold Circuit Design     6       II-C. Choosing a Sub-threshold Circuit Design Methodology     8                                                                                                                                                                                                                                                                          |
| III | DESIGN OF THE CHIP                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     | III-A. Chapter Overview13III-B. Test Vehicle13III-B.1. BFSK Radio Transmitter Architecture14III-C. System Architecture15III-C.1. PLA Basics16III-C.2. PLA Operation18III-C.3. Network of PLA Operation19III-C.4. Dynamic Compensation Circuit20III-C.5. The Digital BFSK Modulator23III-C.5.a. Phase Accumulator and NCO24III-C.6. Digital to Analog Converter (DAC)27III-C.7. Common Source Amplifier29III-D. Design Specifications29III-D.1. Link Budget Analysis29III-E. Chapter Summary32 |
| IV  | IMPLEMENTATION OF THE CHIP     33                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|     | IV-A. Design Flow33IV-B. HDL to Netlist Flow35IV-B.1. SPICE Verification of Dynamic Compensation37IV-C. DAC and Amplifier Design38                                                                                                                                                                                                                                                                                                                                                            |

|            | IV-D. Special Considerations                | 39 |
|------------|---------------------------------------------|----|
|            | IV-D.1. Testability and Redundancy          | 39 |
|            | IV-D.2. Voltage Domains                     | 42 |
|            | IV-E. Standard Cell Based BFSK Design       | 43 |
|            | IV-F. IO Pad and ESD Diode Design           | 43 |
|            | IV-G. Chip Integration and Pin-out          | 45 |
|            | IV-H. Layout                                | 47 |
|            | IV-I. Summary of Verification Methodologies | 48 |
|            | IV-J. Chapter Summary                       | 49 |
| V I        | EXPERIMENTAL RESULTS                        | 50 |
|            | V-A. Functional Verification                | 50 |
|            | V-B. Dynamic Compensation Circuit           | 51 |
|            | V-C. Operating Ranges                       | 52 |
|            | V-D. Spectrum of Output Sinusoidal Signals  | 54 |
|            | V-E. Comparison with Standard Cells         | 55 |
|            | V-F. Chapter Summary                        | 56 |
| VI (       | CONCLUSION                                  | 57 |
| REFERENCES |                                             | 59 |
| VITA       |                                             | 63 |

Page

# LIST OF TABLES

| TABLE | Ι                                                    | Page |
|-------|------------------------------------------------------|------|
| II.1  | Comparison of Traditional and Sub-threshold Circuits | 8    |
| IV.1  | PLA Configuration                                    | 37   |
| IV.2  | Chip Pin-out: Standard Cell BFSK Portion             | 45   |
| IV.3  | Chip Pin-out: Sub-threshold BFSK Portion             | 46   |
| V.1   | Sub-threshold vs Standard Cell Power Consumption     | 56   |

## LIST OF FIGURES

| FIGURE      | ;                                                            | Page |
|-------------|--------------------------------------------------------------|------|
| II.1        | Delay Range with and without Our Dynamic Body Bias Technique | 9    |
| III.1       | BFSK Transmitter Architecture                                | 14   |
| III.2       | System Architecture                                          | 16   |
| III.3       | Schematic View of PLA                                        | 17   |
| III.4       | Timing Diagram of a NPLAs                                    | 20   |
| III.5       | Phase Detector and Charge Pump Circuit                       | 21   |
| III.6       | Phase Detector Waveforms when PLA Delay Lags BCLK            | 22   |
| III.7       | Phase Detector Waveforms when PLA Delay Leads <i>BCLK</i>    | 23   |
| III.8       | Digital to Analog Converter                                  | 27   |
| III.9       | Common Source Amplifier                                      | 28   |
| IV.1        | Design Flow                                                  | 34   |
| IV.2        | Dynamic Bulk Node Modulation                                 | 38   |
| IV.3        | DAC Output                                                   | 40   |
| IV.4        | Amplifier Output                                             | 41   |
| IV.5        | PAD Cell Schematic                                           | 44   |
| IV.6        | PLA Layout                                                   | 47   |
| IV.7        | Die Layout                                                   | 48   |
| <b>V</b> .1 | BFSK Modulation                                              | 50   |
| V.2         | Bulk Node Voltage Modulation with <i>VDD</i>                 | 51   |
| V.3         | Bulk Node Voltage Modulation with <i>BeatClock</i>           | 52   |

| V.4 | Maximum Operating Frequencies                 | 53 |
|-----|-----------------------------------------------|----|
| V.5 | Power Consumed at Maximum Operating Frequency | 53 |
| V.6 | FFT of DAC Output                             | 54 |
| V.7 | FFT of Amplifier Output                       | 55 |

Page

#### CHAPTER I

#### INTRODUCTION

## I-A. The Demand for Low-Power Electronics

The density and the speed of Integrated Circuits (ICs) have been increasing almost exponentially for over three decades. This observation is made by Moore's law, which states that the number of transistors that can be placed on an IC of the same area has been doubling every two years. The main concern which hampers the further scaling of transistors is the issue of power dissipation. A Very Large Scale Integrated (VLSI) chip consists of many energy storage elements, mainly capacitors, some that are required for computation (MOSFET device capacitances) and some that are a hindrance to circuit operation (parasitic capacitances). These capacitors are continually charged and discharged through resistive elements during circuit operation, resulting in energy dissipation in the form of heat. The amount of heat dissipated puts a restriction on the computational performance of the circuit, or the number of times transistors in the circuit can switch for a given power budget. One could argue that the shrinking of devices has reduced the amount of parasitic capacitance and this alleviates power dissipation problems. However the increase in the *number* of devices due to the increase in device density has more than compensated for the decrease in the parasitic capacitance of a single device.

As circuits shrink and more transistors and circuit functions are integrated on a single chip, the sub-threshold leakage current is becoming an important determinant of power consumption. This leakage current occurs when transistors are not switching. Leakage currents therefore dissipate power even when there is no useful computation going on. In recent designs, the power dissipation due to leakage current, is frequently more than

The journal model is IEEE Transactions on Automatic Control.

50% of the power dissipated by the entire chip. Increased power consumption in the chip means that there is more energy being dissipated as heat. The MOSFET threshold voltage  $V_T$  decreases with increase in the device junction temperature caused by this heat dissipation. The sub-threshold leakage currents are exponentially dependent on  $V_T$ , increasing with decrease in  $V_T$ . Thus increased on-chip temperatures cause more power dissipation due to increased leakage currents and increased dissipation of heat. Another problem with aggravated on-chip temperatures is that they can result in reduced operating lifespans for the chip [1]. Also any chip that consumes power beyond a certain degree needs to be cooled. This is a significant problem in the case of commercial electronic products such as servers, desktop computers, graphics processors, some high performance gaming devices, etc. Increased cooling solution costs are an additional bane of increased power consumption. High power consumption is thus a growing problem in the ever expanding world of electronic devices.

There are many applications that use VLSI circuit technology where low power is essential but the speed of operation of the device is non-critical. Examples include sensor networks, wearable computers, certain portable electronic devices, etc. Here speed is a secondary design goal, whereas low power consumption is a primary design requirement. For example [2, 3, 4] show that sensor networks have the capability to accumulate, process and communicate information under various operating conditions. These networks are spatially distributed in nature and have the need for each sensor in the network to be as maintenance-free as possible. Further, low-power consumption in these applications would reduce the amount of headroom needed for battery supplies. Also the weight of the product would be lower since smaller batteries will be sufficient to power these devices, and complex cooling solutions would not be required.

Thus a robust methodology for designing extreme low power circuits will be useful for a large class of applications in which larger circuit delays are tolerable but low power consumption is a primary design requirement. The circuit design methodology of choice in this case would be sub-threshold circuit design. Here the *VDD* of the circuit in question is set at a value lower than or equal to the threshold voltage of that particular process technology, i.e.  $VDD \leq V_T$ . The circuit will thus operate with only sub-threshold leakage currents, since the transistors in the circuit will never be in the linear or saturation region. This approach not only results in very low power consumption but it also utilizes leakage currents for computation and thus capitalizes on the problem that traditional VLSI design methodologies are faced with (that of an exponential increase in leakage with successive process generations). The only source of power dissipation of the chip (in case of subthreshold circuits) is due to sub-threshold leakage currents.

Sub-threshold circuits yield significantly lower power consumption compared to their super-threshold counterparts. However the sub-threshold current has an exponential dependence on process, temperature and supply voltage variations. As a result any practical sub-threshold design methodology must be immune to these PVT variations.

### I-B. Thesis Objectives

The primary goal of this thesis is to demonstrate a sub-threshold circuit design approach, for use in designs which demand extreme low power consumption. There are currently no validated design flows or proved design methodologies for designing sub-threshold circuits. The objectives of this thesis include the following.

- To identify a sub-threshold circuit design approach.
- To come up with a robust design methodology to design and fabricate sub-threshold circuits.
- To choose an application that will demonstrate the usefulness of a low power subthreshold circuit.

- To design the required circuit, fabricate and test the chip.
- To quantitatively compare the post-silicon power consumption of a sub-threshold circuit implementation with that of a traditional standard cell based implementation of the same circuit.

### I-C. Thesis Outline

This thesis is organized as follows. In Chapter II we introduce the concept of sub-threshold circuit design. We also survey of various sub-threshold design methodologies and select one of these methodologies for the design of the sub-threshold circuits in this thesis. The most important criteria for the sub-threshold circuit design approach of choice is that the delay variation of the circuit needs to be compensated against PVT (process, voltage and temperature) variations. We use a methodology that provides this compensation.

In Chapter III, we choose a test application which we will implement using subthreshold circuits. We present a system level architecture and describe each of the system level blocks in detail. We then discuss the various design constraints and optimizations needed for the particular application. We then come up with a design framework that will be used to implement the design.

In Chapter IV, we present a detailed account of the steps involved in the implementation of the design. We explain the design flow used to implement the sub-threshold circuit. This includes various special features such as testability and redundancy added to the design. We also list out the various verification steps that were performed on the design.

In Chapter V, we quantitatively list the experiments performed and the results obtained from the fabricated die. We also show that the sub-threshold circuit consumes 19.4X lesser power than the standard cell circuit implementing the same function.

Finally, in Chapter VI we conclude this thesis.

#### CHAPTER II

#### **RELATED PREVIOUS WORK**

#### II-A. Advantages and Disadvantages of Sub-threshold Circuit Design

Sub-threshold circuit design is done by setting  $VDD \le V_T$  in the circuit. Under this condition the sub-threshold leakage current of a MOSFET is given by the equation:

$$I_{ds} = \frac{W}{L} I_{D0} e^{\left(\frac{V_{gs} - V_T - V_{off}}{nv_t}\right)} [1 - e^{-\frac{V_{ds}}{v_t}}]$$
(2.1)

In Equation 2.1,  $V_T$  is the device *threshold voltage*. It depends on process dependent factors like gate and insulator materials, thickness of insulator and channel doping density. It also depends on operational factors like  $V_{sb}$  (body effect) and temperature ( $V_T$  is inversely proportional to device junction temperature). *W* and *L* are the device width and length. Also,  $I_{D0}$  is a constant while  $v_t = \frac{kT}{q}$ . Here *k* is the Boltzmann's constant, and  $v_t = 26mV$ at room temperature. *n* is the sub-threshold swing parameter (a constant). Finally,  $V_{off}$  is a constant. Note that in the sub-threshold region a transistor is either *off* or *less off*. For the sake of simplicity, we say that an NMOS transistor is *on* when its gate is at VDD and *off* when its gate is at *GND*. Similarly we say a PMOS transistor is *on* when its gate is at *GND* 

The **advantages** of a sub-threshold circuit design approach that utilizes leakage currents for computation are:

- Power is significantly lower. (Shown to be  $100 \times$  to  $500 \times$  lower for  $100 \mu$ m and  $70 \mu$ m processes [5]).
- Circuits get *faster* at higher temperature [6].
- Device transconductance has an exponential dependence on  $V_{gs}$  as seen in Equa-

tion 2.1. This results in a high ratio of ON to OFF current in a device stack. As a consequence of this, circuit noise margins for distinguishing between ON and OFF values are very good.

• Delay gets worse by 10-25×, but the PDP (Power-Delay Product) improves by 10-20× [5].

The disadvantages of a sub-threshold circuit design approach are:

- Sub-threshold leakage currents are extremely small, resulting in large delays.
- The leakage currents are highly sensitive to process, voltage and temperature (PVT) variations. It can be seen from Equation 2.1 that the sub-threshold leakage currents are exponentially dependent on the operating junction temperature of the device through the threshold voltage,  $V_T$  which is inversely dependent on temperature and also on  $v_t$  which is directly dependent on temperature. The exponential dependence of the leakage current on  $V_{gs}$  and  $V_T$  also contributes to the exponential sensitivity of sub-threshold circuits to power supply and process variations.
- There are no stable design methodologies used today to design sub-threshold logic circuits. A systematic EDA framework for the design of complex digital systems using sub-threshold logic has not been demonstrated. The exponential dependence of the sub-threshold circuit on PVT variations has made it difficult to devise such a design methodology.

### II-B. Previous Work in Sub-threshold Circuit Design

Several authors have proposed design techniques to implement sub-threshold circuits. In [7, 8, 9], the authors have introduced a sub-threshold logic design approach for ultra-low power circuits. They state that their approach would be useful for any application where low power

consumption is a primary goal and circuit performance or speed of operation is less important. In one of the two proposed approaches, they have described circuitry to stabilize the operation of their circuit across process and temperature variations. In these papers, the idea of using sub-threshold circuits has been discussed from a device standpoint, and candidate compensation circuits have been proposed. Also, no systematic design methodology has been provided to addresses the multiple issues of process, temperature and supply variations within an IC die.

In [10], the authors have reported a sub-threshold implementation of a multiplier. The methodology utilizes a leakage monitor, and a circuit which compensates the sub-threshold current across process and temperature variations.

In [11], a dynamic substrate biasing technique has been introduced, as a means to make a design insensitive to process variations. The approach is described in a bulk CMOS context. Further, the technique of [11] matches the circuit delay to that of the critical paths (which needs to be found up-front). The dynamic biasing is not performed on a per-region basis, making it susceptible to intra-die variations.

In [5], the authors have described a methodology to design sub-threshold circuits using a network of Programmable Logic Arrays (PLAs). This approach introduces compensation circuitry to counter the high inter and intra-die PVT variations of sub-threshold circuits. The compensation circuit phase-locks the circuit delay of one representative PLA in the design to an external clock. The compensation is done by dynamically modulating the bias voltage of the bulk terminal, thereby controlling the circuit delay. Also, all the PLAs have the same delay. So the task of finding the critical path delay is trivial.

#### II-C. Choosing a Sub-threshold Circuit Design Methodology

The design methodology given in [5] provides a practical approach as it deals with directly compensating the circuit delay using an external reference clock. Also, finding the critical circuit delay is easy since each PLA has an identical delay. This gives the designer more control over the performance of the circuit. Results from [5] indicate that a sub-threshold design approach will yield a 100-500× reduction in power, compared to traditional designs, with a 10-25× delay penalty. This analysis was performed using the Berkeley Predictive Technology Model [12] for the 0.1 $\mu$ m and 0.07 $\mu$ m processes. Table II.1 shows the delay, power and power-delay product (P-D-P) for a 21-stage ring oscillator, implemented using both the traditional and sub-threshold design approaches. Note that Table II.1 also indicates that the P-D-P of the sub-threshold design is 10-25× better than a traditional design.

Table II.1. Comparison of Traditional and Sub-threshold Circuits

|         | Traditional Ckt |          |          | Sub-threshold Ckt ( $V_b = 0V$ ) |                 |                | Sub-threshold $Ckt(V_b = VDD)$ |                 |                |
|---------|-----------------|----------|----------|----------------------------------|-----------------|----------------|--------------------------------|-----------------|----------------|
| Process | Dly (ps)        | Pwr (W)  | P-D-P(J) | Delay ↑                          | Power ↓         | P-D-P↓         | Delay ↑                        | Power ↓         | P-D-P↓         |
| bsim70  | 14.157          | 4.08e-05 | 5.82e-07 | 17.01×                           | $308.82 \times$ | $18.50 \times$ | 9.93 ×                         | 141.10×         | $14.43 \times$ |
| bsim100 | 17.118          | 6.39e-05 | 1.08e-06 | $24.60 \times$                   | $497.54 \times$ | $20.08 \times$ | $12.00 \times$                 | $100.96 \times$ | $8.20 \times$  |

One disadvantage of using sub-threshold circuits is that there is a huge variation in the sub-threshold current (and hence the circuit delay) as a function of  $V_T$ , temperature and VDD. Sub-threshold currents exhibit an exponential dependence on the circuit's junction temperature, as indicated in Equation 2.1. In fact, the variation of circuit delay over a junction temperature of 0°C to 100° is as high as 6×. Sub-threshold circuits are also affected exponentially by process and supply variations. This makes a sub-threshold design approach highly unpredictable, making the design hard to center. To equalize the delay due to these variations, [5] introduces compensation circuitry that dynamically modulates the bias voltage of the bulk terminal and thereby controlling the circuit delay. This involves *phase-locking* the circuit delay to an external clock called a *beat clk*. This phase locking is



Fig. II.1. Delay Range with and without Our Dynamic Body Bias Technique

done for a group of spatially localized Programmable Logic Arrays (PLAs).

The sub-threshold design methodology in [5] uses a network of PLAs as the underlying circuit structure for implementing logic. The working of PLAs will be described in Section III-C.1. PLAs are used here mainly because this circuit block can be designed such that the delay of all outputs is constant, regardless of the input vector applied. Hence, the task of finding the critical delay path (which needs to be solved in other bulk bias control approaches such as [11]) is avoided. If all the PLAs in the design have the same size the circuit delay to be monitored can be the delay of any of the PLAs (used as a representative block). Also, design methodologies using a network of medium sized PLAs was shown [13] to be a viable way to perform digital design, resulting in improved delay for a design. In a standard cell based flow, there is an intervening technology mapping step, which often negates the benefits of technology-independent logic optimization. A network of PLAs on the other hand, allows us to carry forward the benefits of technology-independent multilevel logic synthesis. Finally, a design implemented using such a network of PLAs can be easily mapped into a structured ASIC setting [14].

A circuit to be implemented using this methodology is realized as a multi-level network of interconnected, dynamic NOR-NOR PLAs. Spatially localized PLAs are clustered, and each cluster of PLAs shares a common Nbulk node. This Nbulk node is driven by a bulk bias adjustment circuit (one per PLA cluster), whose task it is to synchronize the delay of a representative PLA in the cluster, to a globally distributed *beat clock (BCLK)*. The beat clock is an external signal, derived from the system clock. If the user would like a high speed of operation, they increase the duty cycle of *beat clk*, and all PLAs in the design speed up to synchronize to *beat clk*. Conversely, the user can reduce the duty cycle of *beat clk* (when the computational needs are relaxed), and the PLAs slow down and synchronize to *beat clk* again. Importantly, *these adjustments are done dynamically, in a closed-loop manner during circuit operation*. In this way, a synchronous design methodology, that is insensitive to inter and intra-die process, temperature and voltage variations, can be implemented using sub-threshold PLAs.

Figure II.1 illustrates the performance of this process, voltage and temperature (PVT) compensation circuit. This figure shows the range of delays of a sub-threshold circuit ring oscillator when process, voltage and temperature are varied. The range of variation of  $V_T$  was  $(\pm 5\%)$ , *VDD* was  $(\pm 10\%)$ ,  $l_{eff}$  was  $(\pm 5\%)$  where the variation values represent  $3\sigma$  variation around the mean [15]. Without compensation, the range of delays of the sub-threshold circuit (under PVT variations) is shown in the light color. Delay variation results for the compensated design are shown in the dark color. The tolerance to delay variation is improved by between 1 and 2 orders of magnitude. Note that without a compensating circuit, the delay variation of the design is as high as 1-2 orders of magnitude, effectively making a sub-threshold design approach infeasible. However with the use of the dynamic compensating circuit, the delay variation of the design is extremely low, enabling a viable

circuit design approach.

Some observations that are made about this design style in [16] are

- The PLAs in this approach operate *just fast enough* to stay synchronized with the *beat clk*, thereby minimizing circuit power for a given speed of operation.
- Adaptive bulk voltage control for PMOS devices is not performed, since there are very few PMOS devices per PLA, and they are mostly utilized for pre-charging the PLA and does not affect the critical evaluation stage of the PLA. It is crucial to perform bulk voltage control for NMOS devices since they are used to perform the computation during the evaluate phase of the clock.
- The distribution of the power supply and ground signals should be performed using a low-resistance supply distribution methodology such as a layout fabric [17, 13]. The power distribution network in these papers had significantly lower *iR* drops than existing power distribution approaches (up-to 20× lower than traditional approaches [17]). The distribution of a sub-threshold *VDD* signal could be challenging, but this challenge can be averted by using a high quality power distribution grid. Also, the switching currents in the sub-threshold design methodology are up to a couple of orders of magnitude smaller than in traditional designs, alleviating the power supply distribution problem significantly.

This thesis will involve the design and implementation of a test application using the sub-threshold design methodology introduced by [5]. This will effectively demonstrate that circuit design in the sub-threshold realm could be a mainstream solution for a large class of applications. Compensation circuitry to combat PVT variations will be used to limit the delay variation of the circuit thus making the design robust. The design of the chip is targeted for the TSMC [18]  $0.25\mu$ m process, which is a triple well CMOS process. The

triple well process is needed since the bulk node of the variation compensated transistors need to be shielded from the substrate of the chip. This is a requirement for this design methodology.

#### CHAPTER III

## DESIGN OF THE CHIP

#### III-A. Chapter Overview

The main objective of this thesis is to demonstrate the viability of a sub-threshold design methodology. This chapter presents the design of a test application using the circuit design methodology described in [5]. This chapter discusses the criteria used to choose a test application. It also defines the design constraints that are to be taken into account while designing a sub-threshold circuit. The architecture of the system to be implemented is shown and the various sub-blocks in the system are explained in a detailed fashion. This chapter also outlines some special considerations and redundant features and failure safe features that are built into the chip. The design of the chip will be targeted for the TSMC [18]  $0.25\mu$ m process, which is a triple well CMOS process.

#### III-B. Test Vehicle

As mentioned before there is a large and growing application space that requires a very low power consumption without the need for high speed. One application that does not need high speeds is a wireless radio transmitter, where the signal to be transmitted occupies a small bandwidth (such as voice). An ultra-low power implementation of a radio transmitter will have broad implications for the class of applications that demand very low power consumption. For example this wireless transmitter can be used in sensor networks. The radio transmitter will be realized with digital circuits as far as possible, since digital circuits are preferable to analog circuits when operating in the sub-threshold region. The digital circuits will be implemented using a Network of PLA (NPLA) based approach. The immunity of the circuit to variation can be strengthened using the dynamic delay compen-



Fig. III.1. BFSK Transmitter Architecture

sation circuitry discussed in Section III-C.4. A simple digital modulation scheme has to be used for the radio transmitter. Binary Frequency Shift Keying (BFSK) and Binary Phase Shift Keying (BPSK) are two well known digital modulation schemes. BPSK is 3dB more power efficient than BFSK. However BFSK has the advantage of being easy to implement. Hence BFSK will be used as the modulation scheme for our radio transmitter.

## III-B.1. BFSK Radio Transmitter Architecture

A typical BFSK transmitter generates a frequency tone at the output and shifts the frequency of the output tone to pre-determined values depending on the value of the input which can be a logical HIGH or LOW. A generic digital BFSK transmitter block diagram is shown in Figure III.1. The input to the transmitter is assumed to be digitized and supplied to the transmitter at a rate of  $R_B$  bits/s. The frequencies of the two tones that will be produced by the BFSK transmitter are given by  $f_1$  and  $f_2$ .  $\phi_1$  and  $\phi_2$  are phase offsets that the two tones could have. Depending on the value of the binary input, one of the tones are multiplexed to the output. A BFSK transmitter can be coherent or non-coherent. In a coherent BFSK modulation scheme,  $\phi_1 = \phi_2$  and in a non-coherent BFSK modulation scheme,  $\phi_1 \neq \phi_2$ . In practice coherent BFSK modulation is extremely hard to demodulate since there is a synchronization required between the transmitter and the receiver. Hence we will use a non-coherent modulation scheme. For non-coherent modulation, if the BFSK modulation has the condition that  $f_1 - f_2$  is an integer multiple of the input bit rate,  $R_B$  then the modulation is called orthogonal FSK (since the two signals used for modulating the binary data are orthogonal if this condition is met). If this condition is not met the FSK scheme is called non-orthogonal. The difference between the two schemes is that, non-orthogonal FSK requires more transmit power than orthogonal FSK for the same error performance at the receiver side. The receiver for an both schemes can be constructed using a couple of bandpass filters with their pass band frequencies centered around  $f_1$  and  $f_2$  respectively.

While designing a BFSK transmitter, the two oscillators in Figure III.1 can be realized using digital circuits as a Numerically Controlled Oscillator (NCO) which will be described in Section III-C.5.a. In order to do wireless transmission of a signal, we need a Digital to Analog Converter (DAC) and an antenna. The entire system level architecture is explained in Section III-C.

#### III-C. System Architecture

The BFSK transmitter architecture consists of a digital BFSK modulation circuit, a DAC, an amplifier and an antenna for wireless transmission. This is shown in Figure III.2. The BFSK modulator is implemented as a digital circuit, using a network of Programmable Logic Arrays (PLAs). We will give a brief introduction to PLAs and how they are used in a network to do computations. We will also discuss in detail about each of the digital and analog components that make up the design of the system.



Fig. III.2. System Architecture

**III-C.1.** PLA Basics

This section describes the structure and operation of PLAs which are the basic circuit modules used in this design. Note that the PLAs in this design operate in their sub-threshold region of conduction. Consider a PLA consisting of *n* input variables  $x_1, x_2, \dots, x_n$ , and *m* output variables  $y_1, y_2, \dots, y_m$ . Let *k* be the number of rows in the PLA. A *literal*  $l_i$  is defined as an input variable or its complement.

Suppose we want to implement a function f represented as a sum of cubes  $f = c_1 + c_2 + \dots + c_k$ , where each cube  $c_i = l_i^1 \cdot l_i^2 \cdots l_i^{r_i}$ . We consider PLAs which are of the *NOR*-*NOR* form. This means that we actually implement f as

$$\overline{f} = \overline{\sum_{i=1}^{k} (c_i)} = \overline{\sum_{i=1}^{k} (\overline{c_i})} = \overline{\sum_{i=1}^{k} (\overline{l_i^1} + \overline{l_i^2} + \dots + \overline{l_i^{r_i}})}$$
(3.1)

The PLA output  $\overline{f}$  is a logical NOR of a series of expressions, each corresponding to the NOR of the complement of the literals present in the cubes of f. In the PLA, each such expression is implemented using *word lines*, in what is called the *AND plane*. These word lines run horizontally through the core of the PLA. Literals of the PLA are implemented using vertical-running *bit-lines*. For each input variable, there are two bit-lines, one for each of its literals. The outputs of the PLA are implemented by *output lines*, which also run vertically. This portion of the PLA is called the *OR plane*.

Figure III.3 illustrates the schematic of the PLAs used in this design.



Fig. III.3. Schematic View of PLA

All the PLAs in our design are of the precharged NOR NOR type, and have a fixed number of inputs (8), outputs (6) and cubes (12). This was found to be a good size for the design based on logic synthesis results explained in Section IV-B while using medium sized PLAs (5-15 inputs, 3-8 outputs and 10-20 rows). Also, a technique called folding is used, to enhance a PLA to hold more logic without increasing the area used. This is done by running two unconnected *bit-lines* corresponding to two different inputs on the same track. One of the *bit-lines* start from the top of the PLA and the other one starts from the

bottom and stops clear of the first *bit-line*. In this way, more cubes can be fitted into the PLA in compact way.

## III-C.2. PLA Operation

The PLAs enter their precharge state when the CLK signal is low. During this time, the horizontal wordlines get precharged. A special wordline (the dummy wordline), which is the maximally loaded wordline also gets precharged. The signal on the dummy wordline is inverted to generate the delayed clock signal D\_CLK. When the dummy wordline precharges (after all the other wordlines of the PLA have precharged), the delayed clock D\_CLK switches low, cutting off the OR plane from GND. This delayed clock signal is also connected to PMOS pullups at each output line which serve to precharge (pullup) the output lines during the precharge phase. A special output line (which is inverted to produce the signal *completion* shown in Figure III.3) also gets precharged. The dummy wordline is designed to be the last wordline to switch (by making it maximally loaded among all wordlines). Similarly, the *completion* signal is also the last output signal to switch, since it is maximally loaded as well, in comparison to other outputs. The *completion* signal switching low, signals the completion of the precharge operation of the PLA. In the precharged state, all the wordlines and the output lines of the PLA are precharged. Now, when the CLK signal switches high, the PLA enters the *evaluation* phase. In evaluation, if any of the vertical bitlines are high, the wordline that it is connected to, gets pulled low. One of the inputs and its complement is connected to the dummy wordline, so that the dummy wordline switches low during *every* evaluate phase and effectively acts as a timing reference for the PLA. By design, the dummy wordline is the last wordline to switch low. When the dummy wordline switches low, it makes the signal D\_CLK switch high, as a result of which the GND gating transistor in the OR plane now turns on. The output lines to which wordlines that have switched low are connected, will switch low. The *completion* line, which is connected to the complement of the dummy wordline is the last signal to switch high. This signals the completion of the evaluation operation. The completion signal of the PLA switches in each cycle. This signal is used to phase lock the PLA delay with the *beat clock (BCLK)* signal. Initial simulations using HSPICE [19], showed that precharge and evaluate time for the 8 input, 6 output, 12 cube NOR NOR PLA were,  $T_{pchg} = 45ns$  and  $T_{eval} = 35ns$ .

### III-C.3. Network of PLA Operation

A network of PLAs, NPLA is nothing but a multilevel network of PLAs. Each of the digital components that make up the digital BFSK modulator in Figure III.2, i.e. the Dynamic Compensation circuit, NCO and the Binary to Thermometer Code Converter are made of NPLAs. Each of these blocks are implemented as combinational circuits and the outputs of each of these blocks are registered using negative edge triggered flip-flops clocked by Clk. The flip-flops are negative edge triggered as the outputs of the flip-flops need to be stable when the *Clk* signal is HIGH when the PLAs are evaluating. The timing diagram of NPLAs in a single combinational circuit is shown in Figure III.4. Notice from this figure that all the PLAs in a network precharge at the same time and start evaluating one after another in a cascading fashion. Hence an evaluation period has to be provided, that is sufficient for *all* the PLAs to evaluate. Each PLA in the network is clocked by the previous PLAs *CLKOUT* signal except for the first PLA in the chain which is clocked by the *CLK* signal. The *CLKOUT* signal of each PLA is the logical AND of its *completion* signal and the *CLK* signal. The maximum throughput that can be achieved depends on the delay of the slowest combinational block. When implemented as a network of PLAs, the throughput of the circuit can be approximately written as:

$$Throughput = \frac{1}{T_{pchg} + N * T_{eval}}$$
(3.2)

Here *N* is the number of levels of PLAs needed in the multilevel network of PLAs. We will see in Section IV-B that the maximum number of levels needed for the slowest combinational block for this design is 19. This gives us an estimate of the throughput as approximately **1.4MHz**, if we use  $T_{pchg} = 45ns$  and  $T_{eval} = 35ns$  as mentioned in the previous section.



Fig. III.4. Timing Diagram of a NPLAs

## III-C.4. Dynamic Compensation Circuit

As discussed in Section II-C, the dynamic delay compensation circuit is used to to *phase lock* the circuit delay to a *beat clock*. This phase locking is done for a group of spatially

localized Programmable Logic Arrays (PLAs). The circuit in the design consists of a multilevel network of interconnected dynamic NOR-NOR PLAs. The total number of PLAs that are needed for this design is 33 as seen from Section IV-B. These PLAs are placed such that they are part of a single cluster of PLAs sharing a common Nbulk node. This Nbulk node is driven by a bulk bias adjustment circuit, whose task it is to synchronize the delay of a representative PLA in the cluster, to a globally distributed *beat clock (BCLK)*. The beat clock is an external signal, derived from the system clock. For a high speed of operation, the duty cycle of *BCLK* needs to be increased, and all PLAs in the design speed up to synchronize to *BCLK*. Conversely, reducing the duty cycle of *BCLK* slows down the PLAs to synchronize to *BCLK* again. In this way, we can implement a synchronous design methodology using sub-threshold PLAs, in a manner that is insensitive to inter and intra-die processing, temperature and voltage variations.

The self-adjusting body bias scheme controls the substrate voltage of the PLAs in a closed-loop fashion, by ensuring that the delay of a representative PLA in the cluster is phase locked to the *BCLK* signal. The phase detector and charge pump circuits used for the design are shown in Figure III.5.



Courtesy: [5]

Fig. III.5. Phase Detector and Charge Pump Circuit

The NAND gate in this figure detects the case when the completion signal is too slow,

and generates low-going pulses in such a condition. These pulses are used to turn on the PMOS device of Figure III.5, and increase the *Nbulk* bias voltage, resulting in a speed-up in the PLAs. The waveforms of the signals for this case are shown in Figure III.6.



Fig. III.6. Phase Detector Waveforms when PLA Delay Lags BCLK

Note that in general, *BCLK* may be derived from *CLK*, having coincident falling edges with *CLK* but a rising edge which is delayed by a quantity *D* from the rising edge of *CLK* as shown in Figure III.6. This quantity *D* is the delay which we want for the evaluation of the reference PLA whose *completion* signal is fed to the Phase Detector Circuit shown in Figure III.5.

If the completion has not occurred by the time *BCLK* rises, a downward pulse is generated on the *pullup* signal, which forces charge into the Nbulk node, resulting in faster generation of *completion*. Note that at this time, *pulldown*, the signal which is used to bleed off charge from *Nbulk*, is low.

The NOR gate in Figure III.5 generates high-going pulses to turn on the NMOS transistor when the PLA delay leads *BCLK*. These pulses drive the NMOS device in Figure III.5, bleeding charge out of *Nbulk* and thereby slowing the PLA down. The waveforms of the signals for this case are shown in Figure III.7.



Fig. III.7. Phase Detector Waveforms when PLA Delay Leads BCLK

The *BCLK* is used to speed up the operation of the PLAs during the evaluation phase. The evaluation delays of PLAs in our design happen one after the other as shown in Figure III.4. We need to choose a reference PLA out of the chain of PLAs in the network. The *completion* signal of this reference PLA will be used as the reference circuit delay for the delay compensation circuit. Usually there are many levels of PLAs in the synthesized network of PLAs. In this scenario, it would be ideal to choose a PLA which completes its evaluation at approximately half the time it takes the entire network of PLAs to complete its evaluation period. This is because the completion signal of the reference PLA would transition to a LOW value during the middle of the evaluation time span of the *CLK* signal. This gives the *BCLK* signal sufficient room on both sides of the completion signal to be able to generate equally long *pullup* or *pulldown* signals. In our case, we use a PLA at logical depth 10 out of a maximum of 19 as the reference PLA.

### III-C.5. The Digital BFSK Modulator

The function of the digital BFSK modulator as seen in Section III-B.1 is to produce either of two frequency tones depending on the logical value of a binary input signal. The digital BFSK Modulator seen in Figure III.1 has two oscillators, but we have reduced this complexity of having two oscillators by using an Numerically Controlled Oscillator (NCO). The modulator is implemented using three combinational circuits namely, the phase accumulator, the NCO and the binary to thermometer code converter. These combinational circuits have negative edge triggered registers between them, which are clocked by the *CLK* signal. The combinational circuits in the next couple of sections.

#### III-C.5.a. Phase Accumulator and NCO

The NCO is a digital implementation of a sinusoidal oscillator. The advantage of an NCO is that the frequency of the sinusoidal wave produced by the NCO and its phase can be altered in real time by programming the NCO. The basic operation of the NCO is described next. The NCO is implemented as a lookup table (LUT) that stores quantized and rounded values of the sinusoidal wave. The index of the LUT represents the angle for which the sinusoidal value needs to be found. If  $2^n$  is the depth of the LUT where *n* is the number of bits needed to address the lookup table, then each address of the lookup table stores  $2^n$  equally spaced samples of the sinusoidal wave for an angle of  $0^\circ$  to  $360^\circ$ . The LUT is then addressed by a self incrementing counter known as the phase accumulator. Thus when the phase accumulator and the NCO are clocked using a clock signal with a frequency of  $f_{clk}$ , the phase accumulator causes evenly spaced values of the sinusoidal wave to be read out from the NCO depending on the value by which it increments. The output frequency generated by the NCO is given by the equation:

$$f_{out} = \frac{f_{clk}\Delta\theta}{2^n} \tag{3.3}$$

where  $f_{out}$  is the frequency of the output digital sinusoidal wave generated by the NCO,  $f_{clk}$  is the clock signal driving the phase accumulator and the LUT and  $\Delta \theta$  is the value by which the phase accumulator increments on every clock cycle. In order to change the frequency produced at the output of the NCO we need to control the phase accumulator increment,

namely  $\Delta \theta$  based on the value of the binary input signal which needs to be modulated. The depth of the LUT from Equation 3.3 is one of the factors that controls the granularity or resolution with which we can choose output frequencies. The width of each word stored in the LUT also plays a role in finding a sine value with sufficient accuracy. The quality of the output frequency is measured by the spectral purity of the output signal. This is measured by a parameter called the Spurious Free Dynamic Range (SFDR). A good rule of thumb to attain a good SFDR at the output of the NCO is that, the SFDR in dB is six times the width of the phase accumulator in bits. For example if we had a phase accumulator that is 9 bits wide, the SFDR would be 54 dB. This is provided the width of the word stored in the LUT is wide enough. However the word length of the LUT does not improve the SFDR when it becomes wider. An advantage of using an NCO to generate the two FSK tones is that, continuous phase is guaranteed at the output of the digital modulator. When the binary input changes from a logical "0" to a logical "1", the frequency of the NCO changes output changes smoothly without giving a kink at the output of the modulator.

One of the optimizations that can be made to the NCO is that, the LUT need not store sinusoidal values for all input angles. In fact the size of the LUT can be reduced by a factor of four due to the inherent quarter wave symmetry of the sinusoidal wave. Depending on the quadrant of the input angle, the sine wave can be generated from just a quarter of the samples for a full cycle. A register is required at the output of the phase accumulator since the previous value of the phase accumulator needs to be stored to allow it to increment itself. We choose the NCO to have a phase accumulator that is 9bits wide and have an output that has a precision of 8bits. This gives us an SFDR of 54dB which is a reasonable amount of rejection for our application. An estimate of the  $f_{clk}$  signal made in Section III-C.3, gives us the value 1.4MHz. In order to transmit wireless data using orthogonal FSK we have the condition that the  $f_1 - f_2$  is an integer multiple of the data rate,  $R_B$  which is 32kbps. By Nyquist's theorem the maximum frequency that can be represented without

losing information using a clock rate of 1.4MHz is half its value. By this argument, the values taken by  $f_1$  and  $f_2$  will be less than 700kHz. But we also need to have a high enough value of  $f_1$  and  $f_2$  so that it will be easy to demodulate at the receiver side. Hence we choose the phase accumulator increment  $\Delta \theta_1$  as 59. This gives us a tone that is less than  $f_{clk}$  by close to a factor of 3. This gives the frequency of the first tone from Equation 3.3 as,

$$f_1 = \frac{f_{clk} \times 59}{512}$$
(3.4)

We choose the second tone to have a frequency three times less than that of  $f_1$ . This is done by choosing the phase accumulator increment,  $\Delta \theta_2$  as 117. Also if we choose,  $f_{clk}$  to be an integral multiple of  $R_B$ , then the condition for orthogonal FSK will be satisfied. We can choose  $f_{clk}$  to be 40 times  $R_B$  so that it is less than the estimated value of 1.4MHz. In this case  $f_1 = 151.04kHz$  and  $f_2 = 453.12kHz$ . Note that the values of  $f_1$  and  $f_2$  can be left completely programmable, achieving a Software Defined Radio(SDR) transmitter. But we need an additional 8 inputs for this, hence this was not done for the sub-threshold IC.

#### III-C.5.b. Binary to Thermometer Code Converter

This circuit block converts a binary encoded digital signal to a thermometer code. The thermometer code is essentially a one hot code which has as many LSB '1's in the code as the unsigned number represented by the binary encoded signal. The use of the Thermometer Code is to pre-process the digital signal before passing along an input to the Digital to Analog Converter (DAC). The higher order bits of the digital signal are converted to thermometer codes while the lower order bits are left binary encoded. Assuming that the binary encoded signal does not change by large values, this will ensure that thermometer code changes by very few bits for small changes in the binary code. Whereas if the bi-


Fig. III.8. Digital to Analog Converter

nary code is used as input to the DAC, even small increments in value have the potential to change many bits in the code. This causes ripples in the output of the DAC and is undesirable. In our design, we convert 4 MSBs to thermometer encoded bits and leave the 4 LSBs as binary encoded bits.

# III-C.6. Digital to Analog Converter (DAC)

The circuit diagram of the DAC is shown in Figure III.8. The DAC has a reference current mirror, M1 biased by resistor Rcm. It also has as many current mirrors reflecting the reference as the number of input bits. The input to the DAC is a 19bit digital signal. The top 15 MSBs are thermometer encoded and the 4 LSBs are binary encoded. Hence the DAC will have 19 current mirror legs. Figure III.8 shows two of the current mirror legs of the DAC. The inputs  $T_i$  and  $T_{ib}$  are the  $i^{th}$  thermometer encoded bit and its complement. The inputs  $B_i$  and  $B_{ib}$  are the  $i^{th}$  binary encoded bit and its complement. The DAC works by switching



Fig. III.9. Common Source Amplifier

the current mirrors ON depending on the value of the input bits and measuring the voltage across the Rout resistor due to this current. The input bits control the NMOS transistors, M3,M4,M6 and M7. For any of these legs, if the input bit is LOW, then the NMOS on the left i.e. M3 or M6 turns ON and prevents the current mirror leg from conducting current. If the input bit is HIGH, then the NMOS on the right turns ON and allows the leg to mirror the current in the reference transistor M1. The difference between the current mirrors for the thermometer code and the binary code is in the size difference between M2 and M5. The *W/L* of M5 used in the current mirrors for the binary encoded bits are 1.3,2.6,5.2,10.4 from LSB to MSB. The *W/L* ratio doubles for every next MSB. The transistors corresponding to M2 have a *W/L* of 20.8 for all the current mirror legs for the 15 thermometer encoded bits. This allows the DAC to modulate the voltage at OUT based on the weighted current flowing through Rout and through different current mirror legs.

#### III-C.7. Common Source Amplifier

A common source amplifier is needed at the output of the DAC to amplify the signal and drive the antenna. The common source configuration is shown in Figure III.9. The common source amplifier is an inverting amplifier. In this configuration, note that there are no bias resistors biasing the gate of the transistor M1. The gate of M1 is connected to the output of the DAC. The gate is thus biased by the DC component of the sinusoidal voltage from the output of the DAC. The amplifier is powered by a very low *VDD*. Under this condition, other amplifiers such as the source follower or common drain amplifier do not function correctly. The transient response of the common source amplifier will be shown in Section IV-C.

# III-C.8. Antenna

An onchip antenna is used to transmit the signal from the amplifier. However due to the low frequency of operation, the length of the antenna coil needs to be comparable to half the wavelength of the transmitted signal which is around 300 meters. We have used an antenna coil of a length of only 0.2 meters due to area constraints on the chip. However, an external antenna will be used to transmit the signal if needed.

### III-D. Design Specifications

#### III-D.1. Link Budget Analysis

The link budget analysis [21] is used in any wireless communication system to calculate the transmit power required at the transmitter side based on certain criteria and assumptions. In this section the link budget analysis is done for a digital non-coherent BFSK transmitter. The design constraints assumed are, the transmit distance is 1000 meters. The data rate,  $R_B$  of the voice signal to be modulated is 32kbps. The link budget analysis is done as follows.

- Modulation Technique: The modulation technique used is FSK. With FSK, two separate frequencies are chosen, one frequency representing a logical "zero", the other representing a logical "one". For non-coherent FSK the channel bandwidth is typically twice the data rate. In our case we have chosen  $f_1$  as 151kHz and  $f_2$  as 453kHz as given in Section III-C.5.a. The channel bandwidth is 302kHz.  $R_B$ . This will also aid in easily designing a reliable and robust receiver system as the two transmitted frequencies are wider apart.
- Noise Floor: The noise power in watts is given by

$$N = kTB \tag{3.5}$$

where k is Boltzmann's constant in J/K, T is the system temperature usually assumed to be 290K, and B is the channel bandwidth in Hz

$$N = 1.38 \times 10^{-23} J/K \times 290K \times 302 kHz$$
  
= 1.209 \times 10^{-12} mW  
= -119.18 dBm

A typical low cost receiver would add about 15dB to the noise floor, Hence the receiver noise floor is -104.18 dBm

• **Receiver Sensitivity:** The required signal strength needs to be determined at the receiver input. For non-coherent digital BFSK modulation using orthogonal signals. The probability of bit error at the receiver is given by the following expression [22].

$$P_b = \frac{1}{2}e^{-E_b/2N_0} \tag{3.6}$$

By plotting Equation 3.6 we can find the bit energy to noise ratio,  $E_b/N_0$  required at the receiver for a particular Bit Error Rate (BER). An  $E_b/N_0$  of 100 gives us a BER of  $10^{-19}$ . We can calculate the Signal to Noise Ratio (SNR) required at the input of the receiver using the equation:

$$SNR = \frac{E_b}{N_0} \frac{R_B}{B}$$
(3.7)

Here  $R_B$  is the data rate and B is the channel bandwidth. The SNR required at the receiver input is 12.21 dB. The required signal strength at the receiver or the receiver sensitivity is given by adding the receiver noise floor and the SNR. The power required at the receiver for correct demodulation,  $P_{rx}$  is the receiver noise floor plus the SNR which is -91.97dB.

• **Path Loss:** The path loss in dB is given by the equation:

$$L = 20 \log_{10}(\frac{4\pi D}{\lambda}) \tag{3.8}$$

where D is the transmit distance,  $\lambda$  is the free space wavelength at the carrier frequency which can be taken as  $(f_1 + f_2)/2$ . If the carrier frequency is taken as 453kHz, we get the path loss, L as 21.98 dB. The higher the carrier frequency used, the more the path loss.

- Antenna Gain: The transmitter antenna gain,  $G_{tx}$  and the receiver antenna gain,  $R_{tx}$  can both be taken as 0dB. This is a reasonable assumption for a simple dipole antenna.
- Fade Margin: Signal fading occurs when waves emitted by the transmitter travel along a different path and interfere destructively with waves traveling on line of sight path. A good rule of thumb for the fade margin is 20dB.

• Link Calculation: The transmit power required,  $P_{tx}$  is given by the expression:

$$P_{tx} = P_{rx} - G_{tx} - G_{rx} + L + FadeMargin$$
$$= -91.97dBm - 0dB - 0dB + 21.98dB + 20dB$$
$$= -49.99dBm$$

If we have a safety margin of 49.99*dB* then we have to design the chip with a transmit power of 0dBm or 1mW. If the output signal has a peak voltage of  $V_P$ , and if we assume a  $50\Omega$  resistance on the output node, then the peak voltage required to get a transmit power of 1mW is given by

$$V_P^2 = 1mW \times 50\Omega \tag{3.9}$$

$$V_P = .22V$$
 (3.10)

Equation 3.10 needs to be taken into account for the DAC and the amplifier which are going to provide the output signal to the antenna.

# III-E. Chapter Summary

In this chapter we covered the entire design considerations of the wireless BFSK transmitter chip. We presented the architecture of the chip and analyzed each of the modules separately. We also did a link budget analysis to determine the amount of transmit power needed to transmit a signal over a distance of 1000 meters.

## CHAPTER IV

#### IMPLEMENTATION OF THE CHIP

This chapter discusses the implementation aspects of the sub-threshold chip.

IV-A. Design Flow

The steps of the design flow to be used are shown in Figure IV.1, and briefly described in the remainder of this section.

- First the design specification (obtained by user requirements such as frequency of data being transmitted, available bandwidth, distance of transmission etc) was determined.
- Next, the HDL code to implement the specification was developed. VHDL was used for this step.
- This code was synthesized next, resulting in an RTL description of the design
- The synthesized code was verified against the HDL, by running functional test vectors.
- Next the design was mapped to a network of PLA based design flow. We used the synthesis code from [23] for this purpose. The size of each of the PLAs to be used in the design was determined at this point based on the number of PLAs required for the design (area) and the speed of operation of the PLAs (latency and throughput). At the end of this step, a SPICE level netlist description of the design is obtained.
- A functional and timing verification is done on the SPICE level schematic. This simulation is done across all process corners. This validates and tests the design of



Fig. IV.1. Design Flow

the circuit to some extent. The design of the circuit can be changed based on the results of this step.

- Using the net-list of PLAs which results from the previous step, the layout of each PLA was drawn using the TSMC  $0.25\mu$ m process. Additionally, the layout of IO pads, ESD cells, and analog components was also drawn.
- Layout Versus Schematic (LVS) verification was performed next to ensure that there were no layout errors.
- Finally, the design parasitics were extracted, and the entire design was simulated in SPICE as a final sign-off.

#### IV-B. HDL to Netlist Flow

The HDL description of the digital portion of the circuit was written using VHDL The external inputs and outputs of the digital BFSK modulator are described in Table IV.3. The output of the binary to thermometer code converter block is a 19bit wide digital signal. These 19 signals are fed into the input of the DAC and cannot be viewed externally.

The HDL description was then synthesized using a synthesis tool for an FPGA. The synthesis tool used was Xilinx ISE Foundation [24]. The synthesis tool output is a gate level description of the implemented circuit. This description is then converted into a logic format format for further synthesis optimization using the multi-level logic synthesis tool SIS [25]. Using SIS, the blif file representation of the digital modulator circuit is then mapped into a network of PLAs. The algorithm used for this mapping is given in [23]. The algorithm involves the following steps. First, a technology independent optimization is done on the given multi-level circuit. Next this circuit is decomposed into a network of nodes with each node having at most 5 nodes. Then these nodes are then levelized, meaning that each node is assigned a level that is one larger than the largest level of all its fanin nodes. The next step in the algorithm is to group nodes together and fit them in a PLA of the given maximum size. We use folded PLAs to fit more logic in a PLA compared

to a non-folded PLA. Folded PLAs are explained in [23], and in our case, we fold only inputs. The logic representation of the multi-level network of PLAs that is obtained in this step, is then used to create a SPICE netlist description of the digital modulator circuit. The SPICE netlist is used as a golden schematic netlist, for LVS verification purposes as well. All the PLAs used to build the circuit have the same size so that they have approximately the same delay. Also this makes the layout of the PLAs easier as the footprint of the metal wires is same for all PLAs and only the transistors in the PLAs are modified based on the logic implemented. In order to find the size of the PLA to be used, we did the following experiment.

We used a set of circuits from the *mcnc91* benchmark circuits, where each circuit was decomposed into a multilevel network of PLAs using the PLA decomposition algorithm, for several PLA sizes. Depending on the number of PLAs and the number of levels in the multilevel circuit and the delays of the PLAs we found that, PLAs with sizes of 8-12 inputs, 4-6 outputs and 12-18 rows have a low delay as well as a small area of implementation.

The size of the PLA we will use for this circuit is 8 input, 6 outputs, 12 cubes. From SPICE simulations the evaluation and precharge periods of a PLA of this size for the TSMC 0.25 $\mu$ m process were found as  $T_{eval} = 35ns$  and  $T_{pchg} = 45ns$ .

Each of the three logic blocks that constitute the BFSK modulator shown in Figure III.2 are implemented using combinational logic. The combinational logic is implemented using a multi-level network of PLAs. The NCO block has the largest delay as it requires much more logic than the other two blocks. It also has the more number of levels of PLAs than the other two blocks. Table IV.1 shows the maximum throughput that we can attain using this particular PLA size.

The output of this step is a logical description of the network of PLAs used to implement the digital BFSK modulator. From this logical description a SPICE schematic is created. The next step in the implementation process is to interface the digital circuitry with the dynamic delay compensation circuit described in Section III-C.4.

Table IV.1. PLA Configuration

| PLA (In,Out,Cube) | Tpchg | Teval | Total no. of PLAs | No. of PLA levels for NCO block | Delay | Throughput |
|-------------------|-------|-------|-------------------|---------------------------------|-------|------------|
| (8,6,12)          | 45ns  | 35ns  | 4+24+3            | 19                              | 710ns | 1.4MHz     |

## IV-B.1. SPICE Verification of Dynamic Compensation

The dynamic delay compensation circuit is interfaced with the digital BFSK modulator circuit. An initial simulation is shown in Figure IV.2. In this case we have configured the beat clock signal to speed up the PLAs. The signal "nandout" in this figure represents the *pullup* signal shown in Figure III.6. This instructs the phase detector and charge pump circuit shown in Figure III.5 to pullup the bulk node. Whenever there is a low going pulse on the "nandout" signal, we see that the bulk node called "bulkn" in Figure IV.2 is pulled up. However the "bulkn" node which represents the body terminal of of the NMOS transistors in the design is very noisy with a ripple close to 100mV on every clock cycle. Notice that this ripple is not caused during the downward going pulse of the "nandout" signal and is not due to the charge pump circuit.

From Figure IV.2, it can be seen that during the precharge period when the "clk" signal is low, the bulk node gets pulled up and during the evaluation period when the "clk" signal is high, the bulk node gets pulled down. The reason behind this effect can be explained using Figure III.3. Notice from this figure that each PLA has a large parasitic drain bulk capacitance due to transistors in the PLA connected to the *dummy wordline*. During every precharge phase, the *dummy wordline* is pulled up to *VDD* and during every evaluation period, the *dummy wordline* is pulled down to *GND*. This transition couples into the Nbulk node, making it noisy. In order to fix this problem, we have added a capacitor to the bulk node of the NMOS transistors to filter out the noise. The charge pump devices are made



wider so that they can overcome the effect of this capacitor. The capacitor is realized using a MOSFET transistor's gate terminal, with the drain, source and body terminals connected to *GND*. This is a non-linear capacitor varying from 100pF to 180pF for a bulk node voltage swing of 0V to 0.5V. The lower part of Figure IV.2 shows the modulation on the bulk node after adding the MOSFET capacitor. Now the ripple on the bulk node is only 25mV. We also ran SPICE simulations in which the objective was to slow down the PLAs by configuring the *beat clock (BCLK)* signal as shown in Figure III.7. These simulations were run across all corners provided by TSMC for their process.

### IV-C. DAC and Amplifier Design

The DAC and Amplifier driving the antenna are using the circuit diagrams shown in Section III-C.6 and Section III-C.7 respectively. The following steps are followed to design the DAC and the amplifier.

- The resistors Rcm and Rout of the DAC are designed to be surface mounted resistors outside the chip. This will allow us to tune these resistors in real time to enhance the output signal. Two external pins in the pin-out of the chip is reserved for these two resistors.
- The resistors Rs and Rd of the Amplifier are also designed as surface mounted off chip resistors. Hence the Amplifier is also connected to two external pins.
- The output of the amplifier is connected to an on-chip coil antenna. The capacitance of the antenna was estimated by finding the capacitance of a small segment of the antenna structure using Space3d [26] and extrapolating that value for the entire antenna. The total capacitance of the antenna was estimated at around 80pF.
- The output voltages of the DAC and the Amplifier need to have a peak voltage value in accordance with the value calculated in Equation 3.10.
- Sample waveforms at the output of the DAC and the Amplifier are shown in Figure IV.3 and Figure IV.4 respectively. The output of the amplifier was loaded by an 80pF capacitor. The output of the DAC and Amplifier are shown alternating between the two frequency tones.
- IV-D. Special Considerations

#### IV-D.1. Testability and Redundancy

Various testability features were built into the design. The use of these features is to test each component of the chip individually to verify functionality. They also serve as a backup against failure of one of the components. The following are the testability features that are incorporated in the design.



Fig. IV.3. DAC Output

- A standalone PLA is included in the design along with the other PLA components which make up the digital modulator circuit. The PLA is designed in such a way that the two outputs of the PLA toggle continuously when the clock waveform is applied. The result of this test verifies the functionality of the PLAs which are the basic building blocks in the design.
- The 8bit output of the NCO block is directly sent to 8 I/O pads on the chip. These pads are bi-directional. This means that these pads on the chip can either be used to get the digital 8bit sine wave value from the output of the NCO, or can be used as an 8bit input to the binary to thermometer code converter. This feature is important since it takes into account the scenario in which only one of the digital modulator or the DAC is functionally correct. In this scenario, these bi-directional pins may be used to excite the correctly functioning blocks in the design.



Fig. IV.4. Amplifier Output

- The output of the DAC can be measured using an oscilloscope, at the pin which connects the external DAC drive resistor  $R_{out}$  to the chip. This allows the DAC to be tuned and tested individually based on its output waveform. This gives us the option of directly using the DAC with an external amplifier and antenna.
- The output of the common source amplifier also can be scoped externally using the pin connected to the  $R_D$  resistor. This signal may also modulate an off-chip antenna, instead of the on-chip antenna.
- The output of the amplifier is connected to the antenna through a pass gate that is controlled by a signal called *Anton*. This signal is used to disconnect the on-chip coil antenna by turning off the pass gate if needed.

#### IV-D.2. Voltage Domains

One of the objectives of this thesis is to compare the operation of a sub-threshold circuit with a standard cell based implementation. The two circuit realizations operate at different *VDD* values. In order to isolate these two implementations, we need one extra voltage domain for the standard cell implementation. This will be a 2.5V domain which is the nominal operating voltage for the TSMC  $0.25\mu$ m process. For the targeted process, we have specified the sub-threshold design to work at a *VDD* of 0.6V. The inputs to the sub-threshold digital modulator circuit cannot be on the same voltage domain. This is because designing I/O drivers at such a low voltage are not reliable. Hence we will use another voltage domain (higher than 0.6V) so that the inputs to the sub-threshold circuit operate at this higher voltage. We have chosen the *VDD* of this domain to be 1V. One of the built in testability features of this chip is that the outputs of the sub-threshold digital modulator circuit, if needed, can be sent directly off-chip to an external DAC and antenna. We however found that there was no off-the-shelf DAC that had an input voltage rating of less than 2V. Hence the outputs of the sub-threshold circuit needed to be driven to a voltage value of at least 2V. Hence another voltage domain with a *VDD* of 2V was used.

We thus have four separate *VDD* domains on the chip. All these domains have a common *GND* to make the power distribution easier. The following special conditions need to be addressed when we have signals that cross two different voltage domains.

- A higher voltage signal cannot drive a pass gate of a lower voltage domain. In this case we buffer the signal with a buffer operating on the *VDD* of the lower voltage domain before driving the pass gate.
- A higher voltage signal can drive the gate of a transistor in a lower voltage domain.
- To buffer a signal from lower voltage to higher voltage domain, we use custom de-

signed level shifters.

### IV-E. Standard Cell Based BFSK Design

We also implemented a traditional Standard Cell based BFSK design on the chip for a head to head comparison with the sub-threshold approach. The design flow for the standard cell portion of the design consisted of the following.

- We used the same HDL code used for the sub-threshold design.
- The synthesized HDL code was mapped into a library of standard cells that consisted of various inverters (2×,12×,36× 108×), and NAND gates(2-input, 3-input).
- The standard cell design is not connected to a DAC and an antenna.
- The mapped design was then placed and routed using the SEDSM [27] from Cadence.
- The inputs to the Standard cell design are, 64kinstd, Clkstd, Resetstd,
- The output of the Standard cell design is an 8bit vector *Stdout*, which represents the 8bit output of the NCO.

## IV-F. IO Pad and ESD Diode Design

The circuit diagram of a general Pad cell with ESD diodes is shown in Figure IV.5. The transistors MP1 and MN1 are the primary ESD diodes. The transistors MP2 and MN2 represents the inverter driving an internal signal towards the pad to an off-chip component. MP3 and MN3 are ESD devices giving further protection. The resistance R has a value of approximately  $200\Omega$ .



Fig. IV.5. PAD Cell Schematic

We have used four separate voltage domains on the chip. Due to this the pads used for the signals can be classified as follows.

- **Power Supply Pad:** These pads do not have any I/O drivers. They have the ESD diodes shown in Figure IV.5, and are used for the *VDD* (for all domains) and *GND* signals.
- **Digital Input Pad:** These pads have ESD diodes with input drivers driving the external signal towards the chip.
- **Digital Output Pad:** These pads have ESD diodes with output drivers driving the internal signal towards the pad.
- **Digital I/O Pad:** Along with ESD diodes, these pads have both input and output drivers. The output drivers are tristated when this pad is receiving an input signal.
- Analog Signal Pad: The analog signals do not have any I/O drivers. Some analog

signals do not have the ESD diode connected to *VDD*. This constraint is used when the peak value of the analog signal can take a higher value than the *VDD* connected to the ESD diode.

### IV-G. Chip Integration and Pin-out

The Integration of the chip mainly involves, deciding the number of pins on the chip. The Pin-out for the standard cell implementation of the BFSK transmitter is shown in Table IV.2. The Pin-out for the sub-threshold implementation is shown in Table IV.3. We need 80 pins. Note that pins 80,1,20,21,40,41,60,61 are dummy pins and these are at the corners of each side of the chip. Some of the sensitive signals are shielded using static signals and/or supply signals. An estimate of the floorplan of the chip is made and signals are buffered depending on the distance that they have to travel. A SPICE level schematic of the entire chip can thus be constructed by including, the digital modulator, the DAC, amplifier and connecting their input, output signals to pad cells. The antenna is represented by a large capacitor.

| Pin Number | Pin Name                      | Description                                           |  |  |  |  |
|------------|-------------------------------|-------------------------------------------------------|--|--|--|--|
|            | <b>Domain 4,</b> $VDD = 2.5V$ |                                                       |  |  |  |  |
| 39         | GND                           | Ground                                                |  |  |  |  |
| 42         | VDD                           | Supply                                                |  |  |  |  |
| 43         | GND                           | Ground                                                |  |  |  |  |
| 44         | Resetstd                      | Active high, Reset signal for Std Cell BFSK           |  |  |  |  |
| 45         | 32kinstd                      | Binary input signal for Std Cell BFSK                 |  |  |  |  |
| 46         | Clkstd                        | Clock signal for Std Cell BFSK                        |  |  |  |  |
| 47-49      | Stdout< 8 : 6 >               | Digital output of Std Cell BFSK                       |  |  |  |  |
| 50         | VDD                           | Supply                                                |  |  |  |  |
| 51         | Anton                         | Active high, Loads the Amplifier with on-chip antenna |  |  |  |  |
| 52         | GND                           | Ground                                                |  |  |  |  |
| 53-57      | Stdout< 5 : 1 >               | Digital output of Std Cell BFSK                       |  |  |  |  |
| 58         | GND                           | Ground                                                |  |  |  |  |
| 59         | VDD                           | Supply                                                |  |  |  |  |

Table IV.2. Chip Pin-out: Standard Cell BFSK Portion

| Pin Number | Pin Name           | Description                                                     |  |  |  |
|------------|--------------------|-----------------------------------------------------------------|--|--|--|
|            | Domain 1, VDD = 1V |                                                                 |  |  |  |
| 7          | Dacin              | Active high, apply external DAC input to pins 23-25,28-30,33-34 |  |  |  |
| 8          | Clk                | Clock signal to BFSK modulator, shielded by static signals      |  |  |  |
| 9          | VDD                | Supply                                                          |  |  |  |
| 10         | Reset              | Active high, Resets the BFSK modulator output                   |  |  |  |
| 11         | GND                | Ground                                                          |  |  |  |
| 12         | 32kin              | Binary input to modulator                                       |  |  |  |
| 13         | sdrouten           | Active high, NCO output sent to pins 23-25,28-30,33-34          |  |  |  |
| 14         | VDD                | Supply                                                          |  |  |  |
| 15         | Beat Clk           | Reference clock for dynamic compensation, Shielded by VDD       |  |  |  |
| 16         | VDD                | Supply                                                          |  |  |  |
| 17-18      | GND                | Ground                                                          |  |  |  |
|            |                    | <b>Domain 2,</b> $VDD = 2V$                                     |  |  |  |
| 19         | GND                | Ground                                                          |  |  |  |
| 22         | VDD                | Supply                                                          |  |  |  |
| 23-25      | In2vOut2v < 1:3 >  | NCO output or DAC input                                         |  |  |  |
| 26         | GND                | Ground                                                          |  |  |  |
| 27         | VDD                | Supply                                                          |  |  |  |
| 28-30      | In2vOut2v < 4:6 >  | NCO output or DAC input                                         |  |  |  |
| 31         | GND                | Ground                                                          |  |  |  |
| 32         | VDD                | Supply                                                          |  |  |  |
| 33-34      | In2vOut2v < 7:8 >  | NCO output or DAC input                                         |  |  |  |
| 35         | Testplaout1        | Pla test signal 1                                               |  |  |  |
| 36         | GND                | Ground                                                          |  |  |  |
| 37         | VDD                | Supply                                                          |  |  |  |
| 38         | Testplaout2        | Pla test signal 2                                               |  |  |  |
|            |                    | <b>Domain 3,</b> $VDD = 0.6V$                                   |  |  |  |
| 62         | GND                | Ground                                                          |  |  |  |
| 63         | VDD                | Supply                                                          |  |  |  |
| 64         | AmpRdRes           | Drain Resistance of Amplifier, shielded                         |  |  |  |
| 65         | GND                | Ground                                                          |  |  |  |
| 66         | VDD                | Supply                                                          |  |  |  |
| 67         | AmpRsRes           | Source Resistance of Amplifier, shielded                        |  |  |  |
| 68         | GND                | Ground                                                          |  |  |  |
| 69         | VDD                | Supply                                                          |  |  |  |
| 70         | DacCmRes           | DAC current mirror resistance, shielded                         |  |  |  |
| 71         | GND                | Ground                                                          |  |  |  |
| 72         | VDD                | Supply                                                          |  |  |  |
| 73         | DacDriveRes        | DAC output resistance, shielded                                 |  |  |  |
| 74<br>72   | GND                | Ground                                                          |  |  |  |
| 75         | VDD                | Supply                                                          |  |  |  |
| 76         | GND                | Ground                                                          |  |  |  |
| 77         | Bulkinout          | Monitor or Force NBulk node                                     |  |  |  |
| 78         | PdKickSupply       | Supply Voltage of Charge Pump                                   |  |  |  |
| 79         | VDD                | Supply                                                          |  |  |  |
| 2          | VDD                | Supply                                                          |  |  |  |
| 3          | GND                | Ground                                                          |  |  |  |
| 4          | VDD                | Supply                                                          |  |  |  |
| 5          | GND                | Ground                                                          |  |  |  |
| 6          | VDD                | Supply                                                          |  |  |  |

Table IV.3. Chip Pin-out: Sub-threshold BFSK Portion

### IV-H. Layout

The layout of the PLA block used in the design is shown in Figure IV.6.



Fig. IV.6. PLA Layout

Each of the PLAs have the same number of inputs, outputs and cubes. The logic implemented by the PLAs however is different. The transistors connected to the bitlines, wordlines and output lines need to be changed for each of the PLAs depending on the function implemented. The layout of the DAC and the amplifier are also done. The transistor lengths used for these analog components are three times the minimum length. This increases the variation tolerance of these components. The antenna is implemented as a coil. The antenna is made of five metal layers, as well as the poly layer. The metal layers and poly layer are all connected to each other by contacts. The pad cells are laid out in accordance with the design rules associated with pads and ESD cells from TSMC. Guard rings are used to prevent latch-up in the ESD diodes. The resistor R, seen in Figure IV.5 is

realized using N-type diffusion material to have a resistance of around  $200\Omega$ .

The vacant areas in the chip is then filled with metal to satisfy the fill rules of the design process. These metal fills are wired up to act as a decoupling capacitance between supply and ground nodes. This serves to drastically reduce supply voltage noise.

The standard cell layout is done using the SEDSM tool [27]. This layout is merged with the rest of the components to get the entire die layout shown in Figure IV.7.



Fig. IV.7. Die Layout

## IV-I. Summary of Verification Methodologies

The following verification methodologies were used at various stages during the design flow shown in Figure IV.1.

- **Combinational Verification:** This is a verification step done after synthesis. The logical representation of the circuit after optimization is functionally verified against the initial HDL description.
- **SPICE Verification:** SPICE based verification is done after mapping the logic netlist into a multi-level network of PLAs. SPICE verification is done to verify functional correctness as well as correctness of dynamic bulk node modulating compensation circuit.
- LVS: A layout versus schematic step is performed after layout designing to verify the correctness of the layout. This was performed using the ASSURA LVS tool [28].
- **RC Extraction and Verification:** An RC Extraction of the chip is performed after the LVS step. This populates the circuit schematic with various parasitic resistors and capacitors. A SPICE level simulation of this extracted netlist is required to verify that the circuit behavior has not been adversely affected by parasitics. The SPICE level simulation also covers the bulk node modulation by the compensation circuit. This is important as there may be extra parasitic capacitances on the Nbulk node, which would require stronger devices on the charge pump device.

## IV-J. Chapter Summary

In this Chapter we went over the implementation details of the wireless BFSK transmitter. The design flow of the chip was discussed and we explained each step in the flow. The chip was divided into four different voltage domains to isolate the standard cell implementation. and provide higher *VDD* for inputs and outputs to the sub-threshold circuit. The steps taken to design the layout were also discussed.

### CHAPTER V

### EXPERIMENTAL RESULTS

In this Chapter, we present results from the fabricated die. The range of operation of the circuit is tested. The functionality of the dynamic body bias delay compensation circuit is also verified. The sub-threshold implementation is compared with a standard cell based implementation of the BFSK circuit, which was also implemented on the same die.

## V-A. Functional Verification

The *VDD* domains 1 and 4, which correspond to the sub-threshold BFSK inputs, and DAC and amplifier outputs are powered ON. The *reset* signal is held LOW. The DAC and Amplifier are biased using resistances determined during the circuit design phase. The output of the DAC for an input signal that makes a LOW to HIGH transition is shown in Figure V.1.



Fig. V.1. BFSK Modulation

Note that the DAC output clearly shows two tones depending on the value of the input.

#### V-B. Dynamic Compensation Circuit

The dynamic compensation circuit stabilizes circuit delay by modulating the bulk node of the NMOS transistors in the design as explained in Section III-C.4. Figure V.2 shows an oscilloscope plot of the bulk node voltage and power supply of the sub-threshold circuit. Here the external beat clock has been fixed to a particular delay. Notice that when the



Fig. V.2. Bulk Node Voltage Modulation with VDD

supply voltage which is the bottom signal in the plot fluctuates from its nominal value, the bulk node voltage which is the top signal in the plot is immediately modulated in the opposite direction to compensate the circuit delay with respect to power supply variation. Thus the reference circuit delay is kept in phase with the external reference signal.

Figure V.3 plots the bulk node voltage in the top half and the external beat clock signal in the bottom half. Here the beat clock is held high for several clock cycles and then held



Fig. V.3. Bulk Node Voltage Modulation with BeatClock

low for several clock cycles. When the beat clock signal is held high the charge pump forward biases the bulk node and the circuit speeds up. When the beat clock signal is held low the bulk node is driven low and the circuit slows down. The bulk node is clearly modulated up and down when the phase of the beat clock signal changes verifying the operation of the dynamic body bias circuit with respect to the external reference signal.

### V-C. Operating Ranges

The supply voltage for the digital BFSK modulator circuit was varied from 0.4V to 0.62V. The maximum frequency of operation at these voltages was determined by observing the output of the source amplifier. When the frequency is too high, the sine wave at the output of the amplifier gets distorted. The maximum operating frequencies over a set of supply voltages is plotted in Figure V.4.

This figure shows two curves which correspond to a bulk node voltage value of 0V



Fig. V.4. Maximum Operating Frequencies

and 0.45V respectively. This plot shows the range of frequencies over which the dynamic compensation circuit can track the reference beat clock. Notice that the maximum speed of operation increases quadratically as the supply voltage increases.



Fig. V.5. Power Consumed at Maximum Operating Frequency

The power consumed by the circuit at these operating voltages and frequencies is

shown in Figure V.5. The power consumed is plotted for the maximum and minimum voltage value that the bulkn node can take. The power consumed is the product of the average current flowing through the digital BFSK modulator voltage source. Note that a different voltage source is used for the DAC and the amplifier.

V-D. Spectrum of Output Sinusoidal Signals

The Fast Fourier Transform (FFT) of the output of the DAC is shown in Figure V.6. Here



Fig. V.6. FFT of DAC Output

the input bitstream is continually alternating between a logical "zero" and a logical "one" at a frequency of 32.25kHz. The clock frequency,  $f_{clk}$  of the sub-threshold circuit is set at 1MHz which is an integer multiple of the input bit rate. From the FFT we see the two transmitted tones at 113kHz and 342kHz respectively.



Fig. V.7. FFT of Amplifier Output

Similarly Figure V.7 shows the FFT of the output of the amplifier for the same signal when the amplifier is loaded by the on-chip antenna coil. Notice that the secondary unwanted peak between the two tones is around -11dB below the fundamental tone. Also through Matlab simulations we found that a signal with a spectrum that has the secondary unwanted peak at -10dB was demodulated correctly at the receiver side. This simulation was done for the worst case noise and attenuation considered in the link budget analysis in Section III-D.1. The receiver architecture used was a standard receiver for demodulating non-coherent BFSK signals [22].

#### V-E. Comparison with Standard Cells

The power consumed by the sub-threshold BFSK Modulator was compared with the power consumed by the standard cell BFSK implementation. This is shown in Table V.1. From

this table we see that the power consumed by the Standard Cell based circuit implementation is  $19.4 \times$  more. The standard cell based design is specified to operate at a supply voltage of 2.5V. Note that the standard cell based design is capable of operating at higher speeds. The standard cell design does not have any compensation scheme that compensates circuit delay for PVT variations which are higher when operating near the sub-threshold region. Hence it would not function correctly under varying operating conditions. Due to this we do not compare the standard cell based design power at a lower voltage of operation.

Table V.1. Sub-threshold vs Standard Cell Power Consumption

| Design Style  | VDD | Clock Frequency | Average Current | Power Dissipation |
|---------------|-----|-----------------|-----------------|-------------------|
| Sub-threshold | 0.6 | 1.05MHz         | 44.7µA          | 26.8µW            |
| Standard Cell | 2.5 | 1.05MHz         | 208.0µA         | 520.0µW           |

### V-F. Chapter Summary

In this Chapter, we presented results from the fabricated wireless BFSK transmitter chip. We verified the functionality of the digital BFSK circuit and the dynamic delay compensation circuitry. We also analyzed the spectrum of the output signal and showed that the transmitted signal spectrum can be suitably demodulated with a standard non-coherent receiver architecture. We also showed that the power consumed by a Standard Cell based implementation of the same circuit on the same die is  $19.4 \times$  more.

#### CHAPTER VI

#### CONCLUSION

Power Consumption in VLSI circuits is a critical issue in the semiconductor industry today. For many applications such as portable devices, low power consumption is a first order design constraint. Several of these applications need extreme low power but do not have high speed design requirements. In these cases Sub-threshold circuit design techniques can be used to provide extreme low power solutions, by sacrificing some of the circuit performance. The problem with sub-threshold circuits however is that these circuits exhibit an exponential sensitivity to process, voltage and temperature (PVT) variations.

In this thesis we have implemented and tested a robust sub-threshold design flow which uses circuit level PVT compensation to stabilize circuit performance. This involves compensating the delay of a circuit over PVT variations by using an external reference clock. The compensating circuitry modulates the bulk node of transistors in the circuit depending on the phase difference between the circuit delay and the reference clock signal. The circuit is implemented using a Network of PLAs in which all PLAs are of the same size. Therefore each PLA has the same delay, and this ensures that the critical path delay to be compensated is the same across the entire circuit.

We have designed and fabricated a sub-threshold wireless BFSK transmitter chip using this robust sub-threshold design methodology. The chip is capable of broadcasting a signal over a distance of 1000 meters. For comparison purposes we have also implemented a BFSK transmitter using a traditional standard cell flow on the same die and shown that the sub-threshold approach consumes  $19.4 \times$  lower power than a traditional standard cell based implementation.

Future work includes constructing an antenna for wireless transmission and constructing a receiver that can be used to demodulate the signal transmitted by the BFSK transmitter. This can be used to test and verify the distance over which the wireless transmitter can operate. Also the speed of the sub-threshold circuit can be improved drastically by using heavily pipelined circuits.

#### REFERENCES

- [1] W Daasch, C Lim, and G Cai, "Design of VLSI CMOS Circuits under Thermal Constraint," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 49, no. 8, pp. 589–593, Aug 2002.
- [2] S-H Choi, B-K Kim, J Park, C-H Kang, and D-S Eom, "An Implementation of Wireless Sensor Network," *IEEE Transactions on Consumer Electronics*, vol. 50, no. 1, pp. 236–244, Feb 2004.
- [3] "The Multimodal Networks of In-situ Sensors (MANTIS) Project," http://mantis.cs.colorado.edu, 2004.
- [4] A Abidi, G Pottie, and W Kaiser, "Power-conscious Design of Wireless Circuits and Systems," *Proceedings of the IEEE*, vol. 88, no. 10, pp. 1528–1545, Oct 2000.
- [5] N Jayakumar and S Khatri, "A Variation-tolerant Sub-threshold Design Approach," in Proceedings, Design Automation Conference, June 2005, pp. 716–719.
- [6] K Kanda, K Nose, K Kawaguchi, and T Sakurai, "Design Impact of Positive Temperature Dependence on Drain Current in sub-1-V CMOS VLSIs," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 10, pp. 1559–1564, Oct 2001.
- [7] H Soeleman, K Roy, and B Paul, "Robust Subthreshold Logic for Ultra-low Power Operation," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 9, no. 1, pp. 90–99, Feb 2001.
- [8] H Soeleman and K Roy, "Digital CMOS Logic Operation in the Sub-threshold Region," in *Tenth Great Lakes Symposium on VLSI*, Mar 2000, pp. 107–112.

- [9] H Soeleman and K Roy, "Ultra-low Power Digital Subthreshold Logic Circuits," in International Symposium on Low Power Electronic Design, 1999, pp. 94–96.
- [10] B. Paul, H. Soeleman, and K. Roy, "An 8X8 Sub-Threshold Digital CMOS Carry Save Array Multiplier," in *European Solid State Circuits Conference*, Sept 2001, pp. 377–380.
- [11] J Tschanz, J Kao, S Narendra, R Nair, D Antoniadis, A Chandrakasan, and V De, "Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-die Parameter Variations on Microprocessor Frequency and Leakage," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 1396–1402, Nov 2002.
- [12] Y Cao, T Sato, D Sylvester, M Orshansky, and C Hu, "New Paradigm of Predictive MOSFET and Interconnect Modeling for Early Circuit Design," in Proc. of IEEE Custom Integrated Circuit Conference, Jun 2000, pp. 201–204, http://wwwdevice.eecs.berkeley.edu/ ptm.
- [13] S.P. Khatri, R.K. Brayton, and A. Sangiovanni-Vincentelli, "Cross-talk Immune VLSI Design Using a Network of PLAs Embedded in a Regular Layout Fabric," in *IEEE/ACM International Conference on Computer Aided Design*, Nov 2000, pp. 412–418.
- [14] N. Jayakumar and S. Khatri, "A METAL and VIA Maskset Programmable VLSI Design Methodology Using PLAs," in *IEEE/ACM International Conference on Computer Aided Design*, Nov 2004, pp. 590–594.
- [15] Payman Zarkesh-Ha, Tony Mule, and James D Meindl, "Characterization and Modelling of Clock Skew with Process Variation," in *IEEE 1999 Custom Integrated Circuits Conference*, May 1999, pp. 441–444.

- [16] Nikhil Jayakumar, "Minimizing and Exploiting Leakage in VLSI," Ph.D. dissertation, ECE Department, Texas A&M University, College Station Texas, Dec. 2006.
- [17] S Khatri, A Mehrotra, R Brayton, A Sangiovanni-Vincentelli, and R Otten, "A Novel VLSI Layout Fabric for Deep Sub-Micron Applications," in *Proceedings of the Design Automation Conference*, New Orleans, June 1999, pp. 491–496.
- [18] "Taiwan Semiconductor Manufacturing Company Ltd.," www.tsmc.com, June 2007.
- [19] "HSPICE," www.synopsys.com/products/mixedsignal/hspice/hspice.html, May 2007.
- [20] N Jayakumar, R Garg, B Gamache, and S Khatri, "A PLA Based Asynchronous Micropipelining Approach for Subthreshold Circuit Design," in *Proceedings, Design Automation Conference*, July 2006, pp. 419 – 424.
- [21] John Proakis, Digital Communications, Boston, McGraw-Hill, August 2001.
- [22] Fuqin Xiong, Digital Modulation Techniques, Second Edition (Artech House Telecommunications Library), Artech House, Inc., Norwood, MA, 2006.
- [23] Sunil P. Khatri, "Cross-talk Noise Immune VLSI Design Using Regular Layout Fabrics," Ph.D. dissertation, EECS Department, University of California, Berkeley, CA, Dec 1999.
- [24] Xilinx Inc., "ISE Foundation," http://www.xilinx.com/ise/logic\_design\_prod/foundation.htm, 2007.
- [25] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj,
  P. R. Stephan, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, "SIS: A System for Sequential Circuit Synthesis," Tech. Rep. UCB/ERL M92/41, Univ. of California, Berkeley, CA 94720, May 1992.

- [26] A. J. van Genderen and N. P. van der Meijs, Space3d Capacitance Extraction User's Manual, Delft Univ. of Technology, Delft, The Netherlands, 1997.
- [27] Cadence Design Systems Inc., 555 River Oaks Parkway, San Jose, CA 95134, Envisia Silicon Ensemble Place-and-route Reference, Nov 1999.
- [28] Cadence Design Systems Inc., "ASSURA Layout vs. Schematic Verifier," http://www.cadence.com/products/dfm/assura\_lvs, 2007.
## VITA

Suganth Paul received his Bachelor's degree in Electronics and Communication Engineering from the College of Engineering Guindy, Anna University in India. He is currently pursuing a Master of Science degree in Computer Engineering from the Department of Electrical & Computer Engineering at Texas A&M University, College Station. During his graduate studies he has done research in many aspects of VLSI including Algorithms for Fast Computer Arithmetic in Hardware, Reconfigurable Systems for Radar Signal Processing, High Performance Circuit Design Styles, VLSI implementation of Ultra-low power circuits.

Suganth Paul may be reached at the Department of Electrical and Computer Engineering 333F WERC, Texas A&M University, College Station, TX 77843-3259. His email address is: spaul\_AT\_ece\_DOT\_tamu\_DOT\_edu.