# DOWNLINK W-CDMA PERFORMANCE ANALYSIS AND RECEIVER IMPLEMENTATION ON SC140 MOTOROLA DSP

A Thesis

by

# KAUSHIK GHOSH

Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

May 2003

Major Subject: Electrical Engineering

# DOWNLINK W-CDMA PERFORMANCE ANALYSIS AND

# **RECEIVER IMPLEMENTATION ON SC140 MOTOROLA DSP**

A Thesis

by

# KAUSHIK GHOSH

Submitted to Texas A&M University in partial fulfillment of the requirements for the degree of

## MASTER OF SCIENCE

Approved as to style and content by:

S. L. Miller (Chair of Committee)

> C. Georghiades (Member)

S. Bhattacharyya (Member)

C. Singh (Head of Department)

May 2003

Major Subject: Electrical Engineering

D. Friesen (Member)

## ABSTRACT

Downlink W-CDMA Performance Analysis and Receiver Implementation on SC140 Motorola DSP. (May 2003) Kaushik Ghosh, B.Eng., Birla Institute of Technology and Science, Pilani Chair of Advisory Committee: Dr. S. L. Miller

High data rate applications are the trend in today's wireless technology. W-CDMA standard was designed to support such high data rates of up to 3.84 Mcps. The main purpose of this research was to analyze the feasibility of a fixed-point implementation of the W-CDMA downlink receiver algorithm on a general-purpose digital signal processor (StarCore SC140 by Motorola). The very large instruction word architecture of SC140 core is utilized to generate optimal implementation, to meet the real time timing requirements of the algorithm. The other main aim of this work was to study and evaluate the performance of the W-CDMA downlink structure with incorporated space-time transmit diversity. The effect of the channel estimation algorithm used was extensively studied too.

### ACKNOWLEDGMENTS

I am greatly indebted to my advisor, Dr. S. L. Miller for giving me an opportunity to work under him. I would like to thank him for all the invaluable guidance he provided me throughout the course of this research. I really appreciated the freedom he gave me while working on my research.

I would also like to express my sincere appreciation to all my committee members for their interest in my research.

Last, but not the least, I would like to thank my parents, for their absolute confidence in me. They have always been a constant source of inspiration to me.

# **TABLE OF CONTENTS**

|     |                   |        | ]                                                                                                                                                                                                                                                        | Page                             |
|-----|-------------------|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| AB  | STRAC             | СТ     |                                                                                                                                                                                                                                                          | iii                              |
| AC  | KNOW              | LEDGN  | MENTS                                                                                                                                                                                                                                                    | iv                               |
| TA  | BLE O             | F CONT | ENTS                                                                                                                                                                                                                                                     | V                                |
| LIS | T OF T            | ABLES  |                                                                                                                                                                                                                                                          | vii                              |
| LIS | T OF F            | GURE   | S                                                                                                                                                                                                                                                        | . viii                           |
| 1   | INTRO             | ODUCT  | ION                                                                                                                                                                                                                                                      | 1                                |
| 2   |                   |        | ND                                                                                                                                                                                                                                                       |                                  |
| 2   | 2.1<br>2.2        | DSP Ba | ackground<br>unications Theory Background<br>DS-CDMA<br>Frequency Selective Channel and Rake Receiver<br>Space Time Codes                                                                                                                                | 4<br>6<br>8<br>12                |
| 3   | W-CD              | MA DO  | WNLINK MODEL                                                                                                                                                                                                                                             | 17                               |
|     | 3.1<br>3.2        |        | Action to W-CDMA<br>Model<br>Transmitter Model<br>3.2.1.1 STTD Encoder<br>Channel Model<br>Receiver Model<br>3.2.3.1 Coupled STTD Rake Receiver                                                                                                          | 18<br>20<br>22<br>22<br>23       |
| 4   | W-CD              | MA SY  | STEM SIMULATION RESULTS                                                                                                                                                                                                                                  | 26                               |
| 5   | 4.1<br>IMPL       |        | tion Results<br>ATION ON SC140 DSP CORE                                                                                                                                                                                                                  |                                  |
|     | 5.1<br>5.2<br>5.3 | Core A | action<br>rchitectural Features<br>onal Units Implemented in DSP<br>Channelization and Scrambling Codes<br>5.3.1.1 Channelization Codes<br>5.3.1.2 Scrambling Codes<br>5.3.1.2.1 Downlink Scrambling Code in W-CDMA<br>5.3.1.2.2 Algorithm on SC140 Core | 33<br>35<br>35<br>35<br>36<br>36 |

|    |       | 5.3.2  | Channel | Estimation                            |    |
|----|-------|--------|---------|---------------------------------------|----|
|    |       |        | 5.3.2.1 | Downlink Channel Estimation in W-CDMA | 40 |
|    |       |        | 5.3.2.2 | Algorithm on SC140 Core               | 41 |
|    |       | 5.3.3  | MRC Ra  | ake Combining                         | 51 |
|    |       |        |         | Algorithm on SC140 Core               |    |
|    |       | 5.3.4  | Channel | Coding                                | 53 |
|    |       |        | 5.3.4.1 | Viterbi Algorithm on SC140 Core       | 54 |
|    | 5.4   | Overal |         | entation Results                      |    |
|    |       | 5.4.1  | STTD E  | ncoder on DSP                         | 54 |
|    |       | 5.4.2  | STTD D  | Decoder on DSP                        | 56 |
| 6  | CON   | CLUSIO | NS AND  | FUTURE WORK                           | 61 |
|    | 6.1   | Conclu | ision   |                                       | 61 |
|    | 6.2   |        |         |                                       |    |
| RE | FEREN | NCES   |         |                                       | 63 |
| Vľ | ТА    |        |         |                                       | 66 |

# LIST OF TABLES

| Table 1. The Encoding Scheme for Alamouti Codes.                               | 14 |
|--------------------------------------------------------------------------------|----|
| Table 2. Implementation Results of the Scrambling Code Generator               | 39 |
| Table 3. Implementation Results for the Channel Estimator Algorithm            | 44 |
| Table 4. Preferred Tap Length for Channel Estimation Algorithm.                | 45 |
| Table 5. Implementation Results for Each MRC Rake Combiner.                    | 52 |
| Table 6. Implementation results of the Viterbi decoder.                        | 54 |
| Table 7. Cycle Count of the Encoder Implementation in DSP.                     | 56 |
| Table 8. Cycle Count of the Decoder Implementation in DSP.                     | 58 |
| Table 9. Effect of Spreading Factor and Number of Path in Channel on Chip Rate | 59 |
|                                                                                |    |

# **LIST OF FIGURES**

|                                                                      | 50 |
|----------------------------------------------------------------------|----|
| Figure 28. CPICH Channel Estimation for (W=2) Fast Fading            | 50 |
| Figure 29. CPICH Channel Estimation for (W=8) Fast Fading            | 50 |
| Figure 30. Symbol Level MRC Combiner.                                | 51 |
| Figure 31. Rate Half Rate Convolutional Encoder Structure            | 53 |
| Figure 32. Comparison of Fixed-Point and Floating-Point Performance. | 59 |

# Page

### **1 INTRODUCTION**

The trend in today's wireless communication is towards achieving unparalleled wireless access that has never been possible earlier. Communications using Voice Over Internet Protocol (VIOP), multi-megabyte Internet access and unparalleled network capacity are just some of the features, which 3G developers are looking at. All such high data rate applications come with an increase in system bandwidth. The increased bandwidth leads to a more distorted channel, which behaves in a frequency selective fashion and is modeled as a multi-path phase and amplitude distortion. This leads to further complexity in receiver design. As mobile communication gadgets become a part of everyday life, cost effective and efficient systems, which can support such high data rate applications, become imperative. Thus, there is the challenge to develop more cost efficient systems, which can support high data rate and computationally intensive algorithms.

To address these issues, the 3G evolution for CDMA systems lead to cdma2000, while for GSM and IS-136, it lead to Wideband CDMA (W-CDMA) [1]. It was designed to support high data rates of up to 3.84 Mcps. It has been studied that multiple transmit antenna usage leads to a diversity increase and thus an increase in downlink system capacity. The third generation partnership project (3Gpp) for W-CDMA has chosen Space Time Transmit Diversity (STTD) as their open loop diversity schemes with two transmit antennas.

To extract the channel diversity provided by the frequency selective channel (wide band channel), a Rake receiver, that uses Maximal Ratio Combining (MRC) is used. Since the receiver requires channel knowledge, a channel estimation algorithm forms a part of the receiver design.

This thesis follows the style of an IEEE journal on Select Areas in Communication.

The purpose of this work was to study the performance of the above-mentioned system and to implement the transmitter and receiver structure for the downlink W-CDMA on a fixed-point Digital Signal Processor (DSP). It is cost effective to use a general-purpose processor like the DSPs instead of Application Specific Integrated Circuits (ASICs). One reason is the fact that DSPs are highly flexible and programmable and are thus ideally suited for rapidly changing decoding algorithms. Whenever data rate is of prime importance, fixed-point processors, which are faster, are usually preferred over their floating-point counterparts, which are more precise in computation. If the loss of precision is not of much significance to the application concerned, then fixed-point processors, which are cheaper and faster, are preferred.

The MOTOROLA SC140 DSP processor core was selected for this implementation. Its highly flexible core and support for a high degree of parallelism make it an ideal choice for such computationally intensive wireless algorithms, which involve extensive complex arithmetic. The main objective behind the work was to evaluate the data rate supported by the receiver implementation and to verify if it is comparable to that provided by the 3Gpp standards for W-CDMA. Another objective is to study and analyze the performance of the algorithm implemented on this fixed-point processor vis-à-vis floating-point simulation results.

The organization of the work is as follows. Section 2 is the background Section, which deals with the basic DSP and Communication Theory related concepts necessary for better comprehension of the work involved. The concepts of spread spectrum CDMA, space time coding and demodulation technique for multi-path fading channel (Rake receiver) is introduced in this Section. In Section 3, the downlink W-CDMA standard specifications for the transmitter and the receiver are introduced. The structures of the base station model and the coupled Rake receiver model are elaborately discussed in this Section. The channel model used for simulation purposes is also introduced here. Section 4 deals with the software simulation results. The performance of the W-CDMA

standard with space-time transmit diversity encoder is studied and analyzed over the frequency selective fading channel. The effects of different parameters like the number of paths in the channel model, the presence of transmit diversity, the modulation format used, etc. on the capacity and the performance of the system are extensively studied through various software simulations. The degradation in bit error rate performance incurred using a channel estimation algorithm as compared to the perfect channel knowledge assumption is also quantified. In Section 5, the DSP (SC140 Core) implementation of the downlink W-CDMA standard with STTD encoding is analyzed in detail. Some of the major functional units are discussed in-depth with respect to implementation issues and techniques. Then the overall results of implementing the transmitter and the receiver on the DSP processor are discussed in terms of the chip rate supported and in terms of the performance loss incurred by using a fixed-point processor as compared to software simulation. Finally, in Section 6, a conclusion to the work is drawn, and some future work is suggested.

### 2 BACKGROUND

#### 2.1 DSP Background

One of the primary reasons for the wireless industry's phenomenal growth till date has been DSP technology. Originally, the wireless industry was basically based on "digital voice" data. However, with the burgeoning demand for wireless data transmission of all kinds (internet, video, etc.), the fixed point programmable DSP turned out to be one of the key semiconductor devices for next generation wireless applications. Traditionally, ASICs have been preferred for wireless applications for their low power consumption characteristics. But the recent trend in DSP techniques towards low power, Very Long Instruction Word (VLIW) architecture provides a viable alternative to ASICs. The DSP requirements of wireless applications can be categorized as high raw throughput, low power dissipation, low systems cost, fast time to market, and small size.

The question arises as to why we would prefer DSPs to ASICs. The answer is, wireless systems evolve quickly, and often times it is difficult to specify changes at design time, and there arises a need to modify systems later in the product. Programmable DSPs offer flexibility to do so, lowering the risk to changing standards and reducing the time to market a new product. On the other hand, ASICs offer greater efficiency in terms of both power consumption and processing power. Since these features too have been integrated in recent DSP architectures (power difference between the two is not significant enough), DSPs [2] form a suitable alternative to ASICs for wireless applications. In today's technology, only 10% of the 3G WCDMA processing power resides in DSPs. The remaining 90% is still in ASICs, but this trend is rapidly changing in favor of DSPs. Several DSPs in the market are being designed with the wireless applications in mind, like the Lucent 16000, AD121xx series, TI C6x series, Motorola Star\*Core, etc.

The Star\*Core SC140 processor core is one of the first innovative and highly efficient digital signal processing cores which is ideally suited for wireless communications applications [3]. Some of the features of this new class of architecture are mentioned below (Further details refer to Section 5.2). It uses parallel architectures that execute many operations per clock cycle. Besides, it has a large memory and I/O bandwidth to support large application program codes and high data rate interfaces (like video). It can support complex multi-threaded real-time synchronous applications and is scalable enough to meet a wide range of cost or performance specific targets. These are some of the major reasons why this particular processor core is chosen for the implementation of the W-CDMA receiver algorithm on DSP. The very high chip-rate processing requirements for the rate adaptive Rake Receiver make it an ideal candidate for software implementation on the SC140 cores.

There are a few application notes published on related work, which has already been accomplished on the SC140 core [4], [5], [6], etc. There are a few major shortcomings of this. Most of the work that has done is on individual functional units of a WCDMA receiver and none of it has analyzed the overall system performance. Moreover, none of the work dealt with the receiver complexity associated with a STTD system. The major contributions of this work have been to address some of the above-mentioned issues. Firstly, the entire W-CDMA system barring a few functional units (like the path searching algorithm, synchronization and the error control-decoding unit), have been integrated onto a single SC140 DSP core. Then the implementation takes into account the open loop transmit diversity case (STTD) of the W-CDMA receiver structure (no application notes published by Motorola deals with this). Finally, and most importantly, the performance of this algorithm on the fixed point DSP architecture (SC140 core) is quantitatively analyzed against simulation results. Another major purpose of this work is to quantify the chip-rate supported by the implementation and to study how it varies with different system parameters like the spreading gain, the number of channel paths, etc.

#### 2.2 Communications Theory Background

In this section some of the vital communications theory concepts necessary for proper comprehension of the W-CDMA system are introduced.

### 2.2.1 **DS-CDMA**

Direct Sequence Code Division Multiple Access (DS-CDMA) is a multiple access technique where multiple users concurrently share the same system bandwidth. Communication systems following this concept are based on the spread spectrum technique. The spread spectrum signals are intentionally made to be of much wider bandwidth than the information they carry to make them more noise-like [7]. This noise-like signal is therefore hard to detect and intercept and even harder to jam than normal narrow-band signals. The various advantages of using a spread spectrum signal can be enumerated as given below:

- 1. It has a good anti-jam property by virtue of its broader bandwidth.
- In the Spread Spectrum (SS) system, the signal is made to appear wide band and noise-like. It is this characteristic that makes SS signals possesses the quality of 'Low Probability of Intercept'.
- 3. Applying SS results in the reduction of multi-path effects.
- 4. As the signal is spread over a large frequency-band, it has a low power spectral density. Thus, mutual interference with narrow band communication is much less.

The two common spread spectrum systems are the 'direct sequence' and the 'frequency hopping' type. In Direct Sequence (DS) spread spectrum systems a high rate pseudo random (not Gaussian noise like) code sequence is added to the information signal. The composite pseudo noise and data is then passed through a data scrambler, which randomizes the output spectrum and thereby removes all discrete spectral lines.

On the other hand, in Frequency Hopping (FH) systems, 'hopping' from one frequency to another over a wide frequency range generate the desired wideband frequency spectrum. The specific order in which frequencies are occupied is a function of the pseudo code sequence generated, and the rate of hopping from one frequency to another is a function of the spreading gain of the system. Spreading gain is a measure of how much more bandwidth is required by the system and it is the ratio of the rate of pseudo code ( $R_c$ ) and the rate of symbol data ( $R_s$ ).

In the DS-CDMA system, multiple users are assigned their own unique pseudo random code known as the spreading code. A typical DS-SS transmitter using BPSK modulation is shown in Figure 1. The spread signal is given as  $s(t) = Ad(t)\cos(\omega_c t + \theta)c(t)$ . Here, d(t) is the user data stream and c(t) is the spreading code sequence. Since the spreading sequence is at a much higher rate than the data, it results in bandwidth widening.



Figure 1. Direct Sequence Spread Spectrum Transmitter.

At the receiver, the original transmitted information of each user is retrieved by cross correlating the received signal with the same (as used in the transmitter) unique signature spreading code followed by sampling at the symbol rate. This process of multiplying the received signal with the user's signature code at the receiver is known as de-spreading. All the other users' information is also spread by their own unique spreading codes and all of these codes are orthogonal to each other. Thereby, the multiple access interference (MAI) from all the other users is suppressed. Thus, for a flat fading channel with orthogonal signature codes, this receiver gives optimal performance. The conventional receiver structure for DS-CDMA is shown Figure 2.



Figure 2. Conventional DS-CDMA Receiver.

However, when the channel is frequency selective in nature, the multi-path characteristic (Section 2.2.2) of the channel destroys the orthogonality among the code sequences. In such a scenario, the MAI can never be totally suppressed. Thus, intelligent code design takes priority in CDMA systems (Section 5.3.1.2).

#### 2.2.2 Frequency Selective Channel and Rake Receiver

For a given signal propagating in a mobile radio channel, the type of fading experienced depends on the nature of the transmitted signal with respect to the characteristic of the channel [8]. The type of fading that a transmitted signal will undergo is determined by the relation between the signal parameters (such as the transmission bandwidth, symbol interval) and the channel parameter (such as the channel delay spread and the Doppler spread). While on one hand the multi-path delay spread leads to time dispersion and frequency selective fading, Doppler spread leads to frequency dispersion and time selective fading. These two fading mechanisms work independent of each other. As mentioned earlier, time dispersion leads to frequency selectivity of a channel. If the channel characteristics like the gain and phase remain constant over the system bandwidth, then the signal undergoes flat fading. On the other hand, if the channel has a constant gain and phase characteristics over a bandwidth, which is comparably smaller than the transmitted bandwidth, then the channel introduces frequency selective fading in the received signal. The Doppler spread determines the rate of change in the channel and accordingly the channel is classified as a fast or a slow fading channel.

In wireless communication in a typical urban environment, there is little possibility of direct line of sight. In such outdoor environments, the channel is typically modeled as a multipath-fading channel. Due to multiple reflections or scattering in the environment, the signal at the receiver traverses multiple paths before arriving at the receiver. Each of these paths has its own characteristic magnitude, phase and time delay. If all the signals are equally dominant, then we term it as Rayleigh fading otherwise if the signal from one path is dominant over the others, we call it Rician fading.

Such a channel introduces Inter-Symbol Interference (ISI) if the maximum time delay between arrivals of signals (delay spread) is comparable to the symbol time. The channel will selectively fade the signals depending on the signal frequency. Thus, frequency selectivity is due to time dispersion of the transmitted signal within the channel. It is a phenomenon, which is more prominent in wide band CDMA systems as they have a broader transmission bandwidth. On the brighter side, frequency selectivity also gives a natural diversity gain to the system.

Now let us introduce the model of the frequency selective fading as a multi-path channel. The time-variant frequency-selective channel can be modeled or represented as a tapped delay line with tap spacing of 1/W and the tap weight coefficients as  $\{h_n(t)\}$ . The minimum time resolution is of the order 1/W, where W is the system bandwidth. Thus, all resolvable paths in the multi-path model are equally time spaced by 1/W. The low pass impulse response of the channel can thus be represented by Equation 2-1.

$$h(\tau;t) = \sum_{n=-\infty}^{\infty} h_n(t) \delta\left(\tau - \frac{n}{W}\right)$$
 Equation 2-1

Since, the total multipath spread is  $T_m$ , for all practical purposes is the number of taps in the tapped delay model for the channel and can be truncated at  $L = \lfloor WT_m \rfloor + 1$  taps. The channel can then be represented as shown in Equation 2-2.

$$h(\tau;t) = \sum_{n=1}^{L} h_n(t) \delta\left(\tau - \frac{n}{W}\right)$$
 Equation 2-2



Figure 3. The Tap-Delayed Model of a Multipath Raleigh Fading Channel.

This truncated tapped delay model is shown in Figure 3. The channel parameters  $\{h_n(t)\}\$  are complex-valued stationary random processes. In the case of Rayleigh fading, the magnitudes  $|c_n(t)|$  are Rayleigh distributed and the phases  $\phi_n(t)$  are uniformly distributed.

The tapped delay model of the channel with statistically independent tap weights provides us with L replicas of the transmitted signal at the receiver. Hence, at the receiver,  $L^{th}$  order diversity can be achieved. While a conventional demodulation technique will require an equalizer to counter the inter-symbol interference introduced by the multi-path channel, in CDMA systems, the codes are designed to have very good cross correlation properties thereby minimizing the ISI. Thus, the propagation delay merely introduces multiple copies of the same signal in the receiver. At the receiver, the multi-path components appear as uncorrelated noise and are made negligible by the spreading. In such a scenario, a Rake receiver, that uses MRC combining, can be used to exploit the diversity offered by the frequency selective channel. It attempts to segregate the time-shifted versions of the original signal and separately correlate each multi-path version of the signal.



Figure 4. Rake Receiver Structure.

The Rake is basically a diversity receiver designed for a CDMA system, where the diversity is provided by the fact that the multi-path components are practically uncorrelated with each other. The output of each correlator is weighted to provide an estimate of the signal. The structure of the Rake receiver is shown in Figure 4. The output at the receiver can be represented by Equation 2-3.

$$R(n) = \int \left[ \sum_{l=0}^{L} h_l^H y \left( t + \frac{l}{W} + nT_b \right) \right] \phi^H(t) dt \qquad \text{Equation 2-3}$$

Here,  $h_l$  is the channel coefficients,  $\phi(t)$  is the pulse shaping filer, W is the system bandwidth and  $T_b$  is the bit period.

### 2.2.3 Space Time Codes

The major requirements for next generation wireless are to have better quality and coverage, and at the same time, be more power and bandwidth efficient. The major impediment to reliable wireless transmission is the time varying multi-path fading channel. One way to achieve these objectives is to improve the overall signal to noise ratio of the system by either transmitting higher power or by increasing the transmission bandwidth. But this is in contrary to the requirements of next generation systems. Under such strict constraints, diversity proves to be an effective technique to combat the effect of multi path fading. In simple terms, diversity can be defined as a method of transmitting multiple copies of the same signal over independent fading channels. Thus the probability that most of those signals will be distorted is much smaller. In literature, there are various forms of diversity [9]. The main forms are:

- 1. Time diversity
- 2. Spatial diversity
- 3. Frequency diversity
- 4. Polarization diversity.

In temporal diversity, the same signal is transmitted separated in time, with the time separation between signals being more than the coherence time of the channel. In frequency diversity, the signals are transmitted over different frequency bands such that the separation between the frequency bands is more than the coherence bandwidth of the channel. On the other hand, transmission of the same signal over multiple antennas, separated in space is known as spatial diversity. The problem with time diversity on a slow fading channel is the large delay encountered, while the problem with the frequency diversity is the requirement of large frequency separation between the bands.

Thus, it can be concluded that the selection of the diversity scheme for a particular communication system is governed by channel characteristics. But it has been observed that in most scattering environment; antenna diversity proves to be a practical and widely used diversity technique.

#### 2.2.3.1 Space Time Transmit Diversity

Space-time codes were introduced as the class of codes [10], which improves performance without any tradeoff of bandwidth efficiency. They are based on repetition codes (Code Rate of 1/2) with multiple antennas to obtain transmit diversity along with coding gain without loss of any bandwidth efficiency.

The 'Alamouti code' [9] is one such codes, which is used as a space-time coding scheme for the open loop transmit diversity in downlink W-CDMA. Figure 5 shows the Alamouti coded transmit diversity scheme. This scheme uses the concept of having two transmitters and one receiver antenna. In this section, we will introduce the encoding scheme of Alamouti codes along with the combining scheme and the ML decoder associated with it in the receiver.

Encoding Scheme - Since it is a two transmit antenna system at any instance of time, two symbols are simultaneously transmitted from the antennas. If the symbol transmitted

Tx antenna 0



Figure 5. Two Transmit Antenna Diversity Scheme.

from antenna zero is denoted by  $d_0$  and that transmitted from antenna one by  $d_1$ , then the symbols transmitted during the next time period will be given by  $(-d_1^*)$  for antenna zero and symbol  $d_0^*$  for the antenna one. Here the \* represents the complex conjugate of the signal. The signal transmission is represented in Table 1.

Table 1. The Encoding Scheme for Alamouti Codes.

|               | Antenna 0 | Antenna 1    |
|---------------|-----------|--------------|
| Time <i>t</i> | $d_0$     | $-d_{1}^{*}$ |
| Time $t+T$    | $d_1$     | $d_0^*$      |

The fading channel is assumed to be constant across two consecutive symbol intervals. It is modeled as a complex multiplicative vector. The channel for the *l*-th path (multi-path fading), for transmit antenna zero at time interval *t* is given by  $h_{0l}(t)$ , while that for transmit antenna one is given by  $h_{1l}(t)$ . They can be represented in the following form:

$$h_{0l}(t) = h_{0l}(t+T) = \alpha_0 e^{j\theta_0}$$
 Equation 2-4

$$h_{11}(t) = h_{11}(t+T) = \alpha_1 e^{j\theta 1}$$
 Equation 2-5

Here, *T* represents the symbol interval. Thus, the received signal of the *l*-th path can be expressed as shown in Equation 2-6 and Equation 2-7.

$$r_{0l}(t) = h_{0l}d_0 + h_{1l}d_1 + n_{0l}$$
 Equation 2-6

$$r_{1l}(t+T) = -h_{0l}d_1^* + h_{1l}d_0^* + n_{1l}$$
 Equation 2-7

Here,  $r_{0l}(t)$  and  $r_{1l}(t+T)$  represent the received signal at time interval t and t+T respectively, while  $n_{0l}$ ,  $n_{1l}$  represent the complex random variable for the corresponding complex noise and interference realizations.

Combining and ML Scheme - The combining scheme is used in accordance with Equation 2-8 and Equation 2-9. The combined signal is then sent to the maximum likelihood decoder.

$$\widetilde{d}_{0l} = h_{0l}^* r_{0l} + h_{1l} r_{1l}^*$$
 Equation 2-8

$$\widetilde{d}_{1l} = h_{1l}^* r_{0l} - h_{0l} r_{1l}^*$$
 Equation 2-9

Substituting  $r_{0l}$  and  $r_{1l}$  in the above equations, the soft symbol output can be represented as:

$$\widetilde{d}_{0l} = \left( \left| h_{0l} \right|^2 + \left| h_{1l} \right|^2 \right) d_0 + h_{0l}^* n_0 + h_{1l} n_1^*$$
 Equation 2-10

$$\widetilde{d}_{1l} = \left( \left| h_{0l} \right|^2 + \left| h_{1l} \right|^2 \right) d_1 + h_{1l}^* n_0 - h_{0l} n_1^*$$
Equation 2-11

To generate the overall soft output for symbol  $\tilde{d}_0$  and  $\tilde{d}_1$ , in the Rake receiver, the soft outputs from all the multipath branches are combined such that  $\sum_{l=1}^{L} \tilde{d}_{0l}$  and  $\sum_{l=1}^{L} \tilde{d}_{1l}$  are

obtained. Note that the total effective path diversity becomes 2L, which is twice the diversity without STTD.

### **3** W-CDMA DOWNLINK MODEL

#### 3.1 Introduction to W-CDMA

Wideband Code Division Multiple Access (W–CDMA) is a CDMA channel that is four times wider than current channels that are typically used in 2G networks in North America. W-CDMA developed as a joint effort between the European Telecommunications Standards Institute (ETSI) and the Association for Radio Industry and Business (ARIB). It ensures backward compatibility with second-generation technologies like GSM, IS-95 etc. as well as with 2.5 G TDMA technologies. Today, W-CDMA is the prime focus of the 3Gpp world standard body. It operated on paired bands of 1920 – 1980 MHz and 2110 – 2170 MHz and has a sub-carrier spacing of 5 MHz. It can support a variety of data services ranging from a very low (60 Kbps for voice) to very high data rates (384 Kbps for Internet access), with a constant chip rate of 3.84 Mcps.

It uses a Rake receiver to compensate the effect of multi-path fading and to achieve diversity gain out of it. Recently, standardization bodies included the use of multiple transmit (downlink diversity) antennas to improve its capacity. Transmit diversity is classified into two categories - closed loop transmit diversity, which requires a mobile to base station feed back and open loop transmit diversity, which does not require any such feedback. W-CDMA employs both open and closed loop diversities while CDMA2000 employs only the open loop diversity. The 3GPP for W-CDMA has chosen STTD as the open loop transmit diversity technique for two transmit antenna system. Other transmit diversity techniques such as Space-Time Spreading (STS) and Orthogonal Transmit Diversity (OTD) are used as open loop techniques for the CDMA2000 standard [11]. This work involves the simulation of the W-CDMA system model, which employs the STTD technique.

#### 3.2 System Model

The W-CDMA system model is illustrated in Figure 6. This section will introduce the various blocks in the system model, while the subsequent sections will deal with detailed overviews of the transmitter structure (Section 3.2.1), the channel model (Section 3.2.1) and receiver (Section 3.2.3) structure.



Figure 6. W-CDMA System Block Diagram.

The functional units in the transmitter implementation are as given below.

- Error control encoding Generated bit stream is convolutionally encoded (Section 5.3.4).
- 2. QPSK signal generation Modulation of the binary data stream into QPSK symbols.
- STTD encoding Mapping the QPSK signal into STTD signaling format (Section 3.2.1.1).
- 4. Spreading & Scrambling Applying channelization and scrambling code to the symbols (Section 5.3.1).
- 5. RRC filtering The resulting chip sequence is then pulse shaped using a RRC pulseshaping filter.

The channel simulator (software simulation) consists of the following units.

- Multiplicative multi-path Rayleigh fading channel A Tapped delay model of the frequency selective channel modeled as several independent realizations of a Rayleigh fading process (Section 3.2.1).
- 2. Additive Gaussian noise A complex Gaussian additive noise process.

The functional units in the receiver structure implementation are given below.

- 1. RRC filtering A low-pass matched (to the transmitter) filter.
- 2. Channel estimation A channel estimation algorithm, since the Rake requires channel knowledge (Section 5.3.2).
- 3. Scrambling code generation Generation of the scrambling code to de-scramble the received filtered signal (Section 5.3.1.2).
- 4. De-spreading & De-scrambling The filtered sampled output is de-spread and descrambled.
- The Rake MRC The two coupled Rake receiver structure for Maximal Ratio Combining (MRC) (Section 5.3.3).
- 6. Error control decoding Viterbi decoding is used to decode the convolutionally encoded bit stream (Section 5.3.4.1).

#### **3.2.1** Transmitter Model

The downlink base station model has *K* users transmitting modulated signals over the same linear multi-path fading channels with additive Gaussian noise [12]. For the STTD scheme, there are two transmit antennas in the base station (*j*=1, 2). Assume a common (same data rate) spreading factor *N* for all *K* users. Spreading and scrambling codes are added to every user symbol. Thus every chip sequence is effectively given by the product of the  $n^{th}$  symbol of the  $k^{th}$  user and an aperiodic sequence  $w_{k,l}$ , which is just a product of the user specific spreading sequence  $c_k$  and the base station specific complex scrambling sequence  $s_l$ .

Let the unit energy Orthogonal Variable Spreading Factor (OVSF) sequence be  $c_k = [c_{k,0}, c_{k,1}, \dots, c_{k,N-1}]^T$ . The aperiodic sequence  $w_{k,l}$  will then be given by,  $c_{k,l \mod N} s_l$ . Thus the resultant chip sequence for each of the two antennas can be represented as the summation of the chip sequences of all the users in them, and can be represented as shown in Equation 3-1.

$$b_l^j = \sum_{k=1}^K \alpha_{k, \lfloor \frac{l}{N} \rfloor}^j w_{k, l}, j = 1, 2.$$
 Equation 3-1

This spreading operation is represented as the filtering of an up-sampled symbol sequence with the spreading sequence as the impulse response. A pulse-shaping RRC ( $\alpha$ =0.22) filter is used at the transmitter in order to band-limit the base band signal and then it is imposed onto a carrier waveform to convert it into a continuous time signal. The spreading and modulation associated with the downlink dedicated physical channel are illustrated in Figure 7.

The receiver has a similar RRC filter. The output of this low-pass filter will be given by  $\bar{b}_l^j$ , an estimate of  $b_l^j$ . Assume the channel coefficients to be represented by  $(h_i^j)$ ,

3.84 Msps



Figure 7. Downlink Transmitter Structure for W-CDMA.

discrete complex vectors, with a delay spread of L. The sampled received signal is given by Equation 3-2.

$$y_l = y_l^1 + y_l^2 + v_l$$
. Equation 3-2

Where, 
$$y_l^j = \sum_{k=1}^{K} \sum_{i=0}^{L-1} h_i^j \overline{b}_{k,l-i}^j$$
,  $j = 1,2$ . Equation 3-3

Here,  $y_l^1$  represents the received signal from the first antenna, while  $y_l^2$  gives the received signal from the second. The sampled complex additive Gaussian noise vector is represented by  $v_l$ .

#### 3.2.1.1 STTD Encoder

The STTD, which employs space-time block coding, was originally proposed for a narrow band system. But now it is applied to the WCDMA system. The received signal on path *j* at time instant *T* will be given by,  $r_j^1 = \alpha_j^1 S_1 - \alpha_j^2 S_2^* + n_j^1$ , while at time instant 2*T*, it will be given by,  $r_j^2 = \alpha_j^1 S_2 + \alpha_j^2 S_1^* + n_j^2$  [13]. The symbol  $\alpha_j^i$  denotes the Rayleigh fading channel of the *j*<sup>th</sup> multi-path of the *i*<sup>th</sup> transmit antenna at the receiver. While  $r_j^i$  and  $n_j^i$  denotes the *j*<sup>th</sup> multi-path-received signal from the *i*<sup>th</sup> transmitter and its corresponding additive AWGN noise, respectively. The block diagram of the STTD encoder is shown below in Figure 8.



Figure 8. Block Diagram of the STTD Encoder.

#### 3.2.2 Channel Model

The channel model used for simulation purposes is the multi-path Rayleigh fading model [14], which in effect is a delayed multiple realization of independent fading processes. The tap-delayed model for frequency selective channel, with uniformly spaced taps is selected in this regard. The spacing between adjacent taps are 1/W (W being the transmitted bandwidth), which is equivalent to one chip duration. The frequency domain method is used to generate the channel coefficients, such that the variance of the sampled random processes in frequency domain maintains the desired bathtub profile. The Doppler frequency ( $f_d = f_c(v/c)$ ), determines the nature of fading in time domain

(slow or fast fading). The energies of the sampled random processes are normalized to maintain overall unit energy as given in Equation 3-4. A uniform power delay profile (PDP) of the chip-tap channel with L path is used (Equation 3-5).

$$\sum_{l=0}^{L} \sum_{k=-M}^{M} \sigma_{l,k}^{2} = 1$$
Equation 3-4
$$\sum_{k=-M}^{M} \sigma_{k}^{2} = \frac{1}{L}$$
Equation 3-5

Here, 2M-1 represents the number of samples of complex Gaussian random process.

#### 3.2.3 Receiver Model

The receiver is to be implemented on the DSP processor and will be composed of six major blocks - the low-pass (RRC) filter, the scrambling code generator, the channel estimation algorithm, the modified Rake receiver and finally, the Viterbi decoder. The received analog signal sampled at M times (the sampling rate of the filter is M times higher than the chip rate) the chip rate is passed through a digital low pass filter. The low pass filter (matched filter) is a FIR, RRC filter with filter coefficients similar to the one used in the transmitter.



Figure 9. Receiver Block Diagram.

The receiver then generates the scrambling and spreading sequence (38400 chip length corresponding to 10 ms (Section 5.3.1.2.2)), which is unique to it and dynamic (changes every time a cell makes a new connection with the base station), and stores it in its data memory. A channel estimation algorithm (Section 5.3.2) generates the necessary channel coefficients required for Rake MRC combining (Section 5.3.3).

The filtered output after down sampling (M times) is de-spread and de-scrambled. The symbol rate output of de-spreading and de-scrambling is then combined using a coupled Rake receiver structure (Section 3.2.3.1). The block diagrammatic representation in Figure 9 gives a better insight into the receiver model.

#### 3.2.3.1 Coupled STTD Rake Receiver

The Rake receiver structure of an STTD encoder is shown in Figure 10.



Figure 10. STTD Downlink Coupled Rake Receiver Structure.

It has two coupled conventional Rake receivers. The linear processing involved for generating the two soft outputs for the symbol  $z_1$  and  $z_2$  is given in Equation 3-6 and Equation 3-7.

$$\sum_{j=1}^{L} r_{j}^{1} h_{j}^{1*} + r_{j}^{2*} h_{j}^{2} \quad for \ z_{1}$$
 Equation 3-6

$$\sum_{j=1}^{L} -r_{j}^{1*}h_{j}^{2} + r_{j}^{2}h_{j}^{1*} \quad for \ z_{2}$$
 Equation 3-7

Figure 10 shows that the soft output of the two symbols depends on both the  $\vec{h}_j$  (multipath channel vectors) and the two input signals  $r_{2P}$  and  $r_{2P+1}$ . The filters  $\vec{h}_j$  are causal chip rate filters [12].

# **4 W-CDMA SYSTEM SIMULATION RESULTS**

#### 4.1 Simulation Results

The software simulation results (using 64-bit precision computation) are introduced in this section. The W-CDMA STTD system model referred to in Section 3 is simulated on a win-32 system using a software tool called MATLAB. The performance results obtained from the simulations were then verified with those given in related literature [15].

The effect of STTD scheme on the capacity of the downlink W-CDMA system is exhibited in Figure 11. It is a BPSK system with a spreading factor of (N=128). The channel model is a multi-path fading channel (two paths, L=2) with Doppler frequency of 10 Hz, which means it is an indoor environment with a speed of (3.37 miles per hour). The bit error rate performance is evaluated against varying capacity (K/N ratio) with zero AWGN noise. Here, K represents the number of active users in the system. It must be noted that in this case, perfect channel knowledge and perfect receiver synchronization is assumed. As expected, the introduction of transmit diversity increases the system capacity but it is also observed that the increment is not much. The reason for this is the fact that diversity gains achieved by using two transmit antennas is counteracted by an increase in MAI as each symbol is duplicated (two antennas). The resultant capacity increment as evaluated from the figure is approximately around 7-8 percent for a given bit error rate. On the other hand, the capacity increment is found to be more pronounced when the channel is flat fading. It is also observed that the improvement of using downlink space-time block coding is more significant when the fading is slow with high temporal correlation.



Figure 11. The Effect of Transmit Diversity on System Capacity.

The effect of the modulation format used; on system capacity is shown in Figure 12. It is a comparative performance analysis of the QPSK modulation scheme (used in the 3Gpp standard for W-CDMA) against the BPSK. As expected there is a distinct tradeoff between increase in bit rate and system capacity. The QPSK modulation format, which supports twice the bit rate of the BPSK format, has approximately 50 percent lesser capacity. The system under consideration has a spreading factor of N=128 and the symbols at the transmitter are STTD encoded. The channel model is a multi-path fading (two paths, L=2) with Doppler frequency of 10 Hz (indoor environment of 3.37 miles per hour). This result also assumes perfect channel knowledge and perfect synchronization of the receiver.



Figure 12. The Effect of Modulation Format on the System Capacity.

The effect of loading on the performance of a QPSK system with space-time coding is shown in Figure 13. The system has a spreading factor of (N=64), while the channel model has two paths (L=2), with a Doppler frequency of 10Hz (indoor environment with a speed of 3.37 miles per hour). As expected, with an increase in system load (more users), the performance degrades. It can be observed from the figure that as the signal strength (SNR high) increases, the bit error rate reaches an error floor.

The effect of the channel estimation algorithm on the performance of the system is shown in Figure 14. It is a QPSK system (STTD encoded) with a spreading factor of 128 and the multi path channel model has two paths with a Doppler frequency of 10Hz (indoor environment). As expected, the performance degrades using this algorithm as against perfect channel assumption. The algorithm details are introduced in Section 5.3.2.1. The Data-Pilot Channel (DPCH) with 2 pilot symbols per slot is used in this



Figure 13. The Effect of System Loading on the Performance.

simulation. It can be concluded from the figure that there is a decrement of 25 percent in the system capacity (for a give bit error rate) by using this channel estimation algorithm (DPCH channel) as against perfect channel knowledge assumption. The number of taps in the Moving Average (MA) filter used in this simulation is 4. The effect of number of filter taps, on the performance, of both the channel estimation (CPICH and DPCH) algorithms, for different Doppler frequency, is dealt with in detail in Section 5.3.2.2.

The downlink W-CDMA standard employs a rate-half convolutional encoder at the transmitter (Section 5.3.4). The effect of this error control-coding algorithm on the overall system performance is demonstrated in Figure 15. It is a QPSK system (STTD encoded) with a spreading factor of 128 and the multi path channel model has two paths with a Doppler frequency of 10Hz (indoor environment). The receiver assumes a perfect



Figure 14. The Effect of Channel Estimation Algorithm on Capacity.

channel knowledge and synchronization. The 'trace-back' length in the simulation is assumed to be 20.



Figure 15. Effect of Channel Encoding on the System Capacity.

# **5 IMPLEMENTATION ON SC140 DSP CORE**

#### 5.1 Introduction

To address the demands of next generation DSP applications, the Star\*Core SC140 digital signal processor was implemented with an innovative architecture. This highly flexible core supports the high computational requirements of next generation wireless applications. It has exceptional performance, low power consumption and highly compact code density. Moreover, it has an optimized power management unit, which dynamically controls the system clock and operates on low supply voltage. It has a scalable performance, which can be configured to any application's performance requirements and a high level abstraction of application software. To maximize parallelism by allowing multiple address generation and data arithmetic logic units to execute multiple instructions in a single clock cycle, the SC140 core employs a Variable-Length Execution Set (VLES) execution model. The core is capable of executing up to six instructions in one clock cycle, which includes a maximum of four data ALU instructions and two Address generations related instructions.

Operating at a maximum clock rate of 300 MHz, it can perform up to 1200 million multiply-accumulate operations (MMACS) and up to 3000 RISC MIPS (Million instructions per second). With the SC140 DSP architecture, there always exists a distinct trade-off between optimizing the code for maximum execution speed (for high data rate applications like the one concerned) and optimizing the code for maximum program code density (applications where memory is at premium) [16]. Thus, one needs to classify the algorithm first. The above-mentioned features make the SC140 DSP processor an ideal choice for computationally intensive, next generation wireless applications.

The SC140 core architecture is introduced in the following section followed by a section, which describes at length the implementation details of the algorithm on the DSP processor.

## 5.2 Core Architectural Features

The main functional parts of the SC140 core [17] are the Data Arithmetic Logic Unit (DALU), the Address Generation Unit (AGU) and the Program Sequencer and Control unit (PESQ).

The DALU performs the arithmetic and logical data operations on the SC140 core. It consists of four parallel Arithmetic Logic Units (ALU), a register file of sixteen 40-bit registers and eight data bus shifter/limiters. Each ALU consists of a Multiply Accumulator (MAC) and a Bit Field Unit (BFU). The MAC can perform 16-bit by 16-bit fractional or integer multiplication between two's complement signed, unsigned or mixed operands, which is accumulated in 40-bits. The BFU has a 40-bit bi-directional shifter that performs all the logical and bit shifting operations.

All the DALU registers are accessible by the entire MAC and the BFU unit. Each register is partitioned into two 16-bit registers (low and high portion of the register) and one 8-bit register (extension portion). Thus, these registers can be accessed as 8-bit, 16-bit, 32-bit or 40-bit register depending on the instruction. To support high data transfer rate between the memory and the register file, the architecture has two data buses between the DALU register file and the memory, each of which is 64-bit wide. This accounts for a maximum data rate of 4.8 Gbps at 300 MHz clock speed between the register and the memory. The 8-bit data shifter provides scaling and limiting on data transfer from the register file to the memory.

The AGU is the unit of the SC140 core, which computes the effective address using integer arithmetic necessary to address the data operands in memory. The AGU supports three types of addressing modes - linear, modulo, multiple wrap-around modulo and

reverse-carry. It consists of two Address Arithmetic Units (AAU), which generate two addresses per clock cycle. The two identical AAUs consist of sixteen 32-bit address registers, eight of which can be used as base address registers for modulo addressing. It also contains two 32-bit address used for offset and modulo calculation. There is a 32-bit modulo control register (MCTL) used to specify the addressing mode.

The core also contains a Bit Mask Unit (BMU) which controls bit operations like setting, clearing, inverting, or testing a selected, but not necessarily adjacent, group of bits in a register or memory location. It operates on all registers in the DALU, AGU and the control registers.

The PSEQ controls operations like instruction fetch, hardware looping and exception processing. It has several control units to generate the Program Counter (PC) for instruction fetch, for controlling the sequential program flow and for hardware looping. It has a register file to perform these operations.

In the SC140 core, the program and data memories are unified. Memory is regarded as a single space and each location can either be data or program information and the exact configuration can be customized. The core memory consists of an on-chip RAM and ROM, which can be expanded using off-chip additional memory units. The memory configuration supports two parallel data access and transfers. Data is byte-addressable and accessed by the two data memory buses.

The SC140 core is a fixed-point processor. Since fixed-point processors are considerably faster and cheaper than the floating-point ones, they are ideally suited for computationally intensive algorithms of next generation, wireless communication systems. Loss of accuracy in computation is a major disadvantage using a fixed-point processor. Thus for applications where real time high speed computation is of prime importance and the loss in accuracy doesn't significantly affect performance, a fixed-

point processor like the SC140 is ideally suited. Thus, for the implementation of the STTD encoder and decoder for W-CDMA, the SC140 DSP processor is used [17], [18], [19], and [20].

## 5.3 Functional Units Implemented in DSP

In this section, some of the major functional units in the encoder and decoder design are introduced along with their implementation issues and results, while in the subsequent section (5.4), the overall implementation results are presented.

## 5.3.1 Channelization and Scrambling Codes

## 5.3.1.1 Channelization Codes

As mentioned earlier, in W-CDMA systems, each user is assigned a unique spreading code to encode its signal. The data pair is first serial-to-parallel converted and then mapped onto the, I and Q branches. This is then spread to the 3.84 Mcps (W-CDMA standard) chip rate with the same real valued channelization code known as Orthogonal Variable Spreading Factor (OVSF) [21]. Since the system supports a wide range of data rates (application specific), in order to maintain a constant bandwidth for all the users, multiple Spreading Factor (SF) is required. The OVSF codes maintain the orthogonality between different downlink physical channels, even though they use different spreading factors. Thus the usage of OVSF provides a great degree of flexibility to the W-CDMA system.

Let  $S_N$  be a square matrix of size N and denote the set of N binary spreading codes of N chip length each. Let  $S_N(i)$  represent the row vectors of each of the N elements and is given by  $N = 2^n$ . The matrix  $S_N$  can be generated from  $S_{N/2}$  as explained in Equation 5-1. These variable length codes can be generated using a tree structure and codes from different layers can also be orthogonal except for the case when one of the codes is derived from the other.

$$S_{N} = \begin{bmatrix} S_{N}(1) \\ \dots \\ S_{N}(N) \end{bmatrix} = \begin{bmatrix} S_{N/2}(1) & S_{N/2}(1) \\ S_{N/2}(1) & \overline{S}_{N/2}(1) \\ \dots \\ S_{N/2}(N) & S_{N/2}(N) \\ S_{N/2}(N) & \overline{S}_{N/2}(N) \end{bmatrix}$$

Equation 5-1

## 5.3.1.2 Scrambling Codes

The symbols at chip rate are then scrambled using scrambling codes, which are unique to the base station within a given geographic area. The purpose of the scrambling code is to achieve code synchronization and to combat the effect of multi-path interference. The scrambling codes are designed so as to have good autocorrelation and crosscorrelation properties. A Linear Feedback Shift Register (LFSR) can generate a Pseudo-Noise (PN) code (noise like but periodic and thus deterministic) sequence. For a LFSR of length L, if the sequence generated is of length  $N = 2^{L} - 1$ , it is known as a m-sequence or maximum-length sequence. This class of codes has the best autocorrelation property but does not possess a good crosscorrelation property (which is imperative for the multi-user case). Gold codes (product codes), which are pairs of m-sequences with same degree generated by linearly combining them; satisfy this requirement (good crosscorrelation). Since the code generated is non-maximal, the autocorrelation is worse than that of the m-sequence.

#### 5.3.1.2.1 Downlink Scrambling Code in W-CDMA

In this section, the structure of the scrambling code generator in W-CDMA standard for downlink is introduced [21]. The code from 18-degree generator polynomials are



Figure 16. Downlink Scrambling Code Generator.

truncated into a 10ms frame duration, which results in 38400 chips at a chip rate of 3.84 Mcps. The two m-sequences are constructed using the following primitive polynomial over GF(2) as shown in Figure 16.

$$1 + X^7 + X^{18}$$
 Equation 5-2

$$1 + X^5 + X^7 + X^{10} + X^{18}$$
 Equation 5-3

The two m-sequences x and y are constructed with the following initial conditions:

$$x(0) = 1, x(1) = x(2) = \dots = x(17) = 0$$
 Equation 5-4

$$y(0) = y(1) = y(2) = \dots y(17) = 1$$
 Equation 5-5

For every subsequent symbol the recursive definition is:

$$x(i+18) = x(i+7) + x(i) \mod 2, i = 0,...,2^{18} - 20$$
 Equation 5-6

$$y(i+18) = y(i+10) + y(i+7) + y(i+5) + y(i) \mod 2, i = 0,...,2^{18} - 20$$
 Equation 5-7

The  $n^{th}$  Gold code sequence  $z_n$  is defined as:

$$z_n(i) = x(i+n) \mod (2^{18}-1) + y(i) \mod 2, i = 0,...,2^{18}-2.$$
 Equation 5-8

The binary sequence is then converted into real valued antipodal sequence by the following transformation:

$$Z_n(i) = \begin{cases} +1 & \text{if } z_n(i) = 0\\ -1 & \text{if } z_n(i) = 1 \end{cases} \quad \text{for } i = 0, \dots, 2^{18} - 2. \qquad \text{Equation 5-9}$$

Finally, the  $n^{th}$  Scrambling code sequence  $C_l$  is defined as:

$$C_i = Z_n(i) + jZ_n(i+131072) \mod (2^{18}-1), i = 0,1...,38399.$$
 Equation 5-10

#### 5.3.1.2.2 Algorithm on SC140 Core

This section deals with some of the issues associated with the implementation of the scrambling code sequence on the SC140 core. In order to de-spread and de-scramble the received signal, the receiver needs to generate this code sequence. The sequence is generated for a frame length of 10ms, which corresponds to a code length of 38400-chip segment. The major steps in the implementation are enumerated below.

- 1. Memory space is allocated for storing the generated complex scrambling code. Since the memory space is 16-bit aligned and its required to store 38400 complex data, a total memory required would be  $(38400 \times 2 \times 2)$  bytes.
- Generate the PN code sequences (Figure 16) with an optimized algorithm as introduced in (5.3.1.2.1 above). In the optimized algorithm, 16-stacked bit (PN code) samples are generated in one iteration rather than generating one PN code at a time.

Two data registers are used as the two 18-stage LFSRs. The most significant 22-bits are set to zero (data registers are 40-bit wide) while the lower 18 bit are loaded with the initialization values (Equation 5-4 and Equation 5-5). Thus the main loop performs 2400 (38400/16) iterations.

3. Finally the generated PN sequence is mapped into real values to be stored in the memory buffer (Equation 5-9). To prevent data overflow (de-spreading and de-scrambling are accumulation process) the real values to be scaled down (+1 or -1 to +0.5 or -0.5) before its stored in the memory.

The assembly code results for the generation of scrambling code on the SC140 core is shown in Table 2.

| Functional Unit     | Cycles/Frame | Code Size (Bytes) |
|---------------------|--------------|-------------------|
| PN generation       | 124000       | 310               |
| Scaling and Mapping | 3670         | 112               |

Table 2. Implementation Results of the Scrambling Code Generator.

## 5.3.2 Channel Estimation

The Rake receivers for DS-CDMA over multi-path fading channel require channel parameters like the number of paths, their relative location (time delay), and their complex-valued attenuation. In the absence of perfect channel state knowledge, the receiver must be able to estimate the necessary channel parameters and compensate for the attenuation, the phase shift, and the propagation delay of the channel in each finger. This task of compensation is known as synchronization. The concept of estimating the channel parameters and using them as if they were the actual channel parameters is known as synchronized detection and is used in practically all-digital receiver implementations. Generally, channel estimation is performed using one of two methods.

1. Blind Estimation - It depends on variations in modulation characteristics of the received signal.

 Pilot-Symbol Aided Estimation - Known transmitted pilot symbols are used at the receiver for estimation purpose.

There is a lot of literature available on channel estimation algorithms [22]. Most of the work involves post processing the maximum-likelihood (ML) estimates of the channel phasors. Post-processing the ML estimates with a Wiener filer, i.e.,  $\hat{h} = w^H . \tilde{h}_{ML}$  can generate the optimal channel estimator for known pilot and channel statistics. The ML estimates are generated by multiplying the received sampled signal with the known pilot symbol, i.e.,  $\tilde{h}_{ML} = d_k^* . r_n$ . The Wiener filter coefficients depend on the mobile velocity and the signal to noise ratio of the channel. However, optimal detection techniques can only be applied when the data rate is not very high. For higher data rate applications, a more simplistic approach is desired.

## 5.3.2.1 Downlink Channel Estimation in W-CDMA

Channel estimation for downlink W-CDMA can, either be performed using a Common



Figure 17. Frame Structure for the Downlink DPCH.

Pilot Channel (CPICH) or by a time multiplexed Data-Pilot Channel (DPCH). The DPCH is the only downlink dedicated physical channel. The frame structure of the downlink DPCH is shown in Figure 17. Each such frame is 10ms in length and divided into 15 slots, each of which has a length of  $T_{slot} = 2560$  chips, leading to an overall chip rate of 3.84 Mcps [23]. Every slot starts with a group of pilot symbols, which may be 2, 4, or 8 symbols long depending on the channel rate. The parameter k (Figure 17) determines the total number of bits per downlink DPCH slot. It is determined by the spreading factor (SF) of the physical channel as SF=512/2<sup>k</sup>. Typically, spreading factors range from 512 to 4. Channel estimates between two consecutive slots can be obtained by interpolation. W-CDMA downlink system also has a Dedicated Control Channel (CPICH) [23]. It has a fixed data rate of 30 Kbps and uses a fixed spreading factor (SF=256) with spreading code of all ones. It carries a fixed predefined QPSK pilot symbol sequence of (*1+j*).

The structure of the CPICH downlink frame is shown in Figure 18. Channel estimation using CPICH is more accurate as all the symbols in the frame are used for the purpose of estimation as opposed to only a few in DPCH. Moreover, higher transmitted power for CPICH channel guarantees better signal reception quality. On the other hand, since dedicated DPCH channel is power controlled unlike CPICH channel, it makes better channel estimation during hand-offs.

#### 5.3.2.2 Algorithm on SC140 Core

Traditionally, Moving Average (MA) filtering has been used to smooth out rapid signal fluctuations and filter out undesired noise [24]. The W-CDMA standard for channel estimation adopts this particular approach to filter out the noisy ML channel estimates. The performance of this channel estimation algorithm (on both the CPICH and the DPCH) was initially evaluated by software simulations and later implemented on the SC140 core.



Figure 18. Frame Structure for the Downlink CPICH.

The major steps in the implementation are enumerated below.

- 1. Compute the ML-channel estimate by de-spreading and de-scrambling the received sampled signal and multiply with a complex conjugate (*1-j*) of the pilot symbol.
- Filter the noisy channel estimate by a variable-length moving average filter. In the DSP implementation, a filter length of N= 8, 16, or 32 is chosen.
- 3. To match the data rate of the DPCH channel (assuming CPICH is used for channel estimation), the channel estimates will either have to be decimated (rate of DPCH smaller than CPICH) or interpolated (higher rate for DPCH). Figure 19 illustrates this estimation algorithm. A simple repeater performs the process of interpolation. Since channel parameters are relatively constant during symbol duration, additional overhead (execution cycle) due to interpolation is avoided. In the case when DPCH is used for estimation, proper interpolation becomes necessary, as the pilot symbols are sparsely present in a DPCH frame. Since the performance of this channel estimation algorithm (MA filtering) does not fare too well for the DPCH case, a linear regression method over a window of slots is also suggested.



Single RAKE finger

Figure 19. Channel Estimate Using CPICH.

The MA filtering is defined in Equation 5-11.

$$\hat{h}_{j}(n) = \sum_{k=0}^{M-1} q(k) \widetilde{h}_{j}(n-k)$$
 Equation 5-11

The filter coefficients q(k) are given by 1/M. The final channel estimate is  $\hat{h}(n)$ , while  $\tilde{h}(n)$  represents the noisy estimate. To perform MA filtering all that needs to be done is add the newest scaled sample to the running sum and to remove the oldest scaled sample from it. This can be represented as shown in Equation 5-12.

$$\hat{h}(n) = \hat{h}(n-1) + \frac{1}{M} \left( \tilde{h}(n) - \tilde{h}(n-M) \right)$$
 Equation 5-12

Thus, it reduces computational cycles (reduction in MAC operations and memory read operations that are typical in filtering). However, computing the running sum for the first time in a frame requires normal filtering operations. Since the MA filtering is a simple accumulation operation with a scaling at the end, data overflow is a critical issue (in

fixed point DSP) that needs to be taken care of. Typically Q-15 format (16-bit representation) is used to represent the CPICH data stream. In case of data overflow it needs to be scaled down (say, Q-13) to a lower Q-format (which depends on various parameters like the number filter coefficients (M), the signal strength, the spreading factor (N)).

The implementation results for the channel estimation algorithm are shown in Table 3. For the DPCH case it is assumed that there are two pilot symbols per slot. The number of taps in the MA filtering (to be introduced later in Section 5.3.2.2) is assumed to be 4.

| Functional Unit            | Cycles/Frame | Code Size (Bytes) |
|----------------------------|--------------|-------------------|
| Channel Estimation (CPICH) | 6612         | 270               |
| Channel Estimation (DPCH)  | 432          | 248               |

Table 3. Implementation Results for the Channel Estimator Algorithm.

The effect of the number of taps in the MA filter of the estimation algorithm is studied in detail. The focus of this study is two prongs - firstly, the effect of the tap length for different mobile speeds (Doppler frequency) and secondly, the effect of the tap length for the two different estimation algorithm (DPCH and CPICH channels). First consider the case when the Doppler frequency is 10Hz, which corresponds to an indoor environment (3.37 mile per hour). The simulation results for tap length (W=2, 8 and 16) for the DPCH channel is shown in Figure 20, Figure 21, and Figure 22. The simulation was performed for a QPSK system with STTD encoding. The multi-path channel has two paths (L=2). The system is loaded to a capacity of 25 percent (N=128, K=32). As expected, for the slow fading channel the performance of the estimation algorithm improves with an increase in the tap length for the DPCH channel. Since, the rate of change of channel parameter is slow; larger tap length averages out the noise more effectively. Similar results are obtained for CPICH channel as shown in Figure 23, Figure 24, and Figure 25. It can also be observed that for the DPCH channel estimation

algorithm, the increase in tap length increases the phase lag produced. However, for the CPICH channel estimation algorithm, the increase in the tap length results in better estimation of the channel (phase lag is not a big issue).

Consider the case when the Doppler frequency is 200 Hz, which corresponds to a vehicular environment with a speed of 67.4 miles per hour. The simulation was performed for a QPSK system with STTD encoding. The multi-path channel has two paths (L=2). The system is loaded to a capacity of 25 percent (N=128, K=32). The simulation results for tap length (W=2 and 8) for the DPCH channel is shown in Figure 26 and Figure 27. For fast fading channel, smaller tap length produces better results. On the other hand, for the CPICH case (Figure 28 and Figure 29), the increase in tap length produces better estimation results. The channel estimation results produced with tap length 8 are better than that produced with tap length 2. However, for higher tap lengths (W=16), the performance degrades. The overall observation is presented in Table 4 below.

| Estimation Algorithm | Doppler(10Hz)        | Doppler(200Hz) |
|----------------------|----------------------|----------------|
| DPCH Channel         | High (W=16, 32)      | Low (W=2)      |
| CPICH Channel        | Very High (W=32, 64) | Medium (W=8)   |

Table 4. Preferred Tap Length for Channel Estimation Algorithm.



Figure 20. DPCH Channel Estimation for (W=2) Slow Fading.



Figure 21. DPCH Channel Estimation for (W=8) Slow Fading.



Figure 22. DPCH Channel Estimation for (W=16) Slow Fading.



Figure 23. CPICH Channel Estimation for (W=2) Slow Fading.



Figure 24. CPICH Channel Estimation for (W=8) Slow Fading.



Figure 25. CPICH Channel Estimation for (W=16) Slow Fading.



Figure 26. DPCH Channel Estimation for (W=2) Fast Fading.



Figure 27. DPCH Channel Estimation for (W=8) Fast Fading.



Figure 28. CPICH Channel Estimation for (W=2) Fast Fading.



Figure 29. CPICH Channel Estimation for (W=8) Fast Fading.

#### 5.3.3 MRC Rake Combining

A Rake receiver uses Maximal Ratio Combining (MRC), to compensate the effect of multi-path fading (Section 2.2.2). There are two ways of performing this combining. It can be done either at Chip-level or at Symbol-level. In Chip-level, the combining of the multi-path signals follows de-spreading and de-scrambling, while in Symbol level, the



Figure 30. Symbol Level MRC Combiner.

chip sequences are first de-spread and de-scrambled and then combined. Though the performances of both schemes are practically the same, symbol level combining is used for implementation and simulation purposes, as it requires fewer computational cycles. The Figure 30 (above) shows the block diagram for symbol level combining.

#### 5.3.3.1 Algorithm on SC140 Core

As mentioned in the earlier section, symbol level combining is computationally more efficient than chip level combining. MRC operation in the chip-level case requires 38400 (per frame) complex multiplication per path, per Rake, while for the symbol-level case, this requirement is reduced to (38400/*SF*). Here, *SF* refers to the system spreading factor. On the other hand the de-spreading and de-scrambling operation requires 38400 complex multiplications, for both the cases. Thus, the computational complexity of the symbol rate case is less by a factor of *SF*. Moreover, it can be shown that the symbol rate case is also more efficient in terms of memory utilization. In the case of chip rate combining, each path is stored in chip-rate which amounts to 38400 symbols per Rake, while for symbol level combining, the total number of samples stored per Rake is equivalent to (38400/SF)\*L. Here *L* refers to the number of paths in the multi-path fading. Since *SF* is usually much larger than *L*, symbol rate combining roves to be more efficient in terms of memory usage than chip rate combining. Table 5 shows the DSP implementation results for the symbol rate combining.

The following optimization techniques are used in the efficient DSP implementation of the de-spreading and the MRC:

- 1. Loop unrolling The inner loop in both the cases corresponds to the L paths in the fading. This loop is unrolled L times to avoid loop initializing and branching overhead.
- 2. Scheduling There are four DALU in SC140 core, and each combining requires two such DALUs, thus allowing two combining operations to proceed concurrently.

| Functional Unit               | Cycles/Frame | Code Size (Bytes) |
|-------------------------------|--------------|-------------------|
| De-spreading & de- scrambling | 81000        | 118               |
| MRC combining                 | 2177         | 156               |

Table 5. Implementation Results for Each MRC Rake Combiner.

#### 5.3.4 Channel Coding

Error control coding or channel coding incorporates information into the signal that allows a receiver to detect and correct bit errors occurring in transmission.

The channel coding schemes used in W-CDMA standards are as follow [25]:

- Turbo coding The scheme of Turbo coder is a Parallel Concatenated Convolutional Code (PCCC) with two 8-state constituent encoders and one Turbo code internal interleaver. The coding rate of Turbo coder is 1/3.
- 2. Convolutional coding Convolutional codes with constraint length 9 and coding rates 1/3 and 1/2 are defined in the standard. In this simulation we use the rate 1/2 convolutional encoder and its configuration is given in Figure 31. A typical convolutional encoder consists of K stages shift registers (constraint length) and one or more modulo-2 adders. It employs Viterbi decoding at the receiver. Each output symbol is a function of the current symbol and the K-1 pervious input symbols. The mathematical connection between the shift register and the modulo-2 adders are described by generator polynomials. The generator polynomials used in the W-CDMA downlink (rate ½), encoder is given in Equation 5-13 and Equation 5-14.

$$G_0 = 1 + D^4 + D^5 + D^6 + D^8$$
 Equation 5-13

$$G_1 = 1 + D + D^3 + D^5 + D^6 + D^7 + D^8$$
 Equation 5-14

Here, D refers to one clock delay.



Figure 31. Rate Half Rate Convolutional Encoder Structure.

#### 5.3.4.1 Viterbi Algorithm on SC140 Core

The SC140 core contains special instruction sets that greatly enhance the implementation and performance of the Viterbi decoding algorithm. Details of these specialized instruction sets are available in reference [17], [18]. There are various optimizing techniques applied in computing the Branch Metric and in the Viterbi Decoder Kernel. These are available in the Star\*Core Application note [26]. The implementation results are given in (for a 40-stage Viterbi decoder) Table 6.

Table 6. Implementation results of the Viterbi decoder.

| Functional Unit | Cycles/Frame | Code Size (Bytes) |
|-----------------|--------------|-------------------|
| Viterbi Decoder | 5399         | 5120              |

## 5.4 Overall Implementation Results

In the previous section (5.3), the implementation issues and results were introduced and analyzed for some of the major functional blocks. The simulation results for the concerned W-CDMA system have also been introduced in Section 4. This section deals with the overall implementation results of the STTD encoder and decoder.

## 5.4.1 STTD Encoder on DSP

Since the DSP cannot handle complex arithmetic directly, for all complex computations the real and imaginary parts are computed separately. To start with, the sampled (Mtimes higher than the chip rate) input symbols, and the pulse shaping filter (RRC  $\alpha$  = 0.22) coefficients are stored in the data memory of the processor. Since the MAC unit can perform 16-bit by 16-bit multiplication, the Q-format used for fractional representation is the Q-15. In this, the largest represented positive fractional number is 1- $2^{-15}$ , while the largest negative fraction is -1. However, while dealing with fixed point processors, one needs to be concerned with the dynamic range of numbers, as it is much narrower than its floating point counterparts. Thus, at any point in the algorithm, if the numbers exceed the dynamic range, then the range has to be anticipated and accordingly, all the data has to be scaled down (divided by a factor of  $2^n$ , for any n) to a lower precision Q-format say (Q-13). This leads to precision loss and thus an overall degradation in performance. Various factors like the signal strength, the number of users K, the spreading factor (N), determine the degree of overflow in the encoder algorithm.

The first stage in the design of the encoder is the mapping the QPSK modulated symbols into space-time codes on the two transmit antennas. This is followed by, spreading, scrambling and finally filtering with the pulse-shaping filter. The implementation issues associated with the generation of scrambling code sequences in DSP are introduced in section 5.3.1.2.2. Details of the transmitter (Section 3.2.1) and the STTD encoder (Section 3.2.1.1) structure have been introduced and explained in earlier sections.

The  $n^{th}$  chip output after spreading and scrambling for a single user in the  $j^{th}$  antenna is given in Equation 5-15. Thus, the generation of a chip sequence requires a single complex multiplication.

$$b_n^{(j)} = \alpha_{\lfloor \frac{n}{N} \rfloor}^{(j)} w_n$$
  $j = 1,2$  Equation 5-15

Here,  $\alpha$  represents the bit,  $w_n$  is the spreading and scrambling sequence for the  $n^{th}$  chip, and N is the spreading factor.

The filter coefficients are sampled M times faster than the chip rate. Thus, the chip sequence is initially up-sampled by zero padding (inserting M-I zeros), and then filtered. The outputs of the filtering operation in the two, I and Q channels are given in Equation 5-16 and Equation 5-17.

$$y_n^{(j,I)} = \sum_{t=1}^{T} b_{n-t}^{(j,I)} \cdot h_t \ j = 1,2 \ \text{(I-Channel)}$$
Equation 5-16

$$y_n^{(j,Q)} = \sum_{t=1}^{I} b_{n-t}^{(j,Q)} \cdot h_t \ j = 1,2 \text{ (Q-Channel)}$$
 Equation 5-17

Thus, each complex filtered output takes 2T (nos. of filter coefficients) real multiplications. The cycle counts required for each of these blocks as a function of the spreading factor (N), filter taps (T) and the frame length ( $N_0$ , number of chips per frame) are given in Table 7.

To get an estimate of the overall chip rate supported by this encoder implementation, let us plug in some standard parameter values. Let N = 128, T = 31 and  $N_0$  (number of chips per frame) = 38400. The resultant chip rate turns out to be approximately 13 Mcps (per user) per SC140 core assuming a clock rate of 300 MHz. However, commercially available DSP processor using the SC140 core, like the MOTOROLA MSC8102 has 4 cores, which would produce a chip rate of 52 Mcps (17\*4). For T=17, the chip rate supported would be 76 Mcps (19\*4).

| Functional Block       | Cycles per frame*                       |        | Chip Rate/Core (Mcps) |
|------------------------|-----------------------------------------|--------|-----------------------|
| STTD Encoder           | 2N <sub>0</sub> /N                      | 600    | 19200                 |
| Spreading & Scrambling | 4+6N <sub>0</sub> /N+3N <sub>0</sub> /2 | 59404  | 193                   |
| Filtering (RRC)        | $(11T+T^2)+N_0(5+T/2)$                  | 788502 | 14                    |

Table 7. Cycle Count of the Encoder Implementation in DSP.

\*Assuming N=128 and T=31.

## 5.4.2 STTD Decoder on DSP

The receiver structure was introduced in section 3.2.3. To start with the received analog signal is sampled at M times the chip rate (filter sampling rate is M times the chip rate) and the digitized values are stacked in the data memory of the processor. Since symbol level combining is selected for the MRC operation (proven to be more efficient in Section 5.3.3.1), we perform the de-spreading and de-scrambling first and then the MRC combining. Generation of the scrambling code (Section 5.3.1.2) and the channel estimation algorithm (Section 5.3.2) have already been introduced in the earlier sections.

Since section 5.3 deals with the implementation details of the major functional blocks in decoder design, this section is dedicated to the analysis of its overall performance.

The complexity of the filtering operation for each filtered output, which is at chip rate, is of the order No\*T. Here, No is the number of chip per frame and T is the number of taps in the RRC filter. After filtering, the filtered output needs to be down sampled (M times). To avoid additional computational overhead associated with computing all the filtered outputs and then down sampling, every  $M^{th}$  filtered output is computed.

Generation of one de-spread and de-scrambled symbol (per Rake) involves N complex multiplications and N-1 complex additions. Each complex multiplication involves four real multiplications and since the SC140 core supports four DALU operations in one cycle, it can be performed in one cycle. Since de-spreading is a summation-of-product operation, four MAC (multiplication and accumulation) instructions in one cycle suffice the task. Thus, a de-spreading operation with N =128 takes 128 execution cycles plus some additional initialization overhead. For each symbol, the number of such despreading operations is directly related to the number of paths in the channel, L. Thus it has an effective complexity of the order No\*L.

Likewise the complexity of the Rake combiner will be of the order (No\*L/N). The implementation details of the Rake MRC combining have been introduced in section 5.3.3.1. The combined effect of de-spreading and the Rake for each symbol output is given in Equation 5-18 and Equation 5-19.

$$O_{k,m}^{1} = \sum_{l=0}^{L-1} \left[ \sum_{n=0}^{N-1} r_{m}(n+l) * w_{k,m}^{H}(n) \right] * h_{m}^{1,H}(l) \qquad \text{(For Rake 1)} \qquad \text{Equation 5-18}$$
$$O_{k,m}^{2} = \sum_{l=0}^{L-1} \left[ \sum_{n=0}^{N-1} r_{m}^{H}(n+l) * w_{k,m}(n) \right] * h_{m}^{2}(l) \qquad \text{(For Rake 2)} \qquad \text{Equation 5-19}$$

l=0 n=0

Here,  $O_{k,m}^1$  represents the  $m^{th}$  bit of the  $k^{th}$  user in the first Rake combiner. In the equation above, w is the combined de-spreading and de-scrambling code and h is the estimated channel vector.

The outputs of the two Rake receivers are then combined to give the resultant symbols (Equation 5-20 and Equation 5-21).

$$Z_{k,m} = O_{k,m}^1 + O_{k,m+1}^2$$
,  $\forall, j = 0, 2, 4, ...$  (Even Symbols) Equation 5-20

$$Z_{k,m+1} = O_{k,m+1}^1 - O_{k,m}^2$$
,  $\forall, j = 0, 2, 4, ...$  (Odd Symbols) Equation 5-21

Here,  $Z_{k,m}$  represents the  $m^{th}$  symbol of the  $k^{th}$  user. The overall performance of the decoder is given in Table 8.

Chip Rate/Core (Mcps) **Functional Block** Cycle per frame RRC filter 8.8  $N_0(3+T)$ 1305600 De-spreading  $N_0/N(L(N+6)+2)$ 81000 142 Rake Combiner  $N_0/N(L+21/4)+2$ 5291 2177 Channel Estimator(DPCH) L(214+W/2) 432 26667 Channel Estimator(CPICH)  $L(W/2+11N_0/N)$ 6612 1742 Viterbi Decoder 25+S+S(TB/2)5399 2113

Table 8. Cycle Count of the Decoder Implementation in DSP.

\*Assuming N=128, L=2, W=4, T=31, K=9, and TB=40

Here, K is the constraint length of the Convolutional encoder (K=9 for W-CDMA), while S is the number of state in the Viterbi decoder (S= $2^{K-1}$ =256). TB is the trace back length of the Viterbi trellis and has a value of around 4 or 5 times the constraint length. To get an estimate of the bit rate supported by each of the functional blocks in the

decoder, we plug in some standard values for the different parameters in the above expression. Let N=128, T=31, TB=40, and N<sub>0</sub>=38400 chips per frame. The channel estimation results presented in Table 8 above are for both the DPCH and the CPICH case. For DPCH case each slot has two pilot symbols. The number of filter taps in MA filtering is assumed to be 4. The effect of spreading factor and the number of path in the multi-path channel model on the overall chip rate of the decoder implementation is presented in Table 9 below.

Table 9. Effect of Spreading Factor and Number of Path in Channel on Chip Rate.

| L N | 8      | 32     | 128    | 512    |
|-----|--------|--------|--------|--------|
| 3   | 4.1272 | 4.2832 | 4.3241 | 4.3344 |
| 7   | 3.7401 | 4.0033 | 4.0749 | 4.0933 |
| 11  | 3.4194 | 3.7577 | 3.8529 | 3.8775 |

\* T=63, W=16 (DPCH channel), TB =40.



Figure 32. Comparison of Fixed-Point and Floating-Point Performance.

Thus, it can be observed that the overall supported chip rate by the decoder implementation is around 4 Mcps per SC140 core.

Figure 32 presents the comparative performance analysis of the SC140 fixed point DSP implementation as against the floating point software simulation results (using 64-bit precision). As it can be observed, there is no significant performance degradation using the fixed point DSP core.

# **6 CONCLUSIONS AND FUTURE WORK**

#### 6.1 Conclusion

The performance of W-CDMA 3Gpp standard using STTD as open loops transmit diversity scheme was thoroughly studied and analyzed. The effect of various parameters like the modulation scheme, the extend of loading, the channel estimation algorithm, the presence of channel coding and the presence of transmits diversity, on the overall system performance and capacity was simulated and analyzed.

The implementation issues of such a decoder on a fixed point DSP processor were analyzed and its performance evaluated in terms of bit error rate and supported chip rate. The decoder implementation supports a high chip rate of around 4 Mcps per core (within the requirements of the specifications in the standard). Simulation results also show that the performance degradation using a fixed-point processor is not significant as compared to the software simulation results. Thus, we can conclusively state that such general-purpose DSP processors (SC140 core) are ideally suited for high data rate wireless applications like the W-CDMA 3Gpp standard.

## 6.2 Future Work

Some factors were not taken into consideration in the implementation. The receiver was designed assuming perfect synchronization with the transmitter. The Rake receiver also requires a path searcher algorithm, which determines the number of fingers in the Rake along with the delay profile in each finger. These two above-mentioned functional units were not included in this implementation. Thus, future work would be to analyze the performance of the DSP implementation with synchronization and a path searcher algorithm taken into consideration. Under these considerations, there will be an additional cycle overhead as well as some performance degradation. However these

algorithms operate at symbol rate and thus, will not have a significant effect on the overall cycle count.

Moreover the performance of other optimum channel estimation algorithm can also be analyzed. Finally, if this algorithm were to be implemented on other similar processors (like Texas Instruments' C6x, etc.) it would give a comparative analysis. It will also help identify the best architecture for this implementation.

## REFERENCES

- R. Prasad, "An overview of CDMA Evolution towards wideband CDMA," *IEEE Communications Survey*, vol. 1, no. 1, Fourth Quarters 1998.
- [2] A. Gatherer, T. Stetzler, M. McMahan, E. Auslander, "DSP based architectures for mobile communications: past, present and future," *IEEE Communications Magazine*, Jan. 2000.
- [3] P. D'Arcy and S. Beach, "Star\*Core SC140- A New DSP Architecture for Portable Devices", Star\*Core, Motorola, Atlanta, GA, Sept 1999.
- [4] A. Aziz, "Channel estimation for a WCDMA Rake receiver", Application Note, AN2253/D, Rev 1, Motorola, Atlanta, GA, Sept. 2002.
- [5] K. C. Gan, "Maximum ratio combining for WCDMA Rake receiver", Application Note, AN2251/D, Rev 1, Motorola, Atlanta, GA, Feb. 2002.
- [6] K. C. Gan, "Path searcher for a WCDMA Rake receiver", Application Note, AN2252/D, Rev 1, Motorola, Atlanta, GA, Mar. 2002.
- [7] J. Meel, "Spread Spectrum (SS) An Introduction", I DE NAYER institute, Belgium, Sint Katelijne Waver, 1999.
- [8] T. S. Rappaport, Wireless Communications Principles and Practice, 2<sup>nd</sup> ed., Upper Sadle River, NJ, Prentice Hall, 2002.
- [9] S. M. Alamouti, "A simple transmit diversity technique for wireless communications", *IEEE Journal on Selected Areas in Communications*, vol. 16, no. 8, pp.1451-1458, Oct. 1998.

- [10] V. Tarokh, N. Seshadri and A. R. Calderbank, "Space-time codes for high data rate wireless communication: performance criterion and code construction", *IEEE Transactions on Information Theory*, vol.44, no.2, pp.744-765, Mar. 1998.
- [11] A. G. Dabak, S. Hosur, T. Schmidl and C. Sengupta, "A comparison of the open loop transmit diversity schemes for third generation wireless systems", DSPS R&D Center, Texas Instruments, Dallas, TX, 2000.
- [12] M. Lenardi, A. Medles and D. T. M. Slock, "Comparison of downlink transmit diversity schemes for Rake and SINR maximizing receiver", *IEEE International Conference on Communications*, Mobile Communications Department- Institut Eurecom, Helsinki, Finland, Jun. 2001.
- [13] Dabak, S. Hosur, R. Negi, "Space time block coded transmit antenna diversity schemes for WCDMA", DSPS R&D Center, Texas Instruments, Dallas, TX, 1999.
- [14] J. G. Proakis, *Digital Communication*, 3<sup>rd</sup> ed., New York, McGraw-Hill, 1995.
- [15]G. Fock, J. Baltersee, P. S. Rittich and H. Meyr, "Exploring the UMTS WCDMA-Receiver Design Space Using a Semianalytical Approach", *IEEE Proceedings of the Global Telecommunications Conference GLOBECOM*, Rio de Janeiro, Brasil, Dec. 1999.
- [16]Z. Rozenshein, D. Halahmi, A. Mordoh, and Y. Ronen, "Speed and Code-Size Trade-off with the StarCore SC140", Application Note, AN1838/D, Rev 0, Motorola, Atlanta, GA, Feb. 2000.
- [17] SC140 DSP Core Reference Manual (MNSC 140CORE/D), Motorola, Atlanta, GA, 2001.
- [18] *SC100 Assembly Language Tools User's Manual* (MNSC100ALT/D), Motorola, Atlanta, GA, 2001.

- [19] SC100 C Compiler User's Manual (MNSC100CC/D), Motorola, Atlanta, GA, 2001.
- [20] SC100 Simulator Reference Manual (MNSC100CC/D), Motorola, Atlanta, GA, 2001.
- [21] Technical Specification Group Radio Access Network; "Spreading and Modulation (FDD), 3<sup>rd</sup> Generation Partnership Project"; 3G TS 25.213 V3.4.0, Dec. 2000.
- [22] P. Schultz-Rittich, J. Baltersee and G. Fock, "Channel estimation for DS-CDMA with Transmit Diversity over frequency selective channels", *Proceeding in Aachen Symposium on Signal Theory*, Aachen, Germany, 2001.
- [23] Technical Specification Group Radio Access Network, "Physical channels and mapping of transport channels onto physical channels (FDD)", 3<sup>rd</sup> Generation Partnership Project, 3G TS 25.211 V3.12.0, Sep. 2002.
- [24] K. A. Qaraqe and S. Roe, "Channel Estimation Algorithm for Third Generation W-CDMA Communication Systems", Cisco Systems and Tality Corporation, San Jose, CA, 2000.
- [25] Technical Specification Group Radio Access Network, "Multiplexing and channel coding (FDD)", 3<sup>rd</sup> Generation Partnership Project, 3G TS 25.212 V3.2.0, Mar. 2000.
- [26] Application Note, "How to implement a Viterbi Decoder on the StarCore SC140", ANSC140VIT/D, Alpha Release, Motorola and Lucent Technologies, Jul. 2000.

# VITA

| Name      | Kaushik Ghosh                                     |
|-----------|---------------------------------------------------|
| Address   | Ghosh & Co. Hill Cart Road                        |
|           | Siliguri, West Bengal, India-734401               |
| Education | Master of Science (Aug 2000 - Present)            |
|           | Major - Electrical Engineering                    |
|           | Texas A&M University, College Station, TX-77843   |
|           | GPA - 4.0/4.0                                     |
|           |                                                   |
|           | Bachelor of Engineering (Aug 1995 to Jun 2000)    |
|           | Major - Electrical and Electronics Engineering.   |
|           | Birla Institute of Technology and Science (BITS), |
|           | Pilani, Rajasthan, India – 333031.                |
|           | GPA - 8.36/10                                     |
|           |                                                   |