Abstract
INTRODUCTION
In recent years, there has been explosive growth in the mobile communications industry. The increasing use of wireless voice and data applications demonstrates society's growing reliance on wireless networks. The challenge to deliver higher transmission rates is hindered by, among others, the effects of multipath fading.
In addition, the increasing importance of data applications to access essential information anywhere will spark the necessity to build user devices that work across multiple air interface standards and possibly multiple wireless platiorms.
Orthogonal Frequency Division Multiplexing (OFDM) has fast become the standard modulation format for high data rate applications because of its robustness to multipath fading [I] . It is presently used in European digital broadcasting systems (DVB-T), as well as the latest WLAN standards and possibly the 802.16 standard for fixed wireless. OFDM takes a high-rate bit stream and divides it into multiple lowerrate bit streams to be transmitted in parallel. As a result, the lower-rate bit streams are more resistant to multipath effects. In addition the bit streams are transmitted using sub-carriers that are orthogonal to each other. This allows the sub-carriers to be squeezed closer together which makes very efficient use of the available spectrum. A block diagram of a typical OFDM system is shown in Figure 1 .
Figure 1: OFDM Block diagram
The concept of a software-defined radio (SDR) consists of a simple analog system and a complex digital system [2]. The goal is to limit the analog iimctions to those that can only be performed with analog components such as the antenna, RF combination, and amplification. The idea is to use generic hardware in both the transmitter and receiver and allow the executing software to determine radio functions. The digital sub-system should be based on a platform that is highly reconfigurable and flexible in order to perform baseband processing that may easily adapt to different system parameters. SDRs present a great advantage because they allow for cost effective designs with minimal hardware changes.
Signal processing technologies that are available for SDR baseband processing range from application specific integrated circuit (ASKS), reconfigurable hardware such as field programmable gate arrays (FPGAs) to reprogrammable platforms such as digital signal processors (DSPs) and general-purpose processors (GPPs).
Given that ASICs are fixed designs that cannot be altered to perform anything other than the originally intended application and that GPP designs are not specialized and therefore not intended to run any application extremely well, these two technologies are not generally considered as candidates for an SDR platform. FPGAs offer great processing performance in a non-fixed design but are not dynamically reconfigurable. DSPs make an interesting solution because they satisfy one of the most fundamental criteria of SDRs which is easy reconfigurablity and efficient programmability [3] .
SOFTWARE IMPLEMENTATION
The following is a description of an OFDM, baseband system implemented using a Texas Instruments (TI) DSP platform. A brief introduction to the software suite and hardware used will be given. Later, the software structure will be described while detailing the modular code blocks that implement the system.
The software suite contains coding, debugging, compiling and profiling features all under one application. The compiler accepts C source code and produces assembly language. To reduce overall execution time, the project can be built in one of four optimization modes. This alleviates the designer fiom the tedious and error prone task of register assignment or focusing on pipeline structure [4] . Once the code is downloaded to the board and run, profiling can be used to target excessive latencies and improve code execution.
The board includes a C6201 fixed-point processor. It is capable of 1600MIPS at 200MHz using a 5ns cycle time, however only a maximum clock rate of 160Mhz is possible with the EVM. It is able to execute up to eight 32-bit instructions every cycle.
On-chip memory consists of 64 ! d3 of IPRAM used as program memory to store code and 64 kB of IDRAM used as data memory to store variables. On board memory consists of 256 ks of SBSRAM and 8 MB of
The OFDM system is constructed with modularity such that each module in the system represents a particular task. The module consists of one or more functions to implement it. The entire system was implemented in s o h a r e using the C language. The build options were chosen to compile for speed at the expense of code size. The modules of the system are executed sequentially such that the data is completely Error control coding was performed by programming a half-rate convolutional encoder. The module takes in as input, a 128-bit frame from the generated data. The hits are stepped, one at a time, into an array shft-reg[ 1, which represents a shift register of constraint length K=3. Note that a constraint length of K=3 was chosen because of limitations on throughput performance and memory size. However estimations for a constraint length of K=7 is given in section 4.
In Figure 3 , the convolutional encoder structure is
shown. An interleaver was created using the block matrix method by declaring a two-dimensional, square array interlv [n,nJ, where n, is the square root of the size of coded-duta [ J (i.e. n = 16). Interleaving was performed by feeding 16 bits at a time into each column of interlv [n,nJ. Once the array is filled the data is read out one row at a time.
The QAM mapping module maps the interleaved sequence to complex QAM symbols using a look-uptable (LUT). A Gray encoded, 16-QAM constellation was chosen for this system. The method for efficiently mapping the bits to QAM symbols is to separate the incoming bit stream into 4-bit groups and calculating the equivalent decimal value. A pointer based on the decimal value is generated to point to the location in the LUT where the corresponding in-phase and quadrature values are stored. The in-phase and quadrature values have been properly scaled upward to maximize precision while still avoiding overflow in the following IFFT and FFT modules.
The Ultimately the output values are scaled by the size of the IFFT (i.e. nx = 64) using the scale function.
I Figure 4: Implementation of IFFTEFT
The FFT module is the first module executed on the receiver side of the OFDM block diagram. It reverses the IFFT operation performed just before transmission. It is essentially the same as the IFFT module as almost all the functions in the IFFT are called upon to perform the FFT. The difference being that the variable inverse is set to "0" before the call to the coeff function in order to generate the correct twiddle factors. In addition, scaling is not needed; therefore the scale function is not called.
Two QAM demapping modules were tested. One was programmed for hard decision and the other for soft decision. Either of the two functions takes in as input the complex QAM values recovered fiom the FFT module. However, because of impairments caused by the simulated Gaussian channel, the amplitudes of the QAM data have been affected and do not equate to the discrete values derived from the QAM mapping module. Therefore the original 16-QAM constellation is divided into decision regions. A received QAM value is evaluated to see what region it falls into and is then assigned the 4-bit group associated with that region. For soft decision to be implemented, each region is further divided into soft regions creating a layout as demonstrated in Figure 5 . The algorithm evaluates which hard region the received value lies and then narrows it further to the soft region. The 4-bit group which now has each element composed of 3-hit soft values is extracted and used as part of the retrieved bit stream.
Figure 5: Example of soft decision region assignment
A deinterleaver was created by declaring a twodimensional, square array deinterlv [n,nJ which is the same size as interlv [n,n]. Deinterleaving was performed by feeding 16 bits at a time into each row of deinterlv [n,n]. Once the array is filled the data is read out one column at a time.
The decoding module performs Viterbi decoding on the soft decision values fed h m the de-interleaver. Three LUTs, input, next-state, and output are created using two-dimensional arrays that contain information representing the convolutional encoder as a state diagram, The viterbi-decoder function operates on two encoded hits at a time. A trellis diagram, represented by the state-history array, is built by comparing the received sequence with all possible combinations using the next-state and output LUTs (see Figure 6 ). Only comparison results with the lowest branch metric are kept. The accumulated branch error values for each row are stored in the accum-eri-metric array. Once state-history has grown to five times the constraint length (i.e. S'K) the traceback to retrieve the first decoded bit begins. The tracehack portion of the algorithm chooses the row with the lowest accumulated error as the starting point on the state-history array. The most likely path is simultaneously copied to the state-sequence array and the last two values are used as pointers on the input LUT to find the first decoded bit. The state-history array is also programmed as a cyclic register to speed uu the algorithm.
State Sequence

Figure 6 Implementation of decoder algorithm
TESTING
The system was tested in a simulated AWGN channel in order to verify its correctness and accuracy. Three extra modules were created to test the system: bitgen, addnoise, eri-chk. Bitgen is responsible for generating random bits for input to the first block of the OFDM system. Addnoise simulates an AWGN by adding Gaussian noise to the transmitted signal. Er-chk keeps track of errors occurring in the system. This data is then used to plot the bit-error rate curves of Figure 7 The rightmost curves of the BER plot of Figure 7 shows a comparison between a floating-point 16-QAM system (circles) and the OFDM system tested without forward error correction (crosses). The curves are quite similar except for a 0.3 dB S N R loss occurring at BER 1x10~'. This deviation can be accounted for by the precision loss in fixed-point calculations. However the plot still demonstrates that the proper use of scaling gives adequate amount of system accuracy.
The system was also tested with the addition of a convolutional encoder and both hard and soft-decision Viterbi decoding. The bard decision plot (asterisks) shows a 4.9 dB gain over the floating-point 16-QAM system. The leftmost curve represents the BER of the system working with soft decision regions (diamonds). We observe an SNR gain of approximately 1.8 dB over the hard decision plot. 
OPTIMIZATION AND PROFILE
RESULTS
In addition to the efficient coding techniques described above, build options in the compiler as well as memory management was used to optimize the design. Loop unrolling and software pipelining were employed to speed up execution of the code. However with this build option, the code size was too large to fit in the on-chip IPRAM. Memory management was used to cany over any less frequently used library objects to the external SBSRAM. This allows for critical instructions to remain on-chip for quicker access by the CPU.
The profiler, which measures the cycle count of each function, was initially used to investigate any performance bottlenecks. This strategy was used to remove any latencies such as unnecessary calls to Clibrary functions. Initial profile results for the OFDM system are shown without forward error correction (FEC) in Table 1 . An increase from 330 kh/s to 2.15 Mbis can be achieved utilizing the build options set for speed. A further increase to 2.49 Mb/s can be achieved if the FFT module was programmed using a Decimation In Time (DIT) algorithm, which removes the need for bit reversal [6] .
When forward error correction is introduced the maximum bit rate drops to 231 kb/s as shown in Table  2 . This is largely due to the addition of the Viterbi decoder, which takes up by far the most cycles.
However when using a faster processor running at 1.1 Ghz, we can safely estimate a bit rate increase to 1.59 Mbis. This is an underestimate given that the 1.1Ghz chip is based on the C64 architecture, which is designed with wider data paths, and larger registers. Therefore, there would he a ratio increase over the C62 architecture for the same algorithm execution. The C64 DSP chip also has the option of using a Viterbi Coprocessor to perform the task of Viterbi decoding. As such the Viterhi decoding algorithm can be performed in parallel with other algorithms being performed by the CPU core. This frees up resources within the CPU core and speeds up the overall code execution. The cycle count can be reduced fiom 64387 cycles to 12924 as shown in Table 2 . The cycle count can be estimated using the following benchmark calculation: cycle count = ((72 + 2)/6)* (F+K-l) . where K is a constraint length of 7 and F is a frame size of 256 encoded hits.. The overall bit rate using the VCP can be estimated to be 4.94 Mb/s.
The C62 and C64 architectures are designed for providing high performance, and not for power efficiency. As such, they may be too power consuming for battery operated mobile devices and only suitable for wireless infrastructure applications such as base stations where power and size are not generally an impeding factor.
6.CONCLUSION
A real-time OFDM system was implemented based on a DSP platform. A DSP platform was used because it is highly reconfigurable and flexible which satisfies one major criteria for SDRs. Profile results were given with estimations using higher performance processors which are able to handle more and more computationally intensive tasks. This is catalyzed with the potential of including highly programmable coprocessor to aid the CPU core. Given the above, it is believed that DSPs make a strong candidate for inclusion in SDR designs within wireless infrastructure applications.
