The primary goal of this project was educational: to demonstrate Software Defined Radio based prototyping using Visual C++ Express and Code Composer Studio. More specifically an IEEE802.11a Phy [1] compliant baseband processor was written in C++ and a radio link demonstrated "live" using a standard PCand the DSK6713 kit from Spectrum Digital [2] for baseband processing at the receiver and transmitter side respectively. To reduce costs without loss of educational value (the algorithms remains the same), the bandwidth was scaled down from 20MHz to 6 kHz to be able to utilize cheap narrowband COTS RF frontends operating at an intermediate frequency of only 12 kHz at the transmitter and receiver sides. This was easily achieved by just reducing the OFDM symbol rate by a suitable factor. The development process is described in detail, emphasizing development tricks to facilitate debugging of this kind of complex baseband processing. For educational purposes some other simpler waveforms was implemented as well.
Introduction
For the last two decades or so, SDR (Software Defined Radio) has been subject to tremendous research 1 . Fueled by the enormous semiconductor advancement, SDR technology is today at the core of established techniques like Cognitive Radio and DSA (Dynamic Spectrum Access) for more effective use of limited spectrum resources. From a stronghold within military applications, SDR technology is migrating into other application domains as well. There are many definitions of SDR, common is that the waveform is completely defined in software. A typical SDR architecture is depicted in Figure 1 . Note that the up/down mixing can in general be either direct conversion or to/from some suitable intermediate frequency.
In this project we wanted to focus on the educational part of SDR prototyping and basic wireless communication concepts. Therefore, focus has been on low cost and writing the baseband processing software from scratch. Unless otherwise stated, we did not emphasize optimizing the code for reduced footprint. In a wideband and/or power constrained context this must of course be focused on. In summary the prime motivating factors were:  Gain experience with implementing and debugging digital signal processing software using the free Visual C++ Express [3] .  Studying carrier and symbol timing recovery techniques applicable to IEEE802.11a/g.  Getting a proper understanding of OFDM (Orthogonal Frequency Division Multiplexing) as well as improving skills with digital modulation and demodulation, filtering and pulseshaping.
There is a myriad of existing SDR development platforms out there that vary in cost and performance, from high-bandwidth systems requiring expensive development software to tiny less flexible systems with modest capabilities. The systems may be classified along different dimensions, e.g. cost, bandwidth, processing capabilities, type of RF frontend, development software, flexibility. A detailed overview is considered beyond the scope of this paper. An interesting taxonomy may be found in [5] . Some examples of available systems for SDR prototyping in order of decreasing cost: is supported by this platform.  GNU radio [9] is often used together with the USRP device from Ettus Research [10] . This is a popular platform providing the base for several other systems as well [5] . However, either existing systems didn't fit our budget or we found the flexibility to be insufficient. In addition, taking the educational value into account, we set out defining and developing our own. We may highlight the characteristics for our platforms follows in the order of decreasing priority:  Low cost (uses relatively cheap hardware and mainly free software).  Developed for educational purposes.  Flexible, developed entirely in C++.  Any RF frontend with an IF (Intermediate Frequency) in the vicinity of 12 kHz may be used. The C++ language was chosen as the implementation language because it is "always" used within the digital signal processing community for programming DSPs (Digital Signal Processors). Although GNU radio uses a Phyton based programming interface, the core signal processing blocks are written in C++. Furthermore, every programmer has some C/C++ knowledge.
Architecture
Our hardware setup for the SDR platforms is shown in Figure 2 . We utilize two PCs together with relatively cheap RF frontend hardware. On the transmitter side we implemented the baseband processor on a DSP using the DSK6713 from [2] connected to PC-A. The reason for this was twofold: 1) to gain experience in programming a DSP, and 2) to be able to compare TIs CCS (Code Composer Studio) IDE 2 with Visual C++ Express for developing signal processing software. The DSK6713 has a CODEC that we connected to a mixer from [11] up-converting (without image rejection) our 12 kHz IF signal to 10.724 MHz. This mixer was chosen due to its excellent linearity; note that the crest factor 3 for an IEEE802.11a signal is approx. 11 dB. We had a WRG313 receiver [12] from an earlier project and decided to reuse this as the RF frontend at the receiver side. This receiver has its own DSP for demodulation, however the DSP was bypassed and the IF samples were transferred directly to PC-B for demodulation on t e PC itself. Note that Winh Radio [12] provides an open API that facilitated complete control over the radio from our software developed on PC-B using Visual C++ Express.
To be able to prototype and run the complete IEEE802.11a baseband processing software using this radio HW setup, we had to scale down the 20MHz bandwidth in the standard [ Figure 3 .
We will not go into the OFDM fundamentals here, see e.g. [13] . In summary, at the top we have the transmitter chain consisting of inner (convolutional) coder, block interleaving (frequency domain spreading of adjacent bits) and sub-carrier mapping, IFFT transforming the complex OFDM symbol to time-domain samples, guardinterval (cyclic-prefix) insertion, pulse shaping and finally up-conversion to RF. In our case, all these blocks were implemented in C++ using CCS and the executable then downloaded to the DSK6713 board. The up-conversion to "RF" was done using the mixer mentioned above, converting the 12 kHz IF signal output from the onboard DSK6713 CODEC to a 10.724MHz signal radiated from a random wire a few feet long. See Figure 2 .
On the receiver side we have down-conversion from RF to a complex baseband signal. In general this may be done either directly or via one or more intermediate frequencies. The choice is left to the implementer. Each method has its strength and weaknesses and the relatively complex trade-offs here are beyond the scope of this paper, see e.g. [14] . In our case we used the WRG313e receiver for converting the received 10.724 MHz signal down to a 12 kHz IF signal at 48Ksamples/s which was then transferred to PC-B via the USB cable.
Please observe that in our low-cost (narrowband) setup, common issues like I/Q mismatch and DC offset are non-existent because the actual I/Q merge/split is done digitally (in the software) with only real IF signals involved. In a system operating at the rated speed and RF frequencies [1], the (broadband) RF frontend will typically be more similar to that in Figure 1 and these issues must of course be dealt with to adequately fulfill required radio performance parameters 4 . Following the complex down-conversion is channel filtering (ensuring proper selectivity) and down-sampling to reduce the computational burden in the downstream signal processing blocks. This is not shown in Figure 3 . Then come a vital block, namely the carrier and timing recovery engine (synchronizer). The task of this block is to estimate the carrier frequency/phase offset and symbol clock from the incoming signal. In a communication system it is of vital importance that this block is carefully designed as its performance will directly affect the packet error rate for a given demodulator SNR. Assuming that carrier, symbol timing and frame synchronization 5 have been performed, the guard-interval is removed and FFT is performed to enable individual subcarrier demodulation. After subcarrier demodulation and equalization 6 , the raw bits are de-interleaved and passed on to a Viterbi decoder. The bits output from this decoder are then fed to the next protocol level for processing.
The Development Phase
Being faced with such a complex development task, we started out with modeling the whole system in Octave [15] . The role of this system modeling can be summarized as follows:  First, to get a proper understanding of the OFDMmodulation principles, experimenting with different system parameters, before implementing. See also the tutorial paper [13] .  Algorithm development: although the key blocks in the processing chain are well defined (Figures 17-12 in [1]), it is left to the designer to choose and implement algorithms for carrier and timing recovery [16, 17] . This is a research field on its own and beyond the scope of this paper. However, we will present our implementation of a proper 7 carrier and timing synchronizer in detail below.  To have a cycle-accurate reference model during the C++ implementation proved extremely useful throughout the development phase. We started with generating the complex baseband samples constituting the training symbols, see We then went on with modeling all the blocks in Figure 3 except for the frontends (the connection in the model between Tx and Rx chains was the 12 kHz IF signal). The most complex blocks to model were the synchronizer and the Viterbi decoder. Since the model was going to be used as an implementation reference for the successive C/C++ implementation, we modeled these blocks in an elaborate way (cycle accurate) to make this transition as smooth as possible. It should however be mentioned that we skipped the "normal" floating-point to fixed-point model refinement during modeling 8 . For more details about the synchronizer, see appendix A.
Having the model up and running, we were set for the Tx software development on the DSK6713. Before going nto further details, we summarize the hardware and i 5 Frame synchronization is the task of determining the relative bit position in the received data-packet such that we know where the header and payload starts. This task is easily done by correlating with a known pilot symbol. 6 Not shown in Figure 2 is a necessary equalizer block which equalizes the channel effect on the OFDM symbol. The 1-tap equalizer coefficients are easily computed based on the long training symbol. They are successively updated based on the pilot symbols. Note that the subcarrier phase coherency is based on the complex rotation done by the equalizer. software used in Table 2 .
Rather than developing the Tx software from scratch in CCS, we decided to implement this software first in VC++ on PC-B on the receiver side. See Figure 5 . The reason for this was mainly twofold:  This enabled us to dump samples to file from any "probe" point in the Tx software and successively verify the samples directly against our model by reading the samples file into Octave. In this way we were able to verify each development "step" in a direct way that wouldn't be possible on the DSK6713.  During the successive Rx software development we utilized the Tx software 9 set up in software-loopback for verification. This proved very useful indeed. Following the development of the Tx part of the software in VC++, we ported this software into a project denoted "OFDM" in CCS on PC-A. This was pretty straight forward. We verified this step by compiling and debugging the software on the DSK, comparing the generated baseband samples against Annex G in [1]. The sample rate was set to 48 kHz, thus an oversampling factor of 8 (the elementary sample rate is 6 kHz, see Table  1 ). The complex baseband samples were upsampled to 48 kHz using a FIR interpolation filter and then upconverted to real 12 kHz IF samples by multiplying the samples with π 2 j n e and keeping the real part. The samples were then scaled properly before being output to the CODEC on the DSK6713. A screen shot of the CCS GUI is shown in Figure 6 . Furthermore, a scope picture of the final 10.724 MHz signal transmitted on-the-air is shown in Figure 7 .
Before going into details about the Rx development, it should be emphasized that the author prior to this project had written a small application for interfacing to the WRG313 receiver using the G313API SDK. This application contains demodulators for simpler modes (AM, FM, SSB, RTTY, BPSK) and has a GUI based on the Qt SDK [18] and sound output 10 based on the PortAudio API [19] . Thus it was not necessary to start from scratch establishing the various software "infrastructure" (GUI, lower level software interface to APIs) and it was thus possible to concentrate solely on the implementation of the core DSP algorithms in this project.
The class hierarchy for the VC++ "modem" project is depicted in Figure 5 . The Tx part consists of the "Wavesynth" class and parts of "FEC" and "OFDM". As mentioned above, this part of the software was ported to CCS on PC-A. The part of the software which is the scope of this work was partitioned into the following classes:  "FEC": encoding and decoding (Viterbi) according to the standard [1].  "OFDM": PLCP preamble generation (short and long training symbols), FFT/IFFT, interleaving, packet assembly, windowing.  "OFDMRX": implements Figure 8 , in addition to subcarrier demodulation, equalization, de-interleaving, decoding. We started the Rx software development by implementing the "synchronizer" (class "OFDMSYNC") which has been described in detail in Appendix A. With the Tx part of the software now in place in the VC++ project, it was very convenient during debug to loopback Tx to the Rx part of the software under development. Thus the Tx part in loopback acted effectively as our Rx testbench. Alternatively we could have read in reference samples from the Octave model, but as mentioned earlier it was beneficial to develop the Tx software itself within the same VC++ project on PC-B.
To increase confidence, we implemented incrementally in small steps. Thus, the "testbench" for each step consisted of:  Stimuli generation by the previous block(s) in the Rx processing chain.  Verifying the response by dumping the samples to file and reading them into our Octave model for verification.
Only after gaining sufficient confidence in the current implementation, we did move on to the next step in the processing chain.
Implementing OFDMRX was relatively straightforward due to the "processing blocks" being so well defined by the standard [1]. However when implementing the Viterbi decoder (belongs to class "FEC"), the tutorial [20] was of great help. Much of the total debugging time was spent on this decoder. A screenshot of VC++ with the "modem" project during debugging of the Viterbi decoder is shown in Figure 9 . To be able to effectively use the Octave model as a reference, we used the same LFSR (Linear Feedback Shift Register) for payload generation in the Tx software and the Octave model.
Results
A GUI screenshot of our application running on PC-B is shown in Figure 10 . The main part of the GUI consists of a real time spectrum display of the 12 kHz IF as received from the WRG313 radio. At the bottom is a transcript window logging packet statistics. To ensure adequate SNR we located the transmitter and receiver as shown in Figure 2 within a few meters of each other.
For reference, we started testing with FEC disabled and achieved a PER (Packet Error Rate) of approx. 30%. With FEC enabled the PER dropped to well below 1%. The packet length was fixed at 10 OFDM symbols (including the SIGNAL field, excluding the PLCP preamble).
For such a narrow bandwidth, the processing requirements were modest. On the Rx side, the "modem" project's CPU usage on PC-B was barely noticeable. On the Tx side, we were well within limits set by the sample rate (48 kHz). However, we struggled a bit with the memory footprint; some minor tweaking was necessary to fit the executable within the 264 kB L1/L2 memory of TMS320C6713.
A direct comparison of VC++ Express and TIs CCS with respect to DSP software development may be difficult based on one project only. However, here are a couple of observations based on our setup. CCS has better data analysis (graph plotting frequency/time domain) possibilities, but this was partly outweighed in our project through the use of Octave together with VC++. Implementing the DSP software this way, using VC++ in tandem with Octave turned out to be surprisingly effective in this project. Another observation is that the "Express" version of VC++ has no profiling support. In addition to extensive profiling support, CCS has other useful analysis capabilities facilitating real time embedded software development.
We found that it was quite convenient to partition the Rx software as discussed earlier: during debug it was e.g. possible to "watch" the whole OFDMSYNC object, thus tracking "key" sync parameters during packet receive.
Approximate software development times are depicted in Table 3 .
The development times listed are based on the number of SVN commits with an average of 4 man hours per commit. The development time of the existing software "infrastructure" (the GUI, etc) is not included here.
Conclusions and Further Work
In this work we have demonstrated the high level modeling and subsequent SDR implementation of an IEEE802.11a Phy compliant baseband processor. The baseband processor was implemented in C++ using MS' VC++ Express and TIs CCS and executed on a standard PC as well as on the DSK6713 board.
To be able to demonstrate functionality utilizing relatively cheap RF frontends, the bandwidth was scaled down to 6 kHz without loss of educational value. We believe we could have put together a similar system running in the 2.4/5 GHz band in a shorter time frame using commercially available prototyping platforms with available reference designs and more sophisticated development tools. But the cost would have been on a completely different scale. Our focus has been on low cost and educational value using only free tools as far as possible.
At a later stage it would have been interesting to investigate the possibility of porting some of the developed C++ code to fit one of the available USRP RF frontends from Ettus [10] . Other platforms don't fit our low cost budget.
Appendix: The Synchronization Engine
The location of the synchronizer is shown in more detail in Figure 8 . The figure shows the first parts of the receiver chain. Note that the synchronizer is running at the (elementary) sample rate of 6 kHz. The block labeled "Foffset estimator" is providing a coarse estimate of the carrier frequency offset based on the short training symbols, see Figure 4 . It must be run prior to timing recovery, otherwise the correlator based timing recovery algorithm will not provide qualifying correlation peaks. This will become clear below 11 . The coarse frequency offset estimator we chose here is based on Phase Increment Estimation [16] Figure 11 . The "sync controller" state machine. Some details are not shown. 11 Although powerful algorithms for joint frequency offset and timing recovery exist (see e.g. [16] ), we chose to split these tasks here.
where s
