Wireless personal area networks (WPANs) have gained interest in the last few years, and several air interfaces have been proposed to cover WPAN applications. A multicarrier spread spectrum (MC-SS) air interface specified to achieve 130 Mbps in typical WPAN channels is presented in this paper. It operates in the 5.2 GHz ISM band and achieves a spectral efficiency of 3.25 b · s −1 · Hz −1 . Besides the robustness of the MC-SS approach, this air interface yields to reasonable implementation complexity. This paper focuses on the hardware design and prototype of this MC-SS air interface. The prototype includes RF, baseband, and IEEE802.15.3 compliant medium access control (MAC) features. Implementation aspects are carefully analyzed for each part of the prototype, and key hardware design issues and solutions are presented. Hardware complexity and implementation loss are compared to theoretical expectations, as well as flexibility is discussed. Measurement results are provided for a real condition of operations.
INTRODUCTION
Personal Networks (PNs) are a recent paradigm that enable an individual to experience connectivity with his devices with unrestricted geographic span [1] . This network concept is leveraged by the availability of reliable wireless links between the devices in the user vicinity. The analysis of user's needs carried out in [2] has shown two typical classes of applications that can be differentiated by the data rate range; they require low data rate (LDR), lower than 250 kbps and high data rate (HDR), up to 100 Mbps. Other studies have shown that due to specific channel conditions and more specifically its dynamicity [3, 4] , a specific attention must be taken in the design of the physical layer (PHY) of these interfaces. New air interfaces have been specified for short-range, very high data rate applications, under the framework of the IEEE802.15.3 standard. However, a consensus could not be reached on a single solution among the systems that were proposed. One of the most famous systems is probably WiMedia which targets 480 Mbps using multiband orthogonal frequency-division multiplexing (OFDM) [5] . New trends in regulation (e.g., [6] ) indicate that the future worldwide band for ultra-wideband operation will move to higher frequency though leading to more power consuming and costly implementation. Besides, most of the applications foreseen require either lower data rate or far higher like wireless highdefinition multimedia interface (HDMI).
The air interface presented in this paper targets applications up to 130 Mbps at reasonable implementation cost and power consumption. It is a mixture of multicarrier OFDMbased technique together with spreading which was initially proposed in [7] . This approach provides many degrees of diversity over the intrinsic advantages of OFDM systems, namely, a potentially low-complexity equalizer and robustness against frequency-selective channels (e.g., [8] ) that is strengthened by code spreading. The use of time division multiplex access (TDMA) prevents the system from classical intercode interference experienced in code division multiple access (CDMA) approaches when many asynchronous users are sharing the same band. Moreover, this air interface exhibits a very high degree of flexibility from which link adaptation techniques can benefit [9] . This paper focuses on the design of a hardware platform for up to 130 Mbps operating in the 5.2 GHz ISM band. Its MAC layer is compliant with the IEEE802.15.3 standard [10] . A real-time implementation of the PHY layer runs on an FPGA and a wideband radio front-end providing over the air interface. The paper is divided into 6 sections. In Section 2, a short description of the air interface and the related parameters is given. In Section 3, some aspects of the MAC protocol and its implementation are detailed. In Section 4, the baseband processing hardware design is described and complexity issues are discussed. In Section 5, the radio front-end selection is discussed. In Section 6, the global hardware architecture and platform are described. Finally, Section 7 provides measurement results obtained with the prototype.
OVERVIEW OF THE SELECTED MC-SS SYSTEM
The MC-SS air interface detailed in this paper, referred to as the MAGNET HDR (M-HDR), has been optimized for wireless personal area networks (WPANs) in the PN context. Unlike cellular and wireless LAN systems, peer-to-peer communication (especially from data traffic point of view) will happen in such a context. In this case, simultaneous communication between different users will yield to high interference for which CDMA would require high complexity multiuser detectors, which is not compliant with the low complexity requirement of the M-HDR system. Therefore, a TDMA scheme was chosen. This scheme also has the advantage of being compliant with the IEEE802.15.3 standard. Regarding the PHY, the M-HDR air interface is based on multicarrier technology capable of transmitting data from low to high rate for WPAN environment. An overview of the baseband PHY operations is illustrated by the block diagram of Figure 1 .
The M-HDR is based on a coded OFDM modulation using convolutional coder. The data are spread over the subcarriers by the spreading and multicode blocks. This function aims at a better exploitation of channel diversity, thus yields to more robustness [7] . Preamble information is then appended in the time domain to build the PHY frame structure described in Figure 2 .
At the input of the receiver, automatic gain control (AGC) and time/frequency synchronization are performed in the time domain. The synchronization block, which is critical in OFDM systems, is detailed in Section 4. After the OFDM demodulation, the channel is estimated using a least square estimator over full pilot symbols. This is based on the assumption of low device velocity in WPAN context. After the despreading, the bits are demapped from the QPSK, 16-QAM, or 64-QAM, according to the mode selected. The range of data rate envisaged is from few of Mbps to 130 Mbps, which corresponds to HDR-WPAN scenarios identified in the MAGNET project [2] . Two modes of operations using 20 MHz and 40 MHz bandwidth handling up to 65 and 130 Mbps, respectively, are considered for additional flexibility. The maximal spectral efficiency of 3.5 bits · s −1 · Hz −1 is achieved using the 64-QAM. Detailed rationale for parameters is given in Table 1 , choices can be found in [7, 10] .
MAC PROTOCOL
The generic MAC architecture for a device capable of supporting M-HDR air interface has been developed with the functional partitioning between the host and the network interface card (NIC). The NIC implements the M-HDR air interface prototype which consists of MAC and PHY layers with an appropriate interface to the host platform. USB is chosen as the default physical interface to the host. The network layer and the applications are implemented on the host platform (Nokia 770 PDA). Figure 3 depicts a high-level MAC architecture for the M-HDR air interface. From the implementation point of view, the following three modules were implemented.
(1) The M-HDR MAC module contains the implementation for the core MAC functionalities, for example, beacon transmission for piconet formation, channel scanning for piconet discovery, synchronization with other devices, association/disassociation requests to join and leave piconet, and asynchronous/isochronous data transmission. On the data path, this module exchanges logical link control (LLC) frames with the host while the control path is used to exchange various management commands, for example, set or fetch configuration parameters. In order to achieve the required real-time performance, the MAC is partitioned into hardware (HW-MAC) and software (SW-MAC). The time-critical and compute-intensive blocks like CRC generation and ciphering are implemented in hardware as part of HW-MAC. In the following subsections, we elaborate on both the software and hardware parts of the MAC implementation. (2) The target module of Figure 3 acts as an interpreter for the messages it receives from the host over the USB link. It translates these commands into IEEE-802.15.3 format and forwards them to the M-HDR MAC module for further processing. (3) The host interface module implements the application programmable interfaces (APIs) which are used by the higher layers to access various MAC functionalities.
To facilitate message exchange between the host and the NIC, a frame format has been specified. As shown in Figure 4 , it contains a frame identifier field which uniquely identifies the type of the frame, a payload size field of two bytes which provides the length of the attached payload. The payload field consists of the parameters specified with the commands and can be a maximum of 2048 bytes. As mentioned above, the implementation of the M-HDR MAC conforms to the IEEE802.15.3 standard and consists of the four main building blocks (see Figure 5 ).
SW-MAC design
Non-real-time critical features of the MAC are implemented on a software (SW-MAC) running on an embedded general purpose processor (GPP).
The TX-frame processing block is mainly responsible for the formation of the data and command frames to be transmitted. Upon receiving the data/command request from the frame convergence sublayer (FCSL) or the device management entity (DME), the transmitter chain validates the request, for example, the sourceid, dstid, data length, and stream index parameters. The MAC frame prepared by attaching the MAC header and the payload is then sent to the transmitter for the transmission over the air.
The transmitter block puts these frames into appropriate device driver queues for transmission. The device driver implements two transmission queues: one for transmissions during the contention access period (CAP) and the other for transmissions during the allocated channel time (CTA).
The RX-frame processing module is responsible for receiving the frames from the baseband and forwarding them to the FCSL or DME. The receiver upon receiving the frame from the baseband verifies the frame for the command or the data. The command frames are forwarded to the DME block and the data frames are passed to the FCSL block.
The receiver block coordinates the packet reception between the receiver chain and the baseband device driver.
From an implementation point of view, each of these blocks is implemented as a separate thread. These threads communicate with each other using the Linux message queues as the interprocess communication (IPC) mechanism. The synchronization between the threads is achieved Dominique Noguet et al. by the use of semaphores. The Linux system calls are implemented as a thin operating system abstraction layer (OSAL). The OSAL implements the generic wrapper functions over the OS-dependent system calls.
As shown in Figure 6 , the M-HDR MAC module is implemented as a multithreaded program. The module is activated by a call to the main function which in turn invokes the initMAC() function. The initMac() function initializes the framework by creating the threads for each of the DME, FCSL, transmitter chain, receiver chain, as well as the transmitter and the receiver blocks. The associated message queues, registers, memory pool, and PIB (PAN Information Base) parameters are also initialized.
HW-MAC primitives
The hardware MAC (HW-MAC) is present at the interface between the PHY layer and the SW-MAC layer. It inherits some terminal functions of the MAC layer to achieve improved real-time performance as compared to that performed when in software. The HW-MAC handles all the data processing in order to provide the PHY layer with the required format of the packet to be transmitted . Similarly, the HW-MAC receives packet from PHY BB and transforms it in a consistent way for the SW-MAC . The HW-MAC consists of several blocks like the hardware 128 bit advanced encryption standard (AES) unit which benefits from the implementation described in Figure 7 , CRC generation/verification units and register address space along with an address decoder. The top-level finite state machine is the intelligence behind the working of HW-MAC. It schedules and synchronizes the data flow between SW-MAC and PHY BB depending on the type of configuration defined by the SW-MAC.
The block diagram of the HW MAC depicting the flow of data between SW-MAC and PHY BB is shown in Figure 7 . The presence of HW-MAC makes the SW-MAC perceive the PHY layer as any other peripheral. This is because the HW-MAC provides the SW-MAC with an interface similar to a memory. Various configurations and status registers including the data and header FIFOs are mapped on to an address space to which the SW-MAC can write. If the SW-MAC requires transmitting a data packet over the air, it writes the configuration in the registers and the data to be transmitted into the FIFOs. The HW-MAC delivers it to the PHY layer according to the configuration set by the SW-MAC. Conversely, when a packet is received from the PHY layer and if it is intended for a device in receive mode, the HW-MAC verifies the packet for its integrity and interrupts the SW-MAC to inform about the received packet. Besides these scheduling functions, the HW-MAC also implements primitives to speed up the computation of data. This concerns encryption, CRC generation/verification, and other minor functions like packet parsing, packet formatting, timers, and so on.
BASEBAND PROCESSING
Like any OFDM system, the M-HDR air interface is sensitive to synchronization error and a particular attention has been made to handle robust synchronization at the receiver. Another specific concern for real-time digital design of the M-HDR air interface is clock-domain management. Finally, hardware implementation errors (e.g., quantization noise, operator bias, etc.) impact on processing precision needs to be quantified. Implementation loss induced by the baseband processing is scarcely addressed in the literature. In this section, the error introduced by the digital baseband processing is quantified and its impact is given in terms of equivalent additive white Gaussian noise (AWGN) signal on the ideal signal.
Synchronization
The synchronization aims at referencing in time the FFT vector for OFDM demodulation and at estimating the carrier frequency offset (CFO) in the time domain (pre-FFT). CFO corresponds to the TX/RX oscillator frequency shift. Correcting the CFO is of paramount importance for OFDM systems which are very sensitive to such an impairment [8] . Synchronization is processed on the fly and runs continuously once the AGC is locked. It seeks a specific synchronization pattern contained in each frame header [10] . The synchronization process is ruled by a finite state machine (FSM) whose state is updated every received sample. It synchronizes the data flow according to the strongest path of the channel which is used as time reference.
The time synchronization is performed as follows. First, the autocorrelation of the received signal is computed. The periodic nature of the synchronization pattern enables the autocorrelation to show a typical flat region when the synchronization symbol is received. When the flat region is detected, the synchronization sample is coarsely indexed. To refine the position detection, a more restricted window is considered and the cross-correlation of the input signal with the known synchronization pattern is analyzed throughout this window. Peaks appear on the cross-correlation profile as soon as the known pattern is completely received. As previously stated, criterion to detect those peaks is defined. When last cross-correlation peak is received, the system can be synchronized accurately. In fact, the window is active when the autocorrelation signal is higher than the threshold over more than a predetermined time. This time is related to the synchronization pattern duration.
In order to determine the best threshold value, the synchronization Probability of false alarm (PFA) or that of misdetection (PMD) is analyzed. The PFA and PMD as a function of the autocorrelation threshold is given in Figure 8 for an AWGN channel. The threshold is represented as a percentage of the maximum value of the autocorrelation. The PFA is defined as the probability of finding a synchronization sample while no synchronization symbol was transmitted. Obviously, it decreases when the autocorrelation threshold increases. The PFA does not depend on the signal-to-noise ratio (SNR). This is due to the fact that the flat region is never detected when no synchronization pattern is sent, whatever the noise level.
The PMD is defined as the probability of missing the synchronization point despite the transmission of a synchronization symbol. For the lowest thresholds, the misdetection is mainly due to bad flat region localization or, as for the false alarm, due to the absence of autocorrelation flat region falling edge. For high thresholds, misdetection is also high but mainly due to nondetection of the flat region. Between these two threshold regions, a minimum is obtained around 0.5.
A PFA versus PMD, tradeoff values can be obtained for each SNR as the crossing point of the misdetection and false alarm curves. For instance, the SNR = 8 dB provides PFA < 10 −5 and PMD < 10 −5 choosing the threshold equal to 68%. When higher SNR are targeted, increasing the threshold will reduce the false alarm probability. For 10 dB, setting a 70% threshold brings about PFA < 10 −6 and PMD < 10 −6 .
Clock management for flexible design
Bringing flexibility of the baseband in terms of data rate increases the complexity of clock management. This section describes clock management and its impact on hardware architecture tradeoffs. The focus is on the 40 MHz system but can be transposed to the 20 MHz case easily. The convolutional encoder is fed with data at frequency f . The coder produces two parallel bits which are serialized before being punctured. Let N be the number of bits per symbol, D the serial output data rate of the convolutional encoder, R the global code rate, P the puncturing rate, and f the working frequency if only one frequency was used in the design. Since each OFDM symbol of 266 samples carries 192 data, the serial bitrate at the output of the coder is D = 192 × 40 × N/266 ≈ 29×N. At the output of the puncturing, the data are at the frequency f . Table 2 recaps the frequency to be used at the coder module according to the MAGNET modulation scheme implying different clock frequencies. The solution to dynamically change clock frequencies to address these modes is to use XILINX Virtex 4 tunable DLL feature. Although the interleaver is processing bits, it is using a parallel architecture whose width is determined by the one of a symbol. This results in an operating frequency of D/N = 29 MHz. This parallel approach was chosen due to frequency requirements for real-time operation. A serial implementation would indeed have had to sustain 174 MHz operation rate in the worst case. Thus, the parallelization, which is typically performed before the mapper here sources the input of the interleaver. In order to simplify clock management, the 7 serial to parallel converter always works at the highest frequency, and the data validation signal duty cycle is adjusted according to the modulation. This choice leads to a very small part of the design working at high frequency that does not need to be changed according to the modulation. The mapper and the spreader, that follow, process at the modulation symbol rate, namely, 29 MHz. Then, pilots are inserted increasing the rate up to 40 MHz for the OFDM modulation. Figure 9 shows the resulting clock domains.
RF FRONT-END
For the M-HDR platform, several receiver front-end architectures have been considered, two of which have emerged as possible candidates. On one hand, a classical zero intermediate frequency (zero-IF), and on the other hand, a modified weaver [11] which achieves a rejection of the image frequency, are generated by the down conversion of a heterodyne receiver.
As it is known, the weaver architecture is first mixed with the quadrature phases of the local oscillator to be then lowpass filtered (see Figure 10 , in which IF = RF 1 − LO = LO − RF 2 , where RF 1 is the desired signal and RF 2 the image frequency that would lead to the same IF after the synthesis).
One drawback of this architecture is that it introduces the problem of a secondary image, if the second mixer translates the spectrum to a nonzero frequency. With the frequency plan considered for the M-HDR system, this effect may cause UMTS image frequency to interfere with the desired signal.
The performance of the modified weaver architecture in terms of rejection depends on the phase and gain mismatch between the two reception paths. For a 1-5
• phase mismatch or 0.2-0.6 dB gain mismatch, it was reported that such architecture achieves 30-40 dB rejection [12] .
The parameters of the second approach, the zero-IFbased architecture, are specified in Figure 11 . The global noise factor is similar to the one of the weaver architecture.
In this case, the potential interference will come from the IEEE802.11 systems due to the direct convertion nature of this architecture. Therefore, rejection filtering concerns fall on this WLAN system. The filtering contribution is shared between the radio frequency (RF) filter, the analog baseband Figure 10 : Weaver RF architecture. filter, and the digital filter. On the other hand, since the channel and image coincide due to the direct conversion, the zero-IF architecture does not suffer from image rejection issue. This latter point is more critical for the weaver architecture that implements an "explicit" rejection scheme. The conclusion that can be drawn is that provided the same frequency selectivity for the filtering after the low-noise amplifier (LNA), both architectures provide sufficient interferer rejection capability, though the weaver architecture requests a more specific design attention to this phenomenon. The classical drawback of the zero-IF-architecture is the DC offset, since this imperfection is translated to the baseband by the Dominique Noguet et al. direct conversion. However, since the DC subcarrier is not used by the baseband, DC offset is no longer a very critical issue if enough attention is paid to the frequency stability and phase noise. The phase noise is imposed on each OFDM subcarrier by the RF synthesis. The phase noise is generated by the RF frequency synthesis of phase locked loop (PLL) and mixed with the RF signal, thus affecting downconverted baseband signal by a random phase shift in the time domain (before FFT).
GHz
The influence of the phase noise in the OFDM signal appears in two different ways in the frequency domain as reported in [13] .
(1) A common phase error (complex value) is multiplied to all subcarriers. This error comes from close-tocarrier phase noise. This error can be tracked and removed by equalization. (2) Due to further carrier phase noise, subcarriers are mixed together at FFT process, by such a way, intercarrier interference appears as hardly removable extranoise in the signal.
These should lead to a tradeoff between signal processing extracomputation (common phase error tracking) and requirements on the PLL and crystal choices. Innovative design works [14] [15] [16] [17] presented different techniques that provided improved reliability and a yield of CMOS RF transceivers, what has made, after the proper evolution in the research areas, CMOS process a real player in the cost-effective radio market. Single-chip solution offers as well several advantages such as reduction in manufacturing and packaging costs due to the elimination of the routing between different integrated circuits, leading to a printed circuit board (PCB) multilayer complexity reduction. Smaller areas and diminished consumption (simplification of internal interfaces between blocks) jointly with shorter factory test times and higher test yields are other benefits of the singlechip designs. For these reasons, the zero-IF approach was preferred and the MAXIM MAX2829 chip was used as the heart of the RF part of the design. Besides, the included PLL bandwidth and the chosen crystal reference made negligible the extradistortion caused by the phase noise effects.
IMPLEMENTATION PLATFORM
The M-HDR prototype consists of a set of boards that embed the components needed for the implementation of MAC and PHY layers, namely, an RF board implementing TX and RX RF functions from/up to the converters up to/from the antenna and a digital board implementing digital PHY and MAC functionalities. The latter also includes some host bridging features in order to plug the HDR prototype to a host device. An overview of the M-HDR prototype is illustrated in Figure 12 .
The HDR RF subsystem (or board) is based on an off-the-shelf component from MAXIM (MAX2829). The MAX2829 is designed for dual-band 802.11a/g applications covering especially world-band frequencies of 4.9 GHz to 5.875 GHz. The IC includes all circuitry required to implement the RF transceiver function, providing a fully integrated receive path, transmit path, VCO, frequency synthesizer, and baseband/control interface. Only the power amplifier, RF switches, RF bandpass filters (BPFs), RF baluns, and a small number of passive components are needed to form the complete RF front-end solution.
The digital board houses the programmable chips that implement baseband PHY and MAC functions. For SW-MAC primitives, an ARM9 has been selected (AT91RM9200). The SW-MAC primitives run on top of a Linux OS. For the HW-MAC and PHY primitives, a Xilinx Virtex 4 has been chosen due to hardware resource available and flexible clock management capability (XC4VSX55-10). Complexity analysis that led to this chipset choices is provided in Table 3 . The NIC is used by its host as a USB device.
Battery operation was made possible to enable handheld field trials. It offers autonomy of several hours. Power consumption is mainly due to FPGA implementation though an equivalent system-on-chip implementation would yield to dramatic power consumption decrease.
PHY MEASUREMENT RESULTS
This section aims at providing results of the tests performed with the M-HDR prototype. The first tests consist in bit error rate (BER) versus SNR for different configurations of the platform, gradually illustrating the impact of each approximation. Results presented hereafter are all given for AWGN channels for the sake of comparison.
The first step aims at evaluating the impact of fixed point implementation within the FPGA. It is worth mentioning that the converters (ADC) at the input of the receiver have a 12 bit dynamic introducing a quantization SNR of 72 dB. This means that the conversion noise is negligible within the SNR range addressed by the receiver. In order to see the impact of fixed point computation, the BER vresus SNR performance of the prototype was compared with the floating point simulation model. In both cases, measurements are performed under perfect channel estimation and without CFO estimation (both TX and RX share the same clock reference and CFO estimator is disabled). The results are shown in Figure 14 for AWGN channel. From these noncoded performance curves, it can be noted that performance loss is negligible at the SNR range to be considered for the system.
Results provided hereafter are coming from prototype measurements only. Figure 15 shows the additional impact of CFO estimation and channel estimation for the M-HDR prototype. The equalizer coefficients use a 12 bit quantization without additional performance impact due to quantization Figure 17 : Impact of (a) IQ mismatch (lin/rad) and (b) IQ mismatch correction. noise. The previous curve obtained with perfect estimation is given as a reference. This reference curve is similar to the one of Figure 14 . It can be observed that the major degradation is brought by the channel estimator linked to the zero-forcing equalizer. However, the 3 dB shift is rather due to the kind of estimator chosen rather than its implementation, since floating point simulation provides similar results. It can be noticed that the CFO estimation and correction have little impact on the overall performance.
Measurements presented in Figure 16 show the impact of the RF front-end to the global performance for the QPSK-1/2 configuration. It can be seen that the RF front-end introduces some degradation (e.g., around 1.5 dB shift at BER of 10 −6 ). At high SNR, an error floor of 10 −8 cannot be overtaken. Such error floor will impact the performance even further when weaker channel coding schemes will be used. Among the potential explanation for this phenomenon is the impact of IQ mismatch. IQ mismatch compensation schemes have been presented in the literature [13] [14] [15] . Simulations have provided information on the effects of the IQ mismatch and the frequency offset, as well as the capabilities of the correction algorithms.
In the simulation results presented in Figure 17 (a), the white noise coming from IQ mismatch intercarrier interference is higher than the front-end thermal noise. Error floor effects are then similar to those observed on the prototype. Results show a good BER improvement with IQ mismatch correction, even if IQ mismatch estimation is degraded at low SNR. In Figure 17(b) , the corrected system curve meets the "no mismatch" curve.
CONCLUSION
The multicarrier spread spectrum prototype presented in this paper enables to achieve data rates that cover most WPAN applications necessities. Since the WPAN transceivers are likely to equip battery-operated devices, it is important that hardware complexity remains reasonable. The MC-SS air interface described herein has a complexity which is close to that of WLAN transceivers while achieving better robustness over WPAN channel conditions. Many hardware-related tradeoffs had to be made for the implementation. The presented choices have shown that reasonable implementation loss was caused while hardware complexity was kept as low as possible. Preliminary measurements have shown that the degradation introduced by the baseband implementation is compliant with simulation results. Measurements including the RF show that some error floor appears at high SNR values. Among the potential sources of degradation is the IQ mismatch. This impairment can be compensated at the baseband by efficient correction schemes that already proved their effectiveness through simulation.
