In this paper, the potential of reconfigurable hardware in wireless communication systems is evaluated. As most of the published works aim at the usage of reconfigurable architectures as a universal platform for different standards, the optimization capabilities of reconfiguration techniques for hardware modules within one standard are often not considered. This work focuses on the hardware optimization within one transmission scheme, namely OFDM (Orthogonal Frequency Division Multiplexing). By making use of special characteristics of packet-based WLANs and standard specifications, functional blocks of an OFDM receiver can be mapped on the same hardware. Even performance enhancements can be achieved without any additional hardware.
tion. Hence, many research efforts are devoted to reconfigurable architectures supporting different standards based on the same transmission scheme. Such architectures require less flexibility as the basic operations of a transceiver for one specific transmission scheme are mostly the same. For example, OFDMbased (Orthogonal Frequency Division Multiplexing) WLAN standards like IEEE 802.11a and HiperLAN/2 show a huge similarity in the physical layer. The usage of reconfigurable architectures for supporting both standards seems to be a promising application field.
The flexibility required for supporting different standards of the same transmission scheme can also be used to improve the performance within one standard. WLAN standards show certain characteristics which can be used to reduce the hardware effort of receivers. Dynamic reconfiguration can be applied as certain functions within a receiver are not required permanently. By mapping these functions onto the same hardware components, the released resources can be used for the sake of optimizing the receiver performance.
The rest of the paper is organized as follows: in section 2, the basic principles of OFDM-based WLANs are presented. Based on these analyses, reconfiguration capabilities of OFDM receivers are shown in section 3. Section 4 gives a design example for a function-specific dynamically reconfigurable hardware design for the blocks depicted in the last section. Finally, a conclusion is given in section 5.
OFDM-based WLANs
The usage of OFDM as a transmission scheme for emerging WLAN standards like HiperLAN/2 [Hip, 2001] and IEEE 802.1 la[IEEE802. 1 la, 1999] has proved its suitability for high data-rate wireless communication systems. The basic principle behind OFDM consists in splitting the available bandwidth into many narrowband subchannels. A high transmission rate can be achieved by sending a large number of subcarriers in parallel as lower rate data streams. The discrete-time equivalent of a complex OFDM signal can be obtained by means of the IDFT (Inverse Discrete Fourier Transform) [Nee and Prasad, 2000] . Therefore the receiver can demodulate the transmitted signal by applying the DFT (Discrete Fourier Transform). Thus the FFT is essential for an OFDM receiver and is also one of the most computation-intensive blocks besides the Viterbi decoder. The complete structure of an OFDM receiver is shown in figure 1 .
The first functional block in the baseband processing for an OFDM receiver is the synchronization unit. Its aim is to detect an incoming data packet and to compensate for a potential frequency offset. Therefore special preambles at the beginning of a transmission are used. After removing a cyclic prefix which was introduced at the transmitter side in order to avoid intersymbol interference, the signal is transformed from the time-domain back to the frequency-domain by applying a FFT. For the compensation of the channel distortion, a channel estimation is done at the beginning of each transmission burst. The estimates are used by the equalizer for the compensation of the channel distortion by means of a zero-forcing equalization. In addition, channel state information are generated by using the channel estimates. The channel state information are used to weight the equalized soft values from the equalizer. Prior to the Viterbi processing of the soft values, the interleaving and puncturing have to be reversed. Finally the data bits are descrambled.
Transmission Format
The HiperLAN/2 standard defines a basic frame with a length of 2 ms, which comprises five different burst types. For a downlink connection, two types are defined, a BCB (Broadcast Burst) and a DLB (DownLink Burst) [Hip, 2001] . Figure 2 shows the structures of these two burst types. The preamble of the BCB is divided into three symbol groups. The A and B symbols enable frame synchronization, automatic gain control and frequency synchronization. The C symbols are used for channel estimation as well as for fine frequency synchronization. While the C symbols are included in both burst types, the A and B symbols are only used for the BCB. As a DLB is preceded by a BCB, a new frame and frequency synchronization is not required for the DLB.
For the WLAN standards IEEE 802.11a and HiperLAN/2, a channel raster of 20 MHz is used. With a subcarrier spacing of 312.5 kHz, this results in 64 subchannels. For one OFDM symbol channel, 11 guard-carriers are defined in order to achieve a sufficient adjacent channel suppression. Moreover, the DC-carrier is not used either. The remaining 52 subchannels are used for transmission. Out of these 52 subchannels, 4 pilot-carriers are scattered among the frequency spectrum and thus 48 subcarriers are used for data transmission. Figure 3 shows the frequency spectrum allocation. 
Reconfiguration Capabilities
Reconfiguration always comes at cost of hardware efficiency. Due to the required high computational power of modern wireless communication systems, the use of reconfigurable architectures for wireless systems based on different transmission schemes seems not to be advisable. By restricting the flexibility to function-specific reconfigurable designs, the hardware overhead can be reduced significantly [Zhang and Bodersen, 2000] . Such hardware architectures will be specific to a transmission scheme but not to a special standard. In the following, reconfiguration capabilities based on WLAN and standard characteristics are analyzed. Notice that computation-intensive tasks like FFT and Viterbi decoding are not considered in the following. Due to the high data rates up to 54 MBits/s, these tasks are often realized with specialized hardware designs. This specialization hampers the utilization of these hardware blocks for reconfiguration.
Reconfiguration Options based on WLAN Characteristics
The main option enabling reconfiguration is the fact that not all functional blocks within an OFDM receiver are active while processing a burst. During timing and frequency synchronization, no payload processing is done (neglecting a pipelined processing of the last burst). On the other side, while decoding the payload, synchronization has already been achieved. This raises the idea of mapping both tasks onto the same hardware. Figure 4 shows the block diagram of a synchronization unit. The synchronization process can be divided into two steps. In the first step, the time and frequency values are determined. Therefore the repetition peambles within the received data sequence r have to be detected. This process is based on the computation of the complex correlation Ns-l m=0 and the power sum of a synchronization windows of length A^^
The hardware structure for the symbol correlator is given in figure 5 .
There exist several metrics which use the complex correlation and the power sum to determine the synchronization point and the frequency offset. In this work, a combination of the MNC (Maximum-Normalized-Correlation) scheme [Schmidl and Cox, 1997] and the MMSE (Minimum-M^an-Square-Error) scheme [Chevillat et al., 1987] is used, which was proposed in [Kabulepa et al., 2002] . While the MMSE scheme has shown to be efficient in continuous OFDM applications, it tends to increase false alarm probabilities in burst-oriented OFDM systems when no data frame is transmitted. By applying the MNC scheme to detect the existence of a data frame and using the MMSE scheme to determine the exact synchronization point, it was possible to combine the advantages of both schemes. The hardware structure for this synchronization scheme is given in figure 6 . When the synchronization point has been detected, the received data are aligned in a second step. Therefore, FIFOs are used for the time aUgnment and a CORDIC (Coordinate Rotation Digital Computer) is used for the frequency correction. The CORDIC is also required for angle calculation during the frequency offset estimation during the first phase. Thus, only the hardware components realizing the synchronization scheme as depicted in figure 4 can be reused as these modules are not required all the time.
Once synchronization has been achieved, payload processing can be started. Focusing on the functional blocks succeeding the FFT, the payload processing Starts with equalization. Equalization (zero-forcing) can be performed by multiplying the received signal dn{i) with the estimated inverse channel transfer function C~^.
resulting in the transmitted signal dn{i) and a noise term Nn. The channel estimates Cn can be achieved by using the two C-symbols at the beginning of each burst. Equation 4 shows the computation of the channel estimates according to the preamble structure of IEEE 802.1 la and HiperLAN/2.
^ _ d^(0) + dT(l) .,,
In case of a non perfect synchronization, the equalized data may still show a remaining phase rotation. In order to compensate for this impairment, the phases of the pilot-carriers can be monitored. By averaging the phase rotations of the pilot-carriers in an OFDM symbol, a correction value (p{i) can be determined which is used to rotate the equalized data. This value can also be used to update the channel estimates, resulting in an improved equalization for succeeding OFDM symbols. Equation 5 presents a method for averaging the phase rotations of the pilot-carriers under consideration of the channel state. Np gives the number of pilot-carriers, dj'{i) are the equalized pilots and Cj'{i) are the channel estimates for the pilot-carriers. A more detailed description of this method can be found in [Pionteck et al., 2003a] .
E \Cfii) Figure 7 shows the hardware structure of a zero-forcing equalizer with phase rotation compensation. The grey area highlights the components which are used for zero-forcing equalization only. It was possible to avoid the division of the equalized data by the squared absolute values according to equation 3, as the succeeding functional block, the soft value generation, requires a multiplication of the data with this factor.
For the soft value generation, the LLR (Log-Likelihood Ratio) is used as a measure for the reliability of a decision. The sign of the LLR denotes the symbol (±1) while the absolute value represents the realibility of the decision. Especially for higher modulation schemes, the computation of the LLR function is very complex. Therefore, a simplified version is used, which was presented by [Tosato and Bisaglia, 2002] . Let k be the dimension of the modulation scheme and |Gn(i)P be the channel state information of subcarrier n. The half distance between the partition boundaries of the constellation for A; > 1 is given by ak^n-According to [Tosato and Bisaglia, 2002] , an approximation of the LLR function for a square QAM constellation is given by:
The corresponding hardware structure for A: < 3 is given in figure 8 . The multiplication of dn{i) with \Gn{i)\'^ can be avoided by adapting the equalizer as mentioned before.
The hardware structures of the presented functional blocks show similarities to the symbol correlator and the power sum computation module. Due to the non-overlapping execution times, the use of a common hardware platform for these functional blocks seems to be advisable. The usually introduced hardware overhead through dynamic reconfiguration can be avoided as the reconfiguration capabilities are very function-specific. A hardware platform for these functional blocks is presented in section 4. 
Reconfiguration Options based on Standard Characteristics
Another way of finding reconfiguration options is the analysis of the standards to be implemented. Focusing on the HiperLAN/2 standard, the different characteristics of the two burst types (BCB and DLB) for a downlink connection can be exploited. The BCB is used to transmit control information to all users while the DLB contains the payload for one specific user. Due to the importance of the BCB, simple modulation schemes are used to make sure that all users are able to decode the control information. In HiperLAN/2, the BPSK modulation scheme is exclusively defined for the BCB, while for the DLB more complex modulations schemes like 64QAM can be used. It is evident that higher modulation schemes require more complex decoding algorithms. As for the BCB only BPSK is used, hardware required for the demodulation of 64QAM is idle. This hardware can be used to improve the performance of the receiver without any costs.
Based on the functional blocks introduced in section 3, the soft values generation can be simplified. For BPSK, the soft information is proportional to the distance from the decision boundary [Tosato and Bisaglia, 2002] . Thus, the hardware blocks for generating the channel state information and soft values are not required for the BCB. All further hardware blocks presented in section 3 are independent of the modulation scheme and thus are required for both burst types.
The freed hardware resources can be used to improve the timing synchronization. OFDM systems are very sensitive to timing synchronization errors [Belcskei, 2001] . Even a small misalignment leads to a degradation of the system performance. The sensitivity increases with the complexity of the modulation scheme. In the presence of a timing offset At, where At is an integer multiple of the sampling period, the received signal can be written as 
Thus a timing offset results in a phase rotation of the subcarriers, which can easily be detected by monitoring the pilot-carriers. A possibility to estimate a timing offset is to determine the best fit line of the phase rotations of the pilotcarriers. With linear regression, the influence of the channel can be reduced. Figure 9 shows the best fit line for a timimng offset of -1. By monitoring the slope of the best fit line during the BCB, a residual timing offset can be detected reliably [Pionteck et al., 2003b] . A receiver can compensate for the estimated timing offset before starting the processing of the DLB. The required hardware effort is marginal. The calculation of the phase of the pilot-carriers, which is the most complex task, is already done in the equalizer when performing phase rotation compensation. For correcting the offset, no additional hardware is required as only memory positions inside the synchronization module have to be updated. 
where NBCB denotes the number of considered OFDM symbols, Np gives the number of pilot-carriers, Xj and Xk are the subcarriers at position i,j and (Pj{i) is the phase of jth pilot-carrier of the ith OFDM symbol. As some of the pilot-carriers are rotated, an addition of the phase of these pilot-carriers with n is required before determining the slope of the best fit line. The timing offset estimator can be realized with only a few hardware components. This allows the realization of this optimization method onto the hardware blocks of the channel state information and soft value generation. By means of a dynamic reconfiguration between the BCB and DLB, no additional hardware for the timing offset estimator is required. The division in equation 10 can be avoided by adapting the reference values which are used to determine the size of the timing offset.
Hardware Design
The previous section indicates that several possibilities exist where dynamic reconfiguration can reduce the hardware requirements of a receiver within one transmission scheme. These function-specific reconfigurations can be used either as optimizations within a dynamically reconfigurable architecture or for a stand-alone implementation. A possible stand-alone implementation for the usage inside an ASIC is presented in the following. Figure 10 shows the datapath of the proposed hardware architecture. For the sake of simplicity, not all feedback connections and pipeline registers are shown. Memory blocks which can be used at different positions in the datapath are drawn separately. Each of these memory blocks has a capacity of 52 Bytes, capable of storing the real or imaginary part of a complete OFDM symbol. The remaining memory blocks provide a capacity of 104 Bytes. For the CORDIC elements a word-parallel architecture was chosen. This increases the hardware requirements but leads to a reduced latency for the complete system. The same holds true for the architecture of the divider.
The design comprises all arithmetic blocks and datapaths for realizing the functional blocks discussed in the last section, including the timing offset estimator for the BCB. Dynamic reconfiguration enables the switching between different functional blocks at runtime. By making use of timing and scheduling properties of the different tasks, the functional blocks can be grouped into three different configuration mappings. Figure 11 shows the mapping of the functional blocks onto the proposed hardware design. mapping A The first mapping realizes the combination of the MMSE and the MNC scheme as described in section 5. According to the numbering in figure 10 , the result of the MMSE scheme is provided at position (s) and the result of MNC scheme at position (4) and (6) + mapping B The second mapping realizes the channel estimater, equalizer, channel state information and soft values generator. This mapping can be used for payload processing in HiperLAN/2 (DLB and BCB) and IEEE 802.11a. Modulation schemes up to 64QAM are supported. Input (T) provides the C symbols which are used to calculate the channel estimates. These values are stored in memory block 2 for the equalization of the data symbols. The incoming data symbols dn{i) at input (5) are stored in memory block 3. In parallel the first data symbols are equalized. As soon as all pilot-carriers have been received, the computation of the correction value for the phase rotation is started. In the meantime, memory block 1 is used as a buffer for the data symbols. The phase compensation of the equalized data is done in CORDIC block 2, while CORDIC block 1 is used to update the channel estimates. The equalized and rotated data symbols are available at position (s). If only zero-forcing equalization is to be performed, the result is available at position (9). For the soft value generation, the equalized data is fed back to the architecture. According to the used modulation scheme, the soft decision values are provided at position (7) or @ .
mapping C The third mapping realizes the channel estimator, equalizer and timing offset estimator as described in subsection 3.2. Generation of soft values and channel state information are not required as BPSK is used as modulation scheme. Thus this mapping is specific to HiperLAN/2 as it exploits standard characteristics for the BCB of the downlink connection. The result of the timing offset estimation is available at position
JLJ:
Synchronization The configuration memory comprises 48 Bytes and is divided into two sections. In the first section, three different configuration sets can be stored. The second section is used to provide some general settings which are applicable to all of the three configuration sets. These settings include information about the preamble structure (HiperLAN/2 or IEEE 802.11a) and the position of rotated pilot-carriers. The architecture allows a single cycle context-switch between different mappings, as it is required for a continuous data processing during a burst. The hardware architecture is controlled by an external device (e.g., microcontroller), which also manages the reconfiguration process. This external device has to implement functions of the data link layer according to the OSI layer model, as only this layer has knowledge of the type of a burst. The switching between the preamble processing and the payload processing of a burst is controlled by the architecture itself.
The proposed dynamically reconfigurable design was implemented using a 0.35^m standard CMOS process with 3 metal layers. Synthesis was done with the Design Analyzer from Synopsys. Table 1 shows the synthesis results of the different functional blocks and of the proposed function-specific dynamic reconfigurable architecture normalized to the area of an eight bits multiplier. The maximum clock frequency of the timing offset estimator, equalizer and the reconfigurable architecture is limited by the critical path inside the unpipelined word-parallel CORDIC. When using a pipelined or word-seriel CORDIC, the maximum clock frequency of the reconfigurable architecture is about 100 MHz. Compared to a fixed stand-alone implementation of the presented functional blocks, the proposed dynamically reconfigurable design significantly reduces the hardware effort while providing a higher flexibility than a fixed design. The proposed architecture has a normalized area of about 153. Compared to the normalized area of 178.5 for implementing all functional blocks separately, hardware savings of about 20% could be achieved by using the reconfigurable platform. The hardware overhead introduced by dynamic reconfiguration is very low (about 6%), as only three configurations have to be stored. Even performance optimizations by improving the timing synchronization were to be realized without increasing the hardware effort.
5, Conclusion
In this work, reconfiguration capabilities for OFDM-based WLANs are explored. By focusing on WLAN and standard characteristics, a set of functional blocks suitable for a dynamic reconfiguration were extracted. The blocks were chosen according to the division of the data processing into preamble and payload processing. The reconfiguration possibilities could be used either for optimization purposes within a dynamically reconfigurable architecture or for a stand-alone implementation. As an example for a stand-alone implementation, the extracted functional blocks were realized onto a function-specific dynamically reconfigurable hardware design, offering significant savings in terms of area. Making use of characteristics in the HiperLAN/2 standard, it was also possible to use the presented hardware design to improve the receiver performance by optimizing the timing synchronization. The work shows that the use of reconfiguration techniques is not only worthwhile for hardware designs for different standards but also for optimizations within a standard. By offering a restricted set of reconfiguration capabilities reduces the overhead introduced by dynamic reconfiguration and eases the use of reconfigurable hardware designs.
6.
