Abstract-As the number of wireless devices, services, communication standards and respective modes of operation rapidly grows, the design of reconfigurable digital baseband processing systems for radio devices becomes more important and challenging. Long Term Evolution (LTE) is among the most relevant wireless systems in 4G communications and its waveform is OFDM-based. According to the LTE mode of operation, OFDM parameters may change and influence baseband processing operations. This paper presents a dynamically reconfigurable LTEcompliant OFDM modulator for Downlink transmission able to adapt its internal hardware organization on-demand according to the digital modulation scheme and OFDM parameters, such as number of data subcarriers, IFFT size, Cyclic Prefix and window length. System reconfiguration is performed by employing FPGA-based Dynamic Partial Reconfiguration (DPR) techniques. The worst-case DPR latencies measured are 895 µs and 1.192 ms for digital modulation and channel bandwidth adaptation, respectively. These results show that the adopted design approach is feasible in wireless baseband processing systems. Power estimations suggest that circuit specialization at run-time can potentially improve system power efficiency.
I. INTRODUCTION
Wireless communications experienced a massive growth, resulting in a near omnipresence in our lives through a wide range of services. This growth is unlikely to stop as the number of users, wireless protocols and respective modes of operation is continuously increasing. Current and future base stations and mobile terminals should then support several standards and, as it is not plausible that all those standards will be simultaneously enabled, radio devices should also be able to switch their operation mode according to the communication needs. Thus, the design of flexible and reconfigurable hardware infrastructures is an important challenge in wireless communications [1] . Concerning baseband processing, OFDM is the dominant waveform in current wireless communications, as well as a strong candidate for 5G waveforms [2] . Depending on the communication environment conditions and available resources, OFDM systems can be adapted by tuning parameters such as FFT size, cyclic prefix (CP) length or modulation scheme. 3GPP Long Term Evolution (LTE) is among the wireless standards that use OFDM, in particular for Downlink Transmission. LTE-Advanced is a key player in 4G communications, having introduced techniques such as carrier aggregation [3] to increase available bandwidth (LTEAdvanced Release 10). LTE Physical layer (PHY) supports six modes of operation for different bandwidths and amount of communications resources. These aspects influence the OFDM datapath used for baseband processing and consequently, the amount of resources required for hardware implementation. A straightforward method to handle this variability is to have separate implementations for each mode of operation. However, this is highly inefficient in terms of logic resource usage and circuit area occupation. Moreover, in the event of standard upgrade and inclusion of new modes of operation, the whole hardware infrastructure needs to be redesigned, causing a negative impact in terms of development time.
Alternatively, the LTE OFDM Modulator can be reconfigured in run-time so that the available hardware resources are dynamically allocated to fulfil the immediate system demands. This could lead to improved resource utilization, smaller circuit area and higher degrees of adaptability and upgradability. The implementation of this kind of infrastructures requires the adoption of appropriate hardware platforms. FPGAs are good candidate to implement flexible and reconfigurable structures for baseband processing because they provide a fair compromise between throughput, flexibility and power consumption. Additionally, SRAM-based FPGAs offer the opportunity to exploit Dynamic Partial Reconfiguration (DPR) -the reconfiguration of logic fabric portions at runtime -which can further extend the levels of flexibility and adaptability. However, DPR-based design presents challenges related with system operation integrity, reconfiguration latency and power consumption overhead. These factors are important due to energy and cost-efficiency concerns in the design of 978-1-5090-4565-5/16/$31.00 c 2016 IEEE future wireless communication devices [2] . This paper presents an FPGA-based flexible, resource efficient and reconfigurable LTE compliant OFDM modulator for Downlink (DL) transmission. The implemented OFDM Modulator supports the modes of operations defined in the LTE standard (for 1.4 , 3, 5, 10, 15 and 20 MHz channel bandwidth) and through DPR it is able to adapt the mode of operation at run-time. This feature allows computation specialization to the immediate system demands [4] and efficient use of hardware resources. Furthermore, upgrades to the current design can be easily done to accommodate other wireless standards. This work is within the scope of the research activities presented in [5] and extends the exploitation of FPGA-based DPR techniques in Software Defined Radio (SDR) and Cognitive Radio (CR) systems initiated in [6] and [7] .
The paper structure is the following: Section II summarizes some related work on reconfigurable OFDM modulators based on FPGAs; Section III presents fundamental background about LTE frame structure and OFDM modulation for Downlink transmission; Section IV reports the implemented reconfigurable LTE compliant OFDM modulator; in Section V the implemented design is evaluated and results are presented and discussed; at the end -Section VI -some conclusions are presented.
II. RELATED WORK
There are several works on reconfigurable OFDM modulators based on FPGAs. Zhang et al. [8] presented an architecture for a multi-standard OFDM modulator supporting DAB systems and Chinese Mobile Multimedia Broadcasting (CMMB) transmissions. However, this modulator does not support LTE and reconfiguration is achieved by setting some communication parameters. This means that the actual hardware implemented is static and designed for the worst-case scenario (e.g.: Inverse Fast Fourier Transform -IFFT -computation for different sizes is done using an IFFT for the largest size and zero-padding techniques). In [9] , the proposed OFDM modulator supports all LTE modes of operation, except for 15 MHz bandwidth. The communication parameters adjusted on-thefly are frame size, CP length, IFFT size, modulation type and pilot values. The system is reconfigured through multiplexerbased techniques and parameters adjustment, leading to an inefficient use of resources. This work presents only PostSynthesis results and doesn't present any power consumption estimations.
An architecture for SDR transmitter baseband processing and Digital Front-End is proposed in [10] . DPR-based techniques are used to dynamically adapt FFT size, modulation scheme and CP length. The design supports different modes of operation from different standards. Regarding LTE, it only supports two modes of operation (5 and 10 MHz bandwidth). However, the main focus is not on the standard implementation details, but on showing the benefits of using DPR in SDR systems, in terms of flexibility and better resource utilization.
Compared with these approaches, our work aims at designing an OFDM modulator compliant with all LTE modes of operations for Downlink transmission and at achieving high degrees of flexibility, adaptability, power and resource usage efficiency though DPR.
III. FUNDAMENTAL BACKGROUND
Time domain LTE signals are structured into frames of 10 ms. Each frame comprises ten subframes of 1 ms and, in turn, each subframe is composed by two slots with a length of 0.5 ms. Slots consist of seven (for normal CP length) or six (for extended CP length) OFDM symbols. The smallest physical resource in LTE -resource element (RE) -consists of one subcarrier during one OFDM symbol. These REs are grouped into resource blocks (RBs): the smallest resource unit that can be allocated to a user. An RB comprises twelve consecutive subcarriers in the frequency domain and one slot in the time domain. Figure 1 depicts the structure of an LTE frame.
Considering an array of Downlink subframe resources as an input, an LTE OFDM modulator for DL transmission should perform the sequence of operations shown in Figure 2 . After modulation with QPSK, 16-QAM or 64-QAM schemes, input data should be mapped to the central subcarriers of the IFFT input array and a null DC component must be inserted -Subcarrier Mapping. Then, the IFFT input array is reordered to move the DC component to the middle of the array -IFFT Shift. The actual OFDM modulation is performed through IFFT computation, which can be achieved by using FFT cores and swapping real and imaginary parts of the input and output data arrays. To mitigate Inter-Symbol Interference (ISI) between adjacent OFDM symbols, due to the effect of time dispersion in multi-channels, part of the end of the symbol is copied and prepended to the beginning of the symbol. This operation is called Cyclic Prefix insertion. Additional post-IFFT processing is performed to improve Outof-Band (OOB) performance: Windowing (multiplication of both symbol head and tail by a window) and Overlap-andAdd (the tail of a symbol is overlapped and added with the head of the next symbol). Among these operations, the IFFT is the most demanding in terms of arithmetic complexity and resource usage. The remaining operations mainly consist of storing and forwarding data with little arithmetic computation.
According to the LTE mode of operation, some OFDM parameters may vary and influence the operations previously described. Table I presents 
IV. IMPLEMENTATION
In the DPR design flow, parts of the FPGA programmable logic (PL) are defined as run-time reconfigurable -reconfigurable partitions (RPs) -and their functionality can be modified by loading a partial bitstream containing information about the configuration of an RP. For each RP possible configuration, a partial bitstream is generated and its size mainly depends on the RP area. During the partial reconfiguration process, the partial bitstreams are fetched from a certain location (e.g.: DDR memory) and sent to the FPGA configuration port. Thus, the reconfigurable latency is proportional to the partial bitstream sizes. In turn, by minimizing the reconfiguration latency, it is possible to diminish the power consumption overhead introduced by DPR [11] .
Consequently, the success of DPR application depends on how the system is partitioned and on the granularity level considered for reconfiguration. One possible approach could be to consider an RP for each module in the OFDM datapath from Figure 2 . However, this approach would neglect possible common hardware blocks among different configurations and result in a considerable number of partial bitstreams. Opposite to this is the definition of a single RP embracing the whole OFDM datapath. As in the previous approach, commonalities between configurations would be ignored, but the number of partial bitstreams would be smaller. In this case each partial bitstream is considerable larger because it contains configuration data relative to the whole processing chain.
The proposed approach attempts to conveniently partition the system, such that the potential of DPR is explored, while keeping the logical structure of the OFDM modulator processing chain.
A. Reconfigurable OFDM Modulator Architecture
The OFDM datapath was analysed and the nature of operations performed in each module was studied. Although Table I parameters vary for all modes of operation, it is possible to identify hardware blocks used in every configuration. This was observed in the IFFT module. In our implementation, the IFFT core follows a Mixed-Radix-2 2 /2/3 algorithm and a Pipelined Single-Delay Feedback architecture, as it was done in [6] . All IFFT sizes in Table I are multiples of 64, meaning that, for every size, the IFFT pipeline will have a hardware infrastructure to compute 64-FFTs. So, this hardware infrastructure is common to all configurations and can remain in the system static part, while the remaining IFFT logic is implemented in one or more RPs.
The OFDM modulator presented in this paper considers reconfiguration at two levels: LTE channel bandwidth and digital modulation scheme. The former is directly related with
RP1

RP2
RP3 RP4
FFT-64
(Static) Table I ; the later refers to the possible digital modulation schemes used for LTE signals: QPSK, 16-QAM and 64-QAM. The channel bandwidth in use does not impose a specific digital modulation scheme. Instead, it is chosen depending on channel conditions such as Signal-toNoise ratio (SNR). Typically, low order modulation schemes (like QPSK) are used in lower SNR scenarios and result in smaller data rates; whereas high order modulation schemes are chosen when SNR is larger, allowing higher data rates.
The datapath architecture for the implemented OFDM modulator is illustrated in Figure 3 . It is composed by one static entity and four Reconfigurable Partitions (RPs), whose functionality can be modified through DPR. All arithmetic operations along the OFDM datapath are performed with 16-bit fixed-point precision for real and imaginary parts. RP1 is where the digital modulation scheme is implemented and is the only region that needs to be reconfigured if one wants to change the modulation scheme. The static entity implements a 64-FFT. To compute IFFTs for the sizes defined in Table I , additional FFT processing has to be performed before and after the 64-FFT module. The IFFT-related operations performed before and after the 64-FFT core are designated here as pre-64-FFT and post-64-FFT computations, respectively. The remaining computations, together with CP insertion, Windowing and Overlap-and-Add operation, are placed in RP4. The reason behind this design decision is the following: for 15 MHz bandwidth, the IFFT size is 1536 (non-powerof-two) and it makes IFFT computation more complex and resource demanding. RP3 dimensions could be expanded to accommodate enough resources for the later stages of the 15 MHz processing chain. However, this would result in the underutilization of RP3 for most modes of operation. Regarding RP4, it is left without any implemented logic -blank -for other configurations than 15 MHz. This strategy allows the use of RP4 resources for other system tasks whenever they are not being used by the OFDM modulator. As the final result may be the output of RP3 or RP4, a multiplexer is used to select the correct signal. This multiplexer belongs to the static part of the system and is controlled by an output bit from RP3. The high-level architecture of the implemented design is shown in Figure 4 . A Xilinx Zedboard (FPGA device: XC7Z020-CLG484-1) was used for the hardware implementation. It is a hybrid HW/SW platform composed of two main subsystems: Programmable Logic (PL) and Processing System (PS).
The PL subsystem runs at a clock frequency of 100 MHz and includes the OFDM modulator datapath (Figure 3 ). This datapath is embedded on an AXI4-Stream [12] IP core which, in turn, is connected with a DMA controller. As the OFDM modulator will perform data transfers to/from the DDR memory, the DMA controller is used to reduce the PS involvement in DDR read/write operations and thus improve the OFDM modulator throughput. Furthermore, PL comprises a Xilinx Internal Configuration Access Port (ICAPE2) primitive to provide access to the FPGA configuration memory. A dedicated DMA controller is employed to improve the reconfiguration throughput by accelerating partial bitstreams transfer to the ICAPE2 primitive.
The main constituent blocks within the PS are an ARM Cortex-9 processor (a dual-core general purpose processor) and a DDR memory controller. The ARM processor is responsible for DPR management, by triggering reconfiguration procedures, determining which RPs should be reconfigured and which partial bitstreams should be sent to the configuration memory.
Upon system start-up, partial bitstreams, input data files and a boot image file are copied from the SD card to the DDR memory. The input data files are then fed to the OFDM modulator which performs further processing and stores the results back in DDR memory. To reconfigure the OFDM modulator, partial bitstreams are fetched from DDR and sent to the ICAPE2 primitive. Table II presents resource utilization for the static part of the system, as well as for the four RPs considered. The maximum partial bitstream size for each RP is also shown. When generating the partial bitstreams, Xilinx Vivado's feature of bitstream compression was enabled to reduce the amount of data to be transferred during DPR. For this reason, partial bitstreams for the same RP do not necessarily have the same size.
V. RESULTS AND DISCUSSION
The output results from the OFDM modulator were compared with results produced by a Matlab function called lteOFDMModulate() [13] , which performs subcarrier mapping, IFFT shift and computation, CP insertion, raised cosine windowing and overlap-and-add of OFDM symbols in a resource array. This function belongs to the LTE System Toolbox that provides "standard-compliant functions and apps for the design, simulation, and verification of LTE and LTE-Advanced communication systems" [14] . For all modes of operations, the implemented OFDM modulator produces correct results.
Apart from functional correctness, the OFDM modulator must be able to support sampling rates defined in the standard requirements. The most demanding sampling rate is 30.72 MSamples/s and occurs for the 20 MHz mode of operation. The implemented OFDM modulator is fully pipelined, runs at a frequency of 100 MHz and has a throughput of at least 92.5 MSamples/s, in steady-state operation. This performance is compatible with LTE requirements.
As mentioned in Section IV, OFDM reconfiguration occurs at two levels (digital modulation scheme and LTE channel bandwidth) and DPR latency was evaluated for each level. Modifying the modulation scheme involves the reconfiguration of RP1 only. In this scenario, the worst-case reconfiguration latency is 895 µs (for a reconfiguration throughput of 39 MiB/s), while the average latency is about 801 µs. Here, the reconfiguration throughputs observed are very low, taking into account that the ICAP theoretical limit is 400 MiB/s (considering f clk = 100 MHz and 32-bit data width). These low throughputs are due to the small size of the partial bitstreams involved in reconfiguration (maximum 38 kB), whose transfer times are comparable to the overhead introduced by DMA initialization.
In turn, reconfiguration at the LTE channel bandwidth level introduces a latency which is 1.192 ms in the worst case and 826 µs on average. Here, the reconfiguration throughputs are around 380 MiB/s (95% if the ICAP limit). Compared with digital modulation scheme reconfiguration, the amount of data transferred to the ICAP is larger and the benefits of using a dedicated DMA controller for DDR-ICAP data transfer become more evident.
It is important to evaluate the impact of DPR latency in the context of wireless systems. LTE and LTE-Advanced control plane latency requirements are 100 ms and 50 ms, respectively [15] . Thus, the latency introduced by DPR is acceptable in this application domain. It is pertinent to compare DPR latency results with the ones obtained in our previous efforts devoted to the implementation of a reconfigurable FFT processor for wireless systems. In [6] , a dynamically reconfigurable FFT processor was implemented on a Xilinx Zedboard. The system specifications like clock frequency, data width, DMA controller utilization, FFT algorithm and architecture are similiar to the ones adopted for the LTE OFDM modulator. The worst-case reconfiguration latency observed for the FFT processor was 1.63 ms, which is larger than the worst-case latency for reconfiguration at the LTE channel bandwidth level. It is worth noting that reconfiguration at the LTE channel bandwidth level adapts not only the IFFT core, but also the modules responsible for Subcarrier Mapping, IFFT Shift, CP insertion, Windowing and Overlap-and-Add. The main reasons for this DPR latency improvement was a more careful system partitioning based on the analysis of processing modules to implement and the identification of commonalities between modes of operation.
The Xilinx Zedboard provides a current sensor for power measurements. However, this sensor only allows for total board power measurements and it is difficult to perform measurements for individual power rails (e.g: PL internal supply voltage), as it would require PCB modifications. Due to this limitations for real-time power consumption monitoring, DPR power consumption overhead was not evaluated. Instead, Vivado's Power Analysis tool was used to estimate the power consumption of the LTE OFDM modulator datapath for each mode of operation and to investigate possible relations between datapath configuration and power consumption. The estimations obtained are shown in Figure 5 . From these estimations, it is possible to observe that static power is almost constant along different modes of operation and that power consumption variations are mostly due to dynamic power consumption. Furthermore, the contribution of dynamic power consumption to the total power consumption increases with the channel bandwidth. Although power measurements were not performed, the comparison between power estimations and measurements performed in [7] shows that the behaviour of power consumption estimations for different modes of operation is similar to what is observed in power measurements. So, based on presented estimations and considering the modes of operation with smallest and largest estimated power consumption (1.4 MHz and 20 MHz), the run-time reconfiguration of the OFDM modulator datapath can bring dynamic power savings of around 101 mW compared to a worst-case implementation. This suggests that implementing a multi-mode LTE OFDM modulator by having separate implementations for each mode, or by designing a hardware infrastructure for the worst-case scenario leads to poor energy efficiency. DPR can improve system energy efficiency if the time interval between reconfiguration procedures is much larger than the reconfiguration latency [7] , which is a reasonable scenario in the application domain considered for the implemented system.
VI. CONCLUSION
This work presented a dynamically reconfigurable LTEcompliant OFDM modulator for Downlink transmission implemented on a Xilinx Zedboard (FPGA device: XC7Z020-CLG484-1). The OFDM modulator supports the six modes of operation defined in the LTE protocol (1.4 , 3, 5, 10, 15 and 20 MHz channel bandwidths) and, through DPR, the system can adapt its operation characteristics at run-time. Reconfiguration is done at two levels: digital modulation scheme (QPSK, 16-QAM or 64-QAM) and LTE channel bandwidth. The OFDM modulator was partitioned taking advantage of commonalities between modes of operation, in order to achieve a good balance between partial bitstream sizes and number of Reconfigurable Partitions.
The worst-case DPR latencies observed at digital modulation and LTE channel bandwidth levels were 895 µs and 1.192 ms, respectively. These values are acceptable compared with latency requirements imposed by LTE. The computation specialization at run-time makes it possible to dynamically allocate hardware resources, resulting in resource and power savings when less resource-demanding modes of operation are enabled.
