Peak to average power ratio (PAPR) is one of the main imperfections in the broadband communication systems with multiple carriers. In this paper, a new crest factor reduction (CFR) scheme based on interleaved phase sequence called Dummy Sequence Insertion Enhanced Partial Transmit Sequence (DSI-EPTS) is proposed which effectively reduces the PAPR while at the same time keeps the total complexity low. Moreover, the prototype of the proposed scheme in field programmable gate array (FPGA) is demonstrated. In DSI-EPTS scheme, a new matrix of phase sequence is defined which leads to a significant reduction in hardware complexity due to its less searching operation to extract the optimum phase sequence. The obtained results show comparable performance with slight difference due to the FPGA constraints. The results show 5 dB reduction in PAPR by applying the DSI-EPTS scheme with low complexity and low power consumption.
Introduction
Orthogonal frequency division multiplexing (OFDM) based systems are applied to many of recent broadband communication systems, mainly because of their effective use of bandwidth and efficiency. Although, the OFDM signal has many advantages, such as robustness against inter-symbol interference (ISI), experiences very high Peak to Average Power Ratio (PAPR) or Crest Factor. This high PAPR signal which is due to the nature of Inverse Fast Fourier Transform (IFFT) inside OFDM, causes power amplifier efficiency degradation and as a result high power consumption. When a signal with high PAPR goes through a power amplifier operating in its nonlinear regime, spectral broadening will be occurred. The other impact will be the increase in the dynamic range of the digital to analog converter (DAC). These cause the cost of the system to be increased. The main objective of this paper is to investigate a solution to resolve the drawbacks of the conventional methods and compare the outcomes with the latest works. However, in order to prove the feasibility of the proposed scheme, the implementation of this scheme in hardware platform has been carried out. Moreover, the power consumption as a critical metric to evaluate the performance of the system is measured. In this paper, a new metric called hardware complexity reduction ratio (HCRR) as an important parameter to identify the total complexity in a real system is introduced for the first time. In addition to power consumption measurement and in order to have a fair comparison, the other parameters complementary cumulative density function (CCDF) and bit error rate (BER) are analyzed This paper is organized as follows. Section 2 describes the related works. Section 3 gives a definition of crest factor and introduces important parameters. In Sections 4 and 5, the proposed scheme and its prototype are presented, respectively. Sections 6 and 7 provide the simulation results and conclusions, respectively.
Related Work
Despite those advantages of OFDM signals, a main limitation is the high PAPR [1] [2] [3] . The main concern with signals having high PAPR is the lack of efficiency due to the power amplifier nonlinear characteristics. The other problem is in the DAC in which high PAPR requires high DAC resolution that leads to high system cost. To combat the high PAPR problem of OFDM signals, several solutions have been proposed in the literatures. The most well-known methods are selected mapping (SLM) [4] [5] and partial transmit sequence (PTS) [6] [7] [8] [9] [10] in the frequency domain and time domain, respectively. There are hybrid approaches in order to reduce the complexity of the PTS method, as described in [8] [9] [10] [11] [12] . Authors in [8] , [9] have proposed a phase sub-block weighting without achieving improvement in complexity; however, the main drawback for those techniques is a lack of PAPR performance that is required for high PAPR signals, especially in MIMO-OFDM applications. In [10] , a new PTS technique based on the insertion of dummy sequence is proposed, but its main issue was the high complexity and bandwidth efficiency degradation. In another previous work [11] , a PAPR technique called enhanced partial transmit sequence (EPTS) is proposed. The results show improvement in the PAPR performance while reducing the complexity; however, its main drawback was the lack of PAPR performance.
In this work, first a new technique called DSI-EPTS is proposed. The main feature of this technique is the incorporation of a new interleaved phase sequence matrix and usage of the insertion of dummy sequence to minimize the complexity and increase the PAPR reduction compared to the prior arts. Moreover, the implementation of the proposed DSI-EPTS scheme on hardware platform has been carried out. The previous works on this topic focus on the implementation of PAPR techniques [13] [14] , but without a comprehensive analysis of other metrics. One of the pivotal factors in implementing the OFDM system is the implementation of IFFT block. Several algorithms have been used to implement the IFFT in OFDM systems, like streaming I/O, pipelines, Radix-2 and Radix-4. Radix-4 is applied here due to its high performance and low complexity compared to other techniques. The DSI-EPTS is implemented in the hardware platform from Xilinx. At the end, the simulation and implementation results have been compared to verify the feasibility of the proposed system in the actual systems.
PAPR Definition
Peak-to-average power ratio (PAPR) is the main parameter to evaluate the dynamic range of the OFDM signals. The square root of the PAPR is also called crest factor which is used to measure the signal envelope variation. As an example, the data sequence is defined as follows:
where X i (i=0,1,… N-1) denote a symbol of a signal. The complex baseband OFDM signal can be expressed as:
where ω 0 = 2π/T and
The PAPR (in dB) which is calculated at the transmitter or from the IFFT output is defined as
where [ ] . E is the expected value operator that shows the average of the magnitude of y(t). It is sometimes useful to know the maximum peaks of the OFDM signal that can be obtained from below:
As the amplitude distribution of the OFDM output signal is known, it is possible to compute the probability of the OFDM signal above a given threshold. This is performed by obtaining the CCDF as follows:
where PAPR 0 is the threshold value. Here, the additive white Gaussian noise (AWGN) is used as the channel model. The main pivotal component in OFDM block is IFFT. IFFT creates orthogonal subcarriers from the input modulated signal by applying the exponential function called twiddle factor. The BER or bit error P be in an AWGN channel is given by [6] :
where M is the modulation order, k=log 2 (M) [6] :
In this paper, the performance of BER versus energy bit signal to noise ratio (
If y(t) in (2) is sampled at an interval of
where s T is the symbol duration.
Without loss of generality, setting
where IDFT is the inverse discrete Fourier transform. Here, Radix-4 has been applied to implement the IDFT algorithm. Fig. 1 . delineates the block diagram of DFT (discrete Fourier transform) based on Radix-4. It can be observed that the processing of input data sequences is not simultaneous in which the data is first loaded to DFT and saved in RAM (random-access memory). This is due to the fact that in FPGA, all the processing is sample based and for executing IFFT, all the input data has to be collected. During the computational process, the new data cannot be loaded. As mentioned in (9), the second part of the sigma is the exponential functions, which make a matrix called twiddle factors. The values of twiddle factor are fixed and can be stored in ROM, which can be used to make the IDFT function. 
Crest Factor Reduction

Conventional Crest Factor Reduction
One of the most advanced techniques to reduce the PAPR is partial transmit sequence (PTS). Due to its highest PAPR reduction performance and its less complexity compared to other PAPR techniques, this technique is applied in this paper to do the comparison. In this section, first some discussions on the conventional PTS (C-PTS) technique are presented and then, the proposed scheme is introduced. Let X denote random input signal with length N in the frequency domain. The signal X is partitioned into V disjoint sub-blocks
. These sub-blocks are combined to minimize the PAPR in the time domain. The method to partition the sub-blocks is based on interleaving due to its lower complexity compared to Pseudo-random and adjacent; however, its PAPR performance is poor [4] . The phase rotation coefficients denote ′ which has the lowest PAPR.
Both b and x can be shown in the matrix form as follows:
According to the C-PTS, the phase sequence of all the symbols is the same. The matrix in (8) is created which has the same value in each row. It should be noted that the oversampling is imperative to obtain the exact PAPR value, that is at least 4 times of the OFDM signal [7] . The oversampling of a signal x inserts zeros to the OFDM signal while the number of phase sequence remains the same. The criteria for searching the optimal vector b  from the matrix in (8) is expressed by:
Following the extraction of the optimum coefficient b  , the signal with the minimum PAPR is transmitted. This is achieved by an exhaustive search of (V-1) of phase factors. This is because one phase factor remains fixed, i.e., b 1 =1.
Proposed DSI-EPTS Scheme
Fig. 2.
shows the block diagram of the proposed DSI-EPTS. In this scheme, the phase sequence matrix in interleaved, which is different from the one proposed in [11] . Moreover, the insertion of dummy sequence further reduces the complexity of the PAPR scheme. The size of the phase sequence matrix in this scheme becomes less by using the adjacent sub-block partitioning. The analysis of the complexity is discussed in the next section. According to this scheme, the dummy signals in complex form are generated and added to the data sub-carriers vector. As a result, the length of the data is increased. A new vector is obtained from K-data and L-dummy sub-carriers where L < K. The new vector U is created as follows:
where
is the data sub-carrier vector and
is the dummy signal vector.
Fig. 2. Block diagram of a PAPR reduction scheme based on DSI-EPTS
By deriving the PAPR of the OFDM signal, its value is compared with the predefined threshold value, according to the defined value of the corresponding standard. If the PAPR is less, the OFDM signal will be transmitted; otherwise, a new dummy sequence is generated and the process continues, as shown in Fig. 2 . This process could be continued for several times. The number of iterations to achieve the required PAPR ( th PAPR ) reduction has a direct relation with the processing time in which higher the number of iterations, higher the processing time, which ultimately degrades the system performance.
According to this figure, X denotes the input signal with length N. At this stage, the signal is 
where P is the iteration number of the DSI-EPTS, and N and V are the number of samples and sub-block partitioning, respectively. By multiplying matrix in (16) , the likelihood of low 
where (17) and (18) are the random and adjacent phase sequence matrix, respectively. The reason that interleaved phase sequence has been selected due to its less complexity compared to random and adjacent phase sequence. The main aim is to identify the optimum phase sequence matrix with the lowest PAPR. It should be noted that N K L = + , which means, there is no change in the input signal length following the insertion of dummy sequence. The maximum number of dummy signals in this case is 55, which is equal to the number of zeros in the OFDM symbol according to the criteria of IEEE 802.16e standard.
By applying a new phase sequence, the number of searching to identify the optimum phase sequence reduces, yielding complexity reduction. Later in complexity analysis, these effects will be shown in details. In the next section, the implementation results of applying this scheme have been presented, which shows the significant conservation in power consumption. The value of P depends on two parameters, PAPR reduction and complexity. It is possible to derive the value of P from the following expression:
where D is the coefficient that is defined based on the PAPR and computational complexity requirement. According to (19) , P is related to the number of sub-blocks denoted by V and W is constant. The process is repeated until the optimal phase sequence c  is obtained from the phase sequence matrix by the following condition:
Upon identifying the optimum phase sequence, the signal ( ) u c ′ is transmitted. The PAPR of ( ) u c ′ has to be in the range of the threshold PAPR ( th PAPR ). This additional processing results in higher complexity which is discussed further in the next section. Following the verification of the PAPR, the signal is transmitted; otherwise, the new dummy sequence will be generated. The process continues for some iterations until PAPR becomes lower than th PAPR .

System Performance
In C-PTS method, the OFDM signal does not exhibit any distortion due to its amendment in phase; however, if PAPR is larger than the threshold, the output signal from the power amplifier will be distorted, forces the power amplifier to back off from its saturation point which leads to system efficiency degradation. In the proposed DSI-EPTS scheme, the transmitted signal through power amplifier has the same characteristics in which there is no distortion generated. The other factor that needs to be considered in actual communication systems, especially at the uplink is the digital predistortion (DPD) [15] [16] [17] [18] [19] [20] . DPD technique increases the power amplifier linearity, which leads to higher peak signals and increasing the system efficiency.
Side Information
The number of required side information bits (SI) in the C-PTS is calculated as:
where v is the number of sub-blocks and the sign     indicates the floor of y.
Since in DSI-EPTS scheme, the addition of the DSI algorithm improves the performance of the system, thus, the less number of partitioning (v) is required. As it can be later observed from Fig. 5 , the DSI-EPTS method with v=2 performs as good as the C-PTS method with v=4.
In another word, the number of sub-blocks v is halved.
From the above formula, SI will be log 2 2=1. This means, transmitting 1 side information bit is sufficient to inform the receiver about the process in the transmitter. This is noticeable compared to 3 bits in C-PTS, and it results in spectral efficiency improvement.
However, the only drawback is the higher memory space required, as shown in (16) . This is due to the fact that the proposed DSI-EPTS scheme has more phase sequences to be multiplied with the data sequence and the corresponding phase sequence matrix has to be saved in memory to make sure, the received data sequence can be recovered. Hence, more memory size is needed to save the phase sequence matrix.
Computational Complexity
Here, the computational complexity of the proposed scheme compared to other methods is investigated. The term complexity here is in fact hardware resource consumption, which differs from the one in [6] . The total complexity of the C-PTS when oversampling factor S=1, can be computed by:
This complexity accounts for the total complex addition and multiplication. In general, a complex multiplication takes four real multiplications and two real additions and on the other hand, a complex addition requires two real additions. Whereas, for the enhanced PTS, this value is as follows:
where V is the number of sub-blocks. To obtain (22) and (23), the complexity of the IFFT and the complexity of the searching algorithm are considered. Most of the previous work did not take into account the complexity of the searching algorithm and as a consequence, the total complexity was miscalculated.
In [11] , the complexity is calculated for IFFT. Hence, the total complexity of the DSI-PTS method is given by:
The total complexity of the proposed DSI-EPTS scheme can be calculated by;
Here, a new metric called HCRR is defined as follows:
(1 ) 100% Table I presents the hardware complexity of C-PTS and the proposed DSI-EPTS when N=512 and W=2. The total complexity as computed here for DSI-PTS and DSI-EPTS is the same. This is achieved while the simulation shows that DSI-EPTS outperforms DSI-PTS in terms of PAPR performance.
Complexity of DSI EPTS scheme HCRR Complexity of C PTS
− = − × − (26)
FPGA Implementation
Prior to introducing the implementation process of the DSI-EPTS scheme, the hardware platform used in the implementation is explained. The implementation is carried out on the XilinxDSP board with the FPGA, XC4VSX35-10FF668. The other features include the availability of the DAC and Analog to Digital Converter (ADC) to test the complete OFDM system. In addition, two sets of SDRAM with the size of 512 K are available to capture I and Q signals and process them to adapt the power amplifier behavioral changing. The Xilinx board is connected through the PCI slot and be able to transmit and receive signals at the speed of maximum 133 MHz. Fig. 3 shows the photograph of the XilinxDSP, including the FPGA and Microblaze processor. The XilinxDSP board consists of Virtex 4 FPGA, ADC, DAC, static random-access memory (SRAM) and several I/Os. According to this figure, the data stream is transmitted from the PC running with Matlab through PCI card. By using the system generator in Xilinx toolbox, the designed DSI-EPTS scheme can be compiled directly to the FPGA, as shown in Fig. 4 . The data stream is transmitted to the power amplifier. The power amplifier here is an off-the-shelf model with memory effects [14] , which is from the captured characteristics from the Mini-Circuit PA. The other important block in Fig. 4 is the adaptation block in which the optimization of the phase sequences in PTS method is performed. This task can be done by using the Microblaze processor inside the FPGA. Hence, C code is written to optimize the phase sequence and the result will be given to the FPGA. The most important part in the implementation of the DSI-EPTS is the implementation of IFFT, as expressed in (2) . To implement IFFT, a Radix-4 algorithm is applied, which has lower size compared to the pipeline algorithm while its processing time is longer. It is obvious that the inverse of IFFT is FFT, which is computed by the phase factor conjugation. It should be noted that processing of the I/O signals are not simultaneous. This means that the signal shall be loaded in RAMs. The new signal is inserted into the system while the computational process is running. The matrix of the twiddle factors is stored in the ROM in order to create the IFFT function. Fig. 4 also shows the implementation of the DSI-EPTS by V=2. It comprises of two IFFT, two complex multipliers and the input and output signals.
The implementation procedure can be explained as follows. Initially, a continuous sample input signal is generated and stored in the memory works as a look-up table (LUT). The memory unit is RAM to transfer the data sequence by using the first input first output (FIFO).
At the final stage, the PAPR will be computed based on the formula mentioned in Section 2. This has to be done for each OFDM symbol. Although the PAPR computation in real systems has to be done in the hardware using digital signal processing, in this paper, it is calculated offline. Upon obtaining the minimum PAPR value based on the criteria in (13) , it has to be transmitted. In order to recover the transmitted signal at the receiver, the side information as mentioned in Section 4.4, has to be transmitted along with the input signal. Table II presents the hardware resource consumption of the C-PTS. It can be seen that the performance is considerably higher which results in higher system cost. Table III gives the information about the hardware consumption of the proposed DSI-EPTS. It can be concluded that the performance is not noticeable, compared to C-PTS which keeps the cost of the system low. The most hardware resource part is the DSP48 slices which is used here as a multiplier. By comparing the above tables, it is fair to say that the hardware resource consumption of the DSI-EPTS outperforms C-PTS, which is mainly due to its less number of IFFT operations.
Numerical Analysis
In this section, the simulation results and its comparison with other methods have been presented. The parameters to evaluate the performance and effectiveness of the proposed scheme to reduce PAPR are N=512 and S=4. To obtain the CCDF, 10 4 random OFDM symbols are generated.
In Fig. 5 , the CCDF results of DSI-EPTS, DSI-PTS and C-PTS are compared when L=55. According to this figure, the number of sub-blocks are V=2 and V=4. Based on the CCDF results, the PAPR reduction of the DSI-EPTS outperforms DSI-PTS when V=2 and V=4, respectively. The PAPR reduction, as shown in this figure when D=1 is almost the same as the C-PTS for V=4 and V=8, respectively; however, this value is higher when D=2 and V=4. From Table 2 , it can be observed that the complexity reduction is minimum when D=2. 6 shows the comparison between the simulation and implementation results. It shows that the implementation results are comparable with the simulation results; however, there is a slight difference due to the FPGA resolution constraint. It should be noted that higher the resolution of FPGA, higher the cost, but better performance can be achieved. In Fig. 7 , the BER performance of the C-PTS compared to enhanced PTS (EPTS) [11] and DSI-EPTS in AWGN channel is shown. From this figure, it can be observed that the BER performance is deteriorated when DSI-EPTS scheme is applied compared to C-PTS. 7. Conclusion In this paper, a novel PAPR reduction scheme called DSI-EPTS based on a new phase sequence has been proposed. The main features of the DSI-EPTS scheme are less complexity and better PAPR reduction performance compared to the C-PTS and DSI-PTS. The prototype of the DSI-EPTS scheme shows comparable PAPR performance with the simulation results. By applying this scheme, the power amplifier efficiency can be enhanced. The implementation results show improvement in power consumption compared to the conventional methods. The proposed scheme can be implemented in recent broadband communication systems, like WiMAX, LTE-advanced and 5G. researcher from Universiti Putra Malaysia (UPM). She has been working in the field of Digital Signal Processing for OFDM based transmission systems applicable for WiMAX, LTE, 4G and beyond technologies. She has authored and co-authored more than 30 papers in cited journals, books, book chapters, and conferences. She has also conducted number of conferences, robotic races, workshops, and trainings. She is a dedicated technology developer to advancing the technology, education, and science society.
Ahmed Wasif Reza (Ph.D., M.Eng.Sc., B.Sc. Eng (Hons.), CEng (UK)) is a Senior Lecturer in the Department of Electrical Engineering, Faculty of Engineering, University of Malaya, Malaysia. He has been working in the field of radio frequency identification (RFID), radio wave propagation, wireless sensor network, wireless communications, biomedical image processing, and cognitive radio and electromagnetic research, both in industrial exposure and academically research valued work. He has authored and co-authored a number of Science Citation Index (SCI) journals and conference papers (about 100 papers). He has also participated as a reviewer and a committee member of a number of SCI/ISI journals and conferences. He is heavily involved with contributing to societies and professional activities.
