Abstract-In this paper a new partial transmit sequence (PTS) scheme called enhanced PTS (EPTS) is introduced to reduce the peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. This is followed with the implementation procedure in field programmable gate array (FPGA). The new phase sequence reduces complexity significantly due to a decrease in the number of searching to find the optimum phase sequence. Simulation results confirm comparable PAPR performance between simulation and implementation results.
INTRODUCTION
Orthogonal frequency division multiplexing (OFDM) system has been adopted in recent broadband communication systems. Despite the advantages of OFDM signals such as high spectral efficiency and robustness against inter-symbol interference (ISI), the OFDM signals suffer from high PAPR which means that in the time domain, the OFDM signal is actually sum of many narrowband signals and at some time instances, the sum of these narrowband signals may be large and at the other time may be small that result in the peak of the signal be much larger than the average value. In this paper, we focus on the issue of how to overcome the PAPR problem [1] , [2] . This can be done by applying PAPR reduction method. The other issue is to maximize the efficiency of the power amplifier. This issue can be resolved by applying digital predistortion technique to the power amplifier (PA). This high PAPR signal when transmitted through a nonlinear PA creates spectral broadening and increase the dynamic range of the digital-to-analog converter (DAC) [3] . The outcome will be an increase in the cost of the system and reduction in efficiency. To overcome this impact, several techniques for reducing the PAPR have been proposed. Some of the most important techniques are selected mapping (SLM) [4] , [5] which is in frequency domain and partial transmit sequence (PTS) [6] - [10] which is in time domain. In a previous work (Varahram et. al.) proposed the DSI-PTS method which enhances the PAPR performance while reducing the complexity significantly.
In this paper a combination of dummy sequence insertion (DSI) and enhanced PTS (EPTS) are applied for PAPR reduction. The main advantage of the present method over the method in [11] is its lesser complexity with the same PAPR performance. In the new EPTS method, the new phase sequence is used to reduce the number of searching for the optimum phase sequence.
This paper describes implementation of a PAPR technique in FPGA. There have been some works that focuses on the implementation of PAPR techniques [12] , [13] . One of the important parts in implementing EPTS is the inverse fast Fourier transform (IFFT) block. There are several ways to implement the IFFT block; such as pipelined, streaming I/O, Radix-2 and Radix-4. In this paper Radix-4 is adopted for IFFT implementation. The EPTS is then implemented in XtreamDSP kit with Virtex 4 FPGA. Finally the implementation and simulation results are compared to show the effectiveness of the implementation.
II. PAPR DEFINITION
In OFDM systems, a fixed number of successive input data samples are modulated first (e.g. PSK or QAM), and then jointly correlated together using IFFT at the transmitter side. This operation produces orthogonal data subcarriers. 
where n x is the n-th signal component in OFDM output symbol, k X is the k-th data modulated symbol in OFDM frequency domain, and N is the number of subcarriers.
The PAPR of the transmitted OFDM signal can be defined as [2] :
where [ ] . E is the expectation value operator. PAPR is a random variable because it is a function of the input data, and the input data are random variables. Therefore PAPR can be calculated by using level crossing rate theorem that calculates the average number of times that the envelope of a signal crosses a given level. Knowing the amplitude distribution of the OFDM output signals, it is easy to compute the probability that the instantaneous amplitude will be above a given threshold and the same goes for power. This is performed by calculating the complementary cumulative distribution function (CCDF) for different PAPR values as follows:
The main part in implementing the EPTS scheme in FPGA is the IFFT block which is expressed in discrete form in (1) . The algorithm that is applied for implementing IFFT is Radix-4, Streaming I/O. The block diagram of FFT implementation based on Radix-4 is shown in Fig. 1 . It is clear that the IFFT is the reverse of the FFT and is computed by phase factor conjugation of FFT. The processing for the I/O signals is not simultaneous. The data is first loaded to FFT and stored in RAMs. During the calculation process, the new signal can't be loaded. The algorithm for calculating the FFT is based on the dragonfly method. Other algorithms like butterfly can also be used. The twiddle factors are fixed in ROM which later will be used to create the IFFT function. Radix-4 is smaller in size than the pipeline method but has longer processing time.
III. ENHANCED PARTIAL TRANSIT SEQUENCE (EPTS)

A. Conventional PTS (C-PTS)
Let X denotes random input signal in frequency domain with length N. X is partitioned into V disjoint subblocks
and then these subblocks are combined to minimize the PAPR in time domain. The subblock partition is based on interleaving in which the computational complexity is less compared to adjacent and Pseudo-random, but it has the worst PAPR performance among them [4] . By applying the phase rotation factor , 1, 2,..., 
where
and L is the oversampling factor. The objective is to find the optimum signal ( ) x b ′ with the lowest PAPR.
Both b and x can be shown in matrix form as follows:
It should be noted that all the elements of each row of matrix b are of the same values and this is in accordance with the C-PTS method. In order to have exact PAPR calculation, at least 4 times oversampling is necessary [7] . As the oversampling of x add zeros to the vector, hence the number of phase sequence to multiply matrix x will remain the same.
The process is performed by choosing the optimization parameter b with the following condition:
After finding the optimum b then the optimum signal is transmitted to the next block.
For finding the optimum b , we should perform exhaustive search for (V-1) phase factors since one phase factor can remain fixed, b 1 =1. Hence to find the optimum phase factor, W V-1 iteration should be performed, where W is the number of allowed phase factors.
B. Enhanced Partial Transmit Sequence (EPTS)
In order to decrease the complexity of C-PTS, a new phase sequence called enhanced partial transmit sequence (EPTS) is generated. The block diagram of the EPTS scheme is shown in Fig. 2 . This new phase sequence is based on the generation of N random values of {1 -1 j -j} if the allowed phase factors is W=4. The phase sequence matrix can be given by: 
where P is the number of iterations that should be set in accordance with the number of iterations of the C-PTS and N is the number of samples (IFFT length) and V is the number of subblocks partitioning.
The value of P can be calculated as follows:
where D is the coefficient that can be specified based on the PAPR reduction and complexity and D N is the value that is specified by user. The value of P explicitly depends on the number of subblocks V, if the number of allowed phase factor is constant
There is a tradeoff for choosing the value of D; whereas the higher D leads to higher PAPR reduction but it comes at the expense of higher complexity; while lower D gives smaller PAPR reduction but with less complexity. For example if W=2 and V=4, then in C-PTS there are 8 iterations and hence P=8D. If D=2, then P=16 and both methods have the same number of iterations. But when D=1, then number of iterations to find the optimum phase factor will be reduced to 4 and this will result in complexity reduction. The main advantage of this method over C-PTS is the reduction of complexity while at the same time maintaining the same PAPR performance. In the case of C-PTS, each row of the matrix b contains same phase sequence while each column is periodical with period V, but in the proposed method each element of matrix b has different random values.
As an example assume N=256, and the number of allowed phase factor and subblock partitioning are W=4 and V=4 respectively. With C-PTS there are W M-1 =64 possible iterations, whereas for the proposed method, in the case of D=2, the phase sequence is a matrix of [128x256] elements according to (8) . In this case 64 iterations are required for finding the optimum phase sequence, because each two rows of the matrix in (8) multiply point-wise with the time domain input signal x v with length [2x256]. The reduction of subblocks to 2 is because it gives almost the same PAPR reduction as C-PTS with V=4. It should be noted that if D=1 then the complexity increase and if D>2 then the PAPR reduction is less.
Therefore the algorithm can be expressed as follows:
Step 1: Generate the input data stream and map it to the M-QAM modulation
Step 2: Construct a matrix of random phase sequence with dimension of [PxN].
Step 3: Point-wise multiply signal x v with the new phase sequence.
Step 4: Find the optimum phase sequence after P iterations to minimize the PAPR.
C. FPGA Implementation
Before starting the implementation process of the EPTS scheme, first we introduce the FPGA kit that we used for implementation. The FPGA that is applied for implementing the EPTS scheme is XC4VSX35-10FF668 which is on the Xilinx DSP kit. The other features of this kit are the availability of the ADC and DAC to demonstrate the whole transceiver scenario of the OFDM system. There are two banks of 512K SRAM to capture I and Q signals. The board connects to the PCI and can be transmit and receive signals at the maximum speed of 133 MHz. Fig. 3 shows the XilinxDSP kit that is used for implementing the EPTS scheme. This board consists of Virtex 4 FPGA, ADC, DAC, SRAM and several IOs. Fig. 4 shows the Xilinx simulation of the EPTS scheme when V=2. It consists of 2 IFFT units, 2 complex multiplier and inputs and outputs. The procedure to implement the EPTS can be described as follows. First the input signal which is continuous samples is generated in the workspace. Because in the EPTS technique the phase sequence should be multiplied with the subblocks, then input samples should be replicated in order to multiply with phase sequence and depending on the number of iterations, the input samples continuously repeated. It should be noted that the input samples and phase sequence are generated separately. By using the multipliers the input samples can be divided into two subblocks in order to execute the EPTS scheme.
The phase sequences are reshaped in to one dimensional vector in order to perform the searching operation. The output of the IFFT experiences several delay due to the nature of the implementation process in which each of the 256 samples (OFDM symbol) has to be saved in the memory until all the real and imaginary part of the samples are saved. This process causes 1291 samples delay. It depends on the type of IFFT implementation whether it is Radix 2 or Radix 4, Streaming IO or single. Hence the delay should be compensated in order to be synchronized with the phase sequence. This is performed offline. The other important block in the implementation is the complex multiplier which consists of 4 real multipliers, 1 addition and 1 subtraction. The output result can be captured and saved in workspace. Now the PAPR will be calculated for each sample. The minimum value of the PAPRs is the one that has to be transmitted. In order to retrieve the original signal at the receiver, side information has to be transmitted with the signal. This causes transmission efficiency degradation. Table I shows the hardware resources of the implementation of EPTS in FPGA. 
C. System Performance
In C-PTS, OFDM signal does not experience distortion however the signal after power amplifier could exhibit distortion if PAPR is higher than the expected value. In this case the power amplifier should back off which degrades the efficiency of the system, Also in EPTS the phase sequence does not exhibit any distortion. In actual application where the cost of the system is the main issue, the other block also has to be considered, the digital predistortion (DPD) [14] [15] [16] [17] [18] . By applying DPD technique, it is possible to increase the linearity of the power amplifier and as a result, higher peak signals can be transmitted to the power amplifier and the performance of the PAPR can be improved. This also increases the efficiency of the power amplifiers and decreases the cost of the system.
D. Side Information
The other important factor in studying the PAPR method is the side information which has to be transmitted to the receiver to extract the original signal. EPTS method, the required side information can be calculated from the above formula; however, the only drawback of this method is that, because of the increase in the phase sequence matrix, higher memory space is required. It should be noted that, addition of dummy sequence does not affect the side information because the dummy signals will be discarded at receiver from the end of the OFDM signal.
IV. NUMERICAL ANALYSIS
In order to evaluate and compare the performance of the proposed EPTS scheme with C-PTS, simulations have been performed. We employed OFDM signal with N=256 subcarriers and QPSK modulation with oversampling factor L=4. To obtain the CCDF, 10 4 random OFDM symbols are generated. Finally the implementation results are compared with simulation. 
V. CONCLISION
In this paper a novel PAPR method and its implementation has been described. The method is based on the new phase sequence in order to reduce the complexity of the system. The EPTS scheme is less complex compared to C-PTS while its PAPR performance is the same. The FPGA implementation of this method is studied and it has been shown that the PAPR performance is comparable with simulation results. This method can be implemented in WiMAX, DVB and 4G applications.
