Abstract-In spectrally efficient frequency division multiplexing (SEFDM), the separation between subcarriers is reduced below the Nyquist criteria, enhancing bandwidth utilisation in comparison to orthogonal frequency division multiplexing (OFDM). This leads to self-induced inter-carrier interference (ICI) in the SEFDM signal, which requires more sophisticated detectors to retrieve the transmitted data. In previous work, iterative detectors (IDs) have been used to recover the SEFDM signal after processing a certain number of iterations, however, the sequential iterative process increases the processing time with the number of iterations, leading to throughput reduction. In this work, ID pipelining is designed and implemented in software defined radio (SDR) to reduce the overall system detection latency and improve the throughput. Thus, symbols are allocated into parallel IDs that have no waiting time as they are received. Our experimental findings show that throughput will improve linearly with the number of the paralleled ID elements, however, hardware complexity also increases linearly with the number of ID elements.
I. INTRODUCTION
Spectrally efficient frequency division multiplexing (SEFDM) is a research topic that is becoming the focus of a great deal of interest in recent years [1] [2] [3] [4] [5] [6] , within the context of non-orthogonal modulation formats, which are gaining popularity for 5G systems [7] . Such interest in SEFDM derives from its ability to save spectrum in comparison to orthogonal frequency division multiplexing (OFDM) by breaking the orthogonality of the subcarriers, placing them at frequencies that are below the symbol rate [8] . This is timely due to the almost exponentially increasing amount of mobile data traffic [9] , expected to exceed 24.3 EB/month by 2019. This will only be compounded by the worldwide roll-out of 5th generation networks in 2020, where data traffic is expected to increase significantly. It is well known that radio spectrum is already heavily subscribed, leading to high-cost premiums, and as such, modulation formats such as SEFDM that save spectrum are highly sought after.
Research into non-orthogonal modulation has increased rapidly over the last decade, with several candidate technologies proposed in the literature, with alternatives to SEFDM including faster-than-Nyquist (FTN) pulse shaping [10] , and truncated-OFDM (TOFDM) [11] , amongst others [12, 13] . FTN is a time-domain technique that reduces the transmission period for each symbol, thus improving spectral efficiency. On the other hand, TOFDM increases the transmission speed by partial transmission of OFDM symbols in the time-domain.
SEFDM operates slightly differently from FTN and TOFDM, however, and is a frequency-domain system that saves bandwidth by compressing its symbols in frequency, and this bandwidth gain translates directly into a capacity gain. The bandwidth compression factor is usually denoted α, where (1 − α) × 100% is the amount of bandwidth saved, in comparison to traditional OFDM for an equivalent number of bits. Incidentally, when α = 1, there is no compression and OFDM is transmitted [8] .
SEFDM is not without disadvantages though, and one of the most significant is the computational complexity requirements of the receiver [14] . Normally, sphere decoders (SDs) are utilised to undo the self-induced inter-carrier interference (ICI) experienced by exceeding the orthogonality limits of subcarrier spacing [15] . Alternatively, iterative detectors (IDs) have been demonstrated in the literature which are relatively low complexity in comparison [16] and [2] , but introduce significant latency due to their iterative nature. In this paper, for the first time, we propose a pipelined ID structure to increase throughput at the cost of additional computational complexity. We demonstrate that with no loss in performance in comparison to traditional implementations of SEFDM with an ID, throughput can be increased linearly with the number of pipelined stages.
II. SEFDM SIGNAL MODEL
The SEFDM signal consists of N non-orthogonal subcarriers, and each one carries a complex signal, denoted by s. The SEFDM signal, x(t), consisting of m SEFDM data symbols, is represented in the continuous time-domain as:
where T is the period of an SEFDM symbol, α < 1 is the bandwidth compression factor, N is the number of subcarriers in every symbol, and s m,n is the complex symbol modulated on the n th subcarrier belonging to the m th SEFDM symbol. In the discrete time-domain, the same SEFDM symbol can be represented in matrix form as follows [5] : X = ΦS (2) where X represents a Q-dimensional vector of a sampled SEFDM symbol in the time-domain, S is an N -dimensional vector of a sampled input signal in the frequency-domain and Φ is a Q × N two-dimensional matrix that signifies the sampled carrier matrix [5] .
Consider that the transmitted SEFDM symbols pass through a wireless fading channel H, which leads to a channeldistorted signal contaminated by noise Z, resulting in the received signal to be demodulated. The reception process is expressed as follows:
where R is the demodulated signal consisting of a vector of symbols of length N and (.)
* is the transpose conjugate operation.
III. TESTBED DESCRIPTIONS
The software and hardware designs of the real-time experiment are presented in this section to evaluate SEFDM systems in Long Term Evolution (LTE) Extended Pedestrian A (EPA) channel model. A photograph of the experimental testbed is shown in Fig. 1 . The experimental testbed contains several universal software radio peripheral (USRP) transceivers (NI USRP RIO N2395R) programmed using LabVIEW and a Spirent VR5 channel emulator to generate realistic LTE channels. The software design of signal generation and transmission, signal synchronisation, channel estimation and equalisation, iterative signal detection and the new pipeline processing method, all developed in real time on the USRPs are detailed below.
A. Transmission
At the transmitter, a pseudorandom binary sequence is generated, which is then encoded by a recursive convolutional coder with code rate R c = 1/2, forward polynomial
]. The coded bits are then interleaved by a block interleaver before being mapped onto the appropriate constellation. In this work, we test binary phase shift keying (BPSK), quadrature phase shift keying (QPSK) and 8-phase shift keying (8-PSK). Next, the symbols are converted into a parallel stream which feeds an inverse fast Fourier transform (IFFT), resulting in the generation of SEFDM symbols. The distance between subcarriers is compressed by a factor α ≤ 1, where α = 1 for OFDM. The SEFDM symbols are then converted back into serial streams by a parallel-to-serial (P/S) converter. In order to decrease the effect of inter-symbol interference Fig. 1 . SEFDM transceiver test-bed setup (ISI) between adjacent symbols in a realistic wireless channel, a cyclic prefix (CP) is added at the beginning of every transmitted symbol. In the final stage of the transmitter, the complex SEFDM signal is fed to the FPGA that drives the USRP RIO, before digital-to-analogue conversion (DAC) and up-conversion by a local oscillator running at 2 GHz. Table I depicts the system parameters used in this experiment. 
B. LTE Fading Channel Model and Signal Synchronisation
The radio frequency (RF) signal is transmitted through the VR5 channel emulator that has LTE EPA5 wireless channel model [17] , and is set using the parameters shown in Table  II . The output of the VR5 channel emulator is fed back to the receiver of the USRP device, which down-converts the RF signal to the baseband, before analogue-to-digital conversion. A Schmidl and Cox [18] synchronisation is applied in this experiment, where two identical timing sequences are added at the start of each frame to estimate the first sample of the data symbols. 
C. Channel Estimation/Equalisation
In this work, the pilot is sent as an OFDM symbol, but at a lower rate in comparison to SEFDM symbols [19] . Hence, we design our OFDM pilot such that the subcarrier frequencies are equivalent to those of the SEFDM subcarriers, but without the inter-carrier interference, since these pilots are orthogonal. This allows the use of a simple one-tap equaliser in the frequency domain to mitigate the effect of the channel. The CP is then removed from the received symbols, and the first symbol of r (i.e. the pilot symbol) is fed to the channel estimator, which is then used for channel equalisation to mitigate the phase and amplitude distortion on the signal. 
D. Signal Detection
To recover the transmitted signal, we implement an ID based on the turbo equalisation technique with an interference canceller which is fully detailed in [2, 16] . In every iteration, the interference between the subcarriers is estimated and subtracted from the original received signal R, before being passed to the next iteration.
The equalised data is subsequently de-mapped and deinterleaved at the beginning of each iteration before Viterbi decoding. Using the estimated correlation matrix and the decoded data, the interference generated between SEFDM subcarriers is estimated. After subtracting the estimated interference from the received signal, the result is passed back into the decoding process to improve the interference cancellation; repetition of this process leads to a better estimate of the transmitted data.
E. Pipeline Processing in SDR
As described previously (Section III-D), the SEFDM receiver requires the ID for eliminating the inter-subcarrier interference, and one of the negative impacts of this is the introduction of a significant processing delay that limits system throughput. Thus, we introduce a pipeline processing flow on this software defined radio (SDR) testbed to improve the overall throughput of the ID. Pipelining is a well-known concept in real-time SDR processing [20] and FPGA processing flow design [21] . In this work we adopt this signal processing technique by leveraging on the power of decentralised multicore processors. The proof of the pipeline design on the SDR platform provides a guideline for implementation on FPGAs. The principle of pipeline flow design is to decompose the long processing sequence into a group of sub-modules. By allocating each sub-module with new data, the pipeline mechanism maximises efficiency of computing resources by avoiding the idle/waiting status of sub-modules. In [22] and [23] , FPGA designs for SEFDM transmitter and receivers were introduced respectively, then, a pipelined architecture has been proposed for SEFDM transmitters in [24] . For the SEFDM Fig. 3 . Pipeline design in side of a single iteration receiver case, this work demonstrates an example pipeline flow design ID, as illustrted in Fig. 3 .
The principle for sub-module processing is to distribute evenly the processing delay to make max{τ i } ≤ τ , where τ i is the processing delay of the i th sub-module τ is symbol duration of SEFDM. Fig 3, shows that all the sub-modules are fully used after the pipeline setup stage. This helps to improve the system throughput significantly, up to η times, due to the fact that the processing delays are evenly distributed in each sub-module, where η is the number of sub-modules.
The block diagram (Fig 3) shows the pipeline flow for a single ID. Measurement in Fig. 7 , a single cancellation iteration is not sufficient to mitigate residual interference from other sub-carriers. Thus, it is necessary to perform a certain number of iterations to suppress fully the interference, especially for low S/N values.
An additional advantage of pipeline processing on the SDR is that the software environment provides sufficient flexibility and time budget for precise calculation to balance the load between the sub-modules. The tested calculation load allocation strategy can be easily transplant for FPGA pipeline design.
IV. EXPERIMENTAL RESULTS
The measured BER of BPSK, QPSK and 8PSK-SEFDM are shown in Figs. 4, Fig. 5 and Fig. 6 , respectively. Upon inspection, it is clear that each modulation format approaches the target BER after reaching the third iteration of ID for varying degrees of α. It is also possible to infer that a higher ratio of bandwidth compression (decreasing alpha) is possible with a lower number of bits/symbol, since a value of α = 0.4 can be supported with sufficiently low BER for BPSK with a power penalty of ∼ 2 dB. On the other hand, for QPSK, α = 0.7 can be supported with approximately equivalent performance in comparison to OFDM, as the power penalty is approximately 4 dB. Finally, For 8-PSK, α = 0.8 is the lowest value that can be supported over the range tested, where a power penalty of 5 dB, whereas error floors are observed for α ≤ 0.7.
In Fig. 7 , constellations for α = 0.7 QPSK are shown and so is the signal spectrum. The top left constellation shows the received symbols after the FFT (referring to Fig. 2) while the top right constellation shows the same data after channel estimation and equalisation. Clearly, at this stage the data cannot be recovered successfully and hence the requirement for the ID. In Fig. 7 , the progressive improvement in received signal constellation is evident as the number of iterations is increased from one (left) to three (right).
Finally, we note that the transforming of the ID into a pipelined structure will increase throughput linearly, by a factor of η, where η is the number of stages in the structure. However, this comes at a cost of computational complexity, which also increases linearly with η. The convolutional decoder algorithm is the dominant source of computational complexity in the ID, featuring a significant number of additions, which can be calculated following [25] . In Fig. 8 we illustrate this, where it is clear that as η increases, here we show a range of η from 1 to 10, the number of operations per second increases from 1,613 addition operations/second for η = 1, to 16,130 for η = 10. Due to the extensive capabilities of modern digital signal processing units such as field programmable gate arrays (FPGAs), we suggest that this could easily be supported without taking significant resources required for further processing.
V. CONCLUSION
In this work we have experimentally demonstrated a pipelined iterative detector structure for applications of SEFDM. We show that by processing the ID iterations in parallel, SEFDM links can be supported with α = 0.4 (BPSK), α = 0.7 (QPSK) and α = 0.8 (8-PSK), demonstrating no Fig. 4 . BER of BPSK-SEFDM using OFDM pilots loss in BER performance in comparison to traditional IDs. We further calculate the throughput improvement of the proposed, and also discuss the computational complexity. We show that computational complexity increases linearly with throughput.
ACKNOWLEDGMENT
This work was part funded by two EPSRC grants "Impact Acceleration Discovery to Use" and EP/P006280/1: MAR-VEL. The work was also supported by National Instruments and a donation of the LTE FPGA core through the Xilinx University Donation program. We are grateful for UCL's studentship for Waseem Ozan PhD studies.
