To meet the exponentially increasing air traffic, Lband (960-1164 MHz) digital aeronautical communication system (LDACS) has been introduced. The LDACS aims to exploit vacant spectrum between incumbent Distance Measuring Equipment (DME) signals and envisioned to follow multi-carrier waveform approach to support high-speed delay-sensitive multimedia services. This paper deals with the design and implementation of end-to-end LDACS transceiver on Zynq System on Chip (ZSoC) platform, consisting of FPGA as programmable logic (PL) and ARM as processing system (PS). We consider orthogonal frequency division multiplexing (OFDM) based LDACS and improve it further using windowing and/or filtering. We propose hardware software co-design approach and analyze various transceiver configurations by dividing it into PL and PS. We demonstrate the flexibility offered by such co-design approach to choose the configuration as well as word-length for a given area, delay and power constraints. The transceiver is also integrated with the programmable analog front-end to validate its functionality in the presence of various RF impairments and wireless channels and interference specific to the LDACS environment. To the best of our knowledge, this is the first ever in-depth analysis of the performance of end-to-end LDACS transceiver concerning parameters such as out-of-band attenuation, DME interference, bit-error-rate, word-length, area, delay, and power.
I. INTRODUCTION
International Civil Aviation Organization (ICAO) envisioned the need of Future Communication Infrastructure (FCI) for aeronautical systems to support exponentially increasing air traffic and enable a wide range of services from voice data to multimedia [1] [2] [3] . The FCI is expected to be deployed in communications, navigation, and surveillance (CNS) applications as well. Research projects such as Next Generation Air Transportation System (NextGen) and Single European Sky ATM Research (SESAR) [2] have been given the mandate to propose and demonstrate the FCI prototype. As shown in Fig. 1 , FCI comprises several data links such as air-to-ground communication (A2GC), air-to-air communication, ground-toground communication, satellite-ground communication and vice-versa. The A2GC link enables two-way communication between aircrafts and ground terminal, and it is the most critical data link in the FCI. The ICAO standardization committee has made the proposal to switch the A2GC link from narrow VHF band (118-137 MHz) to wider L-band (960-1164 MHz) The L-band spectrum allocation, shown in Fig. 2 , indicates that it has been occupied by various incumbent users such as distance measuring equipment (DME), Multi-functional Information Distribution System (MIDS), joint tactical information distribution system, Universal Access Transceiver (UAT), Secondary Surveillance Radar (SSR)/Airborne Collision Avoidance System (ACAS), etc. Based on various spectrum measurement studies, ICAO has identified multiple 1 MHz vacant bands between adjacent DME signals for LDACS. To exploit these bands for the A2GC link, ICAO proposed preliminary LDACS transceiver specifications based on orthogonal frequency division multiplexing (OFDM) transceivers. The OFDM has advantages such as low complexity, simple channel equalization, and multi-antenna support. However, the drawbacks such as large out-of-band emission (OOBE), limited flexibility and stringent synchronization requirements limit the LDACS transmission bandwidth to at most 498 kHz (less than 50% spectrum utilization) due to significant interference to incumbent DME signals. Thus, ICAO expects further research on various windowing and filtering techniques to improve spectrum utilization and feasibility of OFDM based LDACS in complex channel environments encountered in A2GC link during various stages of flight [4, 5] . From the architecture perspective, most of the LDACS transceivers are analyzed via simulations, and their performance analysis on fixed-point hardware in the presence of various RF impairments and wireless channels/interference has not been done yet.
The main objective of the proposed work is to design and implement end-to-end LDACS transceiver on heterogeneous Zynq System on Chip (ZSoC) platform, consisting of FPGA as programmable logic (PL) and ARM as processing system (PS). We also provide a detailed performance analysis with respect to parameters such as windowing, filtering, OOBE, DME interference, bit-error-rate (BER), word-length, area, delay, and power. The contributions of the paper can be summarized as:
1) We design and implement fixed-point OFDM based LDACS and analyze the effect of windowing and/or filtering approaches. Based on the analysis, we suggest enhancements to existing LDACS specifications to improve spectrum efficiency. 2) Since each transceiver block can be realized on PS as well as PL, we provide the architecture for efficient sequential execution on PS (ARM) and efficient parallel execution on the PL (FPGA). 3) We propose novel hardware software co-design approach and implement various transceiver configurations by dividing it into PL and PS. We demonstrate the flexibility offered by such co-design approach to choose the configuration, pipelining and word-length for a given OOBE, BER, area, delay, and power constraints. 4) In the end, various configurations are integrated with programmable analog front-end (AFE) to validate the transceiver functionality in the presence of various RF impairments and wireless channels/interference specific to the LDACS environment. The first three contributions are significant extension of our work in [6] . In this paper, we design and implement four more transceiver configurations than [6] . The performance analysis presented here are detailed as we consider the effect of word-length, pipelining, LDACS specific channels, DME interference as compared to only power spectral density in [6] . Furthermore, we integrate the proposed transceiver with programmable AFE and analyze the effects of RF impairments.
The remaining paper is organized as follows. Section II describes the work done previously in this area. Hardware-Software requirements for the transceiver models are discussed in section III. In section IV and V, the transceiver architecture followed by its variants implementation using hardwaresoftware codesign on ZSoC along with pipelining and AD9361 integration are presented. Experimental results are analyzed in section VI. Section VII concludes the paper.
II. LITERATURE REVIEW
Various works dealing with the performance analysis and feasibility of OFDM based LDACS transceivers for wide range of CNS applications are discussed in [7] [8] [9] . In this section, we focus on the design, implementation, and validation of LDACS transceiver as well as potential alternatives to improve its performance.
The implementation of various blocks in the conventional OFDM based LDACS on homogeneous platforms such as field programmable gate arrays (FPGA) or application specific integration circuits (ASIC) have been discussed in [10] [11] [12] [13] [14] [15] . The major focus of these works was strictly on the synchronization and channel estimation techniques for LDACS environment. In [10] , novel correlation based synchronization approach for large carrier frequency offsets is proposed, and its implementation on the FPGA has shown to consume lower area and power without compromising on the BER performance [11] . In [12] , partial reconfiguration capability of the FPGA is used to design flexible LDACS transceiver. It offers significant improvement in the area and power consumption but gains cannot be extended for implementation on the ASIC. In [13] , a novel sensing method for sensing the active LDACS transmissions via multiplier-less correlation-based approach is proposed. It offers improved performance especially at low signal-to-noise ratio (SNR) and lower power consumption than other architectures. On the receiver side, reconfigurable low complexity filter and filter bank architectures for channelization and spectrum sensing have been proposed in [14, 15] . Such architectures are based on frequency response masking approach, and they enable LDACS ground stations to receive and/or sense single as well as multiple frequency bands simultaneously. The major drawbacks of these works is that they do not consider end-toend transceiver design.
The homogeneous platforms have limitations of flexibility and scalability and may not be suitable for various real-time decision making tasks. Hence, recently heterogeneous platforms consisting of processor and hardware such as FPGA or ASIC on a single chip are being explored. One such platform is ZSoC consisting of ARM and FPGA on a single chip and it is being envisioned for various wireless communication applications [16] [17] [18] as well as autonomous driving, medical applications. For example, a Cognitive Radio Accelerated with Software and Hardware (CRASH) is introduced in [19] and authors analyzed three possible configurations of spectrum sensing and decision making blocks: 1) Both blocks on the FPGA, 2) Both blocks on the processor and 3) Spectrum sensing on the FPGA and decision making on processor. Their experiments show that the third approach offers superior performance over the others. Similarly, cognitive radio exploiting the partial reconfiguration capability of the FPGA and decision making capability of ARM is demonstrated in [20] . Specifically, processor controls the functionality of the FPGA based on real-time network and spectrum status and allows dynamic switching between channelization and spectrum sensing blocks. Similarly, hardware-software co-design approach for IEEE 802.11a transceiver system is discussed in [21, 22] . However, such study and analysis has not been done yet for LDACS transceivers.
Various alternatives have been discussed to improve the OOBE of the OFDM based LDACS. In [23] , filter bank multi-carrier (FBMC) based LDACS transceiver is presented which offers better OOBE and hence, higher vacant spectrum utilization than OFDM due to sub-carrier filtering approach.
However, the complexity of FBMC is high, and receiver design is challenging due to complex synchronization and channel equalization techniques. Since the architecture of FBMC is significantly different from that of OFDM, the single transceiver cannot support both waveforms on a single chip unless they are stacked in parallel. Furthermore, extension of FBMC for multi-antenna transceiver system, a default configuration offering high data rates and superior performance in challenging environment conditions, is difficult. Generalized Frequency Division Multiplexing (GFDM) [24] is another alternative to OFDM but it has not been analyzed for LDACS yet. Furthermore, due to concern regarding the area and power consumption of the transceiver, ICAO prefers windowing and filtering approaches to improve OOBE of the OFDM based LDACS. In [25] , we proposed a reconfigurable filtered OFDM (Ref-OFDM) using reconfigurable linear phase digital filter. Proposed architecture offers better OOBE than OFDM and GFDM as well as enables dynamical switch between various transmission bandwidths using a single prototype filter. Also, it has lower complexity than FBMC and GFDM making it an attractive solution for next-generation LDACS.
To the best of our knowledge, we did not find a work which deals with the efficient hardware realization of endto-end LDACS transceiver on the heterogeneous platform. Also, existing works lack in-depth analysis of the effect of windowing and filtering on the performance of LDACS in the presence of various RF impairments, realistic LDACS channels and DME interference. The proposed work aims to overcome these drawbacks thereby contributing to ICAO LDACS standardization activities.
III. TRANSCEIVER ARCHITECTURE
In this section, we present the detailed architecture of the proposed transceiver and extensions via windowing and filtering. We also discuss the design of AFE along with various LDACS specific channels as well as interference. The detailed block diagram of the transceiver is shown in Fig. 3 .
A. Stimulus and Verification Blocks
The stimulus block at the transmitter reads the input data bits to be transmitted. They are either stored on on-board ZSoC memory or they can be transmitted from the laptop over Ethernet (ENET). For illustration, we consider the total 864 data bits divided into 36 distinct frames of 24 bits each. Frame formation is done using simple counters and multiplexers. The verification block receives the frame and reads the corresponding data bits for subsequent performance analysis. Both blocks are implemented on the PS.
B. Digital Baseband Processing Blocks of Transceiver
Various baseband signal processing blocks of the transceiver are shown in the Fig. 3 . The blocks such as scrambler, interleaver, data encoder, data modulator, frame generation, IFFT followed by CP addition and preamble addition are desired signal processing blocks for OFDM transmitter. The receiver consists of similar blocks which perform the operations in the reverse direction. The OOBE performance of the transceiver can be improved further using windowing or filtering or both. For windowing operation, two new blocks, 1) Cyclic suffix addition, and 2) Windowing, are added before preamble addition. Similarly, at the receiver, we need overlap and add block. For filtering operation, new filtering blocks are added at the transmitter as well as receiver.
Each transceiver block can be realized on the PS or PL. In Fig. 3 , we consider 10 possible configurations, V 1, V 2, .., V 10. Each configuration offers a unique boundary between PS and PL. We discuss these configurations in detail later in Section IV. Here, we focus on the functionality and architecture of each block for the serial implementation on the PS as well as parallel implementation on the PL. 1) Orthogonal Frequency Division Multiplexing (OFDM): The OFDM based transmitter consists of blocks such as scrambler, convolutional encoder, interleaver, binary phase shift keying (BPSK) modulator, Inverse Fast Fourier Transform (IFFT) and cyclic prefix adder. The scrambler does the bitwise XOR operation on the incoming input data and a random scrambling sequence generated by linear feedback shift register (LFSR). The same sequence is used to descramble the data at the receiver. This is followed by a convolutional encoder which uses the generator polynomial of g0 = 133 and g1 = 171. These correspond to a rate 1/2 code with maximum free distance of 7. Thus, the output of the convolution encoder is twice the length of the input. The interleaver performs two-step permutation on coded data and used to handle burst errors. The interleaved data is then converted to complex symbols using BPSK modulator to obtain 48 symbols. Note that any other modulation scheme such as QPSK, 16 QAM or 64 QAM can also be used. These symbols are then mapped to 64 point IFFT as shown in Fig. 4 . As per LDACS specifications, 64 subcarriers are used out of which 48 subcarriers are data subcarriers along with the four subcarriers containing pilot symbols in each frame.
Remaining are the null subcarriers in the middle except a DC subcarrier at the start. To avoid inter symbol interference, a cyclic prefix (CP) of length 16 is added to the OFDM symbol. At the end, preambles are added which aim the receiver for synchronization. The preamble consists of both short training sequence (STS) and long training sequence (LTS). STS is used for timing acquisition, coarse frequency acquisition and diversity selection while LTS is used for channel estimation and fine frequency acquisition [7, 8] . For the length of 160 samples, LTS is repeated twice while STS is repeated ten times. At the end, the signal is transmitted over the wireless channel via AFE and antenna.
Difference in processing modes of PL (Sample mode) and PS (frame mode), leads to difference in implementation of each block of the transceiver in the two modes. Due to limited space constraints, we discuss the architecture of few blocks here while remaining blocks are discussed in detail in Supplementary [26] . The PS implementation of the CP addition involves only vector concatenation due to frame-based processing and as shown in Fig. 5 (a), the last 16 symbols of the IFFT output are appended in the beginning as CP.
On the other hand, PL implementation of the same involves additional counter and registers to store the samples to be added as CP. As shown in Fig. 5 (b), we need two registers of length 2CP (32) and N (64) along with Mod-N counter.
For easier understanding, we consider the illustrative example of frame consisting of 4 samples with 1 CP sample. In this case, we need first register of size 2 and second register of size 4. In the first clock cycle, input sample, a 0 , is loaded into the first register and hence the content of two registers are {a 0 , 0} and {0, 0, 0, 0}. At the fifth clock cycle, content of two registers will be {a 4 , a 3 } and {a 2 , a 1 , a 0 , 0}. In the next clock cycle, frame reset (reset in) happens since we have received all samples of a frame and hence the content of two registers will be {0, a 3 } and {a 2 , a 1 , a 0 , 0}. From the next cycle onward, output valid is always 1 and we get the first output which is a 3 from the first register and content of register becomes {b 0 , 0} and {a 3 , a 2 , a 1 , a 0 }. Here, b 0 is the first sample of a new frame. Subsequently, next four outputs are taken from the second registers. In this way, we get the output as a 3 , a 0 , a 1 , a 2 , a 3 . Similarly, in next four clock cycles, the output will be b 3 , b 0 , b 1 , b 2 , b 3 . As discussed before, valid and reset signals are used to synchronize the transfer of data between any two adjacent blocks and needs to be handled carefully in each block. For instance, as shown in Fig. 5 (b), valid signal involves 32 and 64 tapped delays, similar to the ones used in data signal.
2) WOLA-OFDM: In WOLA-OFDM, the conventional rectangular window is replaced by a windowing pulse with soft edges to improve the out-of-band emission of CP-OFDM [27] . This soft edge windowing is applied in time domain via point-to-point multiplication between the output of CP block and window function. The additional sequence of operations at the transmitter are as follows:
1) Cyclic Extention: The CP addition is slightly different in WOLA-OFDM than CP-OFDM. As shown in Fig. 6 , the CP is formed by appending the last CP + W samples of a given symbol (output of IFFT) to its beginning and the cyclic suffix (CS) is formed by appending the first W samples of a given symbol in its end. Therefore, the length of the WOLA-OFDM time domain symbol is extended from N to N + CP + 2W as shown in Fig. 6 . Such windowing at the transmitter demands additional signal processing at the receiver to suppress the asynchronous inter-user interference. As shown in Fig. 6 , the additional steps at the receiver are as follows:
1) The RRC windowing is again applied at the retrieved data, this window is independent to the transmitted one, and its length is equal to N + 2W . 2) Two adjacent received WOLA-OFDM symbols are overlapped with each other and then added to the next symbol to retrieve the data. The overlap and add process is applied to minimize the effects of windowing on the useful data as shown in Fig. 6 . The PS and PL implementation of windowing is shown in Fig. 7 (a) and Fig. 7 (b) , respectively. The PS implementation at the transmitter is straightforward due to a frame based approach in which a time domain multiplication of the input data with the windowing coefficients is performed as shown in Fig. 7 (a) .
In PL implementation, the data is coming in the form of samples, therefore to add cyclic prefix, suffix and windowing samples, all 64 samples (1 frame) are collected with the help of 63 tapped delays. The input valid signal increments the counter value and the counter counts till 63 i.e total of 64 samples. Once we have received the whole frame of 64 samples (without adding cyclic prefix and suffix), the output valid signal will become one. The output valid signal is generated for one clock cycle for the output frame of size 80 (after addition of cyclic prefix and suffix)).
For PL implementation of windowing, we exploit the parallel operation by dividing the windowing into head and tail sections. Consider P 1 and P 2 denote the windowing coefficients for head and tail sections, respectively. The P 1 is of length W + CP in which first W samples corresponds to first W RRC windowing coefficients (P ) while remaining samples are fixed to 1. The P 2 is of length W and it corresponds to last W RRC windowing coefficients, (P ). In the end, the cyclic prefix, cyclic suffix and the data is concatenated and the desired 80 samples are selected for transmission by discarding the samples in the tapered region.
At the receiver, windowing is modelled in the same manner as transmitter. Additionally, the overlap and add processing is performed by directly extracting the desired samples from the received frame and then concatenate it to the beginning and ending of the symbol. The PS and PL implementation is same for overlap and add processing as presented in Fig. 8 3) Filtered OFDM: The FOFDM uses a linear phase finite impulse response filter instead of time domain windowing for further improvement in out-of-band emission. In [25] , we have shown that FOFDM enables higher transmission bandwidth compared to bandwidth limitation to 498 KHz in OFDM based LDACS system. It also enables the transmission in non-contiguous bands and sharing of adjacent frequency bands among asynchronous users. However, filter needs to be carefully designed and implemented as it may leads to higher inter-symbol and inter-carrier interference. In the proposed FOFDM transceiver, we consider LDACS with 480 KHz of bandwidth with sampling frequency of 1.1 MHz and hence, we designed a linear phase low-pass filter of order 150 with a normalized cut-off frequency of 0.86 and the transition bandwidth of 0.02 generated using park McClellan approach [28, 29] . The PS and PL implementation of the FIR filter is shown in fig. 9 (a) and 9 (b) respectively. The filter specifications and implementation is identical at the transmitter and receiver. For implementation of Filter, we have directly used HDL optimized model provided by Xilinx. In case of PS implementation, we need additional zero padding to handle delay balancing and selector to choose the desired filtered data. For PL implementation of filter, we have studied the effect of word-length on the performance of the transceiver. Please refer to Section V for more details.
C. Analog Front End: RF Transmitter and Receiver
The output of the transmitter is passed to the AFE for over-the-air transmission in L-band. The AFE is designed using the RF models provided by Analog Devices for use in MATAB/Simulink . The transmitter consists of Digital upconversion filters, analog filters and RF front-end as shown in Fig. 11 . The digital up-conversion filter is a series of digital FIR filters that converts the baseband signal to an intermediate frequency (IF) signal. The sample rate of the DUC filter should be same as the input signal. Digital filter also introduces the noise floor. The analog filters are used to shape this noise floor and provide a continuous time signal processed by the RF front-end. The RF front-end up-converts the IF signal to RF carrier frequency using the the local oscillator followed by amplifications using power amplifier.
The RF front-end down-converts the signal centered on the same LO frequency to IF using a quadrature demodulator. The RF front-end has mainly three components: low noise amplifier (LNA), quadrature demodulator (Mixer) and transimpedance amplifier (TIA) and the chain is indicated as LMT. The gains of each component are tunable and controlled by the AGC. The analog filters provide a continuous time signal to the ADC. The ADC models a high-sampling rate third order delta-sigma modulator. The low-pass digital down conversion filters convert the highly sampled signal at the output of the ADC to the baseband. The output of the AFE is passed to the OFDM receiver in Zynq. The integration of the AFE with transceiver in Fig. 3 and its parameters as per the LDACS specification are discussed in the section V-A.
D. LDACS Specific Wireless Channels and DME Interference
As shown in Fig. 12 , three channels which are specific to LDACS environment are considered and they are: Airport (APT), Terminal Maneuvering Area (TMA), En-routing (ENR). The channels are modeled as wide sense stationary with uncorrelated scattering and characterized using three properties: fading, delay paths, and Doppler frequency [30] . The channel parameters are given in Table I [ [30] [31] [32] [33] . Note that the Doppler frequency is obtained as Along with these specific LDACS real time channels DME interference is also taken into account. DME is a measuring equipment used for navigation purposes and has major interference on LDACS as LDACS is deployed between two DME channels. The DME signal is composed of Gaussian pulse pairs given as:
where, δt = 12µs denotes the spacing between the pulses and α is the pulse width of 4.5 × 10 −11 s −2 . All the experimental results presented in this paper considers the DME interference.
E. Receiver
At the receiver, preamble detection block detects the beginning of the data frames using auto-correlation and extract it for subsequent processing. For cyclic prefix removal, the starting 16 samples are discarded out of the 80 incoming The output data symbols are demodulated using the BPSK demodulator. The deinterleaver then deinterleaves the bits using the pre-defined sequence followed by decoding using a Viterbi decoder using the same generator polynomial as a convolutional encoder in the transmitter. The descrambler uses the corresponding descrambling sequence to retrieve the 24 bits of a frame. Similar process is repeated for each frame. The next section presents the HW-SW co-design approach used for the transceiver design and implementation.
IV. HARDWARE-SOFTWARE CO-DESIGN APPROACH
The HW-SW co-design approach gives the flexibility to choose which part of the transceiver is best suited to be implemented on PL and PS of the ZSoC. In this section, we present design details of various transceiver configurations (V1-V10), shown in Fig. 3 realized using the HW-SW codesign approach. The data transfer between PS and PL plays an important role in this approach and corresponding details are summarized in Table II .
We begin with the configuration V1 in which the complete transceiver is implemented on the PS (ARM) as shown in Fig. 13 and hence, there is no data transfer between PS to Table II . The stimulus model generates 32 bit unsigned integers out of which 24 are data bits (single frame), 2 are valid and reset signals and remaining are zero padded bits. Each data bit is modulated and processed to obtain OFDM symbol with 80 samples (64 subcarriers + 16 samples as CP). Each sample can be represented in the form of 8/16/32-bit fixed-point data type. Each frame of 24 data bits takes tpf = 80µs assuming 1 sample takes 1µs. With 36 data frames, 4 pilot frames and additional delays due to frame synchronizations, one simulation runs for 43 * tpf duration. The performance analysis model compares the transmitted and received bits for subsequent BER and throughput analysis. The realization of this architecture on ZSoC is done using MATLAB HDL coder and verifier, along with Embedded Coder toolboxes. Please refer to [6, 26] for detailed steps invloved in the HW-SW co-design.
PL as shown in
In configuration V2, the filtering operation is moved to PL and hence, it is applicable only for FOFDM. As shown in Fig. 14, the transmitter and receivers are divided into two sections, one for PS and other for PL. For V2, the output of transmitter 1 is the frame consisting of 80 complex OFDM samples each of which can be represented in 8/16/32-bit fixed point format. One such frame along with valid and reset signals are interfaced with AXI-compatible buffer realized in PL. The buffering is necessary for subsequent sample-based processing in PL (FPGA). Similarly, unbuffering is needed while passing the data from PL to PS after filtering operation of the receiver in PL (Receiver 1). Note that the sampling time of the blocks in PS is 80µs while the sampling time of the blocks in PL is 1µs. Configurations V3-V9 are similar to V2 where few more blocks are moved from PS to PL. For instance, in V3, preamble addition and detection blocks are realizing in PL along with filtering (in FOFDM). The configuration V4, realizes the windowing, overlap and add block along with the preamble addition and detection in PL and rest of the blocks are implemented on PS. This configuration is only applicable in WOLA-OFDM. In configuration V5-V6, IFFT and CP addition operations are also moved to PL and hence, frame size is reduced from 80 to 48 as shown in Table II . Similarly, in configuration V7, data modulation and demodulator blocks are moved to PL which means Boolean data being transferred between PL and PS. For configurations V8-10, number of data elements are reduced from 48 to 24 since channel encoder and decoders with coding rate of 1 2 are moved to PL. In final configuration V10, entire transceiver is realized on PL except stimulus block. It can be observed that each configuration needs to be designed carefully to synchronize the data transfer between PS and PL. Furthermore, the architecture of the block changes when it is moved between PS and PL due to frame and sample based processing. For PL implementation of each block, we have added pipelining inside the block as well as between the blocks. This demands additional synchronization efforts between PS and PL due to change in latency.
V. EXPERIMENTAL SETUP AND RESULT ANALYSIS
In this section, we present the details of experimental setup and analyze various results to compare the performance and complexity of the proposed transceivers.
A. Testbed Setup and Configuration
In this paper, we have used the Xilinx ZSoC ZC706 evaluation board shown in Fig. 15 for implementation of the proposed transceivers and its specifications are briefly given in the Table. III [34] . It consists of dual core cortex A9 Advanced RISC Machines (ARM) as the software component (PS) and Xilinx 28nm Kintex 7-series as the hardware component (PL) [35] . It is a processor centered device in which PS always boots first and is fully autonomous to PL. Both PS and PL communicate with each other using Advanced eXtensible Interface (AXI) protocol. There are 9 AXI ports between PS and PL and in this project, we use four ports for communication between PS and PL. Among various AXI protocols, we use AXI-stream for communication between PS and PL and AXI-Lite for communication between various signal processing blocks realized in the PL. Fig. 15 . Xilinx ZC706 evaluation board along with its important architectural features [35] . For the design and implementation of the transceivers, we have used MATLAB 2017b and Vivado 2016.4. These are augmented with various MATLAB toolboxes such as Embedded coder and HDL coder/verifier to target the implementation on the PS and PL respectively. To design and configure the AFE, we have used RF Toolbox along with communication and signal processing toolboxes, hardware and support packages provided by Mathworks.
Programming
The AFE is programmed to meet the desired sampling and carrier frequency requirements of the LDACS. The custom digital and analog filters are designed and configured with the help of RF Toolbox of the Matlab/Simulink. For the LDACS transceiver, the passband and stopband frequency are 0.33 MHz and 0.41 MHz respectively. The stopband attenuation is 80 dB and the desired baseband sampling rate is 1.1 MHz. The filter at the receiver is identical to the transmitter. The local oscillator frequency is set to 985 MHz as the LDACS is deployed in the range of 960-1164 MHz and for such up-conversion, various rate changer blocks are added in the design. The output of the AFE receiver is scaled by an appropriate factor (0.00019 to be exact) so that power level of signal at AFE receiver output is closely matches with the signal at AFE transmitter input. The AFE transceiver also introduces the phase noise due to transmission at RF frequency and hence, it demands phase error estimation and correction at the receiver. For the proposed transceiver, we have used pilot signals in LDACS for phase estimation and accordingly, correction is applied to all received samples. Next, we present the experimental results.
B. Power Spectral Density (PSD) Comparison
We begin with the PSD comparison for OFDM, WOLA-OFDM and FOFDM based LDACS transceivers and analyze their out-of-band (OOB) emission. Higher the OOB emission, high is an interference to legacy DME users. Thus, the transceivers should offer lower OOB emission and it should not exceed the desired interference constraints of the DME. Here, we assume that single LDACS transmitter is active in 1 MHz of spectral gap between adjacent DME channels.
The PSD comparisons of OFDM, FOFDM and WOLA-OFDM for 2 transmission bandwidths 1) 732 KHz and 2) 498 KHz are presented in Fig. 16 (a) and (b) respectively. The legacy DME transmission is shown using orange color. Note that 498 KHz is maximum possible bandwidth of existing OFDM based LDACS beyond which it fails to meet the interference constraints of DME. Though FOFDM can achieve 800 KHz bandwidth but we have chosen 732 KHz because it can be achieved using the frame structure same as that of 498 KHz making it compatible with legacy LDACS [25] . For all the transceiver, word-length (WL) is fixed and equal to 32 bits. It can be observed that the FOFDM has approximately 40 dB lower OOB emission and hence, much lower interference to the legacy DME signals. This allows FOFDM to increase the transmission bandwidth from the standard 498 KHz (maximum possible in OFDM) to 732 KHz leading to significant improvement of approximately 50% in the spectral efficiency over existing OFDM based LDACS. Next, we compare the performance of all transceivers by varying the WL. First, we change the WL of windowing and filtering blocks of the transceiver to 8/16 while keeping the WL of rest of the transceiver to 32. As expected, there will be no change in the performance of OFDM as it does not involve windowing and filtering. The PSD of FOFDM and WOLA-OFDM for different WLs are shown in Fig 17 (a) and (b). It can be observed that the PSD for WLs of 16 and 32 are almost identical while there is significant degradation when WL is 8. Thus, it is possible to reduce the WL to 16 without compromising on the PSD performance.
Next, we also analysed the PSD performance when WL of complete transceiver is reduced to 8 and 16 from 32. For illustration, we have shown the PSD of the OFDM in Fig. 18 . Due to limited space constraints and to avoid repetitive results, we omitted the FOFDM and WOLA-OFDM transceivers. For all the transceivers, we observed that the PSD is almost identical for WL of 16 and 32 but there is significant degradation when WL is reduced to 8.
To summarize, we observed that the FOFDM offers superior PSD and hence, lower interference to legacy DME when compared to other transceivers. This allows FOFDM to have wider transmission bandwidth which is desired for the future air to ground communication. However, better PSD at the cost of poor BER performance is not acceptable for wireless transceivers. Hence, we study the BER performance of various transceivers in the next sub-section.
C. Bit Error Rate Comparison
For BER analysis, we consider end-to-end transceiver with LDACS channels (ENR, APT and TMA), DME interference and RF impairments due to the AFE. We consider two transmission bandwidths: 1) 732 KHz and 2) 498 KHz. All BER results are obtained from hardware with at least 1000 frames of data and single BER plot for one transceiver takes around 120 hours on ZC706 with CPU having 16 GB RAM.
As shown in Fig. 19 , FOFDM offers significantly better BER performance than others for wide range of SNRs. Note that though BER performance of WOLA-OFDM and OFDM is acceptable for 732 KHz, they cannot be deployed due to severe interference to DME. Similar to PSD analysis, we compare the BER performance for three different WLs, 32, 16 and 8. As shown in Fig. 20 , BER performance degrades with the decrease in WL for all the transceivers. However, the FOFDM offers significantly better performance than others. In fact, the BER of FOFDM with WL of 16 is significantly better than that of WOLA-OFDM with WL of 32. Similarly, the BER of FOFDM with WL of 8 is significantly better than that of OFDM and WOLA-OFDM with WL of 32 and 16, respectively. Next, we study the effect of WL of windowing and filtering blocks on the BER. Since the PSD and BER performance of transceivers with the WL of 16 and 32 are comparable, we have used the transceiver with WL of 16 for the results shown in Fig 21. It can be observed that the FOFDM with filtering operation using WL of 16 and 32 offers similar performance while its performance degrades when the WL is reduced to 8. Similar trend is also observed for WOLA-OFDM. Thus, the selection of WL is important criteria for transceiver and higher WL may not guarantee higher gain in performance. In terms of BER and PSD, FOFDM not only offers better performance but also leads to higher transmission bandwidth. However, this gain in performance should not come at significant cost in terms of complexity. To analyse this, we present the area and power complexity of these transceivers in the next sub-section. 
D. Resource Utilization and Power Consumption
In this subsection, we compare the resource utilization and power consumption of the proposed OFDM, WOLA-OFDM and FOFDM architectures for 10 different configurations. Since the bandwidth of the transceiver is tunable, the results shown in Table IV corresponds to 732 KHz bandwidth which has higher complexity than 498 KHz bandwidth. To begin with, we consider the WL of 16 in Table IV . All results are obtained after realizing the transceiver on ZC706 from Xilinx.
As shown in Table IV , comparison is made in terms of number of flip-flops, DSP48 (embedded multipliers), lookup-table (LUT) for memory, LUT for logical and arithmetic operations, multiplexers and dynamic power consumption of the FPGA. The power consumption of ARM PS (1.566 W) and static power consumption of FPGA PL (0.247W) is, as expected, identical for all configurations.
In case of V1, entire transceiver is in PS and hence, FPGA resource utilization is zero. In V2, FOFDM resource utilization is due to filtering block realized in FPGA. As expected, multiply-accumulate operations in the filter is mapped to DSP48 to get best possible performance. In V3, preamble addition and detection block is moved to FPGA and due to inbuilt auto-correlation operations, it is one of the most complex block as evident from resource utilization. Similarly, significant increase in resource utilization and power consumption is observed in V5 where FFT/IFFT is moved from PS to PL.
To summarize, FOFDM incurs 27% higher DSP48 than others due to MAC based filtering which can be shifted to LUT as logic if needed. For example, windowing operation in WOLA-OFDM is realized using combination of DSP48 and LUT as logic. The utilization of the rest of the resources is almost identical in all three waveforms. The IFFT/FFT In Fig. 21 , we discussed the effect of WL of filter coefficients in filtering block of FOFDM on BER. In case of resource utilization, we observed the increase in the utilization with WL as shown in Fig 22. For WOLA-OFDM, different WL of windowing coefficients is not feasible for air to ground communications due to poor PSD and BER performance.
The above discussed results shows that the FOFDM offers better side lobe attenuation and better BER performance in trade off to the resource utilization. Filter designed by considering 8 bit fixed WL performs worse than 16/32 bit filter in terms of PSD and BER but better in terms of resource utilization. The FOFDM has higher usage of resources compared to OFDM and WOLA-OFDM but still uses less than 50% of the FPGA resources except DSP48. This makes the FOFDM based LDACS as an appealing substitute to the future air to ground communication. 
VI. CONCLUSION
In this paper, we designed and implemented an end to end LDACS transceiver on Xilinx ZC706 FPGA using HW-SW co-design approach. This co-design approach gives flexibility to choose the configuration along with the word-length for a given area, power and delay constraints. These transceivers are integrated with analog front-end AD9361 to endorse its performance in the presence of various RF impairments, DME interference and LDACS specific wireless channels. We consider OFDM based LDACS and improve the performance using windowing and/or filtering. Detailed experimental results are presented to analyze the area, power, PSD and BER performance for OFDM, WOLA-OFDM and FOFDM having three word-lengths of 8/16/32 bit. The results show that the transceivers with the WL of 16 and 32 bit offers similar performance while the performance degrades for 8 bit WL. The Filtered OFDM based LDACS performs much better in terms of out of band emission (approximately 40 dB) and has significantly better BER performance which allows to adapt a wider transmission bandwidth upto 800 KHz by compromising in resource utilization and power consumption. Though, FOFDM has higher resource utilization compared to OFDM and WOLA-OFDM but still it uses less than 50 % of the FPGA resource except the DSP48. This makes the FOFDM based LDACS, an attractive solution for the next generation air to ground communication.
Transmitter

Scrambler
The 24 bit input stream is scrambled according to a predefined constant scrambling sequence by performing a bit-wise XOR operation. The selector block is used to select the corresponding bit of the scrambling sequence for each incoming data bit. The difference in the PS (Fig. A.1) and PL (Fig. A.2) implementation of the scrambler block is in the generation of valid signal due to addition of PS-PL boundary in the PL model. For the PS implementation, the valid signal is constant as true. While for the PL implementation, due to presence of PS-PL boundary prior to the scrambling block, appropriate valid signal is generated once all the 24 valid bits have been received. 
Convolutional Encoder
The scrambled sequence and the valid sequence are then forwarded to the convolutional encoder block. A 1/2 rate convolutional encoder with p 1 = 133 and p 2 = 171 as the generator polynomials has been used to add error detection and correction capability at the receiver. The entire sequence is encoded using simulink convolutional encoder block for the PS implementation ( Fig. A.3) . Since, the PL implementation is sample based, a frame to sample conversion is require prior to the convolutional encoder and a vector concatenation after encoding to retrieve the entire frame as shown in Fig. A.4 . 
BPSK Modulation
BPSK modulation used in the architecture has −1 and +1 as the constellation points. The pre-defined BPSK baseband modulator block of the simulink communication toolbox is used in the PS implementation (Fig. A.6 ). The phase offset is set to zero. Similarly, other modulation schemes can also be used. We have designed our own model for PL implementation of the BPSK modulation wherein we shift the amplitude of the incoming bit to the respective constellation points using multiplication and subtraction. The complex BPSK symbols are generated by assigning the imaginary part to be zero (Fig. A.7) . However an HDL coder block for BPSK modulation is also available. 
Preamble Addition
The preamble sequence is predefined and has both long and short preamble sequence. For the PS implementation Fig. A.8 , a counter is used to detect the frame number. For the first 4 frames, the preamble (both short and long) is transmitted. The preamble sequence is read from the workspace. Of the 320 samples, 80 samples to be transmitted are selected depending on the frame number. For the PL implementation, the preamble sequence is stored in LUTs each for short and long preamble. The data and the valid signal to be transmitted are then decided depending on the sample number detected using a counter Fig. A.9 . The valid signal from the transmitter now enables the receiver functionality. The first step in the receiver is preamble detection which uses auto-correlation to detect the data frames. The autocorrelation is performed using a filter and a magnitude detector to detect the peak Fig. A.10 .
The implementation of preamble detection is same in both PS and PL. 
BPSK Demodulator
Using the simulink BPSK baseband demodulator block we retrieve the bi stream from BPSK symbols in both the PS and PL implementations as shown in (Fig. A.11 ) and (Fig. A.11 ).
However, delays are used in the PL implementation to generate an appropriate valid signal to keep the data integrity and get the complete frame of 64 bits required for the deinterleaving process ahead (Fig. A.12 ).
Deinterleaver
The bitstream is deinterleaved using the pre-defined deinteleaver sequence and a selector block similar to the interleaving process discussed above. 
Viterbi Decoder
The viterbi decoder block from communications system toolbox is used to decode the data. For the PS implementation, the complete frame is fed to the decoder to generate the output frame consisting of the decoded bits (Fig. A.15 ). While in the PL implementation, sample counter is used to monitor the incoming bits corresponding to each output sample bit as shown in Fig. A.15 . 
Hardware -Software co-design workflow
To design and simulate the transceiver models Hardware -Software Co-Design approach is being used. It is an important approach to implement any algorithm on ZSoC as it utilizes the heterogeneity of PS and PL. This approach also gives the flexibility to choose which part of the system is best suited to be implemented on PL and which on PS. PS makes easy and faster decision-making operations on the other hand PL reduces power consumption and increases speed. The steps for hardware-software Co-Design approach are as follows:
1. Designing a Simulink model for transceivers and set the parameters like number of samples per frame, sampling frequency, total FFT size, Active subcarriers, and subcarrier spacing.
All the blocks present in the Simulink library are not hardware synthesizable. So, while designing the Simulink model these blocks need to be avoided.
2. Differentiate the subsystem of the model which is going to implement on the PL believing that all the other subsystems will target to implement on PS. PL works in sample mode, and PS works in frame mode, which requires an appropriate sample to frame and frame to sample conversion at the boundary of PS-PL interface. Fig. A.17 shows the design have N functional blocks. Transmitter subsystem consisting of blocks 1 T , 2 T , 3 T ...i T are implemented on PS, and remaining blocks are implemented on PL. Similar process is used for the receiver operations. Note that, the output to the host computer will come back through the PS. 
Host PC
ZSoC ZC706 + FMCOMM Board
