This paper reports on a complete end-to-end 5G mmWave testbed fully reconfigurable based on a FPGA architecture. The proposed system is composed of a baseband/low-IF processing unit, and a mmWave RF front-end at both TX/RX ends. In particular, the baseband unit design is based on a typical agile digital IF architecture, enabling on-the-fly modulations up to 256-QAM. The real-time 5G mmWave testbed, herein presented, adopts OFDM as the transmission scheme waveform, which was assessed OTA by considering the key performance indicators, namely EVM and BER. A detailed overview of system architecture is addressed together with the hardware considerations taken into account for the mmWave testbed development. Following this, it is demonstrated that the proposed testbed enables real-time multi-stream transmissions of UHD video content captured by nine individual cameras, which is in fact one of the killing applications for 5G.
Introduction
Millimeter wave (mmWave) communications are envisaged to be integrated in the upcoming generation of mobile networks, namely eMBB, one of the 5G verticals. This has been seen as the solution to increase the overall capacity of mobile radio cells, enabling the support of multi-Gigabit/s transmissions towards mobile equipments. Due to the large available bandwidth (55-66 GHz), mmWave band is, thus, very attractive for future 5G wireless communication systems, which might provide transmission data rates over 10 Gbps and network latency below 1 ms. However, even if a 2 GHz channel in the 60 GHz band is used to transmit data employing both 4 and 16-Quadrature Amplitude Modulation (QAM) modulations, data rate would still be limited to 4 and 8 Gbps, respectively. Therefore, there is demand for improving both system reliability and data rates. In this context, it is introduced in this work a novel software-define radio (SDR) mmWave testbed, aimed to tackle the 5G communication requirements outlined in [1] .
With mmWave testbeds importance in mind, work at 60 GHz can be found in the literature [2] [3] [4] . However, such works were conducted with universal software radio peripheral (USRP) that offer limited bandwidths of around 25 MHz capabilities, which do not fulfill the 5G foreseen multi-Gigabits scenarios. To this extent, this work discusses the implementation of a real-time software-defined radio mmWave testbed that can cope with the next wireless generation needs.
A software-defined radio (SDR) is a common term given to a system which employs the majority of physical layer functionalities using digital signal processing algorithms implemented in an embedded This SDR system ( Figure 2 ) is composed of an embedded platform engine called Field Programmable Gate Array (FPGA) [8] , and at least by one pair DAC/ADC responsible for signal translation between the digital and analog domains. A FPGA is an integrated circuit intended for general purpose usage, which contains programmable logic blocks that form higher-level functions (multiplexers, adders, multipliers and RAM memories) and a reconfigurable interconnection between them [9] .
FPGA platforms bring three main benefits for the designer: faster time-to-market (no layout for manufacturing needed), simpler design (the software handles much of the routing, placement, and timing) and re-programmability. Traditionally, all the above features would become available at the expense of chip performance, e.g., clock speed. However, in recent Xilinx's FPGA technology digital clock values can now go over the 500 MHz performance barrier [9] .
Consequently, FPGA technology was used for the 5G mmWave wireless communication that we propose especially due to their faster time-to-market and reduced cost characteristics. Due to its primal importance in this work, a brief overview of FPGA considerations is given in Section 2.1.
FPGA
One of the main challenges faced by embedded system developers is the diversity of external I/O interfacing requirements, which might be Ethernet, optical, analog conversion or Gigabit serial [10] . Thus, to enable data transfer between the embedded system and other hardware devices, the creation of external I/O interface is needed. Nowadays, FPGA come with one or more expansion slots for daughter boards (mezzanine module), which can increase the I/O possibilities. Such expansion slots allow FPGA designers to not rely solely on each FPGA I/Os and decrease system development costs. Without it, designers would need to change FPGA development board whenever system enhancements require additional interfaces [11] .
To overcome this, the VITA 57 FPGA Mezzanine Card (FMC) connectivity ANSI/VITA standard was developed in 2008, to provide a modular standard mezzanine I/O interfacing solution to a FPGA located on a carrier board [11] . This brought several benefits such as maximum data throughput of 40 Gbps, low latency, reduced design complexity, and reduction of system cost [10] .
Such standard foresees two types of expansion slots, also called FMC, which are usually populated on FPGA commercial solutions (LPC and HPC). While the first FMC solution has 160 pins, the second one has 400 available pins. Moreover, the Low Pin Count (LPC) connector provides 68/34 user-defined signals for single-ended and differential signals, respectively, as well as one serial transceiver pair, clocks, Joint Test Access Group (JTAG), and Inter-Integrated Circuit (I2C) interfaces. The High Pin Count (HPC) maintains precedent mentioned functionalities while providing twice the user-defined signals of the LPC connector also with 10 serial transceiver pairs.
At the time of this work, the most recent and powerful Xilinx FPGA (with more logic resources) available on the market is the Virtex FPGA's family, namely Ultrascale [9] . For example, the XCVU9P-L2FLGA2104E chip provides more than 2000 k logic cells [12] . However, it is relatively expensive, and thus it was decided to acquire the Virtex-7 VC707 evaluation kit instead [13] (see Figure 3 ). This board is characterized as highly-flexible, having a high-speed serial XC7VX485T-2FFG1761C chip-set, and a good compromise between cost and available logical resources. 
ADC/DAC
The selection of both DAC and ADC was based on the criteria of speed, bit resolution together with VC707 FPGA board compatibility. According to the Xilinx documentation, the FMC 230 (DAC) [14] and FMC 126 (ADC) [15] from 4DSP are the only converters available in the market that met such requirements. Table 1 summarizes the main converters features, where it can be noticed that the maximum bandwidth of the proposed SDR is restricted to 2.5 GHz. The FMC126 is a four channel 1.25 GSPS 10-bit resolution ADC daughter board compliant to the mentioned HPC VITA 57.1 standard. Its design is based on four EV10AQ190 Quad 1.25 GSPS ADC chip-sets [18] , which are characterized by Double Data Rate (DDR) Low Voltage Differential Signal (LVDS) outputs [16] . Moreover, each core can operate either individually, in pairs or even all interleaved together, which in the end enables sample rates of 1.25, 2.5, and 5 GSPS, respectively. To achieve the gigabit rates reported on this work, the ADC had its reference sample clock configured to 2.5 GHz, which can be locked by either an external clock source or an internal IC clock circuit.
The FMC230 is a two channel 14-bit DAC daughter board, which is controlled through a single Serial Peripheral Interface (SPI) communication bus [17] . Such board design is based on two AD9129 single-channel 14-bit 5.7 GSPS DAC with DDR LVDS inputs [19] .
Since such boards are intended to work at gigabit rates, special attention was taken in their clock distribution. The clock distribution is carried out by a AD9517 chip from Analog Devices (AD) [20] , which by default has an internal PLL and a 2500 MHz internal VCO. However, as discussed in Section 2.3, the VCO can also be provided externally.
As in FMC126, the FMC230 sample clock is provided by the mentioned AD9517, but its internal PLL reference is configured at 30.72 MHz rather than 100 MHz, which in the end achieves an internal VCO locked at 2457.60 MHz (rather than the 2500 MHz).
ADC/DAC Integration with FPGA
The integration of ADC/DAC with VC707 board required hardware fine tuning to achieve the desirable data rates. Thus, this section addresses thoroughly the steps taken to achieve such integration. It is important to mention that 4DSP already offers a VC707 reference design to serve as basis, but multiple iterations were necessary to achieve the proposed mmWave SDR testbed.
Firstly, the reference design FMC slots were changed to have both boards working simultaneously using the same FPGA. Secondly, since both boards use FPGA Ethernet as an initialization and data interface, a physical connection had to be considered onto the proposed SDR testbed. Thirdly, sampling rate adjustments were performed, because both ADC/DAC had different default values which, in the end, would lead to loss of samples. Lastly, both DAC and ADC data path were changed to be transmitted in real-time rather than using the default memory architecture, which temporarily allocates data in memories before transmission.
As mentioned above, 4DSP reference firmwares target Ethernet communication as the protocol to perform the initialization of both devices and to transmit/receive data. Consequently, such architecture demands a host PC to be always connected to the FPGA, which is impractical in a wireless communication transceiver. Therefore, Ethernet connection must be replaced by a new VHDL entity capable to play back initialization commands that were previously sent by the MAC engine to the FMC board entities (stars). This approach is highlighted in the block diagram illustrated in Figure 4 , where all the connections in red are the ones that must be removed from the original FMC230 and FMC126 reference designs. With such changes, the MAC engine star is now just considered to generate clock and reset signals to the others stars and, due to this fact, it has been decided not to be removed.
In this work, the aforementioned new entity is a processor engine, which allows the configuration commands of both converters to be sent (in real-time) from the FPGA to either ADC or DAC. Nevertheless, the design and implementation of a processor hardware engine architecture is not a straightforward task. It is usually composed of a microBlaze processor, peripherals, reset, clocking and debugging blocks, which are connected together using an AXI4-Stream protocol [21] . Since the microBlaze is a soft-core processor with 32-bit RISC architecture and AXI interfaces, the developed entity had AXI in mind for easy, direct access to fabric and hardware acceleration [22] . It is worth noting that this interface/protocol is not a bus, and thus a custom user block was necessary to convert the AXI data coming from the processor engine to the bus format given in Figure 4 . The VC707 board has two instances of the FMC HPC VITA 57.1 mentioned in Section 2.1. However, the VC707's HPC2 connector is not fully populated, e.g., it has a reduced pin-out in comparison with the standard HPC connector. Consequently, the FMC230 placed on such connector would only have one out of its two DAC channels available. On the other hand, if FMC126 would be connected to it, only three out of its four ADC channels could be available. Considering that most relevant modern digital modulation techniques employ I/Q modulations to achieve higher spectral efficiency, two channels are required in both DAC/ADC boards. Therefore, FMC230 and FMC126 have been connected to the FMC1 and FMC2 slots of the VC707 board, respectively, as can be seen in Figure 3 . The FMC126 board includes not only the Quad ADC IC core itself, but also other peripherals, such as a clock tree IC circuit and a I2C communication for ADC control purposes and temperature monitoring [16] . Since the FMC126 reference design foresees its use in the HPC1 connector, the I2C clock (SCL) and data (SDA) signals must now be driven to the second HPC2 connector. This can be achieved by changing VC707's I2C switch, PCA9548A [23] . As depicted in Figure 6 , the sip_i2c_master entity should route signals either to FMC1 or FMC2 connectors depending on I2C channel selection. 
ADC/DAC Sample Rate Synchronisation
With both ADC/DAC fully integrated in VC707 FPGA, considerations on their data sampling process was studied and addressed. Such process at the receiver might be subject to severe timing imperfections such as erroneous information acquisition or transmitted samples loss depending on both DAC/ADC sampling rates difference [5] . The erroneous information acquisition can occur when ADC rate is higher than DAC, whereas loss of transmitted samples might happen if ADC rate is lower than DAC. That said, sampling rate of both converter boards considered should match.
The easiest solution, found in this work, to meet the above requirement is to configure the clock tree of the FMC230 to drive a 1.25 GHz clock signal into both AD9129 cores. This would be a very straightforward task if the internal VCO of the AD9517-1 could generate a 1.25 GHz operating frequency (maximum rate achievable by FMC126 without phase interleaving) or integer multiples of this frequency. In Figure 7 , it is seen that the DAC sample clock, driven by the f DAC_CLK path, is provided by the OUT2 output of AD9517-1. This port provides clock frequencies from 2457.6 MHz down to 12.8 MHz, which are the result of configuring either both block divisors VCO_div and div1 or, alternatively, bypassing VCO_div and using div1, or vice versa. However, only integer values are acceptable in such block dividers, and, thus, is not possible to obtain a divisor combination where OUT2 port is at 1.25 GHz. Therefore, meeting the ADC sample clock in the FMC230 is only possible considering an external signal as sample clock reference.
As a workaround, the clock tree of the FMC126 has been configured to enable a Clock Output (CO) signal, with Low-Voltage Positive-referenced Emitter Coupled Logic (LVPECL) waveform type, at 1.25 GHz, to be connected to the Clock Input (CI) port of FMC230. Although the AD9517-3 circuit can be easily configured to enable OUT1 as CO, this port is not connected to any external connector in the FMC126 board. To overcome this, an additional SSMC connector has been soldered to the CO port pad in the reverse side of the device PCB.
FMC126/FMC230 Data Interface
It is well known that operating FPGA logic at GHz clock frequencies is impractical, since very low rising interval times result into failing design timing requirements and data integrity is not ensured. Looking at current solutions, the FPGA clock performance barrier is around the 500 MHz. Hence, 4DSP adopted a parallel data path architecture in both original FMC230/FMC126 firmwares, which reduces significantly the required data clock frequencies to MHz. For example, from the FMC230 functional block diagram (see Figure 7 ), two parallel data DDR interface paths come from the FMC connector (DAC0_p0_p/DAC0_p0_n and DAC0_p1_p/DAC0_p1_n-operating at f DACCLK/4), and are then multiplexed into a single data stream by the assembler entity, which is clocked at f DACCLK/2, where f DACCLK/ represents the DAC sampling rate value (F s ). A detailed block diagram of the FPGA data connectivity with DAC's firmware is depicted in Figure 8 . In the figure, it is verified that data source is generated in the Ad9129_w f m_inst0 entity, driven by the w f m_out_data bus signal (16 words of 16bits each) clocked at txclkdiv8. Such clock value is equal to DAC0_dco value divided by 4 or DAC F s 16 . Since AD9129 bit resolution is 14 bits, the two most significant bits are then discard in the Ad9129_io_bu f _v7_inst0 entity. In this entity, the 256 data bit bus (Odata_reg) is serialised, using two sets of fourteen parallel to serial structures of Oserdeses2 (high-speed source-synchronous output fabric interfaces) operating in DDR mode. This results in two buses of 14 bits (DAC0_p0_p/DAC0_p0 and DAC0_p1_p/DAC0_p1), clocked at txclkdiv2 clock, which are then connected to the FMC230 board through the FMC1 pin-out connection (see Figure 7 ). 
P1_logic [13] .oserdese2_p1 The data interface of the FMC126 with FPGA is rather similar to the FMC230 ones, but here eight parallel data paths are considered instead. That is, data go to the FMC230 on a 256 bit bus, whereas data come from the FMC126 on a 128 bit bus. This leads to a mismatch of the clock value in both DAC and ADC firmware channel paths. For example, considering a target sampling rate of 1.25 GHz, buses are clocked at 78.125 MHz and at 156.25 MHz for FMC230 and FMC126, respectively. To overcome this issue, the number of data paths in the FMC230 firmware were reduced to 8 (matching the ones from the FM126), and thus entities, namely, ad9129_mmcm_isnt, ad9129_phy0_dac0_inst0, and ad9129_io_bu f _v7_isnt0, have been updated accordingly, where txclkdiv8 value is now given by As previously stated, the testbed data source is modulated with a real-time OFDM Transceiver (TRX) engine, which was developed in Xilinx System Generator (SysGen) environment. SysGen is the state-of-the-art tool for the design, testing, and implementation of high-performance DSP algorithms on FPGAs. It enables a rapid prototyping of very complex FPGA design by considering the Mathworks model-based design environment, namely Simulink, with total abstraction of the chip-set complexity [24, 25] . In this context, OFDM TRX engine (testbed) was packed in SysGen into a customized IP to be integrated into the previously discussed transceiver firmware.
However, to design and implement the aforementioned communication system, it is required to understand how a real-time data source engine would be integrated to the data interface of both ADC/DAC reference designs (eight parallel processing paths at 156.25 MHz). In this context, at the transmitter, a DSP interpolation filter is necessary, which would be capable of interpolating the sample rate of 156 MHz to the DAC sampling rate of 1.25 GHz. Such solution was implemented using Finite Impulse Response (FIR) filter banks (clocked at 156.25 MHz) by following the polyphase decomposition algorithm reported in [26] . This solution increases the sampling rate and attenuates the generated spectral images from the up-sampling process using a low-pass filter, which also shapes the transmitted signal bandwidth [5] . On the other hand, on the RX side, the counter part must be implemented. That is, ADC sampling rate is decreased using a down-sampling operation followed by a low-pass filter, removing undesired components that would otherwise overlap (aliasing) into the used band. These two operations together result in a decimation filter [5] . It is worth noting that for the I/Q up-/down-conversion 16 distributed FIR filters are necessary.
A filter bank is characterized by a parallel arrangement of low-, band-, or high-pass filters, required to decompose the spectral content of a certain signal into multiple sub-bands [27] . When the filter bank is implemented based on a low-pass prototype filter, which corresponds to the 0th band, the other uniform sub-bands are obtained by frequency shifting the prototype filter frequency response [28] . In this case, it is known as DFT filter bank and the graphical representation of its frequency response is illustrated in Figure 10 . Furthermore, the polyphase decomposition algorithm [26] can be employed to enable an efficient real-time implementation of uniform filter banks, while ensuring the perfect-reconstruction of the analysis/synthesis process.
Since eight parallel signal processing paths are required per I/Q path, eight analysis/synthesis FIR filters were carefully designed and implemented, considering the following procedure:
1. Filter response type: It is well known that pulse shaping with raised cosine is characterized by a zero-Inter-Symbol Interference (ISI) [5] . That is, despite its impulse response extending over several other symbol periods, at the decision points, there is neither constructive nor destructive interference. Therefore, in this work, a square root raised cosine matched filter pair was considered, which have the same zero-ISI properties of the raised cosine [5] . 2. Filter order and frequency specifications: The interpolation/decimation filter was designed with the aid of the Matlab FDAtool algorithm. In this tool, the normalized LPF cut-off bandwidth was set to 0.125, which is the relation between the clock rates of a single processing path and the DAC core. Moreover, both Kaiser window Beta, and filter roll-off values were set to 2 and 0.25, respectively, since this configuration was found to be the best compromise between passband ripple, stop-band attenuation, and transition band. Finally, filter order was chosen to be 62, since it was also a compromise among system complexity, stop-band attenuation, and transition band. 3. Distributed implementation of interpolation/decimation: Both filter banks were implemented using multiple parallel FIR digital filters.
Additionally, both IQ up-/down-conversion stages can be performed by the multiplication of the desired signal by a complex exponential operating at a certain center frequency ( f c ), as it is illustrated in Figure 9 . At the transmitter, the baseband signal spectrum is frequency shifted to be centered at ( f c ), resulting in a non-symmetric spectra [5] . Consequently, applying e −j2π f c at the receiver, will move the transmitted signal spectra to be centered at DC, thus converting the received signal back to baseband. Such complex exponential function was implemented using a Coordinate Rotational Digital Computer (CORDIC) DSP algorithm, which is responsible for implementing an IQ-mixer with angular frequency proportional to the desired f c value. However, since both DAC/ADC converters employ an eight parallel processing path technique, similar to the FIR filter introduced above, a distributed CORDIC with eight paths operating at 156 MHz is also required to perform either an up-conversion or down-conversion. To this end, the distributed CORDIC must generate samples of the same complex exponential but at shifted sample instants in each processing path. It is worth noting that each CORDIC path in the TX FPGA firmware should be connected to the parallel structure of OSERDES, in order to perform the data serialization required in the data interface of the FMC230. In the RX firmware, to shift the signal back to baseband, a similar distributed CORDIC approach was employed but with negative phase increment. Then, it was connected to a parallel structure of ISERDES, which enables data parallelism and consequently reduction of the data clock.
Finally, to integrate the developed SysGen OFDM with the transceiver firmware, it was necessary to package and export such modulation into two distinct TX/RX IP customized blocks. That can be seen in the high-level block diagram illustrated in Figure 11. . . 
Real-time RX engine

RF Front-End Overview
With both ADC/DAC integration in FPGA and OFDM engine working seamlessly, this work next step was to develop the mmWave RF front-ends. The used front-end is a fully custom-made solution composed by several analog components connectorized, as illustrated in Figures 12 and 13 .
On the transmitter side, it was considered a typical direct-conversion (to avoid IQ imbalance effects on transmitted signals) also known as homodyne architecture (see Figure 12 ). It is composed of a PLL operating at 15 GHz, a multiplier by 4, which up-converts the LO signal to 60 GHz, and an up-conversion mixer. In addition, an external 10 MHz reference clock is required for the PLL. In addition, to avoid the non-linearities introduced by the Power Amplifier (PA), the amplification stage at RF (60 GHz) has deliberatively not been included in this study. Otherwise, it would have required the study of mitigation techniques such as Digital Pre-Distortion (DPD). At the receiver, a two-stage heterodyne architecture is employed. A Low-Noise Amplifier (LNA) at RF is coupled to the antenna, followed by a down-conversion to a 6 GHz IF stage, performed by a first mixing stage process with a 54 GHz LO signal. In a second down-conversion stage, received signals are shifted to IF frequencies. Therefore, unlike the transmission architecture, two PLLs operating at 13.5 and 6 GHz are therefore required, as can be seen in Figure 13 . Furthermore, the external clock references for both TX/RX PLLs are generated by two independent PRS10M rubidium oscillator. Such reference clock signals are characterized by very low phase noise and very high frequency stability (≈0.05 ppb), being, therefore, very reliable clock sources. 
Results Discussion
In this section, three sets of measurements are reported and analyzed. In particular, focus on the performance of ADC/DAC integration in FPGA is given. Then, performance comparison of different IF frequencies using the mentioned SDR system with OFDM engine is conducted. In the end, the proposed system (mmWave 5G testbed) performance is addressed with EVM and received scatter constellation analysis.
ADC/DAC Characterization
The FMC126 and FMC230 boards with their design modifications were characterized and validated with the usual figures of merit such as SINAD, ENOB and SFDR metrics.
With that in mind, an oscilloscope and a Spectrum Analyzer (SA) were used to measure the time and frequency responses of the DAC at multiple tones. With regards to the ADC, a signal generator was used to generate sine-wave signals with tuneable amplitude and frequency, which allowed both time and frequency responses assessment.
The DAC performance assessment was evaluated with the DAC's SFDR, which is measured as the difference between RMS power level of the transmitted tone and the most significant spur signal, presented in the DAC's frequency response. The SFDR measurement result, for the FMC230, can be seen in Figure 14 considering the D1 channel and a frequency tone of 134.3 MHz.
Additionally, the FMC230 was also characterized with SINAD, SFDR, and ENOB figures of merit for various frequency tones. Such results are summarized in Table 2 and plotted in Figure 15a ,b, considering a sampling rate of 2456 MHz and DAC's D1 output channel. The average values for SFDR, SINAD, and ENOB measurements are 56.93 dB, 69.61 dB, and 11.34 bits, respectively, which according to 4DSP datasheet is within DAC performance boundaries. 
ENOB [bits]
Freq vs SINAD and ENOB Average Value (b) Figure 15 . FMC230 performance boundaries versus frequency tone: (a) SFDR; and (b) SINAD versus ENOB.
With FMC126 in mind, a similar experimental setup was used to assert this ADC board performance. Consequently, the reference signal connected into both ADC's channel A and B, was characterized with 72.5 dB of SFDR, which is limited by the SNR of the considered signal generator equipment (Figure 16a ), for a signal power level of −2 dBm (ADC full-scale input value). Moreover, comparing Figure 16a ,b, it is verified that ADC sampling and quantization processes induce a performance degradation of 12.7 dB in terms of SFDR, leading to a maximum dynamic range of 59.8 dB.
Back-to-Back Performance
Following the ADC/DAC assessment, measurements with baseband integration were conducted. It is important to mention that, following the 5G pre-trials specification presented in [1] , a multi-Gigabit/s real-time OFDM TRX engine was developed [29] . Such OFDM baseband engine architecture is extensively discussed in [29] and was based on the previous work reported in [30] . Such engine enhancements led to a 1 Gbps real-time transmission rate (considering 256-QAM), which fullfills next wireless generation requirements. That said, the considered OFDM main design operating parameters, particularly the FFT and data size block values are given in Table 3 . Table 4 outlines the main SDR system specifications accomplished by integrating both OFDM modulator/demodulator, with the digital IF architecture using the disruptive modular fully pipelined hardware architecture illustrated in Figure 11 . Firstly, the TRX performance of such SDR was evaluated in B2B configuration using EVM as QoS assessment metric. In Figure 17 , an example of the transmitted OFDM signal spectrum was obtained considering a frequency centered at 312.5 MHz (IF4) and selected digital gain of 7. Figure 18 shows the output power transfer curve as a function of the selected gain and IF modulations. It was verified that OFDM modulation centered at IF2 has less signal attenuation than the remaining IF frequencies, leading to higher DAC output power. Gain Select In Table 5 , the EVM results are summarized for the maximum digital gain of 7 using four different IF configurations. Additionally, when either I or Q channels are used to transmit and receive data, EVM results are denoted EV M I or EV M Q, respectively. That is, connecting DAC0 to ADC0 is considered as the I branch, and connecting DAC1 to ADC1 channels is the Q branch. Additionally, for a quadrature transmission, both I/Q channel branches are used simultaneously. These results show that a minimum EVM of approximately −42 dB is obtained for all low IF modulations, except for zero-IF, in which synchronization algorithms fail to estimate the beginning of frame, due to the nulls obtained on both DAC/ADC frequency responses. As expected, for other IFs, no OFDM performance degradation is verified, even when an IQ transmission configuration is considered, indicating that very low distortion is present on received signal constellations. Finally, in Figure 19 , it can be seen that the 4-and 256-QAM constellations do not present significant scattering distortion on the received symbols, by using IF2 frequency and a maximum digital gain. This subjective quality evaluation indicates that the OFDM SDR system is thus very accurate. 
OTA 60 GHz Performance
Now, the effect of RF impairments on the quality of TRX OFDM system, outlined in Section 3.2, is addressed with 60 GHz over-the-air measurements under different conditions of RX SNIR and employed modulations. With possible OFDM performance degradation due to multipath effect in mind, the testbed was firstly set inside an anechoic chamber under LOS condition. With such controlled environment, it is possible to accurately assess the RF front-end impact on the previously discussed OFDM communication system. The system QoS was assessed through EVM figure of merit and spectral efficiency per stream analysis. In the end, testbed results with 25 dBi antennas show a maximum distance range of 74.5 cm.
A Bit Error Rate (BER) estimation was accomplished by using the relation between this metric and EVM, given in Table 6 [30] . The EVM results were computed for each OFDM IF modulated signal under different SNIR in either the presence or absence of CFO. For example, Table 7 shows the measured average EVM values for an IF OFDM modulated signal at 312.5 MHz (IF4). The minimum average EVM value, −32.99 dB, is verified for an input power of −14.83 dBm when both TX/RX devices are clocked using different sources. Such value together with Table 6 indicates that the RF front-end can handle QAM modulations up to 256-QAM, meaning that 1 Gbps of data transmissions are possible with the proposed mmWave testbed.
For quality assessment of the proposed testbed the EVM value, 4-, 16-, 64-, and 256-QAM received constellations, are depicted in Figure 20a-d, respectively . These results present a slight scattering distortion on the received symbols when compared with the B2B configuration (see constellations of Figure 19 ). However, even for 256-QAM, all constellations exhibit a well defined point scattering area, which indicates a low probability of decoding erroneous bits. In-Phase amplitude In-Phase amplitude Furthermore, Table 8 shows the minimum average EVM and its degradation for the remaining IF OFDM modulations. For IF values of 312.5 and 468.75 MHz, EVM is below 3%, representing a performance degradation below 2% when compared to a back-to-back configuration. This is quite remarkable, since the EVM values at 60 GHz are lower than the 2.5% required by sub-6 GHz Wi-Fi (IEEE 802.11.ac) to employ 256-QAM OFDM transmissions [31] . To the best of authors' knowledge, this spectral efficiency per stream and SNR results go significantly beyond the state-of-the-art, when compared to the current mmWave testbeds. Finally, for the proposed link budget the received power versus digital modulation, for an BER of 10 −3 , is presented in Table 9 . Finally, in Figure 21 , it is shown the input power operating range of the mmWave RF front-end considering IF4. It can be seen that such system operates with a relatively wide input dynamic power range considering an EVM threshold of −10 dB, which is enough for a error-free QPSK demodulation. From the above results, it is demonstrated that it is possible to successfully use the mmWave spectrum to achieve multi-Gigabit/s, while considering relatively high spectral efficient modulations, and signal bandwidths greater than 100 MHz. For example, in literature only the 28 GHz testbeds reported in [32] [33] [34] [35] , consider a modulation bandwidth higher than 100 MHz using real-time baseband processing, and considering the 5G NR waveform as the transmission scheme (OFDM). On the other hand, at 60 GHz, it is evident a lack of testbeds that comply the minimum technical performance requirements of IMT 2020 for 5G. Considering the proposed OFDM testbed, the 5G peak spectral efficiency value of 7.8 bit/s/Hz [1] was achieved, and thus, the gap verified in testbeds operating at 60 GHz has been fulfilled. In other words, no other prototype system can process a signal BW of 150 MHz in real-time with modulation orders up to 256 QAM, using OFDM as transmission scheme in an over-the-air scenario. It is worth noting that even the listed 28 GHz testbed systems, do not meet the DL specification of 7.8 bit/s/Hz.
In-Phase amplitude
Showcasing: From GbE-Based to UHD Multi-Stream Video
In order to make the testbed appealing to the general public, a video base demonstrator was put together, as it is depicted Figure 22 . Such showcase was a joint-collaboration with the Multimedia group in IT Leiria that developed a 180 • field of view system as in [36] .
The UHD multi-stream showcase has been presented at several key forums, creating great impact with both scientific community and industrialist. Such video content was generated independently with the 180 • field of view camera setup [36] , which is composed of 9 UHD cameras responsible for 20 • video acquisition each. Due to the agglomerate high data throughput, a GigabitEthernet connection is required to route all 9 UHD cameras data from the encoded stream outputted from the Raspberry Pi to the transmitting FPGA. That said, a GbE switch merges all 9 Raspberry Pi ethernet connections into the single GbE available at TX baseband unit. At RX side, a high performance laptop with GbE interface runs VLC for real-time decoding and video displaying, the decoded video content from the RX baseband unit.
This has proven to be eye-catching and intuitive, since the visitors could directly interact with the system, e.g., by blocking the direct radio path and observe the immediate impact in both received signal constellations and received video quality. Such user interaction coupled to the fact that 9 UHD video streams were being displayed in real-time on the computer, provided the required validation of a real-time over-the-air user experience. On the other hand, for those with a technical background, there is a Graphical User Interface (GUI) displaying in real-time the received signal constellation, as well as the estimated BER values. 
Conclusions
In this article, a novel software-defined radio solution for future wireless communication systems is presented and validated as a complete multi-Gigabit/s radio in-the-loop OFDM transmission scheme with mmWave 60GHz RF front-end to cope with UHD multi-streams.
The SDR hardware choice was based on the overall criteria of low time-to-market and reduced development costs. On the one hand, VC707 FPGA was the choice for the embedded DSP engine, since at that time it was the best compromise among available logic resources, highly-flexible, high-speed serial bus I/O interface, and cost. In addition, both FMC230/FMC126 were selected based on their inherent high sampling rates, bit resolution, and their compatibility with the VC707 FMC I/O interface connectivity.
Moreover, firmware connectivity issues between the embedded system and both FMC230/FMC126 have been outlined in this work, as well as the considered workarounds to enable a GSPS real-time communication system. Currently, the SDR is configured to 1.25 GSPS, which, as discussed, is 5G future-proof. In addition, both ADC/DAC converters were deeply characterized in terms of SNR, ENOB, SINAD, and SFDR, and their advantages of a digital IF RF architecture has been extensively discussed.
Additionally, a complete multi-Gigabit/s radio in-the-loop OFDM communication system was implemented for high data rate applications considering 4-, 16-, 64-, and 256-QAM, which was validated. EVM results in both scenarios were below 2%, which is quite remarkable considering the current norm for employed 256-QAM OFDM transmissions is 2.5% [31] .
A complete overview on mmWave front-end was done, which together with the baseband unit form a robust communication system with rates up to 1 Gbps. The chosen use case for this work was UHD video stream that was achieved with a 180 • field of view setup available at IT.
In summary, this work details the required steps to achieve a real-time testbed that meets 5G requirements on the mmWave frequency range with 1 Gbps and low latency main characteristics.
