Abstract-For emission in the TV white space spectrum, the regulators are imposing strict spectral masks, which can be fulfilled using a DFT-modulated filter-bank multi-carrier system to extract one or several TVWS channels in the 470-790MHz range. Such a system reduces the channel dispersion, but even with nearperfectly reconstructing filter banks, an equaliser is required to at least perform some form of timing synchronisation. In this work, we propose a per-band equalisation and synchronisation approach, performed by a constant modulus (CM) algorithm running concurrently with a decision-directed adaptation process for faster convergence and reduced phase ambiguity. We compare symbol-and fractionally-spaced versions, and investigate their fixed-point implementation on a field programmable gate array. We elaborate on advantages that this fractionally spaced concurrent system offers over a symbol-spaced equaliser and compare it to equalisers that are updated purely by a CM algorithm.
I. INTRODUCTION
Television white space (TVWS) transceivers may be permitted to access one or more selected channels within a given frequency band depending on a radio's geographical location. For example, in the UK the TVWS spectrum covers the UHF range from 470-790MHz with 40 channels, each with an 8MHz bandwidth. Therefore, because of the need for frequency-agile transceivers with a strict spectral mask requirements [1] , we have previously opted for a filter-bank based multicarrier (FBMC) approach, which relies on an oversampled discrete Fourier transform (DFT) filter bank. This approach is numerically efficient, as it permits the potential upand downconversion of all 40 channels at the cost of a single channel transceiver with only a small overhead [2] .
While the filter bank approach offers more robustness towards synchronisation errors than e.g. orthogonal frequency division multiplexing (OFDM) [3] , the resulting subchannels are generally still frequency-selective and require equalisation [4] , [5] . Even in the absence of channel dispersion and issues such as carrier frequency and phase offsets, such a system still requires careful timing synchronisation [6] . Therefore, in order to moving the transceiver system from [2] closer to realisation, this paper aims to demonstrate that a robust and energy efficient synchronisation / equalisation approach can be incorporated into the transceiver design of [2] running on a Xilinx Virtex 7 field programmable gate array (FPGA).
For a robust synchronisation and equalisation approach, we implement a constant modulus (CM) algorithm combined with a decision-directed (DD) scheme in a fractionally spaced architecture [7] - [12] . This system is known to have performance advantages over symbol-spaced equalisers, but we also compare symbol-and fractionally spaced architectures with respect to area and power requirements when implemented on an FPGA, since this is vital if the TVWS transceiver is to be operated within the confines of a low-power energy-harvesting basestation such as [13] .
Thus, in Sec. II, we provide a brief overview over our TVWS transceiver design, while Sec. III focuses on the subchannel equaliser, where we exploit a concurrent CM and DD scheme for faster adaptation [14] , [15] . Sec. IV addresses the FPGA implementation, with results presented in Sec. V.
II. TRANSCEIVER SYSTEM
This section outlines the transceiver radio front-end [2] , which aims to up-and downconvert any or all of the 40 channels in the UK's TVWS spectrum. The overall design idea is described in Sec. II-A, with its components detailed in Secs. II-B and II-C, and system parameters briefly discussed in Sec. II-D.
A. Overall System Outline
In the FBMC transceiver, as summarised in [2] and shown in Fig. 1 , the upper branch represents the transmitter (TX) and the lower branch the receiver (RX). The conversion from baseband to digital RF is performed in two stages. Seen from the receiving antenna, a first stage -stage 1 -converts between an RF signal and a lower frequent intermediate signal whose rate enables it to be handled by an FPGA. Stage 2 is responsible for multiplexing the 40 TVWS channels into a single baseband signal in the TX, and the demultiplexing from the equivalent single baseband signal in the RX back into the 40 TVWS channels. This multiplexing and demultiplexing is performed by an oversampled filter bank-based multicarrier system.
B. Stage One: Polyphase Filter
On the RX side -the lower branch in Fig. 1 -an analogue-to-digital converter (ADC) acquires data at an RF sampling rate f s with a word length R rx . A bandpass filter extracts the 320MHz-wide UHF band that contains the 40 8MHz TVWS channels, and, due to the band limitation imposed by the filter, enables a reduction of the sampling rate by a factor K (i) 1 , where the index i denotes different design options. The decimation of the signal implicitly results in a demodulation; a modulation correction (e −jΩcn in Fig. 1 ) then aligns the 40 TVWS channels in the baseband between DC and 320MHz.
To efficiently implement the RX stage 1, the bandpass filter h Fig. 2 , is implemented in a polyphase network, enabling the decimation by K (i) 1 to be swapped with the filtering. This reduces the computational cost by a factor of K (i)
1 . Additionally, the bandpass filter enhances the signal-to-quantisation noise ratio (SQNR) by increasing the effective word length by ∆R = log 4 K (i) 1 bits. The transmitter operates analogously, and the baseband signal is implicitly upconverted to RF by expansion. To ensure that the resulting RF signal sits between 470 and 790 MHz, the baseband signal is modulation-corrected prior to upsampling, and interpolated by a filter matched to |H 
C. Stage Two: Filter Bank-Based Multiplexer
The conversion between the 40 TVWS channels and the baseband signal required for stage 1 is performed with the help of an oversampled modulated DFT filter bank with K (i) 2 channels, operating as an FBMC transmultiplexer. The design is based on an 8MHz-wide prototype as characterised in Fig. 3 , whose transition band depends on the oversampling ratio. In our design, we have decided to sample the TVWS channels at 16MHz, i.e. they are oversampled a factor of 2. This provides a sufficient transition band, but will also enable advanced
. . . synchronisation and equalisation to be discussed in the next section.
The prototype filter is modulated by a DFT to the
different band positions, which in the RX operate as band selection filters to extract bandlimited TVWS channels which subsequently can be decimated by
2 /2. In the TX, these filters follow an expansion by K (i) 2 /2 and fulfil the purpose of interpolation filters. Similarly to stage 1, the band limitation and expansion/decimation imply a gain in word length by
An efficient polyphase representation of the FBMC blocks ensures that the filtering is always operated at the lower rate. Further, a DFT filter bank enables a factorisation into a polyphase network consisting of operations that only involve the real-valued prototype filter, and a K (i) 2 -point DFT [16] . As a result, the FBMC implementation for 40 channels is just as costly as the conversion of a single channel, plus the cost of a fast Fourier transform (FFT) operation.
D. Filter Design Example
In the following, we focus on the design with K 1 = 4 and K 2 = 64 in [2] . A filter h 1 [ ] for stage 1 based on a minimax design, and a prototype p 2 [n] constructed by an iterative weighted least squares approach for oversampled filter bank design [17] , require respective lengths of L 1 = 46 and L 2 = 320 to ensure that the overall design satisfies the mask requirements [1] , as demonstrated for the filters' magnitude responses in Fig. 4 .
To fit the 16 bit word length accuracy at the DAC/ADC, the up-and downconversion processes are operated with 16-bit words, but only 12 bit accuracy is required for the ) stage 2 using bit-true and cycle-accurate simulations as described in [2] . baseband signals coming the upconverter and going out off the downconverter. If all TVWS channels are occupied, this results in power spectral densities (PSDs) at RF as shown in Fig. 5(a) , which satisfy the required masks. The PSD of the multiplexed signal at the output of the TX stage 2 is shown in Fig. 5(b) . The graph also shows how the lower 40 of the overall K 2 = 64 channels are occupied by the TVWS channels, while the upper 24 subbands are unoccupied and create design freedom for the stage 2 prototype filter.
III. PER-BAND SYNCHRONISATION AND EQUALISATION

A. Motivation and Rationale
Compared to OFDM, FBMC is generally expected to be more robust to synchronisation errors [3] . Critically sampled FBMC approaches such as FBMC/OQAM still suffer from inter-carrier interference (ICI), and the advantage of high spectral efficiency comes at the cost of a complicated equaliser requiring cross-terms between at least adjacent channels [4] - [6] , [18] . In contrast, oversampled FBMC systems can include a guard band, and even in a doubly-dispersive channel, ICI can be considered negligible.
Even though the subchannels in this oversampled FBMC system can be treated as independent from each other, i.e. without the use of cross-terms, they are still broadband, as such exhibit frequency-selective fading in a dispersive channel, and hence must be equalised. In the absence of dispersive channel conditions, the FBMC system still requires careful synchronisation: with
2 /2-band systems [19] , the sampling point is critical. Fig. 7 shows the impact of different delays in the RF path on the overall response of one subchannel from TX to RX for the design in Sec. II-D. To fit a simple wideband signal into an 8MHz-wide TVWS channel, we use a 5.3MHz single carrier signal x[ν] with a 3rd-band square-root Nyquist system g[n]. W.r.t. this baseband signal, a delay at RF translates into a fractional delay [20] , which even in a non-dispersive channel can introduce significant inter-symbol interference, hence motivating the use of an equaliser for each individual subcarrier.
B. Equaliser Structure
To be robust towards the type of fractional delays shown in Fig. 7 that a dispersive channel and an unsynchronised transceiver system cause at baseband, in this paper we employ a fractionally spaced equaliser. While a symbol-spaced equaliser would generally be required to be of infinite length, for a fractionally spaced equaliser an exact inverse exists for a finite length equaliser, provided that the roll-off of the receiver filter -shown as g[−n] in Fig. 6 -is not too steep. For fractionally spaced equalisation, there generally is no advantage to go beyond a twice oversampled, or T /2-spaced system, but because of the fact that the subchannel data in our implementation is clocked at 5.3 MHz, we here use a thrice oversampled or T /3-spaced fractional equaliser.
The polyphase representation of this equaliser is shown in Fig. 8 , with the three polyphase components w m [ν], m = {0, 1, 2}, followed by a carrier frequency and phase correction to compensate for potential carrier frequency and phase offsets Ω ∆ and ϕ, respectively. The adjustment of the equaliser 2018 IEEE Wireless Communications and Networking Conference (WCNC): Special Session Workshops coefficients in w m [ν] also requires a decision device, which e.g. for quaternary phase shift keying (QPSK) takes the form
with the signum function sgn{a} = ±1 for R a ≷ 0.
C. Concurrent Constant Modulus / Decision Directed Algorithm
In order to implement a robust synchronisation and equalisation scheme, we opt for a blind approach based on the CM algorithm [10] , which is applicable to CM constellations such as QPSK. The CM algorithm adapts relatively slowly compared to schemes such the least mean squares algorithm combined with a DD update approach, whereby a desired signal is derived from decisions made from the output of the equaliser akin to (1). The DD approach however requires the equaliser to be already sufficiently well adjusted such that correct decision are reached at its output. A suitable approach to combine the strength of both CM and DD approaches is by means of a concurrent adaptation using both [14] , [15] .
A concurrent CM/DD algorithm splits the equaliser into two parallel components,
where w 
At the νth iteration, we first calculate the equaliser output
where y m,i [ν] is a tap-delay-line vector containing a data window of the polyphase signal y m,i [ν] in Fig. 8 , such that
. . .
If we neglect carrier frequency and phase offsets, then the subcarrier output is given bŷ
with F(·) defined in (1) . The CM-part of the equaliser is updated via
with
for i = 0, 1, 2. With this modification, a new output
is calculated. If this output leads to an unaltered decision s.t.
[ν]), this means that the equaliser is well adjusted, and subsequently a DD update step is also executed. This can be expressed by including an indicator function
with a ∈ C, such that the update for the DD component of the equaliser is, for i = 0, 1, 2,
. (8) In addition to increasing the convergence rate of the CM algorithm, the inclusion of a DD scheme will also lock the phase ambiguity to integer multiples of π 2 . This equaliser is referred to as FSE-CMA/DD. The concurrent scheme can be modified to a pure FSE-CMA equaliser by setting µ DD = 0; a symbol spaced CMA equaliser is obtained for L m,i = 0, i = 1, 2.
IV. IMPLEMENTATION
This section presents the FPGA realisation of the equalisers described in Sec. III, on a Xilinx Virtex 7 embedded in the basic transceiver system that previously had been elaborated in [2] . Our approach uses the HDL Coder Blockset from Simulink and and its code generation ability to export the models to VHDL. Thereafter, the generated files are used for synthesis and implementation by Xilinx Vivado tools.
A. Wordlength Considerations
In [2] it was established that in order to keep the out-ofband emissions to adjacent TVWS channels below the -69dB currently suggested by the regulator [1] , at RF a word length of 12 bits must be used. Incorporating the resolution gain in the up-and downconversion stages, samples and coefficients at baseband should be resolved with 16 bits. Thus for the DSP48E1 block of the Xilinx FPGA , three different word lengths have been selected for the various equalisers: Input and output. Based on the unit norm of the QPSK signal in (1), it was deemed sufficient to employ 18 bit words with 4 integer and 14 fractional bits, which provided sufficient amplitude and precision during simulations. CMA and FSE-CMA filter coefficients. With 18 bit representations lacking precision in the feedback loop, 36 bit words had to be used for the adaptive filter coefficients to provide adequate convergence. FSE-CMA/DD filter coefficients. The inclusion of the DDmode in the coefficient update reduced the dynamic range in the feedback path and permitted a restriction of 18 bit words. 
B. Step Size Representation
A delay in the feedback loop of the adaptive algorithms is likely to cause slow convergence, particularly for the DD scheme when based on the LMS algorithm [21] . For a very simple realisation of the update, the step sizes µ CM and µ DD are chosen as powers of two, such that multiplication can be replaced by logic shifts.
C. Footprint
The resources used for the implementation of the CMA, FSE-CMA and FSE-CMA/DD equalisers are summarised in Tab. I, listing look-up tables (LUTs), flip-flops and DSP48E1 units. The table also shows the area required for a symbolspaced CMA/DD scheme which operates both CM and DD parts, but L m,i = 0, i = 1, 2 w.r.t. Sec. III. For these symbolspaced systems, the length of the equaliser, L m,0 is equal to the sum of all polyphase components for the fractionally spaced systems.
Since the overall number of coefficients is the same, there is no cost disadvantage in going from a symbol-spaced to the T /3-spaced equaliser. Further, the inclusion of the DD mode is advantageous, as it permits the word length to be shortened, which leads for the FSE-CM/DD algorithm to a reduction by 22% for the DSP48E1 blocks, 26% for the flip-flops and 42% for the LUTs when compared to the CM/DD system.
Information about the implementation's timing was obtained from Xilinx Vivado. With a longest logical propagation delay of 9.3ns and 12ns for the FSE-CM/DD and CMA/DD algorithms, respectively, the system can easily executed be within the 188ns sampling period of the 5.3MHz system.
D. Power Consumption
For the generated VHDL code of the equalisers, Xilinx Vivado allows assessment of the power consumption. In addition to a static 327mW as a baseload of the Virtex 7, Fig. 9 shows the dynamic power consumption. Of the different equalisers, the FSE-CM/DD is lowest, requiring approximately 30% less than the symbol-spaced CM implementation with the same number of coefficients.
The main source of power consumption are the DSP48E1s, of which approximately the same number are used for all designs. However, the FSE approaches operate three branches in parallel, thus reducing the overall length of the logic path. Since each branch is shorter, hardware resources for each branch are more compact in area compared to a single branch, permitting shorter interconnections within the FPGA. As interconnections are expensive in terms of energy, reducing these also reduces the system power consumption of the FSE designs.
E. Overall System
In a realistic scenario, regulators will only release a few TVWS channels depending on the geographic location of use. Hence, we want to operate the TWVS transceiver of Sec. II with only a limited number of subchannels on a Virtex 7. The overall FPGA implementation of the transceiver system therefore includes both the 40 channel TX upconverter and RX downconverter as well as equalisation and synchronisation on two subchannels using adaptive filters with 24 coefficients based on the FSE-CM/DD design for equalisation and synchronisation. The resource use of this implementation is outlined in Tab. II. The design therefore comfortably fits onto the Virtex 7, with still spare resources remaining.
V. SIMULATION RESULTS
In this section, we present performance results for the FSE-CM/DD algorithm using 24 coefficients based on the fixedpoint implementation laid out in Sec. IV-A. We compare the equaliser outputsx m [ν] to the transmit sequence x m [ν], and define the error e[n] after identifying the delay and rotation applied for each simulation run, such that at steady state
where E{·} is the expectation operation. The convergence curves are averaged over different RF channel realisations with randomised propagation delays, and with different levels of interference caused by additive white Gaussian noise applied to the received signal at RF 1 . Fig. 10 shows the performance of a CM/DD plots w.r.t. the error defined in (9) . The ensemble-averaged error converges reasonably quickly due to the concurrent CM/DD scheme, 1 Note that the applied noise is wideband at the RF sampling rate; while SNR figures appear low, a large proportion of the noise is out-of-band w.r.t. the considered TVWS channels. which maintains the robustness of the CM algorithm while offering the enhanced convergence speed of a decisiondirected LMS algorithm. The DD process removes some of the phase ambiguity of the CM scheme, and locks the QPSK constellation to rotations by multiples of π 2 w.r.t. the input signal. Although not shown here and despite the greater phase ambiguity, a sole CM scheme compares poorly in terms of convergence rate to the CM/DD approach. For CM/DD, the steady-state error generally improves with increased SNR, but has been found to saturate above 5dB here due to a truncation of the ideal equaliser, which generally will require more than 8 coefficients per polyphase component.
IEEE Wireless Communications and Networking Conference (WCNC): Special Session Workshops
VI. CONCLUSIONS
This paper has focused on a low-complexity, frequencyagile TVWS transceiver system, which implements a radio front-end for up to 40 channels at a slightly higher cost than that of a single-channel transceiver. This system permits a flexible deployment in low-power TVWS basestations of the type in [13] . In order to enable transmission, a perband equalisation and synchronisation has been introduced. Amongst different options, we have selected a robust and fast blind approach based on a fractionally-spaced concurrent constant modulus and decision-directed algorithm.
The selected approach is capable of synchronising and equalising a frequency-selective channel. The use of a fractionally-spaced architecture has demonstrated advantages in terms of power consumption compared to a symbol-spaced approach, and the concurrency of a decision-directed scheme together with a constant-modulus approach permitted a lower bit resolution compared to a pure CMA approach, hence also resulting in a lower cost. Further reductions may be possible by utilising look-up tables more extensively instead of multiplications in the update equations of the equaliser. Nevertheless, the current approach has demonstrated that a number of equalisers can be operated together with the transceiver system on a Xilinx Virtex 7, and enable low-power base stations such that solar and/or wind energy-harvesting is feasible.
