Low-Complexity Digital Modem Implementation for High-Speed Point-To-Point Wireless Communications by Zhang, H et al.
“© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be 
obtained for all other uses, in any current or future media, including 
reprinting/republishing this material for advertising or promotional purposes, creating 
new collective works, for resale or redistribution to servers or lists, or reuse of any 
copyrighted component of this work in other works.” 
 
Low-Complexity Digital Modem Implementation
for High-Speed Point-to-Point Wireless
Communications
Hao Zhang, Xiaojing Huang and Y. Jay Guo
Global Big Data Technologies Centre
University of Technology Sydney, Australia
Emails: {Hao.Zhang, Xiaojing.Huang, Jay.Guo}@uts.edu.au
Abstract—A low-complexity digital modem is presented in
this paper for achieving high-speed and wideband point-to-
point (P2P) wireless communications. By combining multiple
functionalities into the transmitter and receiver filters, the
signal processing complexity in the digital baseband can be
significantly reduced. The structures and the implementation
using field programmable gate array (FPGA) for the transmitter
and receiver filters are described in details. Pre-equalization for
reducing the impact of practical channel frequency response
can be easily incorporated into the transmitter filter structure.
The experimental test results using a 20 Gigabits per second
(Gbps) digital modem prototype demonstrate the satisfactory
performance with low FPGA resource usage.
Keywords—Low-Complexity, High-Speed, Wideband, Wireless
Backhaul, and FPGA Implementation.
I. INTRODUCTION
High-speed point-to-point (P2P) wireless links play a pivotal
role in wireless communication systems by handling the
aggregation and distribution of various data flows such as
voice, video, Internet, and other data sources. They provide
cost-effective ways to offer tens or even hundreds of Gbps
data rates and hence can be used as backhaul links in cellular
networks, such as the fifth generation (5G) systems [1] and
aerial backbones in space and terrestrial integrated networks.
A typical backhaul system consists of digital baseband,
intermediate frequency (IF) and/or radio frequency (RF) mod-
ules. Any imperfection in various filters of these modules as
well as wireless multipath fading channel will introduce inter-
symbol interference (ISI) at the receiver. As the transmission
rate increases, ISI becomes more serious and deteriorates
the date detection performance significantly. The channel
equalization is commonly used at the receiver side for coping
with ISI. At the transmitter side, pre-equalization is also an
effective and low-complexity way to be implemented by using
a predefined pulse shaping filter based on off-line channel
sounding. In this way, the imperfection of the RF and/or IF
modules can be pre-compensated and the equalization com-
plexity at receiver side can be substantially reduced, especially
for wideband systems.
In-phase and Quadrature-phase (I/Q) imbalance is another
significant factor for a wideband wireless backhaul system
with I/Q modulation architecture where a data symbol is
modulated onto (or demodulated from) an IF or RF carrier
via two separate in-phase (I) and quadrature (Q) channels. The
signal will be distorted if there is any difference between I and
Q channel characteristics. There are a number of methods for
I/Q imbalance compensation in digital domain and/or analog
domain. Most of those methods deal with I/Q imbalance
compensation at the receiver side independently.
In order to suppress the effect of ISI and reduce the I/Q
imbalance, significant researches have been conducted and
a number of techniques can be found in the literature. For
example, a complex infinite impulse response (IIR) filter
with digital pre-equalization is proposed in [2]. This pre-
equalization can deal with various linear distortions as well as
the channel distortion. A two stage iteration-based algorithm
is proposed in [3], which is used at transmitter to calcu-
late a pre-filter. The hardware implementation of the blind
matched filter receiver is described in [4], by which the matrix
inversion and matrix multiplication operations are replaced
by a simple recursive algorithm for the filter design. The
channel identification and equalization performance achieve
almost equal mean squared error (MSE) and bit error rate
(BER) levels compared to those of the conventional receivers.
In [5], a simple non-matched receiver is developed for an
orthogonal pulse shape modulation scheme in ultra-wideband
(UWB) communication system. An iterative decision feedback
receiver for single carrier frequency domain equalization (SC-
FDE) is proposed in [6] to compensate the I/Q imbalance.
Compared with conventional linear methods, this receiver
can significantly improve the performance of SC-FDE system
under I/Q imbalance.
However, in all of the above mentioned techniques, the
channel equalization and I/Q imbalance compensation are
performed separately. With these conventional techniques, it
is very hard to achieve low-complexity implementation for
wideband systems which demand significant signal processing
resources. In terms of complexity reduction, it is necessary
to optimize each digital process (such as transmitter filtering,
channel estimation, receiver filtering and detection) so that all
signal processing functionalities can be implemented in FPGA
with minimum power and volume. In doing so, any wide word-
length operation for large matrix which has more than three
dimensions or recursion in the algorithm should be avoided.
On the other hand, high-speed P2P wireless systems usually
have much wider bandwidth such as multiple GHz in order to
achieve the tens or even hundreds of Gbps data rate. With such
wide bandwidth, the IF and/or RF modules are hard to build
and some undesirable performance such as significant ripple in
the pass band and frequency-dependent IQ imbalance would
be demonstrated in practical hardware.
In this paper, the low-complexity digital modem imple-
mentation for high-speed wideband wireless communications
is presented focusing on efficient transmitter and receiver
filters designs respectively. Each of these filters combines
multiple signal processing tasks together. To be specific, the
transmitter filter performs both sample rate conversion (SRC)
and pulse shaping with the capability of pre-equalization.
The receiver filter performs SRC, channel equalization, and
I/Q mismatch compensation at the same time. Therefore, the
overall implementation complexity is significantly reduced as
compared to conventional designs. Adopting a 20 Gbps single
carrier system as an example, how to implement the transmitter
and receiver filters in Xilinx Virtex 7-690T FPGA is described
in details. Total FPGA resource usage together with optical
Ethernet interface, data symbol mapping/demapping, and low
density parity check (LDPC) encoding/decoding is provided.
Real-time experimental test results are also given. To the
authors’ knowledge, similar work that produces such high data
rate over such wide bandwidth with such low implementation
complexity has not been found in the literature.
The rest of the paper is organized as follows. In Section
II, system descriptions on the digital modem architecture as
well as transmitter and receiver structures are presented. In
Section III, FPGA implementation for transmitter and receiver
filters is described in details and resource usage is provided.
Test setup and experimental results are shown in Section IV.
Finally, Section V concludes this paper.
II. SYSTEM DESCRIPTION
A. Digital Baseband and IF Architecture
The high-speed digital modem presented in this paper
consists of a digital baseband platform and an IF module.
Depending on different applications, this digital modem can
be connected to different RF frontends to become a Ka band,
millimeter wave, and/or terra Hertz (THz) communication
systems. Fig. 1 shows the system architecture with the digital
baseband platform and IF module. The digital baseband plat-
form is composed of two FPGAs, each capable of processing
10 Gbps data rate, and an IF module for transmitter and
receiver respectively. When fully operated, the digital modem
can transmit and receive Ethernet traffic at 20 Gbps data rate
simultaneously. Between the FPGAs and the IF module, the
high-speed digital-to-analog converters (DACs) and analog-to-
digital converters (ADCs) are connected, with sampling rate
2.5 Gsps.
The functionality of the IF module is to up-convert the
I/Q modulated baseband signals (4 channels) generated by the
baseband platform to the 15.65 GHz IF band at the transmitter,
and down-convert the IF signal to 4 channel baseband signals




















































Fig. 1. The system architecture.
GHz pilot frequency is also added at the transmitter for carrier
frequency tracking.
The transmitter filter and receiver filter play important roles
in the digital modem, which are the focuses in this paper.
B. Transmitter Filter Structure
The transmitter filters combine the SRC and pulse shaping
together. The SRC is implemented by using the polyphase
filter bank approach. In this system design, the symbol rate
after modulation (also called data symbol mapping) is 1.875
Giga-samples per second (Gsps) and the signal sampling rate
at DACs is 2.5 Gsps. That is, in every 3 symbols duration,
there will be 4 samples generated. Since the conversion ratio
is 4/3, the number of filters in the filter bank is three. Each
filter is a root-raised-cosine (RRC) pulse shaping filter sampled
at 2.5 Gsps but with different time offsets. The structure of
the filter bank is shown in Fig. 2.
 
Serial-to-parallel    
conversion 
Input of symbols 
at 1.875 Gsps 
Output of samples 




Fig. 2. Structure of transmitter filter.
C. Receiver Filter Structure
The receiver filters simultaneously perform the I/Q imbal-
ance compensation, channel equalization, and SRC functions.
The block diagram of the receiver filter bank and the structure
of each polyphase filter are shown in Fig. 3 and Fig. 4 re-
spectively. A polyphase filter has two parts which are used for
filtering the real part and imaginary part of the received signal
at 2.5 Gsps respectively. After channel and I/Q mismatch
estimation, the filter coefficients of these two parts can be













Sample at (1.875/3) Gsps 
Fig. 3. Structure of receiver filter.
 
Filter for imaginary 
part of received signal 
Real part of received 
signal 
Imaginary part of received 
signal 
Filter output 
Filter for real part of 
received signal 
Fig. 4. Structure of polyphase filter.
III. FPGA IMPLEMENTATION
A. Transmitter Filter
The design of transmitter filters without or with pre-
equalization affects the complexity of channel equalization at
the receiver side. In our design, each transmitter filter is an
RRC pulse shaping filter with pre-equalization sampled at 2.5
Gsps with different time offset. When pre-equalization is used,
coefficients of filters vary with the condition of the IF module.
Therefore, filter coefficients can be configured when the FPGA
bit-file is generated after channel sounding to determine the
channel response of the IF module. In order to be used for
wideband system in practice, the length of transmitter filters
can not be very small so that the length is chosen as 32
in this design. With different time offsets, each transmitter
filter output sample is generated by 24 addition operations
with 12 bit width per addition operation, and eight samples
are generated in each FPGA high-speed clock period (312.5
MHz). Considering the filter coefficient configurability and the
huge number of addition operations with large word-width, the

















LUT memory 2 
LUT memory 23 













Fig. 5. Structure of Tx filter for one sample.
After setting up the digital modem in the IF loopback
mode, a special training sequence is sent from the digital
baseband platform through the IF module and received by
the digital baseband platform. Coefficients for transmitter
filters can be calculated by the captured data from the digital
receiver. The updated coefficients for different conditions of
the IF module can be uploaded into the block memory. The
size of coefficients is around 24 Kbits. Due to the sufficient
block memories in FPGA [7], it is reasonable to use 1x36
Kbits block memory for storing these coefficients. Following
the designed structure of the transmitter filters, one output
sample is generated with 24 time offsets in one FPGA clock
period. Therefore, coefficients stored in the block memory are
divided into 24 small groups, and the size of each group is
1 Kbits. Considering the small size of each small group of
coefficients and the minimum of 18 Kbits for each block
memory, each small group of coefficients is stored in look-
up table (LUT) memory rather than the block memory. At the
same time, the structure of LUT memory is optimized for high-
speed clock frequency when routing all cells in FPGA. Once
coefficients are downloaded into 24 LUT memories, samples
can be generated following the inputs of symbols which are
generated from serial-to-parallel converter. The data outputs
from 24 LUT memories are added together to generate one
output sample. The details of the adder for long bit-width
signal are described in [8].
B. Receiver Filter
As we know, the length of receiver filters affects the
performance and complexity when implementing the system
in practice. Considering these two factors, this design adopts
length 54 receiver filters. From Fig. 3 and Fig. 4, we can
easily find that a large number of multiplications and additions
are necessary for the long receiver filters. Multiplications can
be implemented by DSP48 in FPGA [7]. For the addition
operations, the length of bits output from DSP48 should be
decided for the given precision. In order to achieve good
performance, long bit width should be adopted. However, at
the same time, the complexity will be increased with the
increasing length of bits. Therefore, an appropriate strategy
is required to deal with the large number of additions with
wide bits. Fig. 6 shows the effective structure of an addition
tree.
According to the length of receiver filters, there are 54
multiplications for generating 54 data outputs which should be
added together. However, we can not complete all adders in





























Real/imaginary part of 
one symbol output 
Sub3_2 
Fig. 6. Structure of addition tree for the real/imaginary part of one symbol.
into four levels. Considering the usage rate and the timing
of system clock, we select three data additions in one clock
period. In this way, the same resources are needed as that
for adding two data in one clock period. At the input of the
first level, the length of data bits after multiplications should
be long enough to satisfy the given precision requirement.
However, after adding each level, the sum of each adder
increases. Therefore, one bit can be reduced at the input of next
level. With this design, the lengths of input data are 13, 12,
11 and 10 bits for each level respectively. In one clock period,
six symbols should be generated and each symbol consits of
real and imaginary parts. With this addition tree structure, it
is effective to keep the precision while reducing the resource
usage.
C. Implementation Results
In addition to the above described transmitter and receiver
filters, the digital modem also consists of other necessary
signal processing modules. At the transmitter side, there are
encoding, modulation, transmitter filters and DAC interface
modules. At the receiver side, there are synchronization,
channel estimation, receive filters, demodulation, decoding and
ADC interface modules. Between the digital modem physical
(PHY) layer and the network layer, the 20 Gbps transmitter
and receiver fiber interfaces are also implemented. These
modules are implemented to deal with high throughput for
the wideband system, each being optimized in the FPGA. The
resource uasge for the two filters and the whole system is
shown in Table I.
The powerful device – Virtex7-690T produced by Xilinx
is used in our digital baseband platform. From Table I, we
can see that resource usage of some typical cells including
LUTs, slice registers, block RAMs, multipliers are reasonable
for the whole system compared with the total resources in this
device. The two filters only occupy a small proportion of the
total available resources. The reserved resources can be used
for other modules in the whole digital modem.
TABLE I
FPGA USAGE OF TRANSMITTER/RECEIVER FILTERS AND THE WHOLE
SYSTEM
Module Name Slice LUTs Slice Block MultipliersRegisters RAMs
TX Filter 11200 15400 8 0
RX Filter 18600 69000 12 1344
The Whole 245800 368600 590 2220System
Total 433200 866400 1470 3600Number
Usage Rate of 6.9% 9.7% 1.4% 37.3%Two Filters
Usage Rate of 56.7% 42.5% 40% 61.7%System
IV. TEST SETUP AND EXPERIMENTAL RESULTS
A. Test Setup
The digital modem is composed of two FPGA platforms,
each capable of transmitting and receiving two channels of
baseband I/Q signals. Each FPGA platform is connected to a
control PC via universal serial bus (USB) cable for monitoring
and configuring the state of FPGA. The IF transmitter and
receiver are connected directly at IF frequency via coaxial
cable. A noise generator is used to generate additive Gaussian
noise with on/off switch controlled by a spectrum analyser.
A cascade of two wideband amplifiers is used to amplify the
noise to sufficient power level. Two attenuators (with 1 dB step
and 10 dB step respectively) are used to control the noise level
and a switch driver is connected to them to set the attenuation
manually. The spectrum analyser is also used to monitor the








































































Fig. 7. Structure of test setup.
At the transmitter side, two 10 Gbps Ethernet traffic streams
generated by the Spirent tester [9] are pseudo-random bit
sequences with 128 byte packet size. At the receiver side, the
two received 10 Gbps Ethernet traffic streams are fed back to
the Spirent tester, and the BER result can be recorded from
the user interface window of Spirent tester. Fig. 7 shows the
block diagram of the test setup. A picture of the 20 Gbps
digital modem prototype hardware is shown in Fig. 8.
 
Fig. 8. Picture of 20 Gbps digital modem and test setup.
B. Experimental Results
There are four 2.5 GHz channels in the whole system. For
simplicity, we just show the result from one typical channel.
Other channels produce very similar results.
Before performing real-time system loopback test, the DACs
and ADCs should be calibrated first. Due to the baseband
I/Q modulation architecture, any difference in terms of delay,
phase, and amplitude will introduce I/Q imbalance. After
DACs and ADCs calibration, the transmitted signal can be
looped back via direct DACs and ADCs connection. The error
vector magnitude (EVM) of the constellation is 4.05% without
pre-equalization. From this result, we can make sure that the
signal processing produces satisfactory performance for the
16QAM demodulation without IF module.
The test for obtaining the channel frequency response of
each channel is performed by transmitting a number of discrete
tones throughout the entire bandwidth of each channel. The
frequencies of the discrete tones are selected such that their
image frequencies appear in-between two original tones, re-
sulting in a shaded area below the frequency response envelope
to show the I/Q imbalance.
The overall channel frequency response including both
transmitter and receiver for each channel can be obtained in
digital baseband at the receiver side. Fig. 9 shows the results
calculated using Matlab software after uploading the test data
from the digital platform. We see that the channel frequency
response fluctuates significantly in the bandwidth of 2.5 GHz.
The fluctuated range is around 9 dB. At the same time, the
image components of the IF module are also quite severe.
This poses significant challenges to the signal processing for
recover the transmitted data information.
Fig. 9. Frequency response of one IF channel.
Considering the undesirable performance of the IF hardware
for the wideband system and in order to reduce the complex-
ity of the whole system as much as possible, an effective
way to deal with the significant fluctuation in the channel
frequency responses is channel pre-equalization. Fig. 10 shows
transmitter filters without pre-equalization. The filters only
have the real part without considering the influence from
IF module. However, after pre-equalization, transmitter filters
have changed a lot. Fig. 11 shows the real part and imaginary
part of the transmitter filters with pre-equalization. From this
result, we can see that the characters of the signal to be
transmitted via the IF module have changed significantly after
pre-equalization.
Fig. 10. Tx filters without pre-equalization.
At the receiver side, the characteristics of the channel can
be shown from receiver filters. Fig. 12 shows real part of
receiver filters without and with pre-equalization (considering
the small value of imaginary part, it is not necessary to show
imaginary part). Considering the limited resource in the FPGA,
we can only use the receiver filters with length 54. However,
for the filters without pre-equalization, the channel response is
very long and hence the 54 length filters can not achieve the
required performance. For filters with pre-equalization, we can
see that the channel impulse is short and concentrated. Even
we use a shorter length of the filers after pre-equalization,
satisfactory performance is still achieved.
(a)
(b)
Fig. 11. Tx filters with pre-equalization of (a) real part and (b) imaginary
part.
Fig. 13 shows the comparison between the 16QAM constel-
lations without and with pre-equalization for the selected IF
channel. We see that the EVM is significantly improved after
(a)
(b)
Fig. 12. Real parts of Rx filters (a) without pre-equalization and (b) with
pre-equalization.
pre-equalization. After testing plenty of data, EVMs calculated
for the selected channel without pre-equalization and with pre-
equalization are 16.3% and 10.3% respectively.
(a)
(b)
Fig. 13. Constellations (a) without pre-equalization and (b) with pre-
equalization .
From the above results, we see that pre-equalization is an
effective way to reduce the influence of the IF module in the
wideband system. After using pre-equalization by updating the
transmitter filter coefficients, a huge number of data with real-
time Ethernet traffic are tested and the resulting BER versus
Eb/N0 curve is shown in Fig.14. From this figure, we can
see that excellent performance is achieved for this wideband
system.
Fig. 14. BER for the selected channel.
V. CONCLUSION
In this paper, the low-complexity digital modem implemen-
tation for high-speed and wideband system is presented. Op-
timized architectures are designed for transmitter and receiver
filters which can be implemented with low resource usage on
the FPGA. The experimental test results using digital and IF
hardware prototype verify the excellent performance of the
low-complexity designs. We also show that pre-equalization
significantly improves the EVM for the 16QAM demodulation
with the practical wideband IF module. Adopting the presented
filters design and implementation, high-speed wideband wire-
less communications can be achieved with significant low-
complexity.
REFERENCES
[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K.
Soong, and J. C. Zhang, “What will 5G be?” IEEE Journal on Selected
Areas in Communications, vol. 32, no. 6, pp. 1065–1082, June 2014.
[2] H. N. Kim, W. J. Kim, Y. S. Lee, J. H. Seo, S. I. Park, and S. C. Kim,
“An adaptive IIR pre-equalizer for terrestrial DTV transmitters,” IEEE
Transactions on Broadcasting, vol. 53, no. 1, pp. 120–126, March 2007.
[3] S. Alizadeh, H. K. Bizaki, and M. Okhovvat, “Effect of channel estima-
tion error on performance of time reversal-UWB communication system
and its compensation by pre-filter,” IET Communications, vol. 6, no. 12,
pp. 1781–1794, August 2012.
[4] A. Coskun and I. Kale, “All-adaptive blind matched filtering for the equal-
ization and identification of multipath channels - a practical approach,”
IEEE Transactions on Circuits and Systems, vol. 60, no. 1, pp. 232–242,
Jan 2013.
[5] S. Sharma and V. Bhatia, “Performance analysis of filtered PSM signal
using non-matched receiver for UWB communication,” in 2016 IEEE
International Conference on Advanced Networks and Telecommunications
Systems (ANTS), Bangalore, India, 2016.
[6] X. Zhang, H. Li, W. Liu, and J. Qiao, “Iterative IQ imbalance com-
pensation receiver for single carrier transmission,” IEEE Transactions on
Vehicular Technology, vol. 66, no. 9, pp. 8238–8248, Sept 2017.
[7] Xilinx, “FPGA Family,” 2010. [Online]. Available: https://www.xilinx.
com/support/documentation/data sheets/ds180 7Series Overview.pdf
[8] H. Zhang, X. Huang, and Y. J. Guo, “A 20 Gbps digital modem for
high speed wireless backhaul applications,” in 2017 IEEE 85th Vehicular
Technology Conference (VTC Spring), Sydney, Australia, 2017, pp. 1–5.
[9] Spirent, “Spirent Campus Overview,” 2017. [Online]. Available: https:
//support.spirent.com/SpirentCSC/SC knowledgeView?id=TRN10243
