A 30 Gbps Low-Complexity and Real-Time Digital Modem for Wireless Communications at 0.325 THz by Zhang, H et al.
“© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be 
obtained for all other uses, in any current or future media, including 
reprinting/republishing this material for advertising or promotional purposes, creating 
new collective works, for resale or redistribution to servers or lists, or reuse of any 
copyrighted component of this work in other works.” 
 
A 30 Gbps Low-Complexity and Real-Time Digital
Modem for Wireless Communications at 0.325 THz
Hao Zhang, Xiaojing Huang, Ting Zhang, Jian A. Zhang and Y. Jay Guo
Global Big Data Technologies Centre
University of Technology Sydney, Australia
Emails: {Hao.Zhang, Xiaojing.Huang, Ting.Zhang, Andrew.Zhang, Jay.Guo}@uts.edu.au
Abstract—A high-speed wideband terahertz (THz) communi-
cation system with low-complexity and real-time digital signal
processing (DSP) is presented in this paper. The architectures
of baseband platform, intermediate frequency (IF) module and
radio frequency (RF) frontend are described. For real-time DSP
implementation with affordable field programmable gate array
(FPGA) device, some effective strategies are discussed to reduce
resource usage and ensure that the clock constraints are met.
Adopting these strategies, all physical layer DSP modules are
implemented in two FPGAs with more than 300 MHz system
clock. The experimental test results using the developed real-time
digital modem prototype demonstrate the superb performance for
THz wireless communications.
Keywords—Low-Complexity, Real-Time, High-Speed, THz Sys-
tem, and FPGA Implementation.
I. INTRODUCTION
With ever increasing demand for wireless data transmission,
future wireless communication systems will require very high
throughput with hundreds of gigabits per second (Gbps) or
higher data rate. It becomes obvious that the bandwidth of the
future wireless communication systems will increase to tens of
GHz, especially for the backhaul systems. Broadband wireless
communication systems using the terahertz (THz) frequencies
have recently attracted significant interests because of the huge
potential applications by opening a super wide untethered
bandwidth from 100 GHz to 10 THz for new services.
Due to the relatively lower loss for the carrier frequencies
in the 100 – 300 GHz, several THz communication systems
have been reported in recent years. A 2×40 Gbps wireless
communication system using 0.14 THz band oritho-mode
transducer is shown in [1]. A fully electronic 90 Gbps one
meter wireless link at 230 GHz is presented in [2], and
the hardware implementation with parallel sequence spread
spectrum (PSSS) modulation using 230 GHz RF frontend is
also reported in [3]. With 300 GHz carrier frequencies, a
56-Gbps 16-QAM wireless link is demonstrated in [4]. In
these researches, the transmitted signals are all generated by
the arbitrary waveform generator (AWG) or signal generator
and the received data are analyzed by the digital analyzer or
high performance oscilloscope. There is no baseband platform
connecting IF module or/and THz frontend to transmit or
receive data from/to medium access control (MAC) layer to
form a complete communication system.
There are some experimental systems which connect the
baseband platform in either off-line or real-time mode. In [5],
a fixed wireless link at 240 GHz carrier frequency is presented
with the sampling rate at 64 Giga-samples per second (Gsps).
Due to the high sampling rate, the signal processing has
to be performed in off-line. A 300 GHz CMOS transceiver
for THz wireless communication is described in [6] which
can achieve 20 Gbps with lower sampling rate. A real-time
wireless communication system at 0.14 THz is presented in
[7], which has throughput of 5 Gbps and a bandwidth of 1.8
GHz respectively.
However, in all of the above mentioned systems, the
high-speed data rate and real-time signal processing are
not achieved at the same time. In a real-time system, the
FPGA is the most important part for the baseband platform
to implement the DSP modules, such as encoder/decoder,
modulation/de-modulation and channel estimation. There are
two typical ways for implementing the DSP modules with
FPGA or application specific integrated circuit (ASIC), i.e.,
the high-level synthesis (HLS) language and the low-level
language such as hardware description language (HDL). The
HLS language has attracted interests for the system imple-
mentation in recent years. Different applications using HLS
are shown in [8] and [9]. In both works, the resource usage of
the typical cells such as multipliers, block rams (BRAMs) and
look-up tables (LUTs) is less than 43% and the clock speed
can only achieve 200 MHz. For achieving tens or hundreds of
Gbps data rate for wideband wireless communication systems,
high-speed system clock such as that of more than 300
MHz is necessary to reduce the resource usage. Otherwise,
it is very difficult to implement all DSP modules in one or
two affordable FPGA devices for wideband communication
systems. With low-level language, there are a large number
of implementations available for typical modules of commu-
nication systems, such as low density parity check (LDPC)
or Turbo decoder, channel estimation and synchronization.
However, there are few systems which are implemented with
all physical layer DSP modules for wideband communication
links.
In this paper, a real-time digital modem implementation for
high-speed THz wireless communication system is presented.
This THz system consists of digital baseband platform, an
IF module and commercial RF frontend. Their architectures
are described as well. Focusing on the baseband platform, all
modules for the transmitter and receiver of the digital modem
are presented and some effective strategies which can reduce
the resource usage and improve the timing performance are
described. Adopting these strategies, the implementation of a
wideband wireless communication system dealing with some
hundreds of Gbps raw data without guarding interval to/from
DACs and ADCs becomes possible. The total FPGA resource
usage for all DSP modules is provided. The performance of
baseband platform, IF module and RF frontend is evaluated
and real-time experimental test results are given. Comparison
with recently published THz systems which have baseband
processing modules is also made.
The rest of this paper is organized as follows. In Section II,
architectures of the proposed THz communication system as
well as the baseband platform, IF module and RF frontend are
presented. In Section III, some important strategies for FPGA
signal processing implementation are described in details and
the resource usage for all DSP modules is provided. Test setup
and experimental results are shown in Section IV. Finally,
Section V concludes this paper.
II. SYSTEM DESCRIPTION
The high-speed wideband THz communication system pre-
sented in this paper consists of a baseband platform, an
IF module and an RF frontend. Fig. 1 shows the system
architectures with these three parts. The baseband platform
is composed of two DSP units which process 20 Gbps in-
formation data from the MAC layer. The baseband platform
transmits and receives four baseband signals to and from the IF
module by the high-speed digital-to-analog converters (DACs)
and analog-to-digital converters (ADCs) with sampling rate
at 2.5 Gsps. The four bandpass signals are located at both
the upper and lower sidebands of the 15.65 GHz IF carrier.
The IF signal is up-converted and down-converted to/from
325.25 GHz carrier frequency by the 12th harmonics of a local
oscillator (LO) of 25.8 GHz. The signal frequency conversion
from baseband to THz band is shown in Fig. 2.
A. Architecture of Baseband Platform
At the transmitter, typical DSP modules include LDPC
encoder, modulation, Tx filter for sample rate conversion
(SRC) and pulse shaping, and DAC interface. The encoded
bits are mapped into data symbols using 16QAM and the date
symbol rate is 1.875 Gsps for each baseband signal. Therefore,
the total data rate of four baseband signals is 1.875×4×4
= 30 Gbps. The roll-off factor of Tx filter is 1/3 and the
bandwidth of modulated signal is 1.875×(1+1/3) = 2.5 GHz.
After DSP modules, two DACs are adopted for one In-phase











































Fig. 2. Frequency conversion from baseband to RF.
signal processing diagram of the transmitter is shown in Fig.
3.
At the receiver, there are synchronization, channel estima-
tion, receive filter, demodulation, decoding and ADC interface
modules. For each DSP unit, two I/Q modulated baseband
signals are captured by four ADCs sampling at 2.5 Gsps.
Following frame synchronization, the captured preamble is
used for channel estimation. The coefficients of Rx filters are
calculated from channel estimation and the received symbols
are retrieved from the Rx filters. The data demapping and
LDPC decoding processes are followed after recovering the
data symbols. The signal processing block of the receiver is
also shown in Fig. 3.
 
LDPC 
Encoding  Modulation 
    SRC and  
Pulse Shaping 




 ADC Synchroni-    









Fig. 3. Transmitter (upper) and receiver (lower) signal processing diagrams.
From Fig. 3, we see that the receiver architecture is simple
and highly efficient. There is no feedback between processing
modules and each module does not require huge data storage.
All modules are run by the parallel pipelined processing.
Adopting this kind of simple structure, all modules can be
implemented in real-time using affordable hardware. At the
same time, the volume and power consumption of the whole
system can be easily controlled.
B. Architectures of IF Module and RF
At the transmitter, the IF module up-converts the I/Q modu-
lated baseband signals generated by the baseband platform to
15.65 GHz IF carrier. There are total 4 channels of baseband
signals, each with 2.5 GHz bandwidth. Two of the four bands
are first combined to form a 5 GHz bandwidth baseband signal
which is further up-converted to the 15.65 GHz IF with only
lower sideband. Other two of the four bands are then combined
to form another 5 GHz bandwidth baseband signal which is
further up-converted to the 15.65 GHz IF with only upper
sideband. The lower sideband and upper sideband are finally
combined to form a 12.5 GHz bandwidth IF signal centred at
15.65 GHz. A 15.65 GHz pilot frequency is also added for
carrier frequency tracking at the receiver. The combined IF
signal is amplified by a 30 dB amplifier and then up-converted
to 325.25 GHz RF signal as shown in Fig. 2.
At the receiver, the RF signal is down-converted to 15.65
GHz IF signal by the 12th harmonics of the LO. Due to the
power loss between the transmitter and receiver by the RF
frontend, a 30 dB low noise amplifer (LNA) is connected to
increase the power of IF signal. The 15.65 GHz IF signal
is bandpass filtered to obtain the lower sideband and upper
sideband respectively. Each sideband is then down converted to
a 5 GHz bandwidth baseband signal, and two channels of the
2.5 GHz bandwidth baseband signals are finally received by
the baseband platform. Information data bits are subsequently
demodulated by the digital modem.
III. DIGITAL SIGNAL PROCESSING IMPLEMENTATION
Baseband platform adopts two Virtex7-690T devices [10]
which were introduced in 2010 by Xilinx. This device is
popularly used in current industry and academic research. All
modules in Fig. 3 are implemented in a single device and
one device can deal with 15 Gbps data rate by two baseband
signals. Due to the 2.5 Gsps sampling rate of DACs and
ADCs, the number of paralled pipelines is 8 and the system
clock is 2.5GHz / 8 = 312.5 MHz. Due to the huge resources
used for all DSP modules, high-speed system clock such as
those higher than 300MHz poses significant challenge for real-
time DSP. Therefore, some effective strategies are necessary
to reduce the resource usage and optimize timing performance
when they are implemented in FPGA.
A. Implementation Strategies
 Strategy 1: Wide Word-Length Addition Operations
Addition operations are essential when designing the DSP
algorithms. The huge number of addition operations with wide
word-length occupies lots of LUTs and carry-chains in FPGA.
Meanwhile, timing constraints become critical when routing
the addition operations, especially with wide word-length. It
is not possible to perform lots of operations just in one clock
period and hence dividing addition operations into different
clock periods is necessary to meet the high-speed system clock
timing requirements. Because there are two outputs from one
LUT, three additions in one clock period are adopted in our
implementation which occupies the same resources as those of
two additions in one clock period. Meanwhile, the logic levels
between two registers are kept sufficiently simple for the logic
routing to meet high-speed clock timing requirements.
 Strategy 2: Block RAM vs Distributed RAM
When processing all modules, it is necessary to store a
certain number of data using the internal memory in FPGA.
The BRAM can be used for storing a small number of data.
However, timing requirements for the interface of BRAM are
difficult to meet, especially when the percentage of occupied
BRAM is high and a high-speed system clock is used. There-
fore, other kind of memory, the distributed RAM, can be used
to replace the BRAM when the size of needed RAM is not
very large. Because the distributed RAM is composed of 6-
input LUTs, it is quite efficient to use distributed RAM for
short address memory, especially the RAM with 64 addresses.
 Strategy 3: Resource Sharing
There are some modules which are parallelly processed
continuously without any idle clock, such as LDPC encoder,
Tx filters, Rx filters and LDPC decoder. In this case, the
resources which are occupied in each clock period can not
be shared. However, some modules can be processed over
a number of clock periods, such as channel estimation. The
resources used for channel estimation can be shared in dif-
ferent clock intervals. For example, as we know, the function
of fast Fourier transform (FFT) requires some resources in
FPGA. In our design, each IP core of FFT is shared five times.
Moreover, some BRAMs and distributed RAMs are re-used by
different functions. Although a certain amount of logics for
controlling the interface is necessary, the percentage of these
additional resources is quite small compared with that of the
saved resources.
 Strategy 4: Optimized Precision
For the digital modem, a high precision is necessary to
achieve better performance. However, to achieve high preci-
sion, much more resources are required and it becomes more
difficult to meet timing requirements. Therefore, optimized al-
gorithms are needed to reduce the resource usage and maintain
the performance at the same time, such as the functions of
synchronization and Rx filter. The synchronization consists
of coarse timing and fine timing. The coarse timing captures
the training sequences in a transmission frame. After this, the
fine timing calculates the exact synchronization point from
the training sequence segment captured by coarse timing. This
implies that the precision of coarse timing can be reduced by
using less resources. Although the number of bits for ADCs
is as long as 10, the number of bits used for coarse timing
is just 6 in our design. With this reduced precision for coarse
timing, the function of synchronization is still achieved and
there is no adverse effect on the performance of the whole
system. For the Rx filtering, there are lots of taps in the
Rx filter for producing one output symbol. For each symbol,
there are the same number of addition operations which can
be divided into different levels. Since each addition operation
contributes a small portion to the final output, the word length
of data can be reduced after each addition level and satisfactory
performance can still be achieved for the whole system. The
detailed description of the Rx filtering can be found in [11].
After algorithm optimization and implementation, a number
of resources are saved and the timing requirements are much
easier to be met.
 Strategy 5: Huge Storage Avoidance
There are four 10 bits ADCs sampling at 2.5 Gsps, which
means that the throughout of raw data is 4x10x2.5 = 100
Gbps. As we know, the internal memory in FPGA is limited
and valuable. The system complexity will be increased if
the independent external memory is adopted. It is very less
efficient to store a huge number of data in any module when
performing DSP in real-time. Therefore, avoiding huge storage
is necessary for designing and implementing all DSP modules.
 Strategy 6: Specifying Constraints
Placing efficient constraints to some processing logics is
essential for meeting timing requirements, especially when
a high percentage resource usage occurs and/or some high-
speed system clocks are used. Even after optimizing the DSP
algorithms and the HDL codes, efficient constraints are still
necessary to improve the timing when routing the resource
in the whole device. In our implementation, some efficient
constraints are specified to optimize the routing process and
successfully meet timing requirements. Fig. 4 shows some
essential constraints applied to the FPGA device. Due to
the large number of multipliers used in the whole device,
constraints for more than half of the multipliers are applied.
In this way, it is possible to place and route a high percentage
of multipliers, the number of which is limited for an FPGA
device. The green colored seven columns located on the left
side of the device show the multipliers which are restrained.
Meanwhile, there are nearly twenty Physical Block (Pblock)
constraints which are distributed over in the whole device
and each of them is decided after some iterations through the
synthesis and implementation stages by Vivado tool.
 
Fig. 4. Essential constraints applied to FPGA device.
B. Implementation Results
Adopting the strategies presented in Section III.A, the
resource usage for all DSP modules and the report of design
timing summary are shown in Fig. 5 and Fig. 6 respectively.
Although the percentages of LUT and multiplier are 66%
and 63% respectively, timing requirements for all logics can
be still met. Because there are some high-speed clocks for
interfaces of ADCs, DACs, GTHs and USB, the percentage
of global clock buffer (BUFG) is 62.5%. Due to the necessary
replacement of distributed RAM for BRAM, the percentages
of LUTRAM and BRAM are about 23% and 38% respectively.
With a small percentage of BRAM for real-time high-speed
system, the difficulty for meeting the timing requirements of
the high-speed clocks is effectively reduced. For meeting high-
speed clock timing requirements, a large number of flip-flip
(FF) registers are necessary and the percentage of FFs is 49%.
 
Fig. 5. The report of resource usage.
 
Fig. 6. The report of design timing summary.
IV. TEST SETUP AND EXPERIMENTAL RESULTS
A. Test Setup
The baseband platform is composed of two FPGA devices,
each capable of transmitting and receiving two channels of
baseband I/Q signals. The IF transmitter and receiver are con-
nected with the baseband platform and RF frontend via coaxial
cable. Commercial RF Tx and Rx frontends are connected by
the waveguide. A spectrum analyser is used to monitor the
received signal power. A picture of the THz communication








Fig. 7. THz system and test setup.
B. Experimental Results
Before performing the THz communication system test, the
performance of baseband platform should be verified first. Due
to the baseband I/Q modulation architecture, any difference in
terms of delay, phase, and amplitude of DACs and ADCs will
introduce I/Q imbalance. After DACs and ADCs calibration,
the transmitted signal can be looped back via direct DACs
and ADCs connection. The error vector magnitude (EVM) of
the constellation is 4.0% . From this result, we can ensure that
the signal processing produces satisfactory performance for the
16QAM demodulation without IF module and RF frontend.
 
Fig. 8. Four channels of IF signal.
After verifying the performance of baseband platform, the
IF module can be integrated with baseband platform. Fig. 8
shows the four channels of the transmitted IF signals. We
can see that there are some LO leakage in each channel.
In addition, the power of the four channels are not balanced
when transmitting the same signals from baseband platform.
These practical impairments introduce some difficulty to the
DSP algorithms. Therefore, developing robust algorithms is
essential to deal with the impairments from the IF module.
After adopting the optimized algorithms, the EVM of received
signal constellation for the IF module loopback test is 10.3%.
After testing the performance of the IF module, the whole
THz system can be connected with RF frontend together.
The performance is further deteriorated slightly. The EVM
of signal constellation for the whole THz system is 11.6%.
Fig. 9 shows a comparison of the 16QAM constellations
among the baseband platform loopback via direct DAC and
ADC connection, the IF module loopback and the Thz system
connection. With the IF module loopback setup, the measured
BER versus Eb/N0 curve is shown in Fig. 10. From this result,
we can see that excellent performance is achieved for this
wideband system with real-time signal processing.
(a) EVM = 4.0% (b) EVM = 10.3% (c) EVM = 11.6%
Fig. 9. Measured constellations under various loopback tests: Baseband (a),
IF (b) and RF (c).
Fig. 10. BER for the IF looped back.
Table I shows a summary of recently published THz com-
munication systems with baseband platforms implemented.
The system presented in this paper can achieve 30 Gbps
data rate and, more importantly, all the DSP modules are
implemented in real-time.
V. CONCLUSION
In this paper, a high-speed THz system with low-complexity
and real-time digital signal processing is presented. Some
important strategies in FPGA implementation are described for
simultaneously achieving low resource usage and meeting tim-
ing requirements at high-speed system clocks. The experimen-
tal test results using digital and IF hardware prototype verify
the excellent performance of the real-time THz communication
system, achieving an EVM of 11.6%. A comparison with THz
TABLE I
SUMMARY OF RECENTLY PUBLISHED THZ COMMUNICATION SYSTEMS
Ref. [5] [6] [7] This work
Year 2015 2018 2017 2019
EVM 20.4% 12% – 11.6%
Modulation 8PSK 16QAM 16QAM 16QAM
Symbol Rate (Gbaud) 21.33 5 1.25 1.875×4
Sample Rate (Gsps) 64 – 5 2.5×4
Date Rate (Gbps) 64 20 5 30
Bandwidth (GHz) 32 18.4 1.8 2.5×4
Efficiency (Bit/s/Hz) 2 1.08 2.78 3
Type Off-line Off-line Real-time Real-time
systems published recently and implemented in off-line and/or
real-time mode is also made. It is shown that, by adopting the
proposed low-complexity design and effective implementation
strategies, high-speed wideband wireless communications can
be achieved with high performance real-time signal process-
ing.
REFERENCES
[1] C. Lin, B. Lu, C. Wang, and Q. Wu, “A 240 Gbps wireless communica-
tion system using 0.14 THz band oritho-mode transducer,” in 2015 40th
International Conference on Infrared, Millimeter, and Terahertz waves
(IRMMW-THz), Aug 2015, pp. 1–2.
[2] P. Rodrguez-Vzquez, J. Grzyb, N. Sarmah, B. Heinemann, and U. R.
Pfeiffer, “Towards 100 Gbps: A fully electronic 90 Gbps one meter
wireless link at 230 GHz,” in 2018 15th European Radar Conference
(EuRAD), Sep. 2018, pp. 369–372.
[3] K. KrishneGowda, P. Rodrguez-Vzquez, A. C. Wolf, J. Grzyb, U. R.
Pfeiffer, and R. Kraemer, “100 Gbps and beyond: Hardware in the loop
experiments with PSSS modulation using 230 GHz RF frontend,” in
2018 15th Workshop on Positioning, Navigation and Communications
(WPNC), Oct 2018, pp. 1–5.
[4] K. Takano, K. Katayama, S. Amakawa, T. Yoshida, and M. Fujishima,
“56-Gbit/s 16-QAM wireless link with 300-GHz-band CMOS transmit-
ter,” in 2017 IEEE MTT-S International Microwave Symposium (IMS),
June 2017, pp. 793–796.
[5] I. Kallfass, F. Boes, T. Messinger, J. Antes, A. Inam, U. Lewark,
A. Tessmann, and R. Henneberger, “64 Gbit/s transmission over 850 m
fixed wireless link at 240 GHz carrier frequency,” Journal of Infrared,
Millimeter, and Terahertz Waves, vol. 36, no. 2, pp. 221–233, Feb
2015. [Online]. Available: https://doi.org/10.1007/s10762-014-0140-6
[6] S. Hara, K. Takano, K. Katayama, R. Dong, S. Lee, I. Watanabe,
N. Sekine, A. Kasamatsu, T. Yoshida, S. Amakawa, and M. Fujishima,
“300-GHz CMOS transceiver for terahertz wireless communication,” in
2018 Asia-Pacific Microwave Conference (APMC), Nov 2018, pp. 429–
431.
[7] Q. Wu, C. Lin, B. Lu, L. Miao, X. Hao, Z. Wang, Y. Jiang, W. Lei,
X. Den, H. Chen, J. Yao, and J. Zhang, “A 21 km 5 Gbps real
time wireless communication system at 0.14 THz,” in 2017 42nd
International Conference on Infrared, Millimeter, and Terahertz Waves
(IRMMW-THz), Aug 2017, pp. 1–2.
[8] J. Lei, Y. Li, D. Zhao, J. Xie, C.-I. Chang, L. Wu, X. Li,
J. Zhang, and W. Li, “A deep pipelined implementation of
Hyperspectral target detection algorithm on FPGA using HLS,”
Remote Sensing, vol. 10, no. 4, 2018. [Online]. Available: http:
//www.mdpi.com/2072-4292/10/4/516
[9] J. Lei, L. Wu, Y. Li, W. Xie, C.-I. Chang, J. Zhang, and B. Huang, “A
novel FPGA-based architecture for fast automatic target detection in
Hyperspectral images,” Remote Sensing, vol. 11, no. 2, 2019. [Online].
Available: http://www.mdpi.com/2072-4292/11/2/146
[10] Xilinx, “FPGA Family,” 2010. [Online]. Available: https://www.xilinx.
com/support/documentation/data sheets/ds180 7Series Overview.pdf
[11] H. Zhang, X. Huang, and Y. J. Guo, “Low-complexity digital modem
implementation for high-speed point-to-point wireless communications,”
in 2018 18th International Symposium on Communications and Infor-
mation Technologies (ISCIT), Sep. 2018, pp. 16–21.
