Opportunities and Challenges of Digital Signal Processing in Deeply Technology-Scaled Transceivers by Li, Chunshu et al.
Noname manuscript No.
(will be inserted by the editor)
Opportunities and Challenges of Digital Signal Processing
in Deeply Technology-Scaled Transceivers
Chunshu Li · Min Li · Khaled Khalaf · Andre´ Bourdoux · Marian
Verhelst · Mark Ingels · Piet Wambacq · Jan Craninckx · Liesbet Van
Der Perre · Sofie Pollin
Received: date / Accepted: date
Abstract The ever improving cost advantages and
processing capabilities of the technology have been hap-
pening according to the so-called Moore’s Law. Al-
though digital circuits can significantly benefit from the
aggressive scaling, it is very controversial for analog cir-
cuit. However, analog circuit still has to follow the scal-
ing trend because a single chip integration offers key
commercial advantages. To optimally achieve the best
performance/power/cost tradeoff with deeply scaled
technology nodes, there is a clear trend and paradigm
shift towards digital intensive and digitally assisted
transceivers. Successes of such transceivers have been
proven for individual transceiver components and nar-
row band systems. When targeting emerging commu-
nication standards, higher carrier frequencies, further
technology scaling and reconfigurable radios, required
signal processing design and implementation are or-
ders of magnitudes more challenging but potential gains
are promising. Based on a variety of transceiver de-
signs implementing emerging architectures for different
sub-6 GHz and 60 GHz communication systems, we will
highlight the key challenges and opportunities experi-
enced using 40 nm and 28 nm technology nodes.
Chunshu Li · Min Li · Khaled Khalaf · Andre Bourdoux ·
Mark Ingels · Piet Wambacq · Jan Craninckx · Liesbet Van
Der Perre · Sofie Pollin
Deparment of Circuits and Systems, IMEC
Kapeldreef-75, Leuven 3001, Belgium
E-mail: chunshu.li@imec.be
Chunshu Li · Marian Verhelst · Liesbet Van Der Perre · Sofie
Pollin
Deparment of Electrical Engineering, K.U.Levuen
Leuven 3000, Belgium
Khaled Khalaf · Piet Wambacq
Deparment of Electronics and Informatics, Vrije Universiteit
Brussels, Brussels 1000, Belgium
Keywords Signal processing · SDR · Transceiver ·
Moore’s law · Digital intensive · Digital assistance
1 Introduction
Wireless systems are integrating different existing and
evolving wireless access systems that complement each
other for different application areas and communica-
tion environments. To enable seamless and transpar-
ent inter-working between these different wireless access
systems, communication systems are moving towards
an era where ubiquitous connectivity and growing lev-
els of integration will be essential for most applications.
This revolution will not slow down its penetration of so-
ciety for the foreseeable future. There is on one hand
a market pull by an increasingly connected world pop-
ulation asking for vast information resources through
the ubiquitously connected devices. On the other hand,
there is a market push from a hundred-billion-dollar in-
dustry delivering all kinds of communication products
and applications. In this context, mobile devices can
represent a real bottleneck as they incorporate several
concurrent constraints (e.g., in battery life, cost, perfor-
mance, size and weight) that compromise the flexibil-
ity of future networks. On the other hand, as a crucial
enabler for the ever improving communication technol-
ogy, the ever improving cost advantages and processing
capabilities of the CMOS technology have been hap-
pening according to the so-called Moore’s Law. Main-
stream foundries are leaping toward 14 nm node and be-
yond. The digital gate density doubles when scaling to
the next technology node and the computation power
of the digital platforms is improved. Such aggressive
improvements, although only achieved after intense re-
search and development efforts in both the academia
2 Short form of author list
and the industry, drive commercial digital circuits to
continuously scale further.
However, the clear benefits of the digital scaling
can not be directly projected onto traditional analog
circuits. Analog design with scaled technology is very
controversial. On one side, parasitics are lower, intrin-
sic time resolution is increased as well as the transistor
speed. On the other side, the voltage resolution is sub-
stantially decreased and accurate modelling of transis-
tors becomes impossible. In addition, the spread with
respect to nominal corner is much larger. Moreover,
thermal effects, decreased reliability and aging effects
are all degrading analog devices. Due to the above, com-
mercial radio frequency (RF) transceivers often lag be-
hind digital circuits for using frontwave technologies.
More insight can be found in [1]. However, analog cir-
cuit still has to follow the scaling trend of digital design,
because a single chip integration is one of the key ele-
ments toward commercial successes.
To achieve the above challenging goal, instead of
following outdated analog design philosophies to de-
sign future transceivers, digital intensive and digitally
assisted transceivers have been proposed and experi-
mented in the past decade. The concept of digital RF
and digitally assisted RF were proposed in [2][3] and
many designs with such philosophies have been pre-
sented. Besides circuit architecture aspects, signal pro-
cessing aspects have also been considered [4]. Although
the importance of signal processing has been fully rec-
ognized, new challenges are continuously coming. Tar-
geting emerging communication standards and deeply
scaled technologies, there is an order of magnitudes
growth of the demand for sophisticated signal process-
ing design and implementations.
In this paper, we will bring several concrete exam-
ples to illustrate the challenges we recently experienced
and optimizations we did and are still doing. Impor-
tantly, these concrete examples include silicon-proven
40 nm and 28 nm transceivers. In addition, to have a
complete picture, these selected transceivers represent
both sub-6 GHz and 60 GHz communication systems.
The rest of this paper is structured as four sec-
tions. Section 2 will describe the trends. Section 3 will
describe the signal processing for parameter estima-
tion and calibration of reconfigurable/programmable
transceivers. Section 4 will describe challenges and op-
timizations for high performance signal processing in
signal paths for wide band signals and in emerging com-
munication standards. Section 5 concludes the paper
and briefs several promising directions.
2 Trends and Signal Processing Challenges
The signal processing challenges in emerging
transceivers are driven by many trends in com-
munication standardization, transceiver requirement
and technology itself.
First of all, communication systems are continu-
ously shifting toward much wider bandwidth and higher
performance requirement at the same time. Previous
designs for GSM systems worked with hundreds of
kHz bandwidth [5]. However, emerging standards target
orders of magnitudes higher bandwidth, e.g., 80 MHz
in IEEE802.11ac, 100 MHz in LTE-Advanced and
1760 MHz for IEEE802.11ad. Meanwhile, 64QAM be-
comes mandatory in the majority of high rate systems.
256QAM is mandatory in standards such as DVB-T2.
This directly translates into tougher transceiver spec-
ifications. Both of the above aspects desire drastically
different transceiver architectures and associated signal
processing.
In addition, communication systems with higher
carrier frequencies (e.g., 60 GHz) are also emerging.
Since digital intensive transceivers inherently require
signal processing modules working at sampling frequen-
cies closer to intermediate frequencies (IF) or even RF.
Such high carrier frequencies, often combined with very
large bandwidth, substantially increase signal process-
ing complexity. Although the part of digital power con-
sumption is often not considered in some previous pa-
pers, for real life systems this is a must-solve issue.
Moreover, it is very important to mention that
highly reconfigurable (or even software defined)
transceivers are becoming more and more popular in
various systems. The large number of wireless standards
combined with the large diversity of modes strongly mo-
tivate more flexible transceivers. Abundant flexibilities
combined with intensive signal processing (for param-
eter estimation, calibration, compensation, etc.) prove
to be crucial in emerging transceivers [6].
Last but not least, further technology scaling is
continuously imposing new challenges of uncertainties
for transceiver design. Process/Voltage/Temperature
(PVT) variations, device reliability concern and layout
dependent effects are all growing. Although transceivers
can be digital intensive or even fully digital, certain ana-
log issues of a nanoscale circuit will never disappear.
For instance, the discrete time receiver (Rx) in [7] does
suffer from several spurious clock tones, and from lim-
ited anti-alias filtering due to inaccurate clock timings
and charge leakage even with 90 nm technology, which
in principle exhibits much less “evil analog behaviors”
when compared to 40 nm, 28 nm and further. Inherent
analog non-idealities, when combined with digitization,
Short form of title 3
are often more difficult for signal processing algorithms
to estimate and compensate. On top of this, the sheer
complexity of how transceiver components interact with
each other makes this even worse.
3 Parameter Estimation and Calibration in
Reconfigurable Transceivers
Highly reconfigurable transceivers can achieve better
power/area/performance tradeoffs for a variety of dif-
ferent scenarios. However, there are often a large num-
ber of configuration bits controlling different circuit
components. In [8], even a Network on Chip (NOC)
controller has been implemented. Efficient signal pro-
cessing algorithms and implementations are crucial for
both static and dynamic estimation, calibration and
compensation of such transceivers. This is especially
challenging for emerging transceivers that have multi-
ple duplicated signal paths on chip (for MIMO com-
munication [9], beamforming [10], harmonics rejection
[11], nonlinearity cancellation [12] etc.). With deeply
scaled technology, devices in different signal paths will
differ significantly from each other, so that tuning and
compensation signal processing will be crucial. A num-
ber of extended concepts such as “self healing” [13] are
popping up for such purposes. Section 3.1 uses a spe-
cific silicon-proven transceiver to show the substantial
challenges in signal processing for parameter calibration
and compensation. In Section 3.2, we describe a signal
processing optimization scheme for on-line parameter
training to reject harmonic interference (HI) in wide
band receivers.
3.1 Low Power 60 GHz Reconfigurable Transceiver
with Beamforming
At millimeter-wave (mm-wave) frequencies, several gi-
gahertz of unlicensed bandwidth around 60 GHz be-
came available recently across the world. This enabled
research of mm-wave radio chips targeting several giga-
bits per second communication for consumer applica-
tions. Fig.1 shows a chipset that combines direct con-
version with beamforming [10]. It is implemented with
TSMC 40 nm LP digital CMOS technology. Rx and
transmitter (Tx) are both implemented for 4 antenna
paths. With analog baseband beamforming, signal op-
erations at 60 GHz are kept to a minimum. Sensitivity
to small layout parasitics is lower at analog baseband
than at 60 GHz and the inevitable parasitic intercon-
nect capacitances arising from bringing together the an-
tenna paths can be easily absorbed in the baseband fil-
ter capacitors. The RX front-end is based on the front-
Fig. 1 60 GHz beamforming transceiver (a): Receiver; (b):
Transmitter (40 nm LP).
end from [14], where a 2-stage differential Low Noise
Amplifier (LNA) is preceded by an on-chip balun that
provides ElectroStatic Discharge (ESD) protection. In
the Tx, the in-phase (I)/quadrature (Q) baseband in-
put signal is first split over 4 antenna paths in which
phase shifting is applied as well as DC offset compen-
sation. The upconversion mixer is built around a super
source follower which yields high conversion gain. More
details of this chip can be found in [10].
Although the above chip can achieve satisfactory
communication performance with handset compatible
power consumption, signal processing requirement for
tuning this chip is overwhelmingly complicated. For the
Tx part, there are 115 configuration bits for each of the
4 Tx RF paths, and 135 configuration bits for each of
the 2 PLLs. In total the Tx requires 730 bits. For the
Rx part, there are 62 configuration bits for each of the
4 Rx RF paths, again 135 configuration bits for each of
the 2 PLLs, and 437 configuration bits for Rx baseband.
In total the Rx requires 955 bits.
We can see that, even for a transceiver dedicated to
only 60 GHz communications, a large number of flexibil-
ity is designed to combat with uncertain technology pa-
rameters, aging, PVT variations, different RF channels
and run-time scenarios. Optimal performance/power
for each chip of the above design requires optimal tun-
ing of the above 1685 bits, which unfortunately has
no straightforward solutions. For very stable and well
known technology options, the above configuration bits
might be cut by ×2 or even ×3 by sacrificing certain
tuning ranges. However, even 33% of 1685 configuration
bits are still not straightforward to tune. This imposes
4 Short form of author list
substantial challenges for fabrication testing and also
during run-time.
Based on complicated Matlab programs and inter-
facing instruments, optimal calibration and testing for
this chip take hours in the lab. This is not a practical
solution for commercial applications. Due to the high
cost of RF testers in fabrication and testing facilities, a
complete Build In Self Test (BIST) or BIST combined
with a very short RF tester occupation are strongly
preferred. In addition, about 30% to 50% of the total
configuration bits need to be frequently tuned online to
handle frequency hopping, temperature variation, chan-
nel variation and dynamic power optimizations. To per-
form both post-fabrication test and online calibration
in cost efficient ways, smart signal processing and im-
plementations are crucial.
3.2 Wide Band Transceivers with Harmonic Rejection
Software Defined Radio (SDR) Rxs allow to receive any
band of interest over a wide frequency range, and hence
require wide-band receiver design. The required flexible
down-mixing is commonly implemented using switch-
ing mixers. One big challenge of this approach is the
harmonic down-mixing problem: odd-order harmonics
caused by the switching mixer will down-mix RF in-
terfering signals present at multiples of the receive fre-
quency band to the baseband in the Rx, distorting the
desired signal [15].
Traditional designs either use multiple parallel ded-
icated single-band RF filters or an RF tracking filter, as
proposed in [16] and [17] to filter out HIs, which is bulky
and power hungry. Multi-path mixing is a promising al-
ternative solution to handle odd-order harmonic down-
mixing [11]. In this approach, the outputs of multiple
switching mixers are combined, each weighted with an
appropriate weighting factor to approximate the aggre-
gate local oscillator (LO) signal as a pseudo-sinusoid
signal. The closer the aggregate LO signal approximates
a sinusoid, the fewer harmonics it contains. LO2 and
LO3 are typically 45
o and 90o shifted duplicates of LO1
and each of them contains many odd-order harmonics.
It has been shown that an aggregate LO with an exact
weight of
√
2 for LO2 rejects the 3
rd and 5th order HIs
completely.
To reject HIs down to the transceiver noise floor,
normally 60 to 100 dB harmonic rejection ratio is re-
quired. However, the achievable harmonic rejection per-
formance of multi-path mixing solution greatly depends
on the phase and amplitude accuracy in each path.
Phase and gain mismatch in practical implementa-
tions typically limits the harmonic rejection ratio to
30 − 40 dB [18] even with outdated technology node.
Hence, mismatch estimation and compensation need to
be applied to mitigate this problem. Digital intensive
compensation scheme has been proposed to first esti-
mate the phase and gain mismatch among the mixing
paths, and then compute the optimal digital recombina-
tion weights for the mixing paths. For computing the
optimal digital recombination weights for the mixing
paths, building on the developed mathematical frame-
work presented in [15], the coefficients for path recom-
bination for interference estimation can be derived by
Eqn.(1)
 Cancelling harmonics in I pathCancelling image frequencies in I pathCancelling harmonics in Q path
Extracting image frequencies in Q path
 (1)
which leads to:
∑N
n=1
SInFLOnm
<(
∑N
n=1
SInFLOn1)∑N
n=1
SQnFLOnm
=(
∑N
n=1
SQnFLOn1)
 =
 000
0
 ∀m concerned∀m concerned (2)
where FLOnm denotes the complex coefficient of LO’s
mth harmonic in the nth mixing path, which also re-
flect the phase and gain mismatches. SIn and SQn are
the desired digital weighting factors for the nth mixing
path for I and Q, N is the total number of mixing paths
and m represents the order of the harmonics interfer-
ences that need to be rejected. The above can be seen
as solving a linear equation with least square criterion,
which usually involve complex Orthogonal Triangular-
ization (QR) or Singular Value Decomposition (SVD)
methods.
However, for practical transceivers, loop back test-
ing requires iterative calibration among Tx and Rx,
starting from un-calibrated Rx and Tx. In addition,
such a calibration has to be performed very often, sim-
ply because the path mismatches vary a lot depend-
ing on frequency range, signal bandwidth, temperature,
supply voltage, etc.. When allocating 1 micro-second la-
tency budget for such calibrations, the estimated area
for the required computation power (based on proces-
sor) is larger than 0.5 mm2 with 40 nm GP technology.
We propose a low complexity method with four mix-
ing paths capable of fully rejecting any single HI, which
is achieved through an adaptively optimizing HI rejec-
tion scheme [19] and avoids the computational intensive
SR or SVD.
Fig.2 illustrates the system framework for the pro-
posed harmonic rejection (HR) scheme. After low pass
filtering, the down-converted baseband signal in each
path is directly converted to digital by an A/D con-
verter. Equidistant 45o shifted LOs (0−45−90−135o)
Short form of title 5
Fig. 2 System framework for the proposed HR scheme
are provided to the four paths, taking into account un-
avoidable phase error and gain mismatch for each LO.
Each mixing path provides a baseband input signal con-
taining rich distortions due to harmonic down-mixing.
In the digital domain, the primary input, contain-
ing the desired signal and multiple distortions, is con-
structed by a linear recombination of the input paths
[19]. To eliminate the harmonic distortions in this
primary signal, adaptive interference rejection is per-
formed. To this end, a reference input containing the
interference estimation is generated by a second lin-
ear recombination of the input signals, as shown in
Fig.2(b). The working mechanism of the adopted least
mean squares (LMS) adaptive filtering method is to
adaptively adjust the amplitude and phase of the in-
terference estimation to produce an output that is as
close a replica as possible to the distortion components
in the primary input. This output is then subtracted
from the primary input to produce the desired signal
[20].
Two single-tap filtering with each multiplied by
a complex equalization factor (w1, w2) are conducted
in the adaptive filtering engine (AFE), as shown in
Eqn.(3). We introduce time index k in the following
equations for better illustration.
Iin[k] = (Rin[k]× w1∗[k] +R∗in[k]× w2∗[k]) (3)
where Rin is the interference estimation and Iin is the
filtering output, which is a phase and gain adjusted Rin
to approximate the distortion in the primary input.
Adaptive adjustment of w1, w2 is shown in Eqn.(4).
w1[k + 1] = w1[k] + µ× E∗out[k]×Rin[k]
w2[k + 1] = w2[k] + µ× E∗out[k]×R∗in[k]
Eout[k] = Pin[k]− Iin[k] (4)
where Eout[k] is the error signal generated at time k
and is also the system output. µ is the LMS step-size
parameter.
Fig. 3 Simulation result of the 3rd order HI rejection. (a):
Scatter plot of SIR before and after compensation. (b): Prob-
ability of achieved HR
In this HR scheme, the computational intensive ma-
trix operation is avoided and the equalization factors
can be computed on line. To show the robustness of
the proposed method, an unfavorable situation for per-
formance with relative large phase error (2o) and gain
mismatch (6%) in the analog front-end is assumed in
the simulation model. A random 256-QAM modulated
desired signal and a 3rd order HI with input power vary-
ing from 15 to 65 dB stronger is used as the RF input.
Fig.3 shows the simulation performance of our pro-
posed adaptive HR scheme for the 3rd order HI’s re-
jection in the form of scatter plot of improved signal
to interference ratio (SIR) after adaptive compensation
and probability of achieved HR. The simulation was
conducted with different input power of the 3rd order
HI to cover the real RF scenario and without consider-
ing other analog imperfections than the phase and gain
imbalances. It can be seen that more than 80 dB HR
can be achieved for the RF scenario concerned, which
is enough to provide a SIR of more than 20 dB in the
digital domain to guarantee correct demodulation.
4 Very High Performance Signal Processing in
Signal Paths
Digital intensive transceiver design, as the name also
implies, moves digital processing closer to antennas.
Discrete time receivers [7] and digital transmitters [21]
are typical examples. Recently, 60 GHz systems and
wide band sub-6 GHz systems (e.g., 80 MHz 802.11ac
and 100 MHz IEEE LTE-Advanced) are becoming the
target for such designs. A key challenge is that signal
processing needs to work at very high sampling fre-
quency which may even stay close to RF. This very
high sample rate processing, combined with substan-
tial bit width requirement (e.g., 10 to 11 bits for LTE),
creates substantial challenges.
6 Short form of author list
Fig. 4 Block diagram of a mm-wave polar transmitter
4.1 Digital Processing for Low Power 60 GHz Polar
Transmitter
The power amplifier (PA) is usually the most power
hungry block in 60 GHz chips. Moreover, in order to
overcome signal losses at 60 GHz, phased arrays are of-
ten employed and at least the front-ends have to be mul-
tiplied with the same number of antenna paths. This
increases the PA share in the total chip power consump-
tion. Different applications benefit from improving the
PA power efficiency, such as high datarate short-range
portable applications that require minimal power con-
sumption for longer battery lifetime and high datarate
backhaul systems that transmit with high output pow-
ers for longer range communication. Most 60 GHz PAs
operate in class-A linear mode [10][22][23] due to the use
of variable envelope modulations that are required for
high datarates and high spectral efficiency. This causes
the PA to work at power efficiency values of less than
5% although values up to 30% could be achieved [10].
In order to improve the PA power efficiency, the PA
needs to work in its nonlinear region to utilize the peak
efficiency. The polar architecture is one interesting solu-
tion that allows the PA to operate in saturation with-
out the need for duplicating the signal path or using
power combiners. As shown in Fig. 4, the phase sig-
nal goes to the PA, while the amplitude is extracted
and applied to the PA through a separate modulation
path. Polar conversion can be done with the digital sig-
nal processing to avoid the need of an RF limiter that
can introduce extra nonlinearity and bandwidth limi-
tations. The amplitude signal can then digitally modu-
late an RF digital-to-analog converter (DAC) working
as a variable-size PA. This eliminates the need to have
an additional RF amplitude detection circuit and also
avoids modulating the supply.
The non-linear transformation from rectangular sig-
nals to polar signals broadens the spectrum. Fig.5 (a)
depicts the power spectral density (PSD) of the rect-
angular signal, which is compliant with the spectrum
mask of IEEE 802.11ad [24]. After non-linear conver-
sion to polar signals, the spectrum of the converted sig-
nal greatly expands, as shown in Fig.5 (b). To avoid
Fig. 5 Signal spectrum expansion due to rectangular to polar
transformation
the spectrum overlap due to expansion after conver-
sion, the rectangular signal needs to be firstly upsam-
pled and digitally filtered before converting to polar sig-
nal. The first residual image after oversampling appears
at an offset equal to the sampling frequency. For a sym-
bol rate of 1760MS/s (according to the IEEE802.11ad
standard), an oversamping ratio (OSR) of at least 6
is normally required to avoid the first residual image
locate in the RF band of 802.11ad standard spanning
from 57 GHz to 66 GHz.
To have better knowledge of the implementation
complexity of signal processing involved in the rect-
angular to polar conversion, the quantization accura-
cies of rectangular signal and converted polar signal
are analyzed here. Complete 802.11ad transmission sys-
tem with 16QAM modulation was modeled using Mat-
lab. Error vector magnitude (EVM) results shown in
Fig. 6 (a) indicates a notable improvement when in-
creasing accuracy of the transmitted rectangular signal
(IQ denoted in the figure) from 6 bits to 7 bits, while
minor improvement with further increase. Fig. 6 (b)
then depicts the EVM results with fixed 7-bit rectan-
gular signal and for multiple resolutions of the polar
signal. Since the allowable constellation error should
not be worse than −21 dB for 16QAM modulation in
the 802.11ad standard, 7-bit phase signal and 5-bit am-
plitude signal are chosen to obtain −31 dB EVM per-
formance when targeting 10 dB’s design margin. Note
that although there are multiple choices of quantiza-
tion accuracies to achieve −31 dB EVM, the one with
minimum bits of amplitude signal is chosen to ease the
layout when routing the digital amplitude bit-wires to
the PA. Fig. 7 shows the PSD of the output signal with
7, 5 and 7 bits for rectangular signal, converted am-
plitude signal and phase signal respectively, which is
compliant with the spectrum mask.
In this architecture, the DSP needs to operate at
very high speeds, which can generate a bottleneck in
Short form of title 7
Table 1 Power Consumption Budget
Scenario Psat Pout per FE PA Pdc PA PAE @Pout Total Pout FE Pdc Total Pdc Total eff.
Back-off 14 dBm 5.8 dBm (P5dB) 78 mW 4.9% 17.2 dBm1 110 mW 724 mW 7.25 %
Polar@same PA 14 dBm 9 dBm (Psat, avg) 43.7 mW2 18.2% 20.4 dBm 73.7 mW 578.8 mW 18.94 %
Polar @same Pout 10.8 dBm 5.8 dBm 20.9 mW 3 18.2% 17.2 dBm 50.9 mW 487.6 mW 10.76 %
1 A measured value of 11.4 dB is considered for the 4-antenna paths.
2 The 5 dB PAPR corresponds to RFDAC size of 0.56× the full size.
3 Assuming the same PAE@Psat of 32.2%
Fig. 6 (a): Output EVM results in terms of quantization accuracies of rectangular and polar signals; (b):Output EVM results
in terms of quantization accuracies of polar signals with 7-bit rectangular signals
Fig. 7 Output signal spectrum
the power budget of the polar solution. The following
power consumption calculations can be used to estimate
the power budget for the DSP. Taking the phased array
chip of [10] as a reference, Fig. 1 (b) shows the top-level
Tx architecture with 4-antenna paths. If the same chip
is used in polar mode, the PA will be replaced by a
variable size RFDAC for amplitude modulation. Table
1 shows the advantage of the output power, power con-
sumption and efficiency values for a chip used in po-
lar and non-polar modes. With a 5 dB Peak-to-Average
Power Ratio (PAPR), the linear PA operates at 5 dB
Fig. 8 RFDAC power characteristics with different sizes
showing the linear and polar operating modes of Table.1
back-off from the 1 dB compression point of 10.8 dBm,
which gives a PA efficiency of 4.9% compared to the
maximum value of 32.2% in saturation. If the same chip
is used in polar mode, the RFDAC input includes only
phase information and is allowed to operate in the sat-
uration region. With a PAPR of 5 dB, the amplitude
will modulate the RFDAC such that the average out-
put power is 5 dB less than the peak saturated 14 dBm
(see Fig.8). This causes the average RFDAC size to be
10(−5/20) = 0.56× the full size, and the power con-
sumption to reduce with the same factor. The PA op-
8 Short form of author list
Fig. 9 A typical architecture of digital RF transmitter
erating efficiency is then 18.2% in the polar mode com-
pared to 4.9% in the linear mode. The polar-mode total
Tx output power is 3 dB higher than the linear mode,
and the total Tx efficiency goes to 18.94% compared
to 7.25% in the linear mode. A fair evaluation to the
power consumption advantage of the polar mode in this
chip should include the same analysis at the same Tx
output power. Assuming the same peak saturated ef-
ficiency, the total Tx power consumption in the polar
mode reduces to 487.6 mW compared to 724 mW at the
same output power. In order for the DSP to have a mi-
nor influence on the total power budget, a value of 10%
of the total Tx power consumption should be consid-
ered. This concludes an average of 50 mW for the extra
digital processing required for the polar operation.
The above analysis puts a challenging task on the
optimization of algorithms and design techniques of the
additional signal processing circuitry. The 50 mW bud-
get for signal processing power consumption needs to
cover 10560 (1760x6) Msps I/Q to phase/amplitude
transformation. Aggressive algorithms and circuit level
optimizations are being explored to achieve this target.
4.2 Digital Correction for Sub-6 GHz Quadrature
Digital RF Transmitter
A fully flexible digital RF transmitter (DTX),which ex-
plores intensive digital implementation of radio func-
tions, has been a hot topic in both the academia and
the industry in past years. With DTX, retargeting sys-
tem requirements can be achieved by reprogramming
the digital circuits rather than analog redesign, which
is usually costly and time consuming. A typical high
level architecture of a DTX is shown in Fig.9, which
features digital mixing and hence requires an ultra-
wideband DAC. Current-steering (CS) architecture is
widely adopted in the wideband DAC design for its ad-
vantage in speed and accuracy. A popular segmented
architecture of it is shown in Fig.10. Generally, binary
scaled current sources are steered by the least signifi-
Fig. 10 Segmented architecture of CS RFDAC
cant bits (LSBs) and an array of unary current sources
are steered by the thermometer-decoded most signifi-
cant bits (MSBs).
A big advantage of DTX over the alternative that
consists of multiple single-band transmitters to enable
a multi-mode transmitter is the significantly reduced
size and power consumption. This holds especially when
considering the continuously scaling technology. How-
ever, with the decreasing dimension of transistors to
nano-scale level, the design of this DTX becomes more
and more challenging. Problems pinpointed as perfor-
mance limitations are listed below:
- random errors: Random amplitude errors exist in
the current sources in the CS DAC. The random er-
rors are due to process variations in manufacture and
determined by the dimensions of the current source.
Increasing the active area of each current source is
the most effective method for reducing the random
errors. However, in DAC with high accuracy, this ap-
proach results in large dimension arrays which may
then lead to significant gradient and systematic errors
[25].
- gradient errors in CS DAC: Gradient errors are
significant in CS DAC with over 10-bit linearity.
Main sources of gradient errors are modulated out-
put impedance and gradient amplitude errors in cur-
rent sources due to voltage drop in supply lines and
technology-related errors (e.g., doping, oxide thick-
ness gradient) [26].
- systematic errors in digital mixing: The mix-
ing function of LO and transmitted signal is shifted
from analog domain to digital domain in the DTX. As
shown in Fig.10, square wave LO signal is generated
by digital circuits and routed to digital mixing block
to up-convert the transmitted signal. Phase and duty
cycle mismatch in the generated I and Q LOs directly
Short form of title 9
cause quadrature modulation errors in the transmit-
ted signal. Furthermore, the introduced distortion can
depend on the inner product of transmitted I and Q
signals [27], which makes the existing solutions to IQ
imbalance correction not feasible any more. In addi-
tion, the current cells in CS DAC are switched by each
specific bit in I and Q signals. The delay mismatch
among bit-wires from the digital mixing block to RF-
DAC block will lead to I and Q interference errors
significantly different from the traditionally studied
I/Q phase/gain imbalance problem. Actually, the in-
terference is between bits, not only between I and Q
signals, which imposes big challenge to designers for
correcting this problem.
Besides the common “AM-AM” and “AM-PM” er-
rors extensively discussed in non-linear PA, signifi-
cant “PM-AM” and “PM-PM” errors are also intro-
duced by delay mismatch and duty cycle mismatch
[27]. The “PM-AM” and “PM-PM” errors from the
bit-grained I and Q interference increases the compen-
sation complexity exponentially, since it requires I/Q
co-addressing based correction rather than the tradi-
tional amplitude-addressing scheme widely adopted in
compensation of non-linear PA. In a most recent ISSCC
paper on DTX, this technique is formulated as “2-D
Lookup table” [28]. Importantly, the signal processing
block (to compensate the above non-idealities) will stay
between the filtering block and the digital mixing block.
Due to the upsampling with high OSR, the signal pro-
cessing block usually works in the sampling frequency
range 800 Msps to 1600 Msps.
The philosophy that we propose is similar to [28],
but with much finer grain and is based on multi-
dimensional polynomial approximations. The originally
transmitted signal (BBI and BBQ) is pre-distorted
to another value (BB′I and BB
′
Q) which corresponds
an output OUT ′ with a minimum root mean square
(RMS) error by reference to BBI + jBBQ. The pre-
distortion operation is conducted by polynomial trans-
formation. The default polynomial parameters are cal-
culated based on a prior knowledge of distortion mea-
sured from test chip implemented in 28 nm CMOS tech-
nology. The update of the polynomial parameters is
conducted at predetermined time instances, such as sys-
tem setup or LO frequency switch. To reduce the time
and computational complexity needed for update, this
procedure is conducted block by block. As shown in
Fig.11, all the possible input data pool is segmented
into blocks. Each block has its own transforming poly-
nomial function whose parameters are updated at one
time instance.
The off-line update of each block’s transforming
polynomial function combines three steps. Firstly, all
the test data within the block will be sent out, de-
tected at the Tx output and fed back to baseband.
Secondly, mapping relation between BBI + jBBQ and
BB′I + jBB
′
Q is created. The bigger is the block, the
more candidates of BB′I + jBB
′
Q need to be checked to
find the one with a minimum output RMS error refer-
ring to BBI +jBBQ. Thirdly, after mapping relation is
created for each possible BBI+jBBQ within one block,
a polynomial function can be derived for the mapping
curve fitting. Eqn.(5) depicts how a Tth-order polyno-
mial function is constructed, where the PI(Q)(i,q) is the
polynomial parameter for element BBiI ×BBqQ.
BB′I(t) =
i+q≤T∑
i=0,q=0
PI(i,q) ×BBiI ×BBqQ
BB′Q(t) =
i+q≤T∑
i=0,q=0
PQ(i,q) ×BBiI ×BBqQ (5)
The method of least squares is used to find the opti-
mal polynomial parameters based on the created map-
ping relations. Assuming N digital inputs (N mapping
relations) exist in one block, a matrix equation with
polynomial parameters can then be built as follows: BB
′
I(1)
BB′I(2)
...
BB′I(N)

︸ ︷︷ ︸
b
=
[
BBiI(1)BB
q
Q(1) ∀i+ q ≤ T
...
BBiI(N)BB
q
Q(N) ∀i+ q ≤ T
]
︸ ︷︷ ︸
A
×
[
PI(i,q)
∀i+ q ≤ T
]
︸ ︷︷ ︸
x
(6)
The polynomial parameters can then be calculated by
x = (ATA)−1AT b. (7)
To have a more clear perspective of design choice,
Tab.2 concludes achieved root mean square (RMS) er-
ror reduction at different design options, i.e. segmented
block number and polynomial orders.
If without any segmentation, as expected, the RMS
error reduction is much worse than those of segmented
options. For the segmented options, it can be seen that
the best option for performance is with 36 segmented
blocks which can achieve over 20 dB RMS error reduc-
tion. Performance becomes worse with even more blocks
due to edge effects. With large number of blocks, the
needed polynomial order can be reduced, so does the
computational load. When the number of blocks is more
than 36, minor performance improvement can achieved
by increasing the polynomial order from 1 to 3.
The computational complexity mainly exists in up-
dating the polynomial parameters for each block, in
10 Short form of author list
Fig. 11 Block segmentation scheme and self-correction flow
Table 2 RMS error reduction of transmitter output at different design options
RMS error reduction of I data (dB) RMS error reduction of Q data (dB)
segmented block number
400 100 36 9 4 1 400 100 36 9 4 1
o
rd
er
1 10.6 15.1 16.7 12.8 6.5 3.3 11.6 15.3 16.7 11.2 5.7 2.9
2 10.7 16.2 19.5 18.7 16.2 3.4 11.8 16.5 18.5 17.7 15.5 3.0
3 10.8 16.4 22.0 20.5 19.5 15.4 12.0 16.8 19.7 19.3 17.9 14.2
which a matrix with N ×M elements needs to be in-
verted, where N is the number of digital inputs in one
block and M is determined by the polynomial order
according to Eqn.(5) (M = 3, 6, 10 for T = 1, 2, 3).
Assuming the arithmetic with individual element has
complexity O(1), the computational complexity in up-
dating polynomial parameters for one block can be de-
noted as O(M2N) + O(M3) [29]. With segmentation,
the block size, and also the needed polynomial order
can be made smaller. The computational complexity,
and hence the needed time and power consumption, for
one blocks update in one predetermined time instance
can be greatly reduced as well.
5 Conclusions and Future Work
In this paper, we went through several examples to de-
pict the signal processing opportunities and challenges
we experienced in recent and ongoing design activities.
Signal processing together with circuit architecture are
the key enablers for digital intensive and digitally as-
sisted transceivers. When targeting emerging commu-
nication standards and deeply scaled technology nodes,
signal processing design and implementation are much
more challenging than before. For the parameter esti-
mation, calibration and compensation of analog non-
idealities (or self healing), although recent papers [13]
built very dedicated circuits for signal processing, we
see a clear need to have flexible processors on chip run-
ning versatile algorithms that can even perform model
identification by themselves. Importantly these algo-
rithms can be reprogrammed at any time. The high
performance signal processing in signal paths also call
for disruptive solutions due to the high duty cycle, high
sampling rate and stringent power/area constraints.
Aggressive algorithm/architecture optimizations, ana-
log driven digital design techniques and stochastic com-
putation techniques are considered to be promising in
this context.
References
1. K. Lee, I. Nam, I. Kwon, J. Gil, K. Han, S. Park, and B.I.
Seo, The impact of semiconductor technology scaling on
cmos rf and digital circuits for wireless application, Electron
Devices, IEEE Transactions on, 52(7):1415-1422 (2005).
2. B. Murmann, Digitally assisted analog circuits. Micro,
IEEE, 26(2):38-47 (2006).
3. R. Staszewski, K. Muhammad, D. Leipold, C.-M. Hung,
Y.C. Ho, J. Wallberg, C. Fernando, K. Maggio, R.
Staszewski, T. Jung, J. Koh, S. John, I. Y. Deng, V. Sarda,
O. Moreira-Tamayo, V. Mayega, R. Katz, O. Friedman, O.
Eliezer, E. de Obaldia, and P. Balsara, All-digital tx fre-
quency synthesizer and discrete-time receiver for bluetooth
radio in 130nm cmos, Solid-State Circuits, IEEE Journal of,
39(12):2278-2291 (2004).
4. J. Dabrowski and R. Ramzan, Built-in loopback test for ic
rf transceivers, Very Large Scale Integration (VLSI) Systems,
IEEE Transactions on, 18(6):933-946 (2010).
5. R. Staszewski, R. Staszewski, T. Jung, T. Murphy, I.
Bashir, O. Eliezer, K. Muhammad, and M. Entezari, Soft-
ware assisted digital rf processor for single-chip gsm radio in
90nm cmos, Solid-State Circuits, IEEE Journal of, 45(2):276-
288 (2010).
Short form of title 11
6. J. Borremans, G. Mandal, V. Giannini, B. Debaillie, M.
Ingels, T. Sano, B. Verbruggen and J. Craninckx, A 40
nm cmos 0.4-6ghz receiver resilient to out-of-band blcok-
ers, Solid-State Circuits, IEEE Journal of, 46(7):1659-1671
(2011).
7. J. Craninckx, Cmos software-defined radio transceivers:
Analog design in digital technology, Communications Mag-
azine, IEEE, 50(4):136-144 (2012).
8. V. Giannini, P. Nuzzo, C. Soens, K. Vengattaramane, M.
Steyaert, J. Ryckaert, M. Goffioul, B. Debaillie, J. Van
Driessche, J. Craninckx, and M. Ingels, A 2mm2 0.1to-5ghz
sdr receiver in 45nm digital cmos, In Solid-State Circuits
Conference -Digest of Technical Papers (ISSCC), 2009, IEEE
International, pages 408-409 (2009).
9. M. Wickert, U. Mayer, and F. Ellinger, 802.11a compliant
spatial diversity receiver ic in bicmos, Microwave Theory and
Techniques, IEEE Transactions on, 60(4):1097-1104 (2012).
10. V. Vidojkovic, V. Szortyka, K. Khalaf, G. Mangraviti, S.
Brebels, W. v. Thillo, K. Vaesen, B. Parvais, V. Issakov, M.
Libois, M. Matsuo, J. Long, C. Soens, and P. Wambacq, A
low-power radio chipset in 40nm lp cmos with beamforming
for 60ghz high-data-rate wireless communication, In Solid-
State Circuits Conference Digest of Technical Papers (ISSCC),
2013, IEEE International, pages 236-237 (2013).
11. H.-K. Cha, S.-S. Song, H.-T. Kim, and K. Lee, A cmos
harmonic rejection mixer with mismatch calibration cir-
cuitry for digital tv tuner applications, Microwave and Wire-
less Components Letters, IEEE, 18(9):617-619 (2008).
12. E. Klumperink, R. Shrestha, E. Mensink, G. Wienk, Z.
Ru, and B. Nauta, Multipath polyphase circuits and their
application to rf transceivers, In Circuits and Systems (IS-
CAS), 2007, IEEE International Symposium on, pages 273-
276 (2007).
13. A. Tang, F. Hsiao, D. Murphy, I.-N. Ku, J. Liu, S.
DSouza, N.-Y. Wang, H. Wu, Y.-H. Wang, M. Tang, G. Vir-
bila, M. Pham, D. Yang, Q. Gu, Y.-C. Wu, Y.-C. Kuan, C.
Chien, and M. Chang, A low-overhead self-healing embed-
ded system for ensuring high yield and long-term sustain-
ability of 60ghz 4gb/s radio-on-a-chip, In Solid-State Cir-
cuits Conference Digest of Technical Papers (ISSCC), 2012,
IEEE International, pages 316-318 (2012).
14. V. Vidojkovic, G. Mangraviti, K. Khalaf, V. Szortyka,
K. Vaesen, W. Van Thillo, B. Parvais, M. Libois, S. Thijs,
J. Long, C. Soens, and P. Wambacq, A low-power 57to-
66ghz transceiver in 40nm lp cmos with -17db evm at 7gb/s,
In Solid-State Circuits Conference Digest of Technical Papers
(ISSCC), 2012 IEEE International, pages 268-270 (2012).
15. C. Li, M. Li, M. Verhelst, A. Bourdoux, J. Borremans,
S. Pollin, A. Chiumento, L. Van der Perre, and R. Lauw-
ereins, A generic framework for optimizing digital intensive
harmonic rejection receivers, In Signal Processing Systems
(SiPS), 2012, IEEE Workshop on, pages 167-172 (2012).
16. O. Gaborieau et al., “A SAW-less multiband WEDGE
receiver,” IEEE ISSCC Dig. Tech. Papers, pp. 114-115, Feb.
2009.
17. Y. Sun et al., “A 50-300-MHz low power and high linear
active RF tracking filter for digital TV tuner ICs,” IEEE
Custom Integrated Circuits Conference, pp. 1-4, 2010.
18. Z. Ru, N. Moseley, E. A. M. Klumperink, and B. Nauta,
Digitally enhanced software-defined radio receiver robust to
out-of-band interference, Solid-State Circuits, IEEE Journal
of, 44(12):3359-3375 (2009).
19. C. Li et al., “Adaptive filter based low complexity digital
intensive harmonic rejection for SDR receiver, ” Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE Interna-
tional Conference on, pp. 2712-2715, May 2013.
20. S. Haykin, “Adaptive Filter Theory,” Upper Saddle River,
NJ: Prentice Hall, 2002.
21. S. Balasubramanian, S. Boumaiza, H. Sarbishaei, T.
Quach, P. Orlando, J. Volakis, G. Creech, J. Wilson, and W.
Khalil, Ultimate Transmission, Microwave Magazine, IEEE,
13(1):64-82 (2012).
22. W. Chen et. al., “A 60GHz-Band 22 Phased-Array Trans-
mitter in 65nm CMOS, In Solid-State Circuits Conference Di-
gest of Technical Papers (ISSCC), 2013, IEEE International,
pages 42-43 (2010).
23. M. Nariman, et al., “A Compact Millimeter-Wave En-
ergy Transmission System for Wireless Applications,” RFIC
Symposium, pp. 407410, Jun. 2013.
24. IEEE Std 802.11adTM -2012
25. Y. Cong et al., “Switching sequence optimization for gra-
dient error compensation in thermometer-decoded DAC ar-
rays,” IEEE Trans. on Circuits and Systems-II: Analog and
digital signal processing, vol. 47, No. 7, pp. 585-595, July
2000.
26. Geert A. M. Van der Plas et al., “A 14-bit intrinsic Accu-
racy Q2 random walk CMOS DAC,” IEEE J. of Solid-state
Circuits, vol. 34, no. 12, pp. 1708-1718, Dec. 1999.
27. C. Li et al., Efficient self-correction scheme for static non-
idealities in nano-scale quadrature digital RF transmitters,
In Signal Processing Systems (SIPS), 2013, IEEE Workshop
on, pages 71-76 (2012).
28. C. Lu, H. Wang, C. H. Peng, A. Goel, S. W. Son, P.
Liang, A. Niknejad, H. C. Hwang and G. Chien, A 24.7dbm
all-digital rf transmitter for multi-mode broadband appli-
cations in 40nm cmos, In Solid-State Circuits Conference Di-
gest of Technical Papers (ISSCC), 2013, IEEE International,
pages 332-334 (2012).
29. A.J. Stothers, “On the complexity of Matrix Multiplica-
tion,” PhD Thesis, University of Edinburgh, 2010.
