A Programmable Calibration/BIST Engine for RF and Analog Blocks in SoCs Integrated in a 32 nm CMOS WiFi Transceiver by Carballido, J. et al.
  
 
 
 
 
 
 
 
Citation Carballido J., Hermosillo J., Veloz A., Arditti D., Del Rio A., Borrayo E., 
Guzman M., Lakdawala H., Verhelst M. (2013) 
A Programmable Calibration/BIST Engine for RF and Analog Blocks in 
SoCs Integrated in a 32 nm CMOS WiFi Transceiver 
IEEE Journal of Solid-State Circuits; volume 48, issue 7, pages:1669-1679 
Archived version Author manuscript: the content is identical to the content of the published 
paper, but without the final typesetting by the publisher 
 
Published version https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6488888 
Journal homepage https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4. 
Author contact Email: Marian.Verhelst@esat.kuleuven.be 
Phone: + 32 (0)16 328617 
IR url in Lirias https://lirias.kuleuven.be/handle/123456789/437892  
 
(article begins on next page) 
1A Programmable Calibration/BIST Engine for
RF and Analog Blocks in SoCs Integrated
in a 32nm CMOS WiFi Transceiver
Jorge Carballido∗, Jorge Hermosillo∗, Arturo Veloz∗, David Arditti∗,
Alberto Del Rı´o∗, Edgar Borrayo∗, Manuel E. Guzma´n∗, Hasnain Lakdawala†,
Marian Verhelst‡
Abstract
This paper presents a flexible and portable digital framework for Built-in Self-Test (BIST) and
calibration of RF/analog circuitry. Novel to the proposed testing framework, is a reusable, flexible, drop-
in IP core, composed of a centralized custom processing engine with data path, memory architecture and
instruction set optimized for efficient execution of compute intensive test and calibration algorithms. The
innovative BIST engine is complemented with a calibration and test sequencing methodology exploiting
the embedded test hardware, to dynamically correct for transceiver imbalances and non-idealities, as
well as to estimate performance parameters such as Error Vector Magnitude (EVM). The engine has
been integrated with a WiFi transceiver in a 32nm SoC test chip to demonstrate the functionality of
this framework. This implementation covers an area of 0.63mm2 and provides similar performance
(e.g. improvements up to 10dB in EVM for Rx IQ imbalance compensation) to off-chip testing without
relying on expensive equipment.
*Authors are with the System Integration and Adaptivity group, Intel Labs, Guadalajara, Jalisco, Mexico
†Author is with the Radio Integration Lab, Intel Labs, Hillsboro, Oregon, USA
‡Author is with the Dept. Elektrotechniek ESAT-MICAS, KULeuven, Leuven, Belgium
2Index Terms
calibration, testing, radio, transceiver, IQ imbalance, PAPD, SoC, RF, manufacturing.
I. INTRODUCTION
Demand for the integration on silicon of increasingly complex systems in a constrained
time frame and at reduced costs, is constantly pushing designers to stretch the limitations of
technology. In a similar manner, for wireless communication systems, the ongoing push towards
improved functionality, such as ubiquitous connectivity and higher data rates; usage of advanced
manufacturing silicon technology; reduced Time to Market (TTM) and high power efficiency
severely complicates the analog and RF design. Consequently, the major issues to be tackled
in current mixed-signal designs are coming from the increased parametric variations due to
manufacturing process scaling, the integration of multiple Radio Frequency (RF) front-ends
(WiFi, LTE, Bluetooh, GPS, etc.) [1] and analog blocks (Voltage Regulators, Digital Frequency
Generators, Sensors, etc.) [2] into deep submicron CMOS technologies and the increased cost
for testing and calibrating RF/analog circuitry [3]. Different approaches for the aforementioned
problems have been proposed within academia and industry. Typically conservative over-design
[4] is diminished by the insertion of dedicated calibration logic to compensate for some of the
non-idealities in the transceiver [5]. However, such approach suffers from very limited flexibility
and scalability by solving only one impairment at a time. Another way to tackle these issues is
to extend the baseband processor to perform BIST and calibration algorithms [6]. However, this
implies a strict limitation of localized time slots for the algorithms execution during idle times
of the engine, as well as an inefficient data path and instruction set for this type of operations. A
third existing approach is the integration of a small dedicated test controller within the transceiver
[7], [8]. Although this solution offers improved flexibility, it lacks required computational power
to perform some of the more complex calibration algorithms, e.g. frequency selective calibration
approaches [9], [10]. To overcome the limitations of these alternatives, this paper presents a
3framework solving these challenges in a more global, robust, flexible and portable manner. The
proposed solution is based on a fully programmable BIST and calibration engine for RF and
analog blocks in SoCs. The engine was integrated into a 32nm SoC with a Dual Atom Core
and a WiFi transceiver and several algorithms have been developed and tested to demonstrate
the overall functionality of the proposed solution.
This paper is organized as follows: Section II describes the different components of the
architecture, whereas Section III describes its hardware implementation. Subsequently, Section
IV provides a description of the framework’s operation principle and the specific calibration
sequence that needs to be executed to decouple imbalances from Tx and Rx. In section V,
several lab measurement results are presented, demonstrating the engine’s performance. Section
VI concludes the paper.
II. BASIC OPERATING PRINCIPLE AND MACRO ARCHITECTURE
As described in the introduction, RF and analog circuitry testing and calibration have become
very expensive and hence an efficient and cost effective solution is needed. On-chip integration
of these capabilities can offer test time reduction, a reduction of external equipment cost, as
well as allow run-time re-calibration and testing. The proposed Calibration and Test engine
(CaT-engine) serves as a digital IP block to consolidate this run-time and test-time operation
into one central, custom processing core tailored to transceiver calibration and test workloads.
This digital-intensive approach will moreover benefit from ongoing cost scaling through Moore’s
law and does not interfere with the regular operation of the transceiver.
A. Basic Operating Principle
CaT-engine enables on-die self-test and self-calibration through a) generating a stimulus signal
and injecting this into the transmission chain; b) configuring the transceiver front-end to loop
back the signal in the analog or RF domain, or through an envelope detector; c) capturing the
4loop-backed signal after receiver digitization; d) post-processing the received signal to estimate
impairments of the mixed-signal, analog and RF blocks visited underway; and e) compute radio
configuration updates to compensate these impairments and improve radio performance (in case
of calibration, not needed for test). As indicated in Fig. 1, these configuration updates can
involve setting coefficients of digital pre- or post-distortion filters (e.g. for IQ imbalance, or
PA pre-distortion), as well as updating analog tuning knobs (biases, DACs, etc.). Due to super-
position of all impairments encountered by the signal traversing the loop-back path, individual
calibration algorithms have to be sequenced carefully, exploiting the availability of a variety of
distinct loopbacks present in the radio front-end. A more detailed operation principle description
and proposed calibration sequencing approach is presented in Section IV.
B. Macro Architecture
Fig. 1 shows CaT-engine included in a wireless transceiver front-end. While CaT-engine is not
part of the normal transceiver signal chain; it is capable of injecting signals into the transmission
chain, as well as probe signals at various places of the Tx and Rx digital paths. Fig. 2 shows
the main components that the engine is composed of. This engine is organized around a central
processing core, whose data path and instruction set have been modified to optimize the execution
of typical operations present in calibration and test algorithms. Data is fed to the processor
through several input buffers, while signals can be generated through a stimuli generation
accelerator coupled to the core. The microprocessor communicates configuration updates to
the transceiver in a memory-mapped way through the configuration and status registers (CSRs).
Due to the generic operating principle of CaT-engine, and inherent flexibility, the presented
architecture is radio-standard agnostic and does not pose specific considerations for the type of
transceiver under test. A more detailed explanation of these blocks is presented in Section III.
5III. MICRO ARCHITECTURE AND HARDWARE IMPLEMENTATION
This section describes the details of the different CaT-engine sub-blocks. Integration into a
WiFi transceiver and Dual Core Atom SoC in 32nm CMOS process technology is discussed.
A. Signal Generator
As shown in Fig. 2, the core has been extended with a dedicated stimulus signal generator
block. This component is a memory mapped accelerator for the processor. It relies on an internal
memory to hold a collection of samples enabling to generate almost any kind of signal; only
limited by the memory depth. A small set of registers, programmed by the processor, instructs
the accelerator how to play back the samples from the memory. The block is capable to play the
memory samples in a forward and backward order; to invert their sign, and to read out samples
following a simple address pattern, such as using one sample from memory and skipping the
next one. The advantages of this implementation become clear when considering the example
for the generation of a sinusoidal stimulus signal. In this case, only the samples of a quarter of
the sine period are stored in memory and the remainder of the signal and subsequent cycles can
be generated by a combination of playing back these samples forwardly and backwardly and by
inverting the samples sign. In general, any periodic signal can be played back with programmable
sampling frequency by only storing one cycle of such signal. In the case of a non-cyclic signal,
the pre-recorded samples are stored in memory and simply reproduced by the hardware such
as in the case of an OFDM (Orthogonal Frequency Division Multiplexing) stream. Finally, the
generator also has the capability of performing a simple interpolation of the signal to flexibly
resample the stimulus signal and make a more efficient use of the storing space.
B. Configuration and Status Registers
The communication of the CaT-engine with all its peripheral blocks, as well as the analog front-
end and pre-/post-distortion blocks happens in a memory-mapped way through the Configuration
6and Status Registers (CSR). As it was described above, the configuration for the Signal Generator
is provided by means of a register set which is part of the CSR. Other registers inside the CSR
will contain the configuration of the digital distorters, configuration of the digital filtering chains,
and the status of the impairments being monitored during the operation of the radio either in
the factory or in the field.
C. Input buffers
The purpose of the transmission and reception input buffers is to store samples taken from
the data flow at different sampling rates from the filtering chains and decouple such rates from
the CaT-engine frequency. As it will be described later in more detail, these samples will be
used for the estimation of the non-idealities of the radio.
D. Processor core
The fundamental portion of the CaT-engine is a tiny, 7-stage pipeline, 32 bits RISC-processor,
customized in terms of its instruction set, memory structure and data path. Profiling of the
required set of testing and calibration algorithms for advanced, wideband transceivers was used
to identify the type of operations needing acceleration. As shown in Table I, the data path was
optimized to perform complex algebra operations through the dynamic configuration of parallel
multipliers and adders (top of Fig. 3). Furthermore, the core is connected to four additional
memories besides the instruction and data memory, called the Signal Processing Memories
(SPMs). Each pair of SPMs forms a branch and is designed to store a complex sample. In
other words, each branch will hold the real part of complex samples in one SPM and the
imaginary portion in the other one at the same address. With two branches available, the data
path can receive two complex operands each clock cycle. To achieve true single cycle access, the
processor core is equipped with auxiliary programmable address generators (bottom of Fig. 3)
to compute the next operands address while in parallel the current operands are being read from
7the SPMs and processed by the data path.
As a result, the described processor modifications allow to perform in a single cycle a multiply-
accumulate operation over two complex samples; store the result in an internal register and be
ready for the next cycle to execute the same operation for two new operands. Pipelining of the
accelerated instructions can provide up to 16X speed up for complex array operations when
compared against the regular microprocessor’s data path.
In the same way that the execution has been accelerated by the hardware extensions to the
processor, the inclusion of simple instructions to configure and use these features produces
a considerable code size reduction. Profiling of a typical algorithm execution has shown that
a reduction in code size of up to 70% can be obtained. As an example, the algorithm for
frequency-selective Rx IQ imbalance compensation [10] shows an improvement of 67% both
in time execution and code size when comparing these metrics between the core without any
modification/extension and the processor with all the enhancements (Fig. 4).
E. Advanced Encryption Standard (AES) Block
A small, 7k gates, Advanced Encryption Standard (AES) block [11] has been added as an
additional accelerator to the processor core. This accelerator is introduced to enable remote radio
testing and radio calibration. Under this scheme, an operator or OEM can send a new test or
calibration executable to a transceiver in the field. Upon reception the transceiver executes the
program, collects test results and re-transmits these results wirelessly to the operator or OEM.
As such approach is very vulnerable to external attacks, such scheme needs the AES block to
both: 1) decrypt and verify the authenticity of incoming executable tests; and 2) encrypt and
generate a signature for the authentication of the test results before being transmitted back to
the OEM.
8F. 32nm CMOS Implementation
As a proof-of-concept, the discussed CaT-engine has been integrated with a WiFi transceiver
and a Dual Atom Core SoC into silicon built with 32nm CMOS process technology [12] (Fig. 5).
There is no interaction between the Dual Core SoC and the transceiver/CaT-engine combo. The
implementation includes digitally controlled analog knobs in the transceiver, as well as pre-
/post-distorters inside the up/down sampling digital filtering chains (DFEs Digital Front Ends)
to compensate for the imbalances and non-idealities in the transceiver. CaT-engine has been
designed to not intervene or being part of the main signal flow but has complete control of
the transceiver knobs and configuration as well as of the digital compensators. Therefore, even
though there are direct connections to the ADC, DAC and I2C interfaces of the radio; CaT-engine
will work in the background without impeding normal operation of the radio while at the same
time enhancing its performance.
The CaT-engine runs at a clock frequency of 120 MHz, using 0.63mm2 of silicon area,
whereas the footprint of the Digital Front Ends (DFEs) and the memories is of 0.35mm2 and
4.09mm2 respectively. Data, instruction and SPM memory area is intentionally oversized in this
implementation for debugging and experimental purposes (0.87mm2). It can however be tailored
to a specific set of calibrations and test algorithms. The power consumption of the CaT-engine
amounts to 106mW when continuously active at maximum performance; however taking into
account a realistic 2% duty cycle, after which the engine goes into sleep mode (power gated),
the average consumption is of the order of 2mW. Table II compares the implementation with
two state-of-the-art solutions discussed in Section I.
IV. OPERATION PRINCIPLE AND CALIBRATION SEQUENCE
The most common way of operation to execute a calibration or testing algorithm will usually
start with the generation of a stimulus that is injected into the DFE Tx. Such stimulus could be
a generic signal like a standard frame of an OFDM signal or a very specific training sequence
9that was created to stimulate and isolate the imbalance being monitored or compensated. Then
such stimulus will flow through the DFE and the Tx path of the radio up to the antenna where
externally or on-chip the RF signal is folded back into the reception path. During the time
of these events, the CaT-engine captures through its input buffers the required samples from
either the Tx, the Rx or from both digital filtering chains and starts processing them to estimate
the required parameters or metrics. In the case of a calibration algorithm, once it completes
the estimation of the required parameters, the proper configuration is applied to the distorters or
directly to the Analog Front End (AFE) through its I2C interface. For the case of test algorithms,
once the calculations for the desired estimation have been completed the results are stored in
the CSR and could be reported to a digital tester.
In order to be able to decouple the dependencies among the different imbalances or non-
idealities in the Tx and Rx front-end, a carefully designed sequence of algorithms must be
followed. Fig. 6 shows the sequence designed for operation on CaT-engine to calibrate a WiFi
transceiver. First, signal injection into the receiver path is removed while DC offset cancellation
is done. Once the cancellation in the reception path has been accomplished, a loopback config-
uration (Fig. 7a) is activated and used for the cancellation of the DC offset in the transmission
part. At this point, DC offsets are reduced sufficiently to enable proper estimation of the IQ
imbalances in the transmission path. However, accurate on-chip estimation requires operation
without receive path imbalances or an algorithm capable of differentiating between the transmitter
and receiver imbalances. In the proposed implementation this is solved by activating a second
loopback over an envelope detector, as shown in Fig. 7b, providing the required decoupling
between transmitter and receiver imbalances [13]. Compensation of the final transmission path
impairment, Power Amplifier Pre-Distortion, cannot be accomplished at this stage, since it
requires the usage of the reception chain which has not been compensated at this stage. However
through usage of stimuli signals with relatively low power - in order to remain within the linear
region of the Power Amplifier - and the aforementioned loopback configuration of Fig.7a; the
10
remaining calibrations for the reception path can be performed: Second Order Intercept Point
(IP2) cancellation and IQ imbalance compensation. By keeping the same loopback configuration
and with the reception path completely compensated, the power amplifier pre-distortion algorithm
completes the calibration sequence.
In order to verify that compensations have produced the desired performance improvements,
some tests like EVM can be performed. Other parameters like estimation of the transfer function
of the baseband filters could be accomplished with some spectrum sensing algorithm.
It is important to mention that once the chip has initially been calibrated; tracking and periodic
updates for the different compensations can be accomplished thanks to the on-chip, dedicated
capabilities provided by the CaT-engine. Next section discusses the different performance im-
provements obtained with the execution of these calibrations.
V. EXPERIMENTAL RESULTS
The set of tested algorithms covers three major areas: reception path calibration, transmis-
sion path calibration and performance/test algorithms. Table III provides performance profiling
information for all of them and offers a comparison to state-of-the-art solutions.
A. Reception Path Calibration
As described in section IV, cancellation of the DC in the reception path is the first algorithm
being executed for calibrating the transceiver. The DC level estimation is done using an averaging
FIR filter with configurable number of taps. The obtained estimation is translated into the proper
digital word for the transceiver to apply DC correction through a DAC in the mixer. A residual
digital canceller in the DFE takes care of removing residual DC offset in the digital domain.
Fig. 8 shows an example of an OFDM signal at the output of the DFE before and after the
execution of this calibration.
The second reception algorithm is for the calibration of the IQ imbalances. As it is common
in direct conversion architectures, there are two possible sources for these imbalances: the local
11
oscillator leakage and the differences in the baseband filters. The first source will produce
frequency independent imbalances whereas the latter one will generate frequency dependent
ones. Both types of imbalances are jointly tackled using a blind algorithm based on restoring the
circularity property of a complex signal [10]. The compensation is done through the configuration
of the post-distortion coefficients in the digital domain and has been validated for different
modulation schemes as shown in Table IV. Improvements on the EVM went from 6 to 10 dB
and an example for a 64 QAM (Quadrature Amplitude Modulation) constellation is shown in
Fig. 9.
B. Transmission Path Calibration
For the transmission path, two algorithms were implemented and measured in the lab. A
frequency-unselective IQ imbalance compensation algorithm was validated for different modu-
lations with an average improvement on EVM close to 5dB (see Table V). The algorithm makes
usage of a specific training sequence for the synchronization and identification of the imbalances;
and the compensation is accomplished through a configurable digital pre-distorter [13]. As
previously cited, an envelope detector (ED) is used to decouple the transmission path from
the reception one and as it is expected, there are also process variations present on this sensor.
However, the algorithm implemented in conjunction with the training signal provides the means
to make an estimation of the parameters of the ED in its linear region of operation (relatively
low input power signals) independently of the variations from chip to chip. Therefore, process
variability of the ED is irrelevant and compensation of the IQ imbalances can be accomplished.
Fig. 10 shows the Image Rejection Ratio (IRR) measurements for a complex tone before and
after the calibration with an improvement slightly bigger than 16dB.
The algorithm used for the linearization of the Power Amplifier (PA) does not use a specific
training signal but requires knowledge of the injected stimulus. For the frequency-selective com-
pensation [9], a truncated Volterra series-based digital pre-distorter [14] was implemented. First,
12
the complex Volterra kernels are calculated, after that the power terms are multiplied with these
coefficients and stored into an LUT. This LUT will be indexed by the incoming signal according
to its power level and its output contents are then used to perform the multiplication with the
memory terms of the Volterra series and hence complete the pre-distortion [14]. Performance is
demonstrated by means of a relation between the output power and the EVM of the transmitted
signal as illustrated in Fig.11. Even though the design of the PA was very linear, net gains
between 1 and 2 dB were obtained for the same output power. Alternatively, pre-distortion can
also be used to enable operation at increased output power while keeping EVM low.
C. Performance/Test Algorithms
On-die performance monitoring algorithms allow for run-time radio performance assessment,
as well as initial HVM (High Volume Manufacturing) on-chip testing. To this end, three different
algorithms were developed and implemented, being EVM, Simple Spectrum Analyzer (SSA) or
periodogram, and Root Mean Square (RMS).
Comparison of the on-die estimated EVM against the values being calculated by an external
third party VSA (Vector Signal Analyzer) is shown in Fig. 12, differing in less than 1dB. The
major advantage which should be emphasized is the cost of this estimation. For the option with
VSA, an expensive equipment and software license are required to get the measurement, whereas
the CaT-engine option offers free and accurate estimations on-chip. The implementation of the
algorithm follows the basic procedure outlined in IEEE Standard 802.11a-1999 section 17.3.9.6
[15], where frame synchronization, channel estimation, and frequency offset estimation, are
among the different computational loads that were programmed efficiently by taking advantage
of the CaT-engine capabilities.
For the case of the SSA or periodogram measurement [16], Fig. 13 provides an example of
the accuracy of the estimation for a standard WiFi spectrum when compared to a floating point
calculation on a PC equipped with MATLAB. Once again, the estimation is quite accurate and
13
the capability is on-chip. Among the usage options for these tests, baseband filter test could
be accomplished by iteratively injecting complex tones of different frequencies. Also, blocker
detection and estimation could be accomplished by means of the SSA.
VI. CONCLUSIONS
This paper presented the first silicon solution to enable flexible, independent, and on-chip ex-
ecution of compute intensive calibration and testing during High Volume Manufacturing (HVM)
and in the field. Reuse of CaT-engine’s capabilities by all executed test and calibration algo-
rithms and the exclusion of any hardware blocks customized for a specific type of transceiver
differentiate this solution from the state-of-the-art [5]–[8]. The framework has been conceived
as a drop-in IP block, which can easily be ported across radio generations and be shared by
different sub-blocks of deep sub-micron SoCs, resulting in improved Time to Market (TTM).
The versatile solution moreover has margin to support the execution of more advanced compute
intensive test and calibration algorithms of future radio generations.
Results reported in this paper have shown improvements in performance for both the trans-
mission and reception paths comparable to off-chip calibration solutions. For example, as shown
in Table IV and Table V, gain imbalances are practically eliminated whereas the quadrature
error is considerably reduced. All enhancements are achieved without any external intervention
or instrumentation and in a fraction of the time needed in manufacturing testing (e.g. < 100ms
for PA linearization). In conclusion, CaT-engine enables the creation of more robust and portable
radios on SoCs with better tolerance against technology variations, cheaper manufacturing tests
and calibrations, and extension of these capabilities for in-the-field operation.
ACKNOWLEDGMENTS
The authors thank Brando Perez, Luis Cuellar, Rodrigo Jaramillo, Juan Carlos Pen˜a, Mariano
Aguirre, Blanca Gea, Israel Arriaga, Paulino Mendoza, Daniel Bonilla, Martı´n Garcı´a and
14
Maynard Falconer from the Intel SIA Lab, and Men Long, Jon Duster, Chang-Tsung Fu, Yulin
Tan, Krishnamurthy Soumyanath, Hossein Alavi and the Intel RIL Lab.
REFERENCES
[1] K. Lim et al, “A 2x2 MIMO Tri-Band Dual-Mode Direct-Conversion CMOS Transceiver for Worldwide WiMAX/WLAN
Applications” IEEE J. of Solid State Circuits, 46(7), 2011.
[2] R. F. Yazicioglu et al, “A configurable and low-power mixed signal SoC for portable ECG monitoring applications,” VLSI
Circuits, 2011.
[3] N. Kupp, H. Huang, P. Drineas, Y. Makris, “Post-production performance calibration in analog/RF devices,” IEEE
International Test Conference (ITC), 2010.
[4] P. M. Ferreira, H. Petit, J.-F. Naviner, “A new synthesis methodology for reliable RF front-end Design,” IEEE ISCAS,
2011.
[5] I. Elahi, K. Muhammad, P. T. Balsara, “I/Q mismatch compensation using adaptive decorrelation in a low-IF receiver in
90-nm CMOS process,” IEEE J. of Solid State Circuits, 41(2), 2006.
[6] H. Gandhi, W. Abbott, “A digital signal processing solution for PA linearization and RF impairment correction for multi-
standard wireless transceiver systems,” European Microwave Conf., Sep 2010
[7] R. Staszewski et al, “Software Assisted Digital RF Processor (DRP) for Single-Chip GSM Radio in 90 nm CMOS,” IEEE
J. of Solid State Circuits, 45(2), 2010.
[8] A. Tang et al, “A Low-Overhead Self-Healing Embedded System for Ensuring High Yield and Long-Term Sustainability
of 60GHz 4Gb/s Radio-on-a-Chip,” IEEE ISSCC 2012.
[9] A. Zhu et al, “Open-Loop Digital Predistorter for RF Power Amplifiers Using Dynamic Deviation Reduction-Based Volterra
Series,” IEEE Trans. on Microw. Theory and Techn., 56(7), 2008.
[10] L. Anttila, M. Valkama, M. Renfors, “Circularity-Based I/Q Imbalance Compensation in Wideband Direct-Conversion
Receivers,” IEEE T. on Vehicular Technology, 57(4), 2008.
[11] J. Daemen and V. Rijmen, The Design of Rijndael. New York: Springer-Verlag, 2002.
[12] H. Lakdawala et al, “32nm x86 OS-compliant PC on-chip with dual-core Atom processor and RF WiFi transceiver,” IEEE
ISSCC, 2012.
[13] James K. Cavers, “New methods for adaptation of quadrature modulators and demodulators in amplifier linearization
circuits,” IEEE Transactions on Vehicular Technology, 46(3), 1997.
[14] L. Guan, A. Zhu, “Low-Cost FPGA Implementation of Volterra Series-Based Digital Predistorter for RF Power Amplifiers,”
IEEE Trans. on Microw. Theory and Techn., 58(4), Apr 2010.
[15] 802.11a-1999 - Supplement to IEEE Standard for Information Technology - Telecommunications and Information Exchange
Between Systems - Local and Metropolitan Area Networks - Specific Requirements. Part 11: Wireless LAN Medium
15
Access Control (MAC) and Physical Layer (PHY) Specifications: High-Speed Physical Layer in the 5 GHz Band, IEEE
Std 802.11a-1999.
[16] D. Cabric, A. Tkachenko, R. W. Brodersen, “Experimental Study of Spectrum Sensing based on Energy Detection and
Network Cooperation,” Proc. of the first international workshop on TAPAS, 2006
Digital Front End WiFi Transceiver Block 
Tx Analog 
LPF 
ADC 
Configuration 
Rx Analog 
Tx Digital Front End 
Pre-
Distorters 
Filters 
Post-
Distorters 
Filters 
Flexible 
Calibration and 
Test Engine 
120 MHz 
20MS/s 
480MS/s 
I2C 
From  
Base Band 
Data Signal Control External bump for tester loopback 
To  
Base Band 
20MS/s 
JTAG 
To/From 
Tester 
320MS/s 
LPF 
DAC 
Rx Digital Front End 
Fig. 1. High level block diagram of CaT-engine integrated into a WiFi transceiver.
CaT engine 
DRAM 
IRAM 
Input  
Buffer Tx 
Input  
Buffer Rx 
Signal 
Generator 
CSR 
SPM 
AES 
 
 
Micro- 
processor 
Complex 
Datapath 
Control 
Memory 
Address 
generator 
Sign 
Change 
Inter-
polator 
SPM0 
SPM2 
Real 
parts 
Imag 
parts 
SPM1 
SPM3 
Control 
Ptrs to arrays in SPMs 
Complex 
Data Path 
Branch0 
Branch1 
Fig. 2. High level block diagram of Flexible Calibration and Test Engine internal architecture.
16
×
×
×
×
+
+
+
+
+
+
Acc R
Acc I
Weight R
Weight I
-
-
-
-
Array Ptr 1
Array Ptr 2 +
+ Array 1 Step
Array 2 Step
To/From Core’s General Purpose Registers
To/From Core’s General Purpose Registers
To SPM’s 
Address
Bus
From SPM’s 
Read 
Data Bus
To SPM’s 
Write 
Data Bus
Fig. 3. Detailed block diagram of the complex datapath showing the datapath blocks at the top and the SPMs address generation
blocks at the bottom.
1974 
1851 
610 
0
500
1000
1500
2000
2500
a) b) c)
B
yt
e
s 
Code Size 
Read-Only Data Text Section Uninitialized Data
766 760 
249 
0
100
200
300
400
500
600
700
800
900
0.0
0.5
1.0
1.5
2.0
2.5
a) b) c)
C
yc
le
s/
sa
m
p
le
  
M
cy
cl
e
s 
Execution Time 
Total Cycles Cycles per sample
Fig. 4. Improvements in execution time and code size when using the accelerated instructions: a) original non-optimized code
for the Rx IQ Imbalance algorithm, b) after manual optimization of C code, c) after using the accelerated instructions.
17
Fig. 5. Silicon implementation of CaT-engine integrated with a WiFi transceiver and a Dual Atom Core SoC in 32nm CMOS
technology.
Fig. 6. Calibration algorithm sequence.
18
Digital Front End 
 
 
 
 
 
 
 
 
 
 
 
 
 
              WiFi Transceiver Block 
Tx Analog 
LPF 
LPF 
DAC 
ADC 
Configuration 
Rx Analog 
Tx Digital Front End 
Pre-
Distorters 
Filters 
Rx Digital Front End 
Post-
Distorters 
Filters 
Flexible 
Calibration and 
Test Engine 
I2C 
a) 
b) 
Fig. 7. Transceiver loopback configurations: a) RF loopback b) Envelope Detector loopback.
Fig. 8. Measured DC offset cancellation for a standard OFDM WiFi signal
Fig. 9. Measured improvements in the reception path for a 64QAM constellation after executing the Rx IQ Imbalance
compensation.
19
Fig. 10. Measured Image Rejection Ratio improvement for a complex tone signal after applying Tx IQ Imbalance compensation.
-30
-28
-26
-24
-22
-20
6 8 10 12 14
EVM 
[dB] 
Output Power 
[dBm] 
No predistortion
Predistortion
Fig. 11. Measured EVM improvement after Power Amplifier linearization for a 64QAM modulation scheme.
20
-27
-25
-23
-21
-19
-17
-15
-26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15
E
V
M
 M
e
a
su
re
d
 b
y
 C
a
T
-e
n
g
in
e
 [
d
B
] 
EVM Measured by VSA [dB] 
EVM Estimation Performance 
64 QAM 
CaT-engine
VSA
-27
-25
-23
-21
-19
-17
-15
-26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15
E
V
M
 M
e
a
su
re
d
 b
y
 C
a
T
-e
n
g
in
e
 [
d
B
] 
EVM Measured by VSA [dB] 
EVM Estimation Performance 
16 QAM 
CaT-engine
VSA
-27
-25
-23
-21
-19
-17
-15
-26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15
E
V
M
 M
e
a
su
re
d
 b
y
 C
a
T
-e
n
g
in
e
 [
d
B
] 
EVM Measured by VSA [dB] 
EVM Estimation Performance 
QPSK 
CaT-engine
VSA
-27
-25
-23
-21
-19
-17
-15
-26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15
E
V
M
 M
e
a
su
re
d
 b
y
 C
a
T
-e
n
g
in
e
 [
d
B
] 
EVM Measured by VSA [dB] 
EVM Estimation Performance 
BPSK 
CaT-engine
VSA
Fig. 12. Accurate EVM estimation compared to VSA for different modulation schemes and intentionally induced Rx IQ
imbalances.
21
Fig. 13. Accuracy of the SSA algorithm in CaT-engine for a 64 point FFT averaged over 64 iterations when compared to
Matlab spectrum calculation in floating point using FFT for a WiFi signal of 20MHz bandwidth at a sampling rate of 20 MSps
TABLE I
LIST OF COMPLEX TYPE OPERATIONS THAT ARE OPTIMIZED AND ACCELERATED WITH MODIFIED DATAPATH
Operation type Equation
Dot Product x · y =
∑
i
(ai + bij )(ci − dij )
Vector Scaling wx = [(k + zj ) (a0 + b0j ) , . . . , (k + zj ) (an + bnj )]
Vector Addition x+ y = [(a0 + b0j ) + (c0 + d0j ) , . . . , (an + bnj ) + (cn + dnj )]
Squared Vector Norm ‖x‖2 =
∑
i
(
a2i + b
2
i
)
FFT - Radix 2 Butterfly
x0 = (c+ dj ) + [(a+ bj ) (k + zj )]
x1 = (c+ dj )− [(a+ bj ) (k + zj )]
Complex Number Multiplication x× y = (a0 + b0j )× (c0 + d0j )
22
TABLE II
COMPARISON OF IMPLEMENTATION OF PRESENTED WORK AGAINST TWO STATE-OF-THE-ART SOLUTIONS
this work [7] [8]
Technology 32nm CMOS 90nm CMOS not avail.
Supported radio
standards
Flexible (e.g.
WiFi, WiMAX)
GSM
60GHz,
16 QAM
Supported data
BW
< 40MHz < 40kHz not avail.
Core area 0.63mm2 0.36mm2∗ 1mm2
Average power
consumption∗∗
2mW not avail. 20mW
Clock frequency 120MHz not avail. not avail.
Algorithm
coverage
See Table III
*Including memories
**Assuming a 2% duty cycle for the calibration routines
23
TABLE III
ALGORITHM PERFORMANCE PROFILING SUMMARY WITH RESOURCES REUSE AND COMPARISON TO STATE-OF-THE-ART
Covered by
Calibration/Test Type
Execution
time
[ms]
Coded
in C
Signal
Generator
usage
Complex
Data path
usage
SPM
usage
[bytes]
SRAM
usage
[Kbytes] [5] [7] [8] [6]
Rx
DC offset 0.44 Yes No Yes 256 7 No Yes Yes Yes∗
IQ Imbalance
(freq. sel.)
120 Yes No Yes 640 8 Yes No Part.† No
Tx
IQ Imbalance 23 Yes Yes Yes 65536 23 No NA Yes Yes∗
PA linearization
(freq. sel.)
93 Yes Yes Yes 368 9 No No Part.† Yes∗
Meas.
Periodogram 130 Yes Optional Yes 0 13 No Yes Yes No
RMS 0.4 Yes Optional Yes 0 9 No Yes Yes No
EVM 3000 Yes Optional Yes 7056 11 No No No No
*Only off-line test possible due to reuse of baseband. Dedicated logic limits flexibility.
†Only non-frequency selective tests due to limited compute power.
TABLE IV
MEASURED IMPROVEMENTS IN THE RX AFTER IQ IMBALANCE CALIBRATION
EVM [dB]
Quadrature
Error
[degrees]
Gain
Imbalance
[dB]Modulation
Scheme
No
Calib.
Cal.
No
Calib.
Cal.
No
Calib.
Cal.
64QAM -18.8 -28.3 8.3 0.244 0.06 0.01
16QAM -21.1 -31.2 7.4 0.293 1.05 0.06
QPSK -20.7 -27.0 6.8 2.760 1.25 0.08
24
TABLE V
MEASURED IMPROVEMENTS IN THE TX AFTER IQ IMBALANCE CALIBRATION
EVM [dB]
Quadrature
Error
[degrees]
Gain
Imbalance
[dB]Modulation
Scheme
No
Calib.
Cal.
No
Calib.
Cal.
No
Calib.
Cal.
64QAM -29.5 -34.4 2.4 -0.327 -0.36 0.06
16QAM -29.5 -34.2 2.3 -0.296 -0.34 0.02
QPSK -29.4 -34.4 2.4 0.302 -0.35 0.05
