8

# Abdul Rehman Javed, J. Christoph Scheytt

System and Circuit Technology Group, Heinz Nixdorf Institute, University of Paderborn

# Eswara Rao Bammidi, Ingmar Kallfass

Institute of Robust Power Semiconductor Systems, University of Stuttgart

# Karthik Krishnegowda, Rolf Kraemer

System Architectures, IHP – Innovations for High Performance Microelectronics

# CONTENTS

| 8.1 | Mixed-Signal PSSS Receiver Baseband |                                                   |                                             | 233 |
|-----|-------------------------------------|---------------------------------------------------|---------------------------------------------|-----|
|     | 8.1.1                               | Semiconductor Technology                          |                                             |     |
|     | 8.1.2                               | PSSS Receiver Baseband Unit-Slice                 |                                             |     |
|     | 8.1.3                               | Mixed-Signal Programmable Weighted Code Generator |                                             |     |
|     |                                     | Circuit                                           |                                             | 234 |
|     |                                     | 8.1.3.1                                           | Programmable Differential Current Sources . | 236 |
|     |                                     | 8.1.3.2                                           | One-Hot Pulse Generation Circuit (Mux       |     |
|     |                                     |                                                   | Select Signal)                              | 236 |
|     |                                     | 8.1.3.3                                           | High-Speed Differential Current Switches    | 238 |
|     |                                     | 8.1.3.4                                           | 6-to-1 Differential Current Summation       |     |
|     |                                     |                                                   | Transimpedance Stage                        | 238 |
|     |                                     | 8.1.3.5                                           | Differential Transadmittance Stage          | 240 |
|     |                                     | 8.1.3.6                                           | 3-to-1 Differential Current Summation       |     |
|     |                                     |                                                   | Transimpedance Stage                        | 240 |
|     |                                     | 8.1.3.7                                           | Characterization of the Programmable        |     |
|     |                                     |                                                   | Weighted Code Generator Circuit             | 241 |
|     | 8.1.4                               | Broadband Analog Correlator with Fast Reset       |                                             | 241 |
|     |                                     | 8.1.4.1                                           | Four Quadrant Multiplier                    | 242 |
|     |                                     | 8.1.4.2                                           | Broadband Integrator                        | 242 |
|     |                                     | 8.1.4.3                                           | Characterization of the Correlator Circuit  | 245 |
|     | 8.1.5                               | Characterization of the PSSS Receiver Baseband    |                                             |     |
|     |                                     | Unit-Slice Circuit                                |                                             | 246 |
|     |                                     | 8.1.5.1                                           | M-Sequence Correlation Test                 | 248 |
|     |                                     |                                                   |                                             |     |

# Wireless 100 Gbps And Beyond

|     |                                | 8.1.5.2                                   | Test with actual PSSS data with BPSK                    |     |  |  |
|-----|--------------------------------|-------------------------------------------|---------------------------------------------------------|-----|--|--|
|     |                                | 8.1.5.3                                   | modulated data<br>Test with actual PSSS data with PAM-4 | 250 |  |  |
|     |                                |                                           | modulated data                                          | 250 |  |  |
|     | 8.1.6                          | Summary                                   | 7                                                       | 252 |  |  |
| 8.2 | Receiv                         | er Synchro                                | nization                                                | 253 |  |  |
|     | 8.2.1                          | BPSK Costas Loop                          |                                                         |     |  |  |
|     |                                | 8.2.1.1                                   | BPSK Costas Loop - System Level                         |     |  |  |
|     |                                |                                           | Simulations                                             | 257 |  |  |
|     |                                | 8.2.1.2                                   | BPSK Costas loop - Measurements                         | 259 |  |  |
|     | 8.2.2                          | QPSK Costas Loop                          |                                                         |     |  |  |
|     |                                | 8.2.2.1                                   | QPSK Costas loop - System Level                         |     |  |  |
|     |                                |                                           | Simulations                                             | 265 |  |  |
|     |                                | 8.2.2.2                                   | QPSK Costas loop - Measurements                         | 266 |  |  |
|     | 8.2.3                          | Applicati                                 | on to mmW receivers                                     | 267 |  |  |
|     | A low                          | sensitivity                               | VCO                                                     | 268 |  |  |
|     | Sychronization at IF frequency |                                           |                                                         |     |  |  |
|     | 8.2.4                          | IQ recove                                 | ery for PSSS modulated signals                          | 270 |  |  |
|     | 8.2.5                          | Summary                                   | 7                                                       | 272 |  |  |
| 8.3 | Spread                         |                                           |                                                         | 273 |  |  |
|     | 8.3.1                          | Kasami c                                  | odes for transmission on I/Q channels                   | 274 |  |  |
| 8.4 | Measurement Experiments        |                                           |                                                         |     |  |  |
|     | 8.4.1                          |                                           | el for a PSSS-15 Transmitter                            | 277 |  |  |
|     | 8.4.2                          | HiL model for a PSSS-15 Receiver          |                                                         |     |  |  |
|     | 8.4.3                          | Demonstrator Setup                        |                                                         |     |  |  |
|     | 8.4.4                          | Synchronization in HiL experiments        |                                                         |     |  |  |
|     |                                | 8.4.4.1                                   | Synchronization layers of                               |     |  |  |
|     |                                |                                           | coherent/non-coherent systems                           | 281 |  |  |
|     | 8.4.5                          | Performa                                  | nce results                                             | 282 |  |  |
|     |                                | 8.4.5.1                                   | Channel estimation and equalization                     | 282 |  |  |
|     |                                | 8.4.5.2                                   | PSSS modulated data with BPSK                           | 282 |  |  |
|     |                                | 8.4.5.3                                   | PSSS modulated data with PAM-16                         | 284 |  |  |
|     | 8.4.6                          | Kasami codes transmission on I/Q channels |                                                         | 286 |  |  |
|     |                                | 8.4.6.1                                   | I-Q transceiver system with Kasami codes $\dots$        | 286 |  |  |
|     |                                | 8.4.6.2                                   | Measurement setup of the $230\mathrm{GHz}$              |     |  |  |
|     |                                |                                           | communication link with Kasami codes $\ldots$           | 286 |  |  |
|     |                                | 8.4.6.3                                   | Channel estimation with Kasami codes                    | 287 |  |  |
|     |                                | 8.4.6.4                                   | Kasami codes with PAM-16                                | 288 |  |  |
|     | 8.4.7                          | v                                         | 7                                                       | 292 |  |  |
| 8.5 | Conclu                         | usion                                     |                                                         | 293 |  |  |
|     |                                |                                           |                                                         |     |  |  |

# 8.1 Mixed-Signal PSSS Receiver Baseband

The system architecture of a mixed-signal PSSS baseband has been outlined in chapter 7. The mixed-signal baseband has a sliced architecture where each slice represents a unit-architecture which is repeated N number of times where N=15 represents the number of data symbols transmitted in parallel. To investigate the proposed mixed-signal baseband as proposed in chapter 7, a single slice of the receiver baseband was implemented as a test-chip. The circuit design of the receiver baseband unit-slice and its characterization are presented in the next sub-sections. The transmitter baseband circuit design is not discussed in this chapter. The measurement results presented in this chapter make use of an arbitrary waveform generator (AWG) with sufficient effective number of bits (ENOB) to emulate the PSSS transmitter. The input feed to the AWG is generated using Matlab.

# 8.1.1 Semiconductor Technology

For the mixed-signal PSSS receiver baseband unit-slice integrated circuit, a semiconductor technology suitable for high-speed circuit design was required. For this purpose, a Silicon Germanium (SiGe) Bipolar Complementary Metal Oxide Semiconductor (BiCMOS) technology from the Institute of High-Performance Microelectronics (IHP) was chosen with the minimum featuresize of 130 nm. The technology offers NPN Heterojunction-Bipolar Transistors (HBT) with very high transit frequency  $f_T$  of 250 GHz and a maximum frequency of oscillation  $f_{max}^{-1}$  of 340 GHz. It is a self-aligned, single polysilicon technology with 130 nm minimum lithographic emitter width, 5 thin and 2 thick aluminum metallization layers [325]. The value of the breakdown voltage BVCEO<sup>2</sup> 1.7 V. Additionally, it offers CMOS transistors with BVDSS<sup>3</sup> value of 1.2 V for core logic design and 3.3 V for the thick oxide MOS devices used for inputs and outputs (IOs).

# 8.1.2 **PSSS** Receiver Baseband Unit-Slice

The receiver baseband circuit consists of a sliced architecture where each slice corresponds to the circuit required to recover one transmitted symbol from the PSSS waveform. It takes the PSSS sequence as the input and returns a recovered data symbol as the output. Each slice uses an individual decoding sequence which is correlated with the incoming PSSS stream.

The block diagram of the unit-slice circuit is shown in fig. 8.1. One of the most important components is the mixed-signal, weighted code generator cir-

<sup>&</sup>lt;sup>1</sup>Unity unilateral power gain frequency

<sup>&</sup>lt;sup>2</sup>Breakdown voltage between collector and emitter with base terminal open

<sup>&</sup>lt;sup>3</sup>Breakdown voltage between drain and source with gate shorted to source



Block diagram of the mixed-signal PSSS receiver baseband unit-slice. From [318]  $\bigodot 2015\, \mathrm{IEEE}$ 

cuit that generates the weighted decoding sequence chips which are correlated with the incoming PSSS chips. It consists of programmable differential current sources which are provided as inputs to a high-speed broadband analog current switching mux. The mux sequentially routes the current input signals to the output one after the other. The output of the weighted code generator is multiplied with the PSSS stream using a broadband analog multiplier circuit. The output is integrated for a duration of 15  $T_{chip}^{4}$  after which follows a reset period of 3  $T_{chip}$  for the integrator circuit. At the end of the correlation phase, the output of the integrator is sampled using a sample and hold circuit. The reset or integrate command signal for the integrator circuit also acts as the clock signal for the sample and hold circuit. The duty cycle of the reset or integrate command signal (and the sample or hold command signal) is 16.667 % i.e. 1/6 th of the chip rate  $f_{chip} = 1/T_{chip}$ . The output of the sample and hold circuit is converted to a 4-bit digital output using an analog to digital converter (ADC). In the next sub-sections, the circuit design of the differential, broadband, fast-resettable correlator circuit as well as the high-speed, broadband, mixed-signal, weighted code generator are discussed. The direct output of the correlator as well as the sampled output of the correlator were made available as chip outputs and are discussed in the next sub-sections. These outputs suffice for the verification of the concept for a mixed-signal PSSS receiver baseband described in the last chapter. The circuit design and the output of the 4-bit ADC are not discussed in this chapter.

<sup>&</sup>lt;sup>4</sup>Duration of one chip of the coding sequence

# 8.1.3 Mixed-Signal Programmable Weighted Code Generator Circuit

One of the most important and complex components of the unit-slice shown in fig. 8.1 is the differential, broadband, high-speed, mixed-signal weighted code generator. It consists of a differential, broadband, analog multiplexer (mux) with 18 static inputs in the form of differential scalable currents where each input represents a weighted chip of the decoding sequence. The analog mux uses a select signal to route one of the 18 differential current inputs to a pair of output resistors forming a differential voltage output. The architecture of the mixed-signal code generator is shown in fig. 8.2. Going from the bottom to the top of the figure, the weighted code generator consists of the following components:



#### Figure 8.2

Detailed block diagram of the broadband, mixed-signal weighted code generator circuit. From [326] (C) 2020 IEEE

- Programmable differential CMOS current sources whose values are defined by the contents of a CMOS shift register.
- One-hot pulse generation circuit to provide the control signals for the differential current switches. The one-hot pulse generation circuit consists of a data flip flop (DFF) chain with a clock tree to supply synchronous clock to the DFFs, and a feedback network to route the output of the last DFF to the first DFF.
- High-speed differential current switches.
- 6-to-1 current summation transimpedance stage circuit to combine the

currents of 6 inputs. Three copies of this circuit are needed in total to cater for the 18 current input signals.

- Transadmittance stage to convert the voltage output of the transimpedance stage back to a current signal to perform wired-OR operation with 2 more similar transadmittance stages.
- A final transimpedance stage to combine the current outputs of 3 of the above transadmittance stages to generate a differential output voltage.

#### 8.1.3.1 Programmable Differential Current Sources

The inputs to the weighted code generator are static, differential, scalable current signals which remain constant after their values are set using static 4-bit CMOS registers. A total of 72 bits are required to program the 18 current sources. The shift register consists of CMOS DFFs connected in series. The input to the first DFF and the clock input to all the DFFs were made externally accessible to allow a user-defined pattern to be stored into the shift register. 72 clock cycles are required to fill the shift register by shifting forward the 72 input data bits of the external input signal.

The 4-bit digital weight value provided by the static CMOS shift register is applied to an R-2R DAC to convert it into an analog voltage. Two copies of the R-2R DACs are required to generate one differential rail-to-rail output voltage by providing the inverted version of the digital inputs to one of the DACs.

To be able to allow the wired-OR operation to combine the current signals, the differential voltage is converted to a differential current signal using a linear HBT differential amplifier with active PMOS loads. A pair of PMOS common source amplifiers with NMOS active loads are used to mirror the current to an NMOS transistor pair which act as the differential current source for the differential current switches explained later in section 8.1.3.3.

#### 8.1.3.2 One-Hot Pulse Generation Circuit (Mux Select Signal)

The starting point of the analog current switching mux design is the generation of the select signal. This is achieved by using a chain of DFFs which are initialized with a one-hot code. The subsequent clock pulses shift the one-hot pulse forward to the next DFF until it reaches the last DFF after which it is fed-back to the first DFF using a feedback circuit. The feed-back pulse must be sampled by the first DFF at the sampling (falling) edge of the clock. If the output of the last DFF is fed back to the first DFF, it arrives later than the sampling edge of the clock. The pulse is missed and cannot be propagated. The output of the 18th DFF needs to travel at least the sum of the lengths of 18 DFFs together or 1008  $\mu$ m.

The clock rate of the DFF clock signal input is 30 GHz, so the DFF circuit is designed based on the current mode logic (CML) topology. The DFF is

equipped with a set or reset functionality to enable the application of an initial condition. The clock signal to the DFFs is gated to allow the use of the set or reset input of the DFFs to provide the desired initial conditions for the DFFs. Note that the external clock-enable signal is first applied to the data input of a D latch clocked with the inverted system clock signal. This makes sure that the internal clock-enable signal changes to logic 1 only during the negative edge of the clock and is stable before the rising edge of the next clock cycle. This ensures that the output of the AND gate has no glitches or half pulses.



#### Figure 8.3

Block diagram of the one-hot pulse generator circuit. From [326] (C) 2020 IEEE

The DFF output pulses are broadband signals with a width of 1/30 GHz or 33.33 ps. A good approximation of the rise/ fall time of the 33.33 ps wide pulse is 7.5 ps. This allows for a duration of 18.33 ps for the top flat part of the pulse. Using a first-order RC filter approximation for the tracks used to connect the DFF output to the next DFF input, the bandwidth of the signal can be calculated from the rise/ fall time of the signal using the following rule of thumb:  $BW = 0.35/t_{rise}$  where  $t_{rise}$  is the 10% – 90% rise or fall time of the pulse. The required bandwidth of the DFF pulse is 46.667 GHz. The wavelength of such a signal on chip (with  $SiO_2$  as the dielectric,  $\epsilon_r$ =3.9-4.1) can be calculated to be 3171  $\mu m$ .

The feedback path length of approximately 1 mm is roughly equal to one third of the signal wavelength at 30 GHz i.e. 3.17 mm which makes it necessary to consider wave properties of the signal. A driver cell driving a matched load on either end of a transmission line is used to transmit the broadband pulse from the last DFF to the first DFF. The propagation delay of the driver cell added with the propagation delay of the long feedback path makes the total delay so large that the feedback pulse is not sampled by the sampling clock edge of the first DFF. Thus, instead of the 18th DFF, the output of the 17th or 16th DFF is used to generate the input pulse for the first DFF. Additionally, a selectable delay cell is used to add a delay (selectable in 3 steps with each step adding a delay equal to one-third of the pulse width at 30 GHz) to the selected pulse (16th or 17th). The circuit design of the adjustable feedback delay circuit is not discussed in this chapter.



#### Figure 8.4

Active clock tree with 18 output nodes for the distribution of the differential clock signal for the 18 DFF chain.

A clock tree is required to supply the gated clock to the 18 DFFs synchronously. The clock tree should be designed in a way so as to provide the same total delay at each output node. A binary tree cannot be used because the number of output nodes is non-binary. Among the three branching options for the non-binary tree i.e.  $2 \times 3 \times 3$ ,  $3 \times 2 \times 3$ , and  $3 \times 3 \times 2$ , the clock tree with the branching ratio  $2 \times 3 \times 3$  has the minimum difference in path lengths between the different nodes.

The clock tree branches cause significant capacitive loading. Repeater circuits are used at each node of the clock tree which consist of HBT differential amplifiers with resistive and capacitive peaking. A pair of emitter followers are used as drivers to drive the next branch of the clock tree. Electromagnetic simulation of the clock tree segment tracks is used to find the values of the capacitors and resistors for the capacitive and resistive peaking respectively to ensure an amplitude of approximately 400  $mV_{diff}$  for the clock signal at each branching node of the clock tree.

# 8.1.3.3 High-Speed Differential Current Switches

The output of the one-hot pulse generation circuit serves as the selection input for the analog current switching mux. The outputs of the DFFs are applied to EFs that drive the high-speed differential current switches. The current switching quad consists of two differential pairs as shown in fig. 8.5. The input to the switching quad is the differential current input signal described in section 8.1.3.1. The differential current signal can either be routed to the open collector output node (when the DFF output is high) or be routed to a dummy load (when the DFF output is low). The differential pairs in the switching quad are made unbalanced by adding degeneration resistors of 15  $\Omega$ to only one side of the differential pair. This allows a smooth transition of the differential current output of the analog mux as the one-hot selection pulse switches to the next DFF.



Schematic of the high-speed differential current switch circuit. From [326]  $(\widehat{C})$  2020 IEEE

# 8.1.3.4 6-to-1 Differential Current Summation Transimpedance Stage

The differential current outputs from the 18 current selection cells have to be combined (wired-OR) to form the analog mux current output signal. A transimpedance stage can then be used to convert the summed-up (wired-OR) current signal to a voltage equivalent. The simplest method to achieve both these goals is to connect all the left-hand side (or p-side) open-collector outputs to one resistor and all the right-hand side (or n-side) open-collector outputs to another resistor. There are two problems with this approach. Firstly, combining the 18 open-collector outputs to one resistor increases the RC time constant at the output node significantly. Secondly, the current pulses are broadband signals with a width of  $1/30 \,\mathrm{GHz}$  or  $33.33 \,\mathrm{ps.}$  A good approximation of the rise/ fall time of the 33.33 ps wide pulse is 7.5 ps similar to the DFF output pulses. This allows for a duration of 18.33 ps for the flat part of the pulse. Using a first-order RC filter approximation for the tracks which are used to connect the open-collector outputs together to the resistors, the bandwidth of the signal can be calculated from the rise/ fall time of the signal using the rule of thumb given by  $BW = 0.35/t_{rise}$  where  $t_{rise}$  is the 10% - 90% rise or fall time of the pulse. According to this rule, the required bandwidth of the broadband multivalued current pulse is 46.667 GHz. The wavelength of such a signal on-chip (with  $SiO_2$  as the dielectric) can be calculated to be 3171  $\mu m$ . If all 18 open-collector outputs are connected together at the geometric mid-point, then the minimum length of the longest track that is connecting the leftmost i.e. 1st and the rightmost i.e. 18th output to the midpoint has been measured in the layout to be larger than approximately 1000  $\mu m$  which is 31.7% of the wavelength of the signal. Thus, wave properties of the sig-

nal have to be considered which requires the use of transmission lines with matched loads to avoid reflections.

The solution to the first problem is to use a common base stage between the node connecting the open collector wired-OR'ed outputs and the output resistor. This reduces the Miller capacitance between the collector and base terminals of the differential current switches. The solution to the second problem is to connect 6 open collector outputs with a wired-OR connection rather than connecting all 18 of them together. This reduces the difference in track lengths between the farthest output nodes  $(1^{st} \text{ and } 6^{th} \text{ or } 7^{th} \text{ and } 12^{th} \text{ or} 13^{th} \text{ and } 18^{th})$  from 1000  $\mu m$  to 333.33  $\mu m$  and the difference in the maximum skew value from 6.75 ps to 2.25 ps. Three copies of the 6-to-1 current summation transimpedance stages can be seen in fig. 8.2 consisting of pairs of common base stages where each stage is used to combine 6 input currents.

#### 8.1.3.5 Differential Transadmittance Stage

The voltage outputs of the transimpedance stages need to be converted back into currents to combine them using a final wired-OR operation. A linear differential amplifier with open collector outputs is used as a differential transadmittance stage. Emitter follower circuits are used as buffers at the input. Three copies of this circuit are needed to combine the 18 differential current output signals. Fig. 8.6 shows the open collector differential amplifier transadmittange stage.



#### Figure 8.6

Schematic of the transadmittance stage circuit. From [326] (C) 2020 IEEE

# 8.1.3.6 3-to-1 Differential Current Summation Transimpedance Stage

The 3 current outputs from the 3 (open-collector) transadmittance stages are wired-OR to generate the final output of the weighted code generator circuit. This wired-OR (open-collector) connection can be directly connected to a pair of load resistors. However, a common base stage is used for Miller's

compensation. Note that the DC bias current of all 3 transadmittance stages flows through the output load resistors which lowers the DC voltage at the collector terminals of the common base stage. To ensure that the collector DC bias voltages of the common base stage are enough to keep the transistors in saturation region, the load resistors are connected to a supply voltage of 0.3 V instead of 0 V. This is achieved by using the supply voltage of 1.2 V and using a diode to drop down the voltage to 0.3 V. The differential output voltage of the weighted code generator can be read at the two load resistors.



#### Figure 8.7

Schematic of the 3-to-1 differential current summation transimpedance stage circuit. From [326]  $\bigodot 2020\,\mathrm{IEEE}$ 

# 8.1.3.7 Characterization of the Programmable Weighted Code Generator Circuit

The direct output of the programmable weighted code generator was not available as an output. The only way to characterize the code generator circuit was to set the PSSS input to a constant DC level and to observe the output of the correlator. If the PSSS input consists of all 1's i.e. (a constant DC level) the output of the multiplier is simply the output of the code generator circuit itself. The code generator could function reliably up to a clock frequency of 21 GHz. The desired output of the pattern generator is a bipolar m-sequence of length 15 with 3 additional chips at the end for the guard interval during the reset phase which can be filled with zeros i.e. [1,1,1,1,-1,-1,-1,-1,1,1,-1,-1,-1,0,0,0]. The details about the measurement setup and the details about the PCB design etc. follow later in section 8.1.5 which contains the discussion about the overall PSSS receiver baseband circuit. The measured output of the weighted code generator circuit is shown in fig. 8.8. The results show the very good linearity, wide bandwidth and the fast reset operation of the circuit.



Correlation of the weighted code generator output with a constant DC-level for the PSSS input. From [326] (C) 2020 IEEE

# 8.1.4 Broadband Analog Correlator with Fast Reset

The most important component of the PSSS receiver baseband is the correlator circuit. The programmable weighted code generator provides the weighted decoding vector which is correlated with the incoming PSSS stream. To perform the correlation, a broadband analog correlator is used that multiplies the chips of the PSSS stream with the output of the on-chip weighted code generator. The process of correlation consists of multiplication and integration and is performed using an analog integrate and dump correlator with a reset mechanism to reset the integrator after completion of the correlation.

#### 8.1.4.1 Four Quadrant Multiplier

A four-quadrant multiplier circuit based on the Gilbert cell is used as the analog multiplier. The PSSS input of the multiplier (In1) has a linear dynamic range larger than 1  $V_{diff}$  whereas the code generator input has a linear dynamic range of roughly 250  $mV_{diff}$ . The 1  $V_{diff}$  dynamic range for PSSS is based on the maximum output dynamic range of the M8195 arbitrary waveform generator (AWG) from Keysight which was the used to emulate the PSSS baseband transmitter for the characterization of the PSSS receiver baseband unit-slice test chip. The 250  $mV_{diff}$  dynamic range of the mux input corresponds to the maximum output linear dynamic range of the mux. The schematic diagram of the four quadrant multiplier is shown in fig. 8.9.

# 8.1.4.2 Broadband Integrator

For the realization of the broadband integrator, a Gm-C cell is used. The transconductance Gm cell is realized using an HBT differential amplifier which has a pair of resistors as the passive load. A negative resistance generator



# Figure 8.9 Schematic of the four quadrant multiplier circuit.

circuit realized as a cross-coupled HBT differential pair is connected in parallel to the resistive load as shown in the dashed box in fig. 8.10. The integrating capacitor is connected between the two output nodes of the differential Gmcell. The capacitor is realized as a pair of cross-coupled metal-insulator-metal (MIM) type capacitors connected in parallel. The circuit is realized using only NPN transistors because no PNP transistors were available in the chosen fabrication technology. A fast reset circuit and a manual offset correction circuit are also included. The integrator is not only required to be linear but it must also allow fast resetting before the start of the next integration cycle.

One of the most important design parameters for the integrator circuit is the dynamic range both at the input and the output of the integrator. The input dynamic range of the Gm-C integrator depends on the linearity of the Gm cell. The Gm cell being an HBT differential pair in the current design can be linearized with resistive degeneration. The output dynamic range of the integrator circuit depends on the RC time constant of the load. There are, however, two conflicting requirements for the RC time constant in this circuit. During the integration phase, the RC time constant must be very large to allow linear operation whereas during reset operation it must be small enough to allow a swift discharge. The required change in RC time constant during the two phases is obtained by changing the resistance R of the RC load while keeping the capacitance constant during the integration and reset phases.

To obtain the required very large RC time constant during the integration phase, the magnitude of the negative resistance is designed to be equal in magnitude to the positive load resistance  $(R_{C1}+R_{C2})$  which results in a very large equivalent resistance in parallel to the integrating capacitor during the inte-



Figure 8.10 Integrator core with fast reset and manual offset correction circuits. From [327] (C) 2016 IEEE

gration phase. Ignoring the output resistance  $r_o$  and the input resistance  $r_{\pi}$  of the common-emitter stage, the negative resistance of the cross-coupled differential pair  $(Q_5, Q_6)$  with emitter degeneration  $(R_E)$  is given by  $-R_E - 2/g_m$ where  $g_{m5} = g_{m6} = g_m = V_T/I_C$  is the transconductance of the transistors  $Q_5$  and  $Q_6$ .  $V_T$  is the thermal voltage defined as kT/q where k is the Boltzmann's constant which has the value  $k = 1.381 \times 10^{-23} Joules/Kelvin$ , T is the temperature in Kelvin and q is the elementary charge i.e. the charge on a single proton or the magnitude of the charge on an electron which has the value  $q = 1.602 \times 10^{-19} Coulombs$ .  $I_{C1} = I_{C2} = I_C = I_{neg/2}$  which can be controlled externally can be used to adjust the  $-2/g_m$  part of the negative resistance to tune the negative resistance equal in magnitude to that of the collector resistance  $(R_{C1} + R_{C2})$  of the main differential pair.

The maximum output voltage of the integrator is limited by the base-tocollector voltage of the cross-coupled pair transistors  $(Q_5, Q_6)$ ; because by increasing it to a large value, one of the two transistors goes into saturation. On the other hand, the negative resistance generated by the cross-coupled pair is a function of the differential voltage across its terminals thereby changing the effective RC constant of the integrator for large output amplitudes. Note that during the correlation process, the output of the integrator can go up to  $log_2(N + 1)$  times the final correlation result where N=length of the msequence. For the chosen case of N = 15, the instantaneous output of the integrator during the correlation can go up to 4 times the final output of the

correlator. The final correlator output is, therefore, limited to  $\pm 120mV_{diff}$  to restrict the maximum instantaneous output voltage amplitude to  $\pm 480mV_{diff}$ . The  $\pm 120mV_{diff}$  output corresponds to the maximum amplitude symbol i.e. ( $\pm 15$ ) of the PAM-16 input data symbol set. The minimum amplitude symbol i.e. ( $\pm 1$ ) of the PAM-16 input data symbol set corresponds to the final output voltage amplitude of  $\pm 8mV_{diff}$ .

For the reset operation, a current switch is used that connects both terminals of the integrating capacitor (Cinteg) to a low impedance discharge path  $(Q_7 - Q_9)$ . When the current is steered to transistor  $Q_9$  by the reset signal, the transistors  $Q_7, Q_8$  operate as diodes forcing the voltages at the nodes  $Integ_p$ and  $Integ_n$  to become equal and hence  $C_{integ}$  is discharged. Note that the common-mode voltage of the nodes  $Integ_p$  and  $Integ_n$  remains constant during the reset and integration phases. The transition between the integration and reset phases is governed by the differential pair formed by  $Q_9, Q_10$ with inputs *intgrt* and *reset*. The *intgrt* or *reset* command signal keeps the correlator in the integrate mode for the first 15 chips and switches to the reset mode during the last 3 chips of an 18 chip long correlate and dump cycle. The inputs *intgrt* and *reset* is a differential signal formed by taking the logical OR of the  $16^{th}$  and  $17^{th}$  DFF outputs with (a delayed version of) the  $15^{th}$  DFF output from the one-hot pulse generation circuit described in section 8.1.3.2. Note that the *intgrt* and *reset* differential signal has a long path length in the layout and therefore, the  $18^{th}$  DFF output cannot be used.

A manual offset correction circuit is added to correct any post-fabrication offset voltages in the integrator circuit  $(Q_{13}, Q_{14}, I_{offset})$ . The circuit allows control of magnitude and sign of the offset correction. Emitter follower stages are added as buffers at both input  $(Q_1, Q_2)$  and output  $(Q_{11}, Q_{12})$  of the integrator core. The current sources  $I_{neg}/2$  are implemented as current mirrors and their value can be adjusted externally to match the value of the negative resistance to that of the collector resistance  $(R_{C1} + R_{C2})$ .

#### 8.1.4.3 Characterization of the Correlator Circuit

The broadband analog correlator circuit with the fast reset is the most important component of the PSSS mixed-signal receiver baseband circuit. The input and output dynamic range of the correlator and the possibility of the fast reset operation is the key for the reliable working of the overall baseband circuit. Therefore, a separate testchip was designed to characterize the performance of the correlator circuit. The block diagram and microphotograph of the correlator test chip is shown in fig. 8.11. The chip is glued to a cavity inside a high-speed PCB substrate. A ground ring was created around the cavity to provide a low inductive path for the ground connection. More details about the PCB are provided later in section 8.1.5.

The external inputs to the multiplier circuit as well as the reset or integrate command signal is generated using a bit pattern generator with 4 sub-rate differntial outputs up to 32 Gbps and 2 full-rate (multiplexed) outputs up to



Block diagram and microphotograph of the correlator test chip. From [327] C 2016 IEEE

56 Gbps. The direct output of the resettable correlator was measured with a wide-bandwidth (70 GHz) digital sampling oscilloscope from Keysight.

# 8.1.5 Characterization of the PSSS Receiver Baseband Unit-Slice Circuit

To characterize the performance of the PSSS receiver baseband unit-slice test chip, that contains both the broadband correlator and the weighted code generator presented before, a high-speed PCB was designed. For the PCB, a  $127\mu m$  thick high-frequency Isola Astra substrate is used with a 1 mm thick Copper plate at the bottom for heat dissipation. A cavity is created in the center of the PCB which is shorted to both the RF ground plane on the



Measured correlator output signal with 28 Gbps input signals.

bottom layer of the PCB as well as to the copper plate. The bottom plane (RF-ground) of the PCB is at 0 V, whereas the chip-substrate is at -4 V. The chip is, therefore, glued to the PCB using an electrically conductive but electrically insulative glue. The depth of the cavity is such that the top of the glued chip is at approximately the same height as the PCB itself. The PCB surface is treated with an electroless nickel immersion gold (ENIG) coating to allow wedge-wedge type wire-boding with Aluminum bondwires with a diameter of  $25\mu m$ . 2.92 mm prototype connectors are used to connect the PCB to external measurement devices. The DC and control signals are supplied using a separate board whereas the CMOS input signals e.g. the contents of the decoding vector etc. are supplied using an FPGA board. The chip uses the following power supply voltages: 1.2 V for digital and some analog mixed-signal connections, and -4 V.

The measurement setup for the characterization of the receiver baseband unit slice test chip is shown in fig. 8.14. The two main input signals required for the characterization of the receiver baseband test chip are the PSSS input as well as the clock input signals. Both the signals need to be synchronous, so they are generated using a single arbitrary waveform generator (AWG) i.e. the Keysight M8194 AWG with 2 output channels and 2 marker channels



PCB and chip microphotographs for the PSSS receiver baseband test chip. From [326]  $\bigodot 2020\,\mathrm{IEEE}$ 

with a sampling rate of 120 GSa/s and the analog bandwidth of 50 GHz. The AWG has a vertical resolution of 8 bits with an ENOB of 4.7 at 15 GHz and 5.5 at 30 GHz [328]. Apart from the PSSS and clock input signals, a synchronous clock enable signal was also required. This signal was generated using the data marker output from the AWG. A wide bandwidth sampling oscilloscope (Keysight 86100D DCA-X) was used to observe the direct output of the correlator and a real-time oscilloscope (Keysight UXR0402) was used to observe the sampled output of the correlator (after the sample and hold circuit) and to plot eye diagrams.



# Figure 8.14

Measurement setup for characterization of the baseband RX unit-slice test chip. From [326]  $\bigodot 2020\,\rm IEEE$ 

#### 8.1.5.1 M-Sequence Correlation Test

To characterize the correlator output, a unipolar m-sequence is applied from the PSSS input i.e.  $[1\ 1\ 1\ 1\ 0\ 0\ 1\ 1\ 0\ 1\ 1\ 0\ 1\ 0]$  followed by [0,0,0] during the guard interval whereas the weighted code generator is configured to generate a bipolar m-sequence i.e. -1's instead of 0's. When the PSSS input and the weighted code generator output are perfectly aligned, the output of the multiplier is equal to  $[1\ 1\ 1\ 1\ 0\ 0\ 0\ 1\ 0\ 0\ 1\ 1\ 0\ 1\ 0,0\ 0\ 0]$ . The output of the correlator corresponding to this case can be seen in the fig. 8.15.



#### Figure 8.15

Measured correlation result signal showing correlation of the bipolar msequence generated by the on-chip code generator with a unipolar m-sequence as the PSSS input.

The results show very good linearity and the fast operation of the correlator. Importantly, the correlator output remains flat when the inputs are 0's which shows that there is negligible offset voltage at the correlator output to cause a drift in the output with 0's at the input.

If bipolar m-sequences are used as the PSSS input signal instead of the unipolar m-sequences, the output is a ramp signal for the case when there is no time delay between the PSSS input and the code generator output. For any non-zero delay between the two signals, the correlator output is a small value. If the delay is a non-zero integer multiple of  $T_{chip}$ , the output is equal to -1/N where N represents the code length. The correlator output corresponding to this case is shown in fig. 8.16. The numbers in the figure represent the delay between the two signals in integer multiples of  $T_{chip}$ .

It can be seen that only for the case of no delay i.e. 0  $T_{chip}$  delay the correlator output is a ramp signal with a large amplitude at the end of the correlation cycle. For any other non-zero delay, the output at the end of the correlation cycle is a small amplitude. The output of the correlator at the end of the correlation phase is sampled using the sample and hold circuit. The



Correlator output corresponding to the correlation of the bipolar m-sequence generated by the on-chip code generator with unipolar m-sequence as the PSSS input for a data rate of 20 Gbps. From [326]  $\bigcirc$  2020 IEEE

corresponding output of the sample and hold (S/H) circuit is shown in fig. 8.17.

The output of the S/H circuit clearly shows that the correlator output is maximized when the external PSSS signal and the on-chip code generator output are perfectly alighted with the on-chip integrator or reset command signal. For any sub- $T_{chip}$  delay between the external PSSS signal and the codegenerator circuit, the output of the correlator and the S/H circuit is smaller than the output for the case of fully aligned inputs. The delay can be adjusted with the help of an on-chip voltage controlled delay line. The output of the S/H circuit can be used to figure out the optimum alignment between the external PSSS input and the on-chip code generator output.

# 8.1.5.2 Test with actual PSSS data with BPSK modulated data

Following the functional verification of the PSSS receiver baseband unit-slice circuit using m-sequences as PSSS input, the circuit performance with actual PSSS data was tested. If the PSSS data is generated by using BPSK modulated data and the on-chip code generator is configured to generate a bipolar m-sequence, then the sampled value of the correlator output at the S/H output corresponds to the received BPSK symbol. Note that for BPSK data, the PSSS amplitude set has the following values  $(0, \pm 1, \pm 2, \pm 3, \pm 4)$ . An eye diagram for the said case can be seen in fig. 8.18. The PSSS receiver baseband unit-slice circuit works very well with BPSK modulated PSSS data up to 20 Gbps.



# Figure 8.17

Sample and hold output corresponding to the correlation of the bipolar m-sequence generated by the on-chip code generator with unipolar m-sequence as the PSSS input for the data rate of 20 Gbps. From [326] O 2020 IEEE



# Figure 8.18

Output eye-diagram of the PSSS receiver baseband unit-slice for BPSK modulated data with the data rate of 20 Gbps.

# 8.1.5.3 Test with actual PSSS data with PAM-4 modulated data

If the PSSS data is generated by using PAM-4 modulated data and the onchip code generator is configured to generate a bipolar m-sequence, then the

sampled value of the correlator output at the S/H output corresponds to the received PAM-4 symbol. Note that for PAM-4 data, the PSSS amplitude set has the following values  $(0, \pm 1, \pm 2, \ldots, \pm 12)$ . An eye diagram for the said case can be seen in fig. 8.19. Note that the Keysight M8194 AWG has an ENOB of 4.7 at 15 GHz and 5.5 at 30 GHz. For PAM-8, the PSSS amplitude set has the following values  $(0, \pm 1, \pm 2, \ldots, \pm 28)$  which can not be generated with the current AWG owing to the limited ENOB. The PSSS receiver baseband unit-slice circuit works very well with PAM-4 modulated PSSS data up to 20 Gbps.



#### Figure 8.19

Output eye-diagram of the PSSS receiver baseband unit-slice for PAM-4 modulated data with the data rate of 20 Gbps.

The above results verify the proposed mixed-signal PSSS receiver baseband architecture. The complete PSSS receiver would consist of 15 such unit-slice circuits to generate  $15 \times 2 \times 20 \times 10^9/18 = 33.33$  Gbps net data rate. The use of the I/Q data further increases the data rate by a factor of 2 to 66.66 Gbps.

#### 8.1.6 Summary

The sliced architecture of a mixed-signal PSSS receiver baseband circuit for 100 Gbps wireless communication was presented in chapter 7. In this section, the circuit design of a single receiver baseband unit-slice is described in detail. The most important circuit component of the receiver baseband slice is the ultra broadband, fast resettable, NPN-only correlator circuit designed in 130 nm SiGe BiCMOS technology. The measured results of the correlator circuit exhibit the worldwide highest reported bandwidth and the fastest reset operation with excellent linearity. The second most important component of the receiver baseband slice is the mixed-signal weighted code generator circuit which is used to generate the weighted decoding sequence for correlation with the incoming PSSS stream. Using an arbitrary waveform generator as the source for the PSSS input, the receiver baseband circuit was tested with PAM-4 modulated PSSS data up to a maximum clock frequency of 20 GHz (beyond which the output was not stable). The circuit showed excellent linearity and high speed performance. PSSS with higher modulation orders e.g. PAM-8 could not be tested owing to the limited effective number of bits specification of the AWG.

# 8.2 Receiver Synchronization

In order to down-convert the digitally modulated signals at high carrier frequencies, a carrier recovery system is required as the system becomes very sensitive to little phase differences between the transmitter and the receiver. Since the type of modulation used in the present work is of the suppressed carrier type, where the carrier information is not available directly, a receiver synchronization system which can extract information from the base-band signals can be used [329].

Different carrier synchronization schemes have been developed, either in analog architecture as presented in [320, 330, 331] or in digital baseband where the data rates are very low [332]. Since the operation on RF or IF signals becomes even more difficult in case of increasing carrier frequencies, Costas loop systems are considered to develop the synchronization system in this work [301]. The motive behind the use of the Costas loop is to be able to extract the carrier information from the down–converted baseband signals, which allows to maintain the full bandwidth supported by the RF channel, as opposed to the alternative of synchronization at IF in super-heterodyne receiver architectures. However, the architecture of the Costas loop synchronization is modulation dependent. Costas loops employ a modulation-type specific circuit to extract an error signal from the baseband I and Q signals. The error signal is fed to the voltage-controlled oscillator via a loop filter in a phase-locked loop architecture.

In this section, the design of a differential BPSK Costas loop and a fullyintegrated QPSK Costas loop are discussed. The design of all the individual circuit blocks required to develop the final system are discussed and the measurements of the complete system is presented. This section concludes with the discussion on the possibility of using Costas loop type sychronization for mmW carrier fequencies.

# 8.2.1 BPSK Costas Loop

The BPSK Costas loop can be considered as an extended form of a phaselocked loop (PLL). Fig. 8.20 shows the block diagram of the conventional BPSK Costas loop. It consists of two branches, the I-arm and the Q-arm. These are connected directly at the output of the IQ down-converter. The I and Q-signals are divided using power dividers. After the power division, the signals are fed into the Costas loop in one direction and the signal goes to the baseband stages or measurement device in other direction. Inside the Costas loop, the error detector detects the error value which is resultant of the phase error between the LO signal of the transmitter (TX) and the receiver (RX). This error signal, after passing through the loop controller (LC), is used to controls the voltage-controlled oscillator (VCO) in order to corrects its phase and frequency.



#### Figure 8.20 Block diagram of the BPSK Costas loop. From [324] (C) 2019 IEEE

The active power divider (APD), which must be broadband enough to support the entire signal bandwidth without distorting the signal, was designed using a differential traveling wave amplifier (TWA) topology [324]. The schematic of the unit cell is shown in Fig. 8.21(a) and the block diagram of



#### Figure 8.21

Circuit schematics of the Active Power Divider (APD). From [324]



# Figure 8.22



the designed active power divider is shown in Fig. 8.21(b). A differential cascode amplifier has been used as an unit gain cell in order to reduce the Miller effect and widen the bandwidth. An emitter follower (EF) stage, formed with transistors Q1 and Q2, is added at the input of the cascode stage. The input capacitance of Q3 and Q4 seen by the EF stage at its output, due to the impedance transformation characteristics, is transformed to a negative resistance at the EF's input. This helps in compensating for the losses in the input transmission lines at higher frequencies and thus improving the bandwidth. The input and the output transmission lines are terminated with  $R_{t,b}$  and  $R_{t,c}$ of differential 100  $\Omega$  respectively.

As shown in Fig. 8.22(a), the average small signal gain per branch,  $S_{21}$  and  $S_{31}$  is around 10 dB with a gain variation of  $\pm 1 dB$  for the whole 3 dB

bandwidth of 42 GHz in both the directions towards the outputs with a good matching of  $S_{21}$  and  $S_{31}$ . The group delays (GD<sub>1-2</sub>, GD<sub>1-3</sub>) are around 20 ps with a ripple of  $\pm 10$  ps which is more than  $\pm 10$ % of the bit duration for 30 Gbps resulting in data-dependent jitter. The Fig. 8.22(b) shows the measured P<sub>-1dB</sub> point of the APD at 25 GHz input signal. The power of the input signal is swept from -18 dBm to 10 dBm. The input-related P<sub>-1dB</sub> of the APD is 2 dBm corresponding to an output-related P<sub>-1dB</sub> of 9 dBm.



# Figure 8.23

Error detector circuit for the BPSK Costas loop. From [324] (C) 2019 IEEE

The error detector (ED) of the BPSK Costas loop, as shown in Fig. 8.23, is designed as a four quadrant multiplier which is based on the well known Gilbert cell multiplier [329]. A pre-distortion stage is added to the input of the multiplier in order to improve the linearity and to achieve symmetry between the two inputs. The multiplier is a fully differential circuit, with broadband  $50\,\Omega$  input matching. The pre-conditioning is performed on the input signals in order to compensate for the hyperbolic tangent transfer characteristics of the Gilbert cell based multiplier. The circuit consists of three main blocks viz., the input pre-distortion stages (P1 and P2), the multiplier core and the output stage. The multiplier core is formed by the transistor (Q4) switching quad. An emitter follower stage is used at the output and is omitted for simplicity. This stage acts as a buffer and is simultaneously used to control the DC offset at the output. The symmetry is maintained from both sides of the inputs. The emitter followers formed by Q5, are used at the inputs in order to achieve large bandwidths. The 50  $\Omega$  resistors at the inputs helps in matching to 50  $\Omega$ source impedance. A constant tail current source Q4 is used to achieve a good common-mode rejection ratio (CMRR) for the whole operating bandwidth.

The S-curves, showing the ED output as a function of the LO phase offset,





(a) Simulations and Measurements of the ED output voltage for 1 GBd BPSK signals

(b) Measurements of the ED output voltage with 8 GBd and 16 GBd BPSK signals

#### Figure 8.24

S-curve measurements of the Error Detector. From [324] (C) 2019 IEEE

for 1 GBd BPSK signals plotted in Fig. 8.24(a), show both in simulations and measurements, that the circuit is responding to the inputs in a linear fashion. The S-curve shows that it fits the requirements of the Costas loop. A DC shift is observed in the measurements, which is due to the fact that the output node of the buffer stage is not exactly held at zero Volts when one of the input signals is zero. This is due to the difficulty in achieving zero Volts at a node while having dual supply voltages. Fig. 8.24(b) shows the one-sided S-curves for higher data rates of 8 GBd and 16 GBd. Due to the symmetry in the S-curves for higher data rates 8 GBd and 16 GBd maintain the shape compared to simulations of the required output characteristics, but with little variations in DC offsets.

#### 8.2.1.1 BPSK Costas Loop - System Level Simulations

Fig 8.25 shows the envelope simulations performed for the complete BPSK Costas loop in order to test the functionality of the complete system by using the designed circuit level blocks which were discussed in the earlier sections. Envelope simulations utilizes both the time and frequency domain information in order to analyze the feedback system. The transmitter and the receiver blocks are the block-level models from advanced design systems (ADS). The VCO is modeled using one of the ADS models where the parameters are set based on the commercially available VCO. An active proportional-type loop controller with gain of 2 and a bandwidth(BW) of 10 MHz has been used for the simulations which is similar to the one used in measurements. The simulation takes place at a carrier frequency of 40 GHz, a data rate of



(d) Control voltage to the VCO

Time-domain simulations of the BPSK Costas loop showing the locking transients with f<sub>off</sub>=100 kHz,  $\phi_{off}$ =30 deg

1 GBd, a frequency offset (f<sub>off</sub>) of 100 kHz and a phase offset ( $\phi_{off}$ ) of 30 deg. These conditions have been used in order to show the functionality of the developed system utilizing the designed circuits which are system-specific. Similar simulations with higher carrier frequencies of 240 GHz and data rates up to 30 Gbps were performed successfully.

Fig 8.25(a) and Fig 8.25(b) shows the transmitted and the received BPSK in-phase (I-Data) signal. Fig 8.25(c) shows the received quardature-phase (Q-Data) data which almost goes to zero after 300 nsec, where the loop reaches the

locked state. The control voltage to the VCO as shown in Fig 8.25(d) settles down to a constant DC value which reflects the value of frequency offset. The Q-data signal still have a steady-state value due to the steady state error in the feedback systems. This can be made to zero by adding an integrator in the loop controller. But the integrators are not easy to handle due to their high gain which potentially makes the loop unstable.

#### 8.2.1.2 BPSK Costas loop - Measurements

Fig. 8.26 shows the measurement setup for the BPSK Costas loop, where different modules of the system are shown in accordance with the block diagram shown in Fig. 8.20. A commercially available IQ-downconverter,



#### Figure 8.26

Measurement set-up for the BPSK Costas loop receiver

HMC751 from Analog Devices was used. The down-converter has 3 GHz of IF bandwidth with 7 GHz center frequency. Similar measurements were performed at high carrier frequencies at 24 GHz and 34 GHz. The measurement results are shown as constellations in Fig. 8.29.

A Keysight arbitrary waveform generator (AWG) M8195A was used as a transmitter in order to generate the RF signal at 7 GHz with different baud rates. Fig. 8.27 shows the eye plots of the down-converted BPSK signal after sychronization for different baud rates and Fig. 8.28 shows the constellations diagrams. It can be observed that the phase offset is increasing as the baud rate is increased from 1 GBd to 4 GBd. The degradation of the signal shape can already be observed at 4 GBd data rate due to the limited IF bandwidth of the IQ-downconverter.

Fig. 8.30 shows the variation of error vector magnitude (EVM) with the data rate for the BPSK Costas loop systems after synchronization. The EVM



Eye diagram plots of I channel at carrier frequency  $(f_c) = 7 \text{ GHz}$  and data rates of 1 GBd (a), 2 GBd (b) and 4 GBd (c) with the BPSK modulation with AWG M8195 as the RF signal generator and HMC951A as the IQ down-converter

degrades with the data rate and carrier frequency. For a given data rate of 1 GBd, an EVM of -20 dB is achieved for  $f_c$  of 7 GHz while an EVM of -17 dB is achieved for  $f_c$  of 40 GHz. The sensitivity of the VCO (K<sub>VCO</sub>) is dependent on the frequency of operation, which increases non-linearly with the frequency of operation. The EVM performance can be improved, especially at the lower  $f_c$  by complete integration of the system as maintaining symmetry during the measurement set-up is not easy to achieve. This difference will be observed in case of QPSK Costas loop peformance as it is a fully integrated IC.

At higher data rates, it is observed that the EVM performance does not degrade anymore with the carrier frequency. The limiting factor at higher data rates is the IF bandwidth of the receiver system, which also includes the circuit blocks of the Costas loop. This is because of the low-pass effect of the IF bandwidth on the baseband I and Q signals which might degrade the performance of the ED.



Constellation diagrams with BPSK Costas loop synchronization at symbol rates of 1 GBd (a), 2 GBd (b) and 4 Gbd (c)



#### Figure 8.29

BPSK Costas loop constellation diagrams with synchronization for high carrier frequencies of  $f_c = 24$  GHz (a) and  $f_c = 34$  GHz (b)

# 8.2.2 QPSK Costas Loop

The BPSK Costas loop has to regenerate a carrier that is in phase with a single carrier. In case of QPSK, we no longer have to deal with one single carrier, but receive a signal that is the addition of two carriers which are orthogonal to each other in phase and modulated by two different data signals  $m_1(t)$  and  $m_2(t)$ , respectively, as shown in Fig. 8.31. If we assume that  $m_1(t)$  modulates the in-phase carrier and  $m_2(t)$  modulates the quadrature carrier, the combined output signal V<sub>QPSK</sub>(t) of the transmitter can be written as in





(8.1). The locally generated VCO signal  $V_{VCO}$  as in (8.2) is multiplied with the received RF signal in order to down-convert the transmitted data signals  $m_1(t)$  and  $m_2(t)$ .







Figure 8.32 Completely integrated QPSK Costas Loop MMIC. From [333] © 2020 IEEE

$$V_{\text{QPSK}} = m_1 \cdot \cos(\omega_1 t + \theta_1) - m_1 \cdot \sin(\omega_1 t + \theta_1)$$
(8.1)

$$V_{\rm VCO} = \cos(\omega_2 t + \theta_2) \tag{8.2}$$

Fig. 8.31 shows the block diagram of the designed QPSK Costas loop along with the transformation of the down-converted signals at each stage. Fig. 8.32 shows the fabricated MMIC. The down-converted I and Q signals are divided in power twice. In order to convert the multi-level signals to two-level signals, which occur during the phase offsets, limiters are used as shown in Fig. 8.33(a) before they are sent to the QPSK-error dectector (QPSK-ED). The QPSK-ED generates the control signal for the VCO which will lock to the required frequency and the phase of the received RF signal [324]. An interesting fact about the QPSK Costas loop is that it can also be used to synchronize the BPSK modulated signals. When the Q - channel goes to zero, which is the case of the BPSK modulation, the QPSK Costas loop looks like a BPSK Costas loop which is marked as grey region in Fig. 8.31 [334].

The Limiter circuit was designed based on a modified, three-stage Cherry-Hooper topology [335]. The smallest amplitude level depends on how large the phase offset is between the local oscillator (LO) and the received carrier. In the worst case, where the phase offset is at its maximum, the amplitude of the signal is at its minimum. In order to convert a multi-level amplitude to an unilevel amplitude, a large amplification factor is required for low-level signals and a small amplification factor is required for high-level signals. Fig. 8.34(b) shows the small-signal measurements of the designed limiter circuit. A small signal gain of around 16 dB upto 50 GHz has been achieved. The Fig. 8.35





Circuit schematics for QPSK Costas loop



(a) Simulated Tanh characteristics of the (b) S-paramter measurements of the limiter Limiter

# Figure 8.34

Simulated tanh characteristics (a) and S-Parameter measurements (b) of the limiter

shows the limiting of the 1 GBd, four-level input signal to two-level signal at the output of the limiter.

The complete error detector for QPSK Costas loop (QPSK-ED) is formed by combining the two unit cells (ED) used for the BPSK Costas loop as shown



(b) 1 GBd two-level output signal from limiter

#### Figure 8.35

Measurements of the limiter with multi-level input signals

in Fig. 8.33(b). The four input signals to the QPSK-ED are shown in the Fig. 8.31. The output currents of the two unit-cells are summed at the nodes A and B. An emitter follower stage is used at the output to match to  $50 \Omega$  at the output.

#### 8.2.2.1 QPSK Costas loop - System Level Simulations

Fig 8.36 shows the simulations performed for the complete QPSK Costas loop. Similar to the set-up for BPSK Costas loop, the transmitter and the receiver blocks are the models from ADS. The VCO is modeled using one of the ADS models where the parameters are set based on the commercially available VCO. An active proportional-type loop controller with gain of 2 and BW of 10 MHz has been used for the simulations which is similar to the one used in measurements. The simulation is carried out at a carrier frequency of 24 GHz, a data rate of 1 GBd, a frequency offset ( $f_{off}$ ) of 100 kHz and a phase offset ( $\phi_{off}$ ) of 30 deg.

Fig 8.36(a) and Fig 8.36(b) shows the transmitted I-Data and Q-Data respectively and Fig 8.36(c) and Fig 8.36(d) show the received I-Data and Q-Data, respectively. The control voltage to the VCO as shown in Fig 8.36(e) settles down to a constant DC value which reflects the value of frequency offset. The loop reaches the lock state after 200 nsec. The amplitude variations observed in the control voltage are severe compared to the BPSK simulation results which is due to the complexity of the QPSK Costas loop. When compared with the BPSK Costas loop, the QPSK Costas loop consists of double

Wireless 100 Gbps And Beyond



266

Simulations of QPSK Costas Loop with foff=100 kHz,  $\phi_{\rm off}{=}30~{\rm deg}$ 

the number of loops and more number of circuits. It also involves both I and Q signals for complete running time of the loop, which is not the case in BPSK Costas loop. These differences makes the output of the QPSK-ED is more noisy. This is one of the reasons which makes the QPSK Costas loop operate stably only for lower carrier frequencies compared to the BPSK Costas loop, which was observed in the measurements as discussed in the following sections.

# 8.2.2.2 QPSK Costas loop - Measurements

Fig. 8.37 shows the measurement setup for the QPSK Costas loop along with the PCB on which the MMIC has been wire-bonded. Fig. 8.38 shows the down-converted QPSK signals after sychronization for different baud rates. The measurement set-up is similar to the set-up used for BPSK Costas loop set-up for the initial measurements at the carrier frequency of 7 GHz.

The IQ-downconverter HMC751 was used for down-conversion and the arbitrary waveform generator (AWG) M8195A was used as a transmitter in





order to generate the RF signal at 7 GHz with different baud rates. Fig. 8.39 shows the constellations of the received signal. A similar observation in the peformance can be made with respect to the effect of the limited band-width of the IQ down-converter on the down-converted baseband signals. The constellations shown in the Fig. 8.40(a) and 8.40(b) where the constellation points for 8 Gbps looks less varying than the constellation points of 4 Gbps due to the low pass filtering and lower number of collected samples for the higher data rates.

Fig. 8.41 shows the variation of EVM with the data rate for QPSK Costas loop systems after synchronization. As observed for BPSK system, the EVM degrades with the data rate and carrier frequency. For a given data rate of 2 GBd, an EVM of -23 dB is achieved for  $f_c$  of 7 GHz while ab EVM of -12.5 dB is achieved for  $f_c$  of 24 GHz. Along with the K<sub>VCO</sub>, the EVM performance is also affected by the complexity of the QPSK Costas loop, where two loops are present unlike one loop in the BPSK Costas loop system. This makes it difficult to use the QPSK Costas loop for higher  $f_c$  when compared to the BPSK Costas loop.

As mentioned in the earlier section, the measurements were performed using the QPSK Costas loop to recover the BPSK modulate signal. The EVM performance is shown in Fig. 8.41. The EVM performance of the QPSK Costas loop is better compared to the BPSK Costas loop for BPSK modulated signal with an EVM of -35 dB for 1 GBd data rate at  $f_c$  of 7 GHz. The limiter along the I-arm improves the EVM value by clipping the amplitude variations of the recovered I-signal.



(b) Q channel with 1 GBd

Figure 8.38 Eye plots for QPSK demodulation. From [333] © 2020 IEEE



Constellation for QPSK demodulation at 7 GHz. From [333] (C) 2020 IEEE

### 8.2.3 Application to mmW receivers

The phase of the VCO on the receiver side during the locking process given by (8.3), shows that the two parameters which are directly related to the phase variations inside the loop are the carrier frequency or here the angular carrier frequency ( $\omega_c$ ) and the K<sub>VCO</sub>. The parameters V<sub>ctrl</sub> and  $\omega_c$  are directly related to the parameter K<sub>VCO</sub> as the V<sub>ctrl</sub> depends on the K<sub>VCO</sub> and further K<sub>VCO</sub> increases with  $\omega_c$ . Thus, improving K<sub>VCO</sub> plays a crucial role in making it possible to implement it for millimeter-wave (mmW) receivers.

$$\theta_2(\mathbf{t}, \omega_c, \mathbf{V}_{\text{ctrl}}) = \omega_c \mathbf{t} + \mathbf{K}_{\text{VCO}}(\omega_c) \int \mathbf{V}_{\text{ctrl}}(t) \cdot d\mathbf{t}$$
(8.3)

Real100G.COM



### Figure 8.40

Constellation diagrams for  $f_c = 24$  GHz. From [333] (C) 2020 IEEE



Figure 8.41 EVM performance of QPSK Costas loop. From [333] © 2020 IEEE

### A low sensitivity VCO

The design of a VCO with lower  $K_{VCO}$  will reduce the sensitivity of the Costas loops and will improve the performance with respect to the carrier frequency. But designing a VCO with lower  $K_{VCO}$  at higher frequencies can be challenging due to the increase in the sensitivity of the capacitors or varactor which control the oscillation frequency of the VCO. The  $K_{VCO}$  can be brought down by parallel connection of a number of varactors, but this would bring down the frequency of operation. An optimal solution to solve this issue should be investigated.

### Sychronization at IF frequency

As explained in the earlier sections, the Costas loops can suffer from the low carrier frequency operation due to the limited  $K_{VCO}$ . But it can be still used in mmW systems, like at 300 GHz carrier frequencies by installing it in a superheterodyne receiver architecture where the synchronization can be performed at lower IF frequencies depending on the required RF bandwidth. In case of





baseband signal with 33 GBd symbol rate, the RF bandwidth of 40 GHz should suffice. In such a case, the received RF frequency can be down-converted to an IF frequency of 20 GHz where synchronization using Costas loop is still viable.

Here, such a heterodyne system at  $f_{IF}$  of 7 GHz is shown for the BPSK case in Fig 8.42. The measurements are performed in a similar environment as explained in above section, except that the RF frequency is at 300 GHz which is down-converted to 7 GHz by using  $LO_1 = RF - f_{IF}$ . In the second stage, a 7 GHz IQ down-converter is used with a VCO at  $LO_2 = 7$  GHz. The following experiments were performed using BPSK system as a proof of concept.

### 8.2.4 IQ recovery for PSSS modulated signals

In the PSSS architecture using m-sequences as in [336], the IQ-PSSS modulated signal data is a sum of I and Q signals which are up-converted to RF frequency due to which there exists no quadrature relationship between I and Q signals. In such cases, the QPSK Costas loop cannot be used directly after the down-conversion on the receiver side. To solve this issue we propose a novel concept which uses an intermediate up- and down-conversion stage to establish the quadrature relationship and thus performing synchronization using the designed QPSK Costas loop. The simulation set-up in MATLAB-Simulink is shown in Fig 8.45. The RF signal is modulated by PSSS data and it is frequency and phase shifted in order to include the non-linear effects of the channel. The receiver side of the set-up consists of the 240 GHz IQ downconverter which converts the received RF signal to baseband I and Q data signals. The base-band system which was shown in Fig 8.1 decodes the PSSS I and Q data signals.

After the sample and hold block in baseband part, the multi-level signals are converted into bi-polar signals. These I and Q signals from the baseband

Real100G.COM



(b) 2 GBd BPSK

### Figure 8.43

Eye diagrams of sychronization of the super-heterodyne system



### Figure 8.44

Constellation diagrams for sychronization of the super-heterodyne system

part are up-converted and then down-converted at carrier frequency of 80 GHz, in order to establish the frequency and the phase information of the carrier. This is followed by the QPSK Costas loop [324] which acquires the frequency and the phase information. The recovered carrier is used as the LO signal for the receiver and as the clock for the baseband part, thus making it a coherent system. The simulations were performed using a unit matrix based PSSS data with the chip rate of 20 Gcps and f<sub>off</sub> of 1 MHz and  $\phi_{off}$  of 45 deg using the block Phase/FrequencyOffset.

The simulation results are shown in Fig 8.46 where I-Data and Q-Data are shown during the synchronization process. The constellation diagrams in Fig 8.47 shows the QPSK signal before and after synchronization. The simu-





Simulation set-up for IQ PSSS synchronization in MATLAB-Simulink





### Figure 8.46

PSSS Signals during the synchronization process

lations prove that this method enables the QPSK Costas loop to acquire the frequency and the phase information of the received PSSS modulated signals.

As mentioned earlier, the QPSK Costas loop cannot be used directly for the down-conversion of PSSS modulated signals with m-sequences. This is due to the existing correlation between I and Q signals which does not allow for quadrature up-conversion. The other solution to this problem is to use Kasami codes to generate the PSSS signals in order to have less correlation between I and Q signal as suggested in [336]. This will allow quadrature modulation while having no interference between the I and Q signals. This is discussed in details in section 1.3.1.

Real100G.COM



### Figure 8.47

Constellation diagrams showing before and after synchronization for IQ-PSSS data of 20 Gcps

### 8.2.5 Summary

The design and implementation of an analog BPSK and QPSK Costas loop has been discussed. All the required custom circuit blocks to build the Costas loops were designed, implemented and measured as millimeter-wave monolithic integrated circuits (MMIC) in SiGe BiCMOS technology. An integrated QPSK Costas loop was also implemented. The measurement results presented here proves the synchronization capability and also discusses the limitations of the Costas loops. Further more the capability of QPSK Costas loops to sychronize both the BPSK and QPSK modulated signal is proved through measurements.

The measurements were not successful with the high data rate links above  $f_c$  of 40 GHz because of the increase in the  $K_{VCO}$ . Unlike the conventional PLL systems where the VCOs are followed by frequency dividers, the VCO in the Costas loops are followed by frequency multipliers, which increase the  $K_{VCO}$  by the multiplication factor. The solution to this is to design a VCO directly at the carrier frequencies with as low sensitivity as possible, in order to make the loop less sensitive to the variations of the control voltage.

A novel concept has been introduced on simulation level in order to synchronization the IQ-PSSS signals using the designed QPSK Costas loop. The issue of phase-ambiguity in case of QPSK Costas loop is not investigated in this work. A well known solution to the phase ambiguity is using a differential-QPSK modulation type which decodes the received symbols based on the phase changes rather than the symbol value.

Although the circuit blocks of the loop are designed with ultra-broadband performance, the results shown here are lower in data rates due to the limitation from the parasitic effects of the packaging of the MMIC with wire-bonds other than  $K_{VCO}$ . The broadband performance of each individual circuit block and the integrated QPSK Costas loop can be further improved by improving the packaging techniques.

### 8.3 Spreading Codes

A pseudo-random spreading code is a crucial component of any spread spectrum system. Ideal data transmission spreading codes have a delta function as autocorrelation, and zeroes as a cross-correlation. The ideal spreading code would be an infinite sequence of plausible random binary digits. Implementing such a spreading code on both the transmitter and receiver side would require unlimited memory, which is not a practical option. Therefore, we use sub-optimal codes that look like Pseudo-Random Noise (PRN), which meet the standard statistical randomness test. A few examples of statistical-based tests are the Wald–Wolfowitz test [337], and the Diehard tests [338].

We are looking for sequences with excellent auto-correlation properties<sup>5</sup> and low cross-correlation properties<sup>6</sup> for data transmission. The choice of spreading codes and the advantages/drawbacks of m-sequences and Kasami sequences are detailed in the thesis submitted by Krishnegowda, 2019 [336]. The generation and correlation property of Kasami sequences is described in the following section since they were used in our experimental work (s. Sec. 8.4).

### 8.3.1 Kasami codes for transmission on I/Q channels

For data transmission on I/Q channels, we have to ensure the codes have the following property,

- The chosen code should minimize the I–Q cross-leakage during downconversion at the receiver. The cross-correlation properties of the selected code ensures the above criteria.
- The good autocorrelation properties of the chosen code take care of the orthogonality property used for data transmission.

Ideally, we want to have a cross-correlation value of 0, i.e. ideal orthogonality. *Welch's lower bound* [339] sets a mathematical lower limit to the cross-correlation value of the chosen sequences.

The cross-correlation is a measure of the similarity between two sequences as a function of one sequence being displaced relative to another. Low crosscorrelation values mean two sequences are not similar. Thus, the above property is useful for separating the data sent on the transmitter's I/Q channels. In comparison to m-sequences, Kasami codes have acceptable autocorrelation properties and good cross-correlation properties. Therefore they have close correlation functions to "ideal codes". Families of Kasami code are built from m-sequence. Kasami sequence sets are one of the most commonly used binary sequence sets, due to their low cross-correlation [340, 341, 342].

<sup>&</sup>lt;sup>5</sup>A clear peak without side lobes

<sup>&</sup>lt;sup>6</sup>The orthogonality of different sequences





The cross-correlation function of any two sequences x and y of length L is given by

$$R_c(n) = \sum_{k=0}^{L-1} x(k) \times y(n-k).$$
(8.4)

For a kasami sequence with a period of  $L = 2^r - 1$ , where r is non-negative even integer. The unique number of Kasami code alphabets for a given r [340], are  $2^{r/2}$  sequences.

Let us consider an example for r = 4. Then we have a Kasami sequence of length L = 15. The small set of Kasami codes contains  $2^{4/2}$  unique different sequences which are periodic over a length of 15. These sequence are shown in Eqn. (8.5),

Since kasami sequences are built through decimation of *m*-sequences, Seq. 1 in Eqn. (8.5) is a *m*-sequence of length 15. Seq. 1 can , e.g., be used to encode the I-channel, and Seq. 2 is used to encode the Q-channel (s. Eqn. (8.5)). Thus, we can have I-Q transmission system.





Cyclic cross-correlation of a Kasami sequence. From [336].

Figure 8.48 shows the cyclic auto-correlation of Seq. 2 as described in Eqn. (8.5). The difference between the maximum peak and the lowest value is  $10.8 \text{ dB}^7$ .

To implement an I–Q systems with a spreading code, it is desirable to have a very low cross-correlation between the codes used for the different channels to prevent interference between the two channels. Figure 8.49 shows the cross-correlation between Seq. 1 and Seq. 2. In Fig. 8.49, the difference between the maximum peak and the lowest value is  $4.8 \text{ dB}^8$ .

To summarize, the advantages of using Kasami sequences are:

- The "acceptable" autocorrelation properties are used for data transmission. In addition, side-lobes that occur due to non-ideal autocorrelation properties (as seen in Fig. 8.48) can be compensated by channel equalization.
- The I-Q cross-leakage that occurs during down-conversion at the receiver can be minimized as the code domain separates the I-Q channels. The cross-correlation property adds an additional layer orthogonality to facilitate transmission on I-Q channels.

<sup>&</sup>lt;sup>7</sup>Processing gain (PG) is given by  $PG = 10 \cdot \log_{10}(A)$ , here A is 16.

<sup>&</sup>lt;sup>8</sup>Processing gain (PG) is given by  $PG = 10 \cdot \log_{10}(A)$ , here A is 3.



Frame structure used for the HiL experiment. From [336].

### 8.4 Measurement Experiments

In this section, we will discuss the Hardware-in-the-Loop (HiL) measurement experiments with PSSS modulation using RF-frontends operating at 230 GHz.

In a HiL experiment, a PSSS modulated baseband signal is transmitted using RF-frontend operating at THz frequency. The PSSS transceiver operations are pre/post-processed offline in MATLAB/Simulink. During the preprocessing operation, the PSSS modulated symbols are generated and downloaded to an Arbitrary Waveform Generator (AWG) and transmitted over the air. A Real-Time Oscilloscope (RTO) samples and stores the received signal. In the post-processing steps, the PSSS receiver performs synchronization, channel estimation, and demodulation. We demonstrated a PSSS encoding/decoding scheme transmitting wireless data up-to 80 Gbps [343] on air with a distance of 1 m using a 230 GHz RF-frontend developed by the University of Wuppertal<sup>9</sup>.

### 8.4.1 HiL model for a PSSS-15 Transmitter

Our mixed-signal PSSS system is explained in chapter 7. All the individual blocks in the box "PSSS transmitter" are designed in Matlab/Simulink. Figure 8.50 shows the frame structure used in the HiL experiments. We take 100 symbols for synchronization and channel estimation. By averaging channel estimates over 50 symbols, the noise is reduced. The preamble consists of a repeated pattern of "m-sequences". In this experiment, 3000 PSSS symbols<sup>10</sup> are modulated with PAM-16 carrying 180 000 bits ( $3000 \cdot 15 \cdot 4$ ).

The generation of PSSS symbols to be transmitted by the RF-frontend is shown in Fig. 8.51. The data bits are modulated by PAM-16<sup>11</sup> and then passed on to the PSSS encoder. The output of the encoder is routed through the "Multi-Rate Filter", which adjusts the chip-rate to an appropriate sampling

<sup>&</sup>lt;sup>9</sup>This was part of the Real100G.RF project [292, 344].

<sup>&</sup>lt;sup>10</sup>The choice of this number of symbols were limited by the AWG memory.

<sup>&</sup>lt;sup>11</sup>Here the spectral efficiency is 4 bit/sec/Hz.



PSSS transmitter. Post-processing performed in Matlab/Simulink. From [343] © 2018 IEEE



### Figure 8.52

PSSS receiver. Post-processing performed in Matlab/Simulink. From [343] © 2018 IEEE

rate of the AWG<sup>12</sup>. The PSSS signal is up-sampled by a factor of two using a multi-rate filter to match the AWG sampling rate of 40 GS/s.

### 8.4.2 HiL model for a PSSS-15 Receiver

As shown in Fig. 8.52, the received down-converted signal from the RFfrontend is digitally sampled by a RTO. Then the PSSS symbol timing and the carrier phase/frequency are recovered by post-processing in Matlab/Simulink. After the synchronization, the baseband processing chain is followed up by channel estimation and channel deconvolution. Finally, the transmitted bits are recovered, and the BER is evaluated. Here, the RTO operates at a sampling rate of 100 GS/s<sup>13</sup>.

### 8.4.3 Demonstrator Setup

Figure 8.53 shows the measurement setup picture taken in the lab. The transceiver RF-frontend modules are described in [292]. A transceiver operating at 230 GHz [292, 344] were used to setup a 1-meter link as in Fig. 8.54. IHP SiGe 0.13 µm HBT technology [345] is used to produce Tx and Rx chip-sets. As shown in Fig. 8.54, the LO receives a reference signal of 14.375 GHz. A  $16 \times$  multiplier chain (i.e,  $16 \times 14.375 = 230$  GHz) is used to generate a carrier signal at 230 GHz.

The frequency synthesizer is capable of generating signals up to 20 GHz. The output signal from the multiplier chain controls the up-/down conversion

 $<sup>^{12}\</sup>mathrm{We}$  used an AWG from Tektronix.

 $<sup>^{13}\</sup>mathrm{We}$  used an RTO from Tektronix.



Measurement setup of the 230 GHz: A lab photograph.



### Figure 8.54

Measurement setup of the 230 GHz communication link. From [343]  $\bigodot 2018\,\mathrm{IEEE}$ 





280

mixer at the Tx/Rx chip-sets. We have used two phase shifters in Fig. 8.54, to keep the losses balanced in both paths. Both mixers are equipped with baseband buffers to integrate the high-speed baseband signal (I/Q signals as in Fig. 8.54) with a 50  $\Omega$  link. Each module is mounted on a chip linearly polarized ring antenna, which radiates through the silicon substrate into a 9 mm silicon lens. With the help of a focusing lens, the modules achieve a directivity of 25.5 dBi at 230 GHz.

The experimental system, as shown in the Fig. 8.54, was used to achieve highspeed communication with PSSS modulation. A Tektronix AWG70000s (40 GS/s sampling rate each) at the transmitter produces the differential baseband signal that goes through the transmitter's I-channel, and the transmitter's Q-channel is grounded. To ensure the operation of the transmitter in a linear region, 10 dB attenuators are placed at the output of AWG. Two differential operated RTOs (Tektronix DPO70000SX) digitizes the received signal from the RF-frontend receiver chip-set. The LO signal is produced using a single frequency synthesizer which operates at 14.375 GHz. Power splitters and phase shifters are available to ensure maximum frequency alignment. The Tektronix Oscilloscope is running at a maximum sampling rate of 100 GS/s and receives the received baseband signal.

### 8.4.4 Synchronization in HiL experiments

In our HiL experiments, we performed carrier frequency/phase and chip timing recovery offline using MATLAB / Simulink. Figure 8.55 shows the main functional blocks required to achieve synchronization. The received signal is converted to a baseband signal by an RF frontend. The baseband signal is



Limited coherent communication setup. From [343] (c) 2018 IEEE

sampled by an oscilloscope at a high sampling rate, as shown in the Fig. 8.55. This over-sampled signal is fed to the "Chip timing recovery" block for chip timing and the "Carrier synchronization" block for carrier frequency/phase recovery. Following these steps, demodulation at the receiver is used to get back the transmitted data.

We used a chip rate of 20 Gcps in our HiL measurement experiment, and we sample this signal at a sampling rate of 100 GS/s with an oscilloscope. This results in an over-sampling factor of 5. The over sampling factor is used for the synchronization algorithms developed by Matlab for the recovery of carrier frequency, carrier timing and chip time.

#### Synchronization layers of coherent/non-coherent systems 8.4.4.1

In a fully coherent system, the clock signal used to generate both the RF signal and the chip clock and the PSSS symbol clock has to be multiples of the same clock reference. We note that there are several synchronization levels available, including carrier frequency synchronization, and sampling frequency synchronization. Finally, we start the detection of the PSSS symbol to identify the beginning of the data on the payload.

Figure 8.56 shows a limited coherent setup system, where the carrier frequency at 230 GHz required at both transmitter/receiver is generated by a single LO source at 14.375 GHz (s. Fig. 8.54). There is a coherence relationship between

the PSSS symbol synchronization, the chip rate (20 Gcps), and the sampling rate of the AWG (40 GS/s) at the transmitter. It should also be noted that there is a non-coherence relationship between the AWG sampling rate (40 GS/s) and the carrier frequency on the transmitting side. As the same LO signal is transmitted to the transmitter/received chip set (s. Fig. 8.54), a carrier frequency recovery is not necessary.

There is a coherence relationship between the PSSS symbol synchronization, the chip rate (20 Gcps) and the RTO sampling rate (100 GS/s) in the PSSS receiver, but there is no coherence between the RTO sampling rate (100 GS/s) and the carrier frequency (230 GHz). Since the transmitted PSSS modulated signal at 20 Gcps is sampled by an RTO at 100 GS/s, we have an oversampling ratio of 5 samples per chip.

### 8.4.5 Performance results

In this Section, we discuss the measurement results of our HiL experiment (Fig. 8.54). The essential results are the channel estimation/channel equalization. We present the eye diagram for a BPSK modulation at a chip-rate of 20 Gcps. We have also performed PAM-16 modulation for a chip rate of 20 Gcps.

### 8.4.5.1 Channel estimation and equalization

The core idea of the channel estimation method applied in our experiments is to use a Dirac-Delta like autocorrelation function of m-sequences (refer chapter 7). This is utilized for channel estimation by sending repeated strings of m-sequences in the preamble.

Figure 8.57 illustrates the channel response measured with *m*-sequences before and after channel deconvolution (Fig. 8.54). Figure 8.57(i) shows the channel response measured in time-domain before deconvolution and Fig. 8.57(ii) shows the channel response in time-domain at 20 Gcps after deconvolution. In Fig. 8.57(i), the channel response is spread over three chips, and we observe strong side lobes due to the impairments caused by the RF-frontends. For example, before the deconvolution (s. Fig. 8.57(i)), the first side lobe at a chip time 3 has an amplitude of 0.4. After we complete channel deconvolution (s. Fig. 8.57(ii)), the channel response is limited to a single chip and has a Diracdelta alike auto-correlation function. As we can recognize in Fig. 8.57(ii), the amplitude of the sidelobe at chip time 3 is less than zero. Consequently, total distortion caused over the "effective channel" could be compensated by deconvolution. Here, the "effective channel" consists of the transfer function of the RTO, the AWG, the signal transmitted over the air and the RF-frontend impairments of the Tx/Rx modules<sup>14</sup>.

<sup>&</sup>lt;sup>14</sup>In general, deconvolution operation corrects only for the linear distortions. But our experiment shows that the non-linear distortions are also corrected effectively.



Measured channel response before deconvolution for the 230 GHz transmission link. As indicated by the dotted black line window, the channel response is spread over three chips. (ii) shows the channel response after performing deconvolution. As indicated by the dotted black line window, the channel response is confined to one chip. From [343]  $\bigcirc$  2018 IEEE

### 8.4.5.2 PSSS modulated data with BPSK

Typically, for a spectral efficiency of 1 bit/s/Hz, there are only two levels in the eye diagram, but there are several levels, as shown in Fig. 8.58. Each level of output in Fig. 8.58 is the result of 15 parallel IDCs (Integrate and Dump Correlator), and they remain constant over a symbol time of 750 ps.

Ideally, each output level that represents the integration result of a single  $IDC^{15}$  should be mapped to +1 or -1 values. In case of our HiL experiment setup Fig. 8.54, the 3 dB RF bandwidth of the transmitter module is 26 GHz

 $<sup>^{15}{\</sup>rm There}$  are 15 parallel IDCs operating concurrently as described in the baseband receiver architecture. Refer to Fig. 8.1.



Eye-diagram for a PSSS modulated signal transmitted at 20 Gbps with BPSK using a 230 GHz RF-frontend. From [336].

(230–250 GHz). At 230 GHz RF frequency, the measured linear gain of the transmitter module is 16 dB [292]. As can be seen in Fig. 8.58, the eye-opening is 86% due to the high linear transmit gain of the power amplifier. The results of this HiL experiments are published in Ref. [343].

### 8.4.5.3 PSSS modulated data with PAM-16

The number of discrete levels that need to be distinguished at the output of parallel ADCs (refer chapter 7 and Fig. 8.1) increases with higher spectral efficiency. E.g., to achieve 1 bit/s/Hz, one needs to detect two levels, requiring a 2 bit resolution ADC. The ADC sample rate should be equal to that of the PSSS symbol rate, i.e, 1.67 GHz. Similarly, in order to reach 4 bit/s/Hz, the ADC should have a 4-bit resolution<sup>16</sup> with a sampling rate of 1.67 GHz. For PSSS transmission at 20 Gcps with PAM-16, the recorded BER is  $2.072 \times 10^{-3}$  as described in Ref. [343].

As outlined in the chapter 7, the PSSS modulation uses only the I-channel of the transmitter. In order to achieve higher spectral efficiency, PAM modulation can be combined with PSSS encoding. E.g., PSSS with PAM-2 results in a bit loading of 2 bits and PSSS with PAM-16 leads to a bit loading of 4 bits [346, 347].

Figure 8.59 shows a modulated PSSS signal with a spectral efficiency of 4 bit/s/Hz (or with a PAM-16 overlay of data bits) at 20 Gcps to reach

 $<sup>^{16}\</sup>mathrm{The}$  number of discrete levels that need to be detected is 16.



PSSS transmission at a chip-rate of 20 Gcps with PAM-16 to achieve 80 Gbps. From [343] C 2018 IEEE

80 Gbps. Figure 8.59(a), shows the transmitted PAM-16 symbols, Fig. 8.59(b) illustrates the received symbols before channel deconvolution at the receiver and Fig. 8.59(c) shows the received symbol after channel deconvolution. By comparing Fig. 8.59(a) and Fig. 8.59(c), it is evident that we can recover the transmitted PAM-16 symbols. This plot also demonstrates the major impact of channel equalization through deconvolution. The LO leakage at the receiver limits the IF bandwidth to 14.375 GHz. Thus, we were not able to target a higher chip rate above 20 Gcps.

As described in Ref. [348], using the same RF-frontend as in Fig. 8.54 with 16-QAM modulation the maximum data rate of 100 Gbps was achieved. It should be noted that other baseband operations such as synchronization, channel estimation, and equalization are not addressed in the paper. However,

both I/Q channels of the transmitter were used in Ref. [348]. We demonstrated data rate of 80 Gbps by using only the I – channel at the PSSS transmitter. If we apply the same methods as outlined using Kasami sequences on I and Q channels, we can achieve the target of above 100 Gbps.

### 8.4.6 Kasami codes transmission on I/Q channels

As we could see in the measurement experiment from previous section, the PSSS transmitter uses only the I-channel of the transmit RF-frontend to send the modulated data and the Q-channel is not used (s. Fig. 8.54). We investigated methods used to double the data rate by utilizing the Q-channel of the transmitter. We adopted the idea of parallel sequence spread spectrum transmission to transmit data on both I/Q channels using Kasami sequences as outlined in Sec. 8.3.1.

For a Kasami code length of 15, there are 4 different unique sequences available (Seq 1, ..., Seq 4) (s. Sec. 8.3.1), possessing acceptable auto-correlation and good cross-correlation properties. One of the attractive features of Kasami codes is that out of these 4 sequences, one of them is the m-sequences<sup>17</sup>. Because, we know that the m-sequences possess good auto-correlation properties, and it also a part of the Kasami sequence family. Thus, the only sequence in Kasami family, i.e. Seq 1, posses a good autocorrelation and cross-correlation property.

### 8.4.6.1 I-Q transceiver system with Kasami codes

Fig. 8.60 shows the general concept on how to encode/and decode using Kasami codes. I and Q domains are used as two independent channels in which the I-channel operates with a cyclically shifted bi-polar encoding matrix generated by the use of "Seq 1" <sup>18</sup>. The Q-channel operates with a cyclically shifted bipolar encoding matrix generated by the use of "Seq 2" <sup>19</sup>. Before encoding, each channel can be overlayed with a PAM modulation of 1/2/3/4 bit/s/Hz spectral efficiency. The encoded signal coming from "Encoder 1" block is connected to the I-Channel of the RF-frontend (Fig. 8.61). Similarly, the output signal from "Encoder 2" is connected to the Q-channel of the RF-frontend (Fig. 8.61).

The uni-polar decoding matrix is generated in the receiver by using "Seq 1" for the I-channel. The uni-polar decoding matrix is generated for the Q-channel using "Seq 2". From the above matrices, the two separate streams of transmitted data are reconstructed in the receiver.

<sup>&</sup>lt;sup>17</sup>Here it is named as Seq 1.

<sup>&</sup>lt;sup>18</sup>This is same as m-sequence of length 15.

 $<sup>^{19}</sup>$ Here we take Seq 2 to encode Q-Channel. But it does not matter if we take Seq 3 or Seq 4 because all of these sequences possess the same cross-correlation properties w.r.t Seq 1.





Orthogonal PAM overlay modulation using a different set of Kasami basecodes on I and Q channel. From [336].

# 8.4.6.2 Measurement setup of the 230 GHz communication link with Kasami codes

The RF-frontend is described in the Sec. 8.4.3. The only difference in this configuration is that the Q-channel in the transmit RF module is not grounded as shown in Fig. 8.61 as opposed to Fig. 8.54 where the Q-channel is grounded. The AWG produces independent data streams on I/Q channels using "Seq1"/"Seq2" at a chip rate of 10 Gcps <sup>20</sup> with spectral efficiency of 4 bit/s/Hz (i.e., PAM-16 overlay modulation), respectively. The sampling rate of AWG is set to 40 GS/s.

The RTO collects incoming data from the RF-frontend receiver module at a rate of 100 GS/s and is stored in the memory of the receiver. The synchronization, channel estimation, and recovery of the transmitted data symbols are performed by post-processing of the data in Matlab/Simulink. The synchronization is presented similarly, as described in Sec. 8.4.4.1 and illustrated in Fig. 8.56.

 $<sup>^{20}\</sup>mathrm{The}$  restriction is due to LO leakage and I-Q cross leakage at the receiver. More on this later.



Transmission experiments with a 230 GHz link by using Kasami codes. From [336, 344].

### 8.4.6.3 Channel estimation with Kasami codes

The measured channel response on the I-Q channels, before and after deconvolution for the 230 GHz transmission link, is shown in Fig. 8.62. In the transmitted frame of the I-channel, the preamble consists of a repeated vector of "Seq 1", and the Q-channel preamble consists of a repeated vector of "Seq 2".

The primary purpose of using two different sequences on the I-Q channel is to prevent cross-channel leakage at the receiver after down-conversion, i.e., a part of the energy is leaked from I to Q and vice versa. That is where cross-correlation plays a critical role as Kasami codes have an almost perfect cross-correlation value of around 4.8 dB <sup>21</sup>, when we compare to the perfect cross-correlation value of 0 dB. Since the I-channel uses a "*m-sequence*", its channel response is spread only over 3 chips as in Fig. 8.62(a)(i), and this is also corrected by channel deconvolution as depicted in Fig. 8.62(a)(ii).

There is a trade-off between having excellent cross-correlation properties among the sequences "Seq 1"/"Seq 2" and having a non-ideal auto-correlation property of "Seq 2". This can be observed in Fig. 8.62(b) (i), where the sidelobes of the channel estimation of Q-channel are spread over 13 chips. However, as can be seen in Fig. 8.62(b) (ii), due to the channel deconvolution scheme, we observe no side lobes. The error caused by non-ideal autocorrelation is "deterministic" and can therefore be corrected by deconvolution.

 $<sup>^{21}\</sup>mathrm{The\ cross-correlation\ between\ Seq\,1}$  and Seq 2 is 4.8 dB.





(a) Measured channel response before deconvolution for the 230 GHz transmission link as described in the setup Fig. 8.61 and chip-rate used is 10 Gcps. The Ichannel uses "Seq 1" which posses ideal auto-correlation properties. (i) as indicated by the dotted black line window, the channel response is spread over three chips. (ii) shows the channel response after performing channel deconvolution. As indicated within a dotted black line window, we can see channel response is confined to one chip.

(b) Measured channel response before deconvolution for the 230 GHz transmission link as described in the setup Fig. 8.61 and chip-rate used is 10 Gcps. Q-channel uses "Seq 2". This sequence does not possess ideal autocorrelation properties like a m-sequence. (i) The dotted black line window indicates that the channel response is spread over 13 chips. (ii) shows the channel response after performing deconvolution. As indicated by the dotted pink line window, the channel response is confined to one chip.

Measured channel response before and after deconvolution for the 230 GHz transmission link. From [336].

### 8.4.6.4 Kasami codes with PAM-16

A PSSS modulated signal at 10 Gcps with PAM-16 was independently transmitted on I-Q channel resulting in a cumulative data rate of 80 Gbps.

Figure 8.63(a), depicts the transmitted PAM-16 symbols on the Ichannel of the transmitter at 10 Gcps, Fig. 8.63(b) shows the received symbols before channel deconvolution at the receiver and Fig. 8.63(c) shows the received symbol after channel deconvolution. By comparing Fig. 8.63(a) and



Independent I-channel of PSSS transmission at a chip-rate of 10 Gcps with PAM-16 to achieve 40 Gbps. From [336].

Fig. 8.63(c), it is clear that we can recover the transmitted PAM-16 symbols.

Figure 8.64(a), depicts the transmitted PAM-16 symbols on the Q-channel of the transmitter at 10 Gcps, Fig. 8.64(b) depicts the received symbols before channel deconvolution at the receiver, and Fig. 8.64(c) shows the received symbol after channel deconvolution. By comparing Fig. 8.64(a) and Fig. 8.64(c), it is evident that we can recover the transmitted PAM-16 symbols.



Independent Q-channel – PSSS transmission at a chip-rate of 10 Gcps PAM-16 to achieve 40 Gbps. From [336].

Here, we must point out that we have carried out I-Q transmissions using repeated data symbols, and these symbols are reproduced as shown in Fig. 8.63 and Fig. 8.64. In principle, we have shown that two separate symbols can be transmitted independently on the I/Q channel. Therefore, a further extensive measurement campaign needs to be performed to evaluate the BER performance. We can conclude from the channel equalization plots performed on I and Q channel (s. Fig. 8.62(a) and Fig. 8.62(b)) that the deterministic error occurring on the Q-channel can be completely corrected.



I-Q channel separation measured at the receiver. From [336].

The largest achievable chip-rate to be transmitted depends on the I-Q channel cross-leakage and the LO leakage of the 230 GHz receiver module [349]. The LO leakage sets a sharp accessible IF bandwidth of 15 GHz. This is caused due to the feed-through of the LO signal back to the I-Q baseband down-converter circuits.

Figure 8.65 shows the I-Q channel separation at the receiver for different chip-rates for the 230 GHz RF-frontend. We can infer that more of I-Q channel separation gain results in less I-Q cross leakage during the down-conversion at the receiver, e.g., in Fig. 8.65, for 5 Gcps we have an I-Q separation of 21 dB, and for 20 Gcps we have 3 dB. Thus, the above factor limits the chip-rate of 10 Gcps in which the I-Q separation is 17 dB.

### 8.4.7 Summary

The PSSS encoded waveform output is a discrete multi-level signal. A PSSS modulated signal always uses *m*-sequences for encoding, and it is fed only to the I-channel of the transmitter frontend. We have discussed our HiL experiments setup, and they show that PSSS is a suitable candidate for modulation in THz frequency range transmission. A PSSS modulated waveform was trans-

### Real100G.COM

mitted using 230 GHz link to achieve a maximum data-rate of 80 Gbps. For the first time, we have adopted the idea of PSSS transmission on both the I-Q channel by using Kasami codes. We have transmitted a Kasami coded signal with 10 Gcps with 4 bit/sec/Hz spectral efficiency independently on the I and Q channels. This results in a combined data rate of 80 Gbps.

From our HiL experiments, we can say that the main limitations to target higher chip rates are LO leakage, available RF/IF bandwidth, and I-Q cross leakage. The key drawback of achieving higher spectral efficiency is the linear gain of the transmitting power amplifier on the THz radios.

### 8.5 Conclusion

Innovative and efficient wireless technologies for a range of transmission links need to be developed to fulfill the challenges to attain higher data rates. We need to focus our thinking on the design of complete "End-2-End Systems" consisting of the baseband, the MAC layer, and the RF-frontend.

In this chapter, we presented research on a PSSS-based transceiver architecture and Costas loop-based RF carrier synchronization that enables a large portion of the baseband signal processing to be effectively implemented in the analog domain. This includes the use of parallel DACs and ADCs operating at the comparatively low PSSS symbol rate, thereby significantly reducing the data converter requirements. The SiGe BiCMOS technology is used to implement the critical circuits for receiver signal processing such as broadband integrate and dump correlator, and their measurement results are presented. The design and implementation of an analog BPSK and QPSK Costas loop have been discussed. All the required custom circuit blocks to build the Costas loops were designed, implemented and measured as MMIC in SiGe BiCMOS technology.

Finally, we have demonstrated a data rate of 80 Gbps using PSSS modulation with PAM-16 in our HIL measurement experiment by using the RF-frontends from Real100G.RF project. We have shown that Kasami codes can be used for data transmission in both I/Q channels to achieve a maximum data rate of 80 Gbps.

In the future, the research goal is to implement a full PSSS transmitter and receiver with RF synchronization to investigate the suitability of a PSSS-based transceiver. Besides, we have to integrate the baseband developed in Real100G.COM, the MAC layer shown in End2End, and the RF-frontends built in the Real100G.RF, which leads to a complete "100 Gbps demonstrator".