Abstract-This paper presents a nonlinear Q-switching impedance (NLQSI) technique for picosecond pulse radiation in silicon. A prototype chip is designed with four NLQSI-based impulse generation channels, which can produce picosecond pulses with a reconfigurable amplitude. An on-chip impulsecoupling scheme combines the outputs from four channels and delivers the combined signal to an on-chip antenna. In addition, an asynchronous optical-sampling measurement system is used to characterize the radiated picosecond pulses in the time domain. The prototype chip can radiate 4-ps pulses with an SNR > 1 bandwidth of 161 GHz. Furthermore, pulse amplitude modulation is experimentally demonstrated. The prototype chip is fabricated in a 130-nm SiGe BiCMOS process technology with a die area of 1 mm 2 .
I. INTRODUCTION
T RADITIONALLY, signal generation in the millimeterwave (mm-wave) and terahertz (THz) regimes is performed using continuous-wave (CW) or pulse techniques [1] , [2] . As shown in Fig. 1 , CW sources produce long pulses in time with a small bandwidth (BW), while pulse sources generate short pulses in time with a large BW. Recently, mm-wave and THz signal have been used in applications, such as high-resolution 3-D imaging [3] , [4] , biomedical sensing [5] , high-speed wireless communication links [6] , and broadband spectroscopy [7] .
Over the past few decades, photonic techniques have been used for signal generation in mm-wave and THz regimes. These techniques include fsec-laser gated photoconductive antennas (PCAs), photomixing, and quantum cascade lasers [8] - [10] . In photonic-based solutions, laser sources are required, which makes the whole system bulky and expensive. Recently, fully electronic laser-free sources have been reported that produce CW signals in the mm-wave and THz regimes [11] - [18] . These sources are based on high-speed transistors that can achieve f t (current gain cutoff frequency) of above 300 GHz and f max (maximum oscillation frequency) Manuscript received July 1, 2016; revised September 26, 2016 ; accepted October 25, 2016 . Date of publication November 18, 2016; date of current version December 7, 2016 . This work was supported in part by the National Science Foundation and in part by the W. H. Keck Foundation. An earlier version of this paper was presented at the 2016 IEEE MTT-S International Microwave Symposium, San Francisco, CA, USA, May 22-27, 2016 .
The authors are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA (e-mail: peiyu.chen@rice.edu; assefzadeh@gmail.com; aydin.babakhani@rice.edu).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMTT.2016.2623700 exceeding 400 GHz [11] , [19] . The majority of the work published on mm-wave and THz sources is based on CW techniques that are narrowband and contain only few frequency tones. Compared with these methods, far less research has been performed to produce broadband picosecond pulses in the mm-wave and THz regimes. It is important to note that the picosecond pulses discussed in this paper should have a large frequency spectrum, above 100 GHz. Pulseshaping techniques, based on mixing harmonics and nonlinear transmission lines, are beyond this paper's scope [20] - [22] . In 2014, a silicon-based digital-to-impulse (D2I) architecture was reported that radiated sub-10 ps impulses [23] , [24] . This technique was based on switching a dc flowing through a broadband on-chip antenna, which results in converting a stored magnetic energy into a radiated picosecond pulse. In D2I, a pulse-matching circuitry was used to reduce the duration of the radiated pulse. Since then, arrays of D2I radiators have been published in the pursuit of generating short pulses with a large radiated power based on spatial power combining [25] , [26] .
Apart from the source technology, the accurate detection of picosecond pulses via electronic methods has been a major challenge for the circuits and microwave community. Over the past 20 years, researchers have been using electronic oscilloscopes to perform time-domain characterizations of picosecond pulses that are generated by silicon chips [27] . Unfortunately, the fastest off-the-shelf real-time oscilloscope has a BW of 100 GHz and a rise time of 4.5 ps (Teledyne Lecroy LabMaster 10 Zi-A Oscilloscope), which is not good enough to characterize sub-10 ps pulses [28] . In addition, in these methods, broadband calibrations of antennas/probes, waveguides, coaxial connectors, and coaxial cables are required, which makes the measurement process complicated and time-consuming [27] . Step response of a parallel RLC tank.
In this paper, a novel circuit based on nonlinear Q-switching impedance (NLQSI) is proposed that produces picosecond pulses with a reconfigurable amplitude. A prototype chip is implemented that contains four NLQSI-based impulse generation channels and an on-chip impulse-coupling scheme, which combines the outputs of the four channels and delivers the combined pulses to an on-chip antenna. The chip is characterized with an asynchronous optical sampling (ASOPS) system that provides a measurement BW up to 4 THz. The time-domain measurements demonstrate that the prototype chip radiates 4 ps pulses with an SNR > 1 BW of 161 GHz. In addition, pulse amplitude modulation is experimentally demonstrated. The prototype chip is fabricated in a 130 nm SiGe BiCMOS process technology and occupies a die area of 1 mm 2 .
This paper is an extension of [29] with further simulation analysis and measurement results. The remainder of the text is organized as follows. NLQSI technique is discussed in Section II, while Section III presents the design of the prototype NLQSI-based picosecond pulse radiator chip. The characterization results of the prototype chip are demonstrated in Section IV, including a brief introduction to the ASOPS timedomain measurement system used in this paper for picosecond pulses, as well as the measured NLQSI-induced tunable pulse amplitudes and measured pulse amplitude modulation. Conclusions are offered in Section V.
II. NONLINEAR Q-SWITCHING IMPEDANCE TECHNIQUE
The idea of NLQSI originates from the step responses of a parallel RLC tank. In order to understand its mechanism, in this section, the step response of a parallel RLC tank is briefly reviewed and then the method of NLQSI is introduced.
A. Step Response of a Parallel RLC Tank
A parallel RLC tank, as shown in Fig. 2 , has two types of step responses: overdamped/critically damped response and underdamped response. 1 The response of the tank depends on its damping rate and resonant frequency, which are defined as follows:
(1)
1 A loss-less response is excluded from this discussion. where R, L, and C are the resistance, inductance, and capacitance of the tank. Equivalently, the step response of a parallel RLC tank can also be determined by investigating the quality factor of the tank (Q tank ), which represents the ratio of the stored energy to the energy dissipated in a circuit. Q tank can be expressed as follows:
Q tank 2π maximum energy stored total energy lost per cycle at resonance
To simplify the analysis, the current source is considered to have an ideal step response. 1) Underdamped Response (α 2 < ω 2 0 or Q tank > 0.5): When the damping rate α is smaller than the resonant frequency ω 0 , equivalently, Q tank is larger than 0.5, the tank displays underdamped responses. In this case, the voltage across the tank, V (t), and the current flowing through the inductor L, I L (t), both have exponentially decaying oscillatory behaviors. Analytical expressions of V (t) and I L (t) can be obtained by solving the following differential equation:
with initial conditions of
where I 0 is the steady-state current flowing through the tank before switching off. The underdamped response of the parallel RLC tank can be expressed as
where ω d is the oscillation frequency, defined as Therefore, the oscillation period is
From (8) and (9), we can make an observation that the oscillation frequency increases by reducing the damping rate, which extends the timing duration of the decay.
2) Overdamped/Critically Damped Response (α 2 ≥ ω 2 0 or Q tank ≤ 0.5): When the damping rate α is larger than or equal to the resonant frequency ω 0 , equivalently, Q tank is smaller than or equals 0.5, the tank presents overdamped or critically damped responses. The current flowing through the inductor L, I L (t), decays to zero exponentially but does not produce a large voltage response V (t) across the tank. In this case, there is no oscillatory behavior in either the current or voltage responses.
B. Switching From Underdamped to Overdamped to Produce Short Pulses
To produce a large voltage response in a short time with minimal ringing, a parallel RLC tank is designed such that it can switch from an underdamped response to an overdamped response by switching the load resistance of the parallel tank, equivalently, switching the quality factor of the tank (Q tank ). In this case, when the falling edge of the excitation current is arrived, the tank behaves in an underdamped response, producing a half-cycle oscillation, which is the desired oscillation shown in Fig. 3 . The duration of the first half-cycle
, which can be as short as several picoseconds, according to simulations. The ringing, "Undesired Oscillations," also shown in Fig. 3 , needs to be eliminated by forcing the tank to switch to and stay in an overdamped response by reducing the load resistance R such that α 2 > ω 2 0 . In this switching event, Q tank is decreased from Q U that sustains an underdamped response to Q O that supports an overdamped response. Therefore, there exists a Q-switching event around the transition time t 0 , which is shown in Fig. 3 . Fig. 4(a) shows the reported NLQSI block for picosecond pulse generations. A bipolar transistor Q 1 acts as a current source. When the base node of the transistor Q 1 is applied to 0 V by a voltage falling edge, the parallel tank starts to behave the step responses. A bipolar transistor Q 2 monitors the output voltage V out (t) and autonomously changes the quality factor of the parallel RLC tank (Q tank ). Parasitic capacitance and resistance of transistors Q 1 and Q 2 play an important role in the performance of this NLQSI block. During the step responses, transistors Q 1 and Q 2 can be simplified by a parallel combination of R Q1 and C Q1 and that of R Q2 and C Q2 , respectively, as shown in Fig. 4(a) . The proposed NLQSI block is nonlinear because these parasitic elements, especially R Q2 and C Q2 , vary significantly depending on the output voltage V out (t). Fig. 4(b) demonstrates the nonlinearity of these parasitic elements when V out is swept from 0.5 to 3 V, V cc is 1.3 V, and V bias is 2.5 V. The values of V cc and V bias are chosen for the proper operation of NLQSI blocks, which will be discussed later. During the step responses, transistor Q 1 is OFF because its base node is set to 0 V. Therefore, transistor Q 1 has a little nonlinearity with V out . However, transistor Q 2 turns OFF when V out is larger than V bias − V BE(ON),Q2 , which leads to the fact that transistor Q 2 dominantly contributes the nonlinearity of the NLQSI block and, therefore, transistor Q 2 is crucial in shaping the generated ultrashort pulses.
C. NLQSI Block

1) Overview:
2) Q-Switching Mechanism: The Q-switching mechanism of the NLQSI block is as follows [ Fig. 5(a) ]: in the steady state, a dc flows through the inductor L and V out equals V cc . The tank is designed to have an overdamped response by biasing the transistor Q 2 to be ON. Stage I: When the current falling edge arrives, the tank behaves in an overdamped response, increasing V out , and consequently, transistor Q 2 is turned OFF and Q tank is increased so that it is larger than 0.5. Therefore, the first Q-switching event is triggered at the transition time t 1 , and the tank behaves in an underdamped regime (Q tank > 0.5). Stage II: In the underdamped response, the "desired oscillation" is generated until V out drops below a certain value, which is when transistor Q 2 turns ON. The small ON resistance of transistor Q 2 (R Q2 ) reduces Q tank . Q tank should be decreased enough to force the tank to switch to an overdamped response (Q tank < 0.5). Therefore, the second Q-switching event happens at the transition time t 2 . Stage III: The tank stays in the overdamped response, and V out decays exponentially to V cc , resulting in eliminating the undesired ringing. Eventually, the tank returns back to the steady state, where V out stays at V cc . The time-domain evolution of Q tank (t) is illustrated in Fig. 5 (b). It will be shown later that the shape of the generated pulse is dependent on Q tank (t), which is affected by the bias voltage of transistor Q 2 (V bias ), emitter lengths of both the transistors Q 1 and Q 2 , and the inductance of L.
3) Effects of V bias of Transistor Q 2 : V bias of transistor Q 2 has significant effects on the performance of the NLQSI block. Fig. 6 shows the simulation results when V bias is swept from 1.5 to 2.5 V. In this simulation, the emitter lengths of transistors Q 1 and Q 2 are 15 and 2.5 μm, respectively.
As shown in Fig. 6 (a), when V bias remains below 2.1 V, which is smaller than the sum of V cc and V BE(ON),Q2 , undesired Q O is the quality factor that sustains the overdamped response. Q U is the quality factor that sustains the underdamped response. oscillations do not disappear after the transition time t 2 . This is because transistor Q 2 turns OFF again and the tank stays in the underdamped condition, which is validated by studying the simulated time-domain evolution of Q tank (t), as shown in Fig. 6 (c) (1) (2) (3) . For the situations where V bias is smaller than 2.1 V, Q tank is always larger than 0.5, meaning that the tank stays in an underdamped response with a time-varying damping rate. At around the transition time t 2 , Q tank is reduced along with the decreasing V out , but afterward it returns to a high level that sustains the ringing (undesired oscillations). Another observation from the simulation is that the amplitudes of the ringing decrease with increasing V bias . This is due to the fact that, with increasing V bias , the tank has a smaller Q tank , equivalently a larger damping rate α, for a longer period of time, as shown in Fig. 6 (c) (1) (2) (3) . The amplitude of the ringing depends on the minimum value that Q tank can reach after the transition time t 2 . In all these cases, transistor Q 2 adds a very large OFF resistance to the RLC tank, and consequently, the tank has an almost identical Q tank before the transition time t 2 , which explains why the peak amplitudes remain almost constant. In summary, in situations where V bias is smaller than the sum of V cc and V BE(ON),Q2 , the underdamped response dominates the entire transient response.
When V bias is larger than the sum of V cc and V BE(ON),Q2 , for example, V bias is in the range from 2.3 to 2.5 V, transistor Q 2 turns ON as long as V out is smaller than or equals V cc . Therefore, transistor Q 2 keeps turning ON after the transition time t 2 and forces the tank to remain in the overdamped condition, eliminating undesired ringing, as shown in Fig. 6 (b). The simulated Q tank (t) also reflects that the designed Q-switching mechanism is achieved when V bias is larger than V cc + V BE(ON),Q2 [Fig. 6(b) and (c) (5-7)]. In these situations, transistor Q 2 turns ON completely in the steady state and Q tank is smaller than 0.5. When the current falling edge arrives, the tank is in the overdamped response (Stage I), where V out increases and turns OFF Q 2 , boosting Q tank to be larger than 0.5. The tank enters Stage II and has an underdamped response until around the transition time t 2 , where V out is smaller than V bias −V BE(ON),Q2 and transistor Q 2 is turned ON again. At this moment, Q tank becomes smaller than 0.5, and the tank performs an overdamped response, switching from Stages II to III. Unlike the previously discussed cases (V bias < 2.1 V), after the transition time t 2 , Q tank is always smaller than 0.5 and, therefore, the tank stays in the overdamped response (Stage III), completely eliminating the ringing. Therefore, a clean pulse is generated, compared with the previously discussed cases. With increasing V bias , the first Q-switching event happens later and the tank stays in the overdamped response (stage I) for a longer period of time, causing more energy loss and therefore, reducing the peak amplitude of the generated pulse. The comparison between the Q tank (t) in these two cases (V bias = 1.5 and 2.5 V) is plotted in Fig. 6(c) (8) .
When V bias is 2.1 V, as shown in Fig. 6 (c) (4), the simulated Q tank (t) clearly reflects that this bias condition is in the transition to the proper Q-switching mechanism. In sum, when V bias is larger than the sum of V cc and V BE(ON),Q2 , transistor Q 2 is completely ON in the steady state, and the NLQSI block has a desired Q-switching mechanism to generate a clean picosecond pulse with tunable peak amplitude.
4) Choose Emitter Lengths of Transistors Q 1 and Q 2 :
The emitter lengths of transistors Q 1 and Q 2 have crucial effects on the performance of the NLQSI block. Under the proper bias condition (V bias = 2.3 V), if the emitter length of transistor Q 1 , l e,Q1 , is increased, the steady-state current flowing through the RLC tank is increased as well. Fig. 7(a) shows the simulation results where l e,Q1 is swept from 2 to 15 μm. A step voltage is applied to the base of transistor Q 1 to mimic an ideal predriver stage. As expected, a larger transistor Q 1 generates a pulse with a greater peak voltage but the duration of the pulse becomes larger as well. In this simulation, by choosing a length of 15 μm for transistor Q 1 , the full-widthat-half-maximum (FWHM) of the generated pulse becomes 2.7 ps. Fig. 7(b) demonstrates the simulated Q tank (t) with different l e,Q1 . Except for the case where l e,Q1 is 2 μm, the NLQSI block has the proper Q-switching mechanism. With a larger transistor Q 1 , the tank is in the underdamped response 
t). (c) Simulated damping rate α(t).
(Stage II) for a longer period of time, causing the duration of the pulse to be larger. In addition, when l e,Q1 is increased, the tank switches from Stage I (overdamped response) to Stage II (underdamped response) earlier, causing less energy loss, and meanwhile, stronger energy
is initially stored in the tank. As a result, the peak amplitude of the generated pulse is increased with l e,Q1 . In practice, a larger transistor Q 1 slows down the switching speed due to its large base capacitance, but this issue can be mitigated by designing a proper predriver stage with strong currentdischarging capability. In this paper, transistor Q 1 is designed to have an emitter length of 15 μm, which is chosen as a compromise between the pulsewidth (FWHM) and the pulse amplitude. For transistor Q 2 , when its emitter length l e,Q2 is increased, transistor Q 2 adds more parasitic capacitance and less parasitic resistance to the tank. Fig. 8 shows the simulated effects of l e,Q2 on the performance of NLQSI block. With a larger transistor Q 2 , the NLQSI block generates a wider pulse with a smaller peak amplitude, as shown in Fig. 8(a) . Fig. 8(b) demonstrates that, with V bias = 2.3 V, the NLQSI blocks with l e,Q2 ranging from 2.5 to 15 μm, all have the designed Q-switching mechanism. Furthermore, a larger transistor Q 2 produces a larger Q tank in Stage I (overdamped response) and a smaller Q tank in Stage II (underdamped response). The relationship between the pulse peak amplitude and l e,Q2 can be explained as follows. First, Fig. 8(c) shows that a larger transistor Q 2 causes more energy loss during the pulse generation period (Stages I and II). Second, with a larger transistor Q 2 turning ON in the steady state, less dc flows through the inductor L, and consequently, less initial energy
is stored in the tank. As a result, these two effects cause that the peak and dip amplitudes of the generated pulse are decreased with a larger transistor Q 2 . II) is decreased with l e,Q2 , which explains the observation in Fig. 8(a) that the pulsewidth of the generated pulse is increased with l e,Q2 . In this paper, transistor Q 2 is designed to have an emitter length of 2.5 μm to enable Q-switching without degrading the pulse amplitude.
5) Inductance L: According to (8) and (9), the peak amplitude and the pulsewidth of the generated pulse are dependent on the inductance of L. Fig. 9 demonstrates the simulated effects of sweeping L from 30 to 60 pH with the proper bias condition of V bias = 2.3 V. As expected, with the increasing L, the generated pulse has a larger peak amplitude but also a greater pulsewidth [ Fig. 9(a) ]. The effects of the inductor L can also be investigated in the viewpoint of Q tank (t). With the proper bias condition, the NLQSI blocks have the desired Q-switching mechanism. According to (1) and (3), a larger inductor L reduces Q tank without changing the damping rate α. Fig. 9 (c) and (d) validates this point and shows that a larger inductor L tends to provide the tank with a longer underdamped response (Stage II), which produces less energy loss in the pulse generation period. In addition, a larger inductor L stores more initial energy (E = (1/2)L I 2 ) in the steady state, and consequently, the peak and dip amplitudes of the generated pulse are increased with a larger inductor L. Equation (2) also indicates that, with a larger inductor L, the resonant frequency ω 0 of the tank is reduced. Because the damping rate α is independent of the inductance of L, according to (9) , the oscillation frequency in the underdamped response is decreased with L, resulting in the increased pulsewidth.
The design strategy here is to maximize the SNR > 1 BW, because a broad detectable BW is crucial for spectroscopy and imaging applications. As shown in Fig. 9(b) , the NLQSI block with a smaller inductor L produces picosecond pulses with a larger SNR > 1 BW. However, the actual measured SNR > 1 BW is also dependent on the sensitivity of the detector. If the peak amplitude of the generated pulse is very weak, the measured SNR > 1 BW will be smaller than the theoretical (simulated) value. Furthermore, practically, an inductor smaller than 35 pH is difficult to be implemented in the process used in this paper, because the parasitic effects of the neighboring routings are significant. Therefore, based on the above considerations, in this paper, the inductance of L is set at 40 pH.
III. CIRCUIT AND ANTENNA DETAILS
The NLQSI technique reported in this paper is implemented in a 130-nm SiGe BiCMOS process technology. In this section, first, an overview of the system architecture is presented. Second, an impulse-coupling scheme is introduced to resolve the issues caused by the transmission line effects in the NLQSI circuit. Third, the design details of the individual impulse generation channel and the four-way impulse combiner are presented. In addition, bias routings in the NLQSI blocks are specially designed. Finally, the design of the broadband on-chip antenna is discussed. Fig. 10 shows the architecture of the radiating chip, which consists of four impulse generation channels. A trigger signal is fed to the chip and distributed to these channels through an H-tree distribution network. Each channel converts the input trigger into a picosecond pulse train with a repetition rate the same as the trigger. An impulse-coupling scheme is introduced to resolve the issues induced by the transmission line effects in implementing NLQSI blocks. Based on this scheme, a four-way microstrip-based impulse combiner is designed to combine the generated picosecond pulses from the four channels and feed the on-chip antenna. The four-way impulse combiner also enables pulse amplitude modulation by controlling the individual impulse generation channels. A highly resistive silicon lens is attached to the backside of the radiator chip in order to increase the radiation efficiency and directivity of the on-chip antenna.
A. System Architecture
B. Impulse-Coupling Scheme 1) Transmission Line Effects in Implementing NLQSI Blocks:
The chip uses on-chip metal routings, wirebonds, and PCB traces for biasing and power supply, as shown in Fig. 11 . The transmission line effects of these wirings need to be carefully considered in the transient analysis. For stability purposes, De-Qing blocks, consisting of a capacitor and a resistor in series, are placed along the on-chip metal routings. As the supply current varies in a short period of time, the inductance of the metal routings adds a strong low-frequency ringing to the picosecond pulse at V out (t) (Fig. 11) . Although the on-chip antenna is optimized for high-frequency radiation, the strong low-frequency ringing will be still radiated, which may cause interference with other low-frequency channels. Therefore, it is important to eliminate the low-frequency ringing before feeding the impulse to the antenna.
2) Impulse-Coupling Block: In this paper, an impulsecoupling block is introduced to isolate the input of the onchip antenna from the strong low-frequency ringing caused by the transmission line effects of the supply wirings. Fig. 12(a) shows the NLQSI circuit with the impulse-coupling block, which is based on microstrip inductive coupling, consisting of two adjacent 50 μm long metal lines fabricated on the top metal layer AM, as shown in Fig. 12(b) . Inductor L 1 is implemented by a metal line with a width of 4 μm, while inductor L 2 is implemented by another line with a width of 5 μm. The metal layer M3 is used as a ground plane. In order to increase the mutual coupling between the two metal lines, the spacing between them is set to 2.8 μm, which is the minimum value allowed by the DRC rules for the process technology.
A simplified equivalent circuit of the designed inductive coupler is shown in Fig. 12(c) . The inductive mutual coupling dominates the coupling mechanism, as the parasitic capacitors between L 1 and L 2 are very small (less than 2 fF). The parasitic capacitors associated with the ground plane are omitted here. EM simulations show that the designed inductive coupler has a broadband performance. As shown in Fig. 12(c) , L 1 , L 2 , and the inductive mutual coupling coefficient k are flat from 10 to 200 GHz. At 100 GHz, L 1 is 40 pH, L 2 is 37 pH, and k is 0.45. 2 To perform impulse coupling in the NLQSI circuit, the inductive coupler is set to the following configuration: the collector node of transistor Q 1 is connected to Port 1, Port 2 connects with the V cc of 1.3 V, the connection of Port 3 is changeable, which will be discussed later, and Port 4 connects with the load. The inductive mutual coupling and the small parasitic capacitors between L 1 and L 2 filter out the lowfrequency ringing in the forward-coupling direction, which is from Ports 1 to 4. Fig. 12(d) shows the simulated output voltage of the NLQSI block with the introduced impulsecoupling block when Port 3 is grounded. At the antenna input node, the low-frequency ringing is filtered out and a clean impulse-like waveform is fed to the on-chip antenna. Compared with the generated impulse without the impulsecoupling scheme, as shown in Fig. 11 , the peak amplitude of the coupled pulse at Port 4 is reduced by around 50% as expected. In addition, there are very few distortion effects induced by this impulse coupler. The small downward pulses at 0 and 1 ns, shown in Fig. 12(d) , are produced by the rising edges of the converted 1 GHz square wave, which switch ON transistor Q 1 . These "parasitic" pulses can be eliminated by turning ON the transistor Q 1 using a waveform with a slow rising edge, which, however, may limit the repetition rate of the generated pulse train.
C. Impulse Generation Channel
The schematic of an impulse generation channel is shown in Fig. 13(a) . The input trigger is fed into a digital inverter chain through a one-to-four H-Tree distribution network. The input impedance of the digital inverter chain is designed to be 200 , in order to reduce the reflections. The digital inverter chain converts the input sinusoidal trigger into a square wave, which switches transistor Q 1 in the NLQSI block. In order to switch off transistor Q 1 quickly, an 18 μm bipolar transistor Q 3 is added to provide an additional discharging path from the base node of transistor Q 1 . The maximum trigger frequency is 1 GHz, which is limited by the BW of the digital inverter chain. V digital of each impulse generation channel needs to be optimized, which will be discussed later. Fig. 13(b) shows the simulated transient waveform applied to the base node of transistor Q 1 , which has a falling time of 13.6 ps.
D. Four-Way Impulse Combiner
The four-way impulse combiner designed in this paper consists of four impulse couplers stacked in series, as shown in Fig. 14(a) . The spacing between the neighboring channels is identical. The second winding of the combiner connects with the on-chip antenna at one end and is grounded at the other. The collector nodes of transistor Q 1 in the four channels are connected to Ports 1-4, respectively. The routings in the NLQSI block in each channel are included in the EM simulations of the impulse combiner. Fig. 14(b) illustrates the design methodology of the impulse combiner: the impulse combiner structure, with routings in the NLQSI blocks, is first simulated in a method of moments-based EM simulator (Mentor Graphics HyperLynx Full-Wave Solver) in a wide frequency range from 1 to 400 GHz. The extracted S-parameters are then imported into Cadence Virtuoso as an N-port S-parameter box, which is connected with Assura-RC-extracted circuits of the impulse generation channels and an S-parameter box of the on-chip antenna. To investigate the performance of the designed impulse combiner, transient simulations are performed in Cadence Virtuoso to examine the combined pulse delivered to the on-chip antenna.
By switching on only one impulse generation channel at a time, the transient voltage at the antenna input is simulated and shown in Fig. 15(a) . Compared with the simulation result shown in Fig. 12(d) , the ringing effect in the coupled pulse is increased a little, which is mainly caused by the different port connection configuration from that in Fig. 12(a) , equivalently, the different load impedance seen by the NLQSI block. The coupled pulses from Channels 1 and 2 are almost identical and their arrival times to the antenna's input port are equal. However, the pulses from Channels 3 and 4 have smaller peak amplitudes and arrive in different times at the antenna node. This is because the impulse combiner is not fully electromagnetically symmetrical among all the four channels. Channels 1 and 2 see a similar load impedance from the impulse combiner. The value of this impedance varies slightly for Channels 3 and 4. 3 Timing mismatch is more detrimental to the performance of the pulse combiner than the mismatch in the peak amplitudes because it will distort the pulse shape of the combined signal. In this design, the timing mismatch can be compensated by tuning the propagation delay of the trigger in each channel with varying the supply voltage V digital of the digital inverter chain in each channel. The optimized V digital values are: V digital,CH1 is 1.2 V, V digital,CH2 is 1.2 V, V digital,CH3 is 1.21 V, and V digital,CH4 is 1.19 V. Fig. 15(b) presents the aligned coupled pulses after adjusting the supply voltages of the digital inverter chains. The differences in their arrival timings are within 0.5 ps. With different ON-OFF combinations of the four channels, the combined pulse can have 16 peak amplitudes, which can be used for pulse amplitude modulation purposes. 4 The simulated peak amplitudes in these 16 combinations are shown in Fig. 16 .
E. Bias Routings in the NLQSI Block
Thin and long on-chip bias routings have significant parasitic effects in high frequency. These effects cause the bias nodes of circuit blocks to be no longer an ideal ground. To mitigate this problem, in this paper, two wide metal planes on M1 and M2 layers are used as V bias and V cc planes, respectively, which are placed directly beneath the ground plane (M3) of the impulse combiner, as shown in Fig. 17(a) . The advantages of this design are as follows. 1) Self-inductance of a large metal plane is much smaller than that of a long and thin metal line. 2) Large distributed capacitance is formed between these layers and the ground plane. As a result, the two wide metal planes with the ground plane can be considered as two transmission lines with a small Z 0 ( √ L/C). With a modest length, the designed bias routing planes present a broadband low impedance at the bias nodes. Fig. 17(b) shows the impedance of the bias routings at the V cc and V bias nodes of NLQSI blocks from 1 to 200 GHz.
3) The ground plane on M3 isolates the impulse combiner from the bias routings, eliminating the undesired mutual couplings.
F. On-Chip Antenna
In this paper, a single triangular metal sheet on the top metal layer (AM), with a slot on the ground plane (M3) is designed to couple the radiation to the silicon substrate. The triangular shape is used to support broadband radiation [30] . For assembly purposes, a silicon slab is placed between the silicon chip and the silicon lens, as shown in Fig. 18(a) . All these components are included in the EM simulations during the design phase; the geometric details are noted in Fig. 18(b) . An FEM-based 3-D EM simulator, HFSS v13, is used to simulate the antenna in the frequency domain. The on-chip antenna has a relatively flat input impedance from 50 to 200 GHz [ Fig. 19(a) ]. The simulated radiation efficiency is shown in Fig. 19(b) . It will be shown later that the measured radiated picosecond pulses have a peak frequency component around 54 GHz. At this peak frequency, the designed antenna has a 19% simulated radiation efficiency and a 16 dBi simulated peak directivity. The designed antenna has greater radiation efficiencies at higher frequencies, which compensates for the weak high-frequency components of the generated picosecond pulses. Simulated 2-D and 3-D radiation gain patterns at 54 GHz are presented in Fig. 19(c) . The main lobe in the E-plane pattern is tilted, because the on-chip antenna is not symmetrical in the E-plane. A finite-difference time domain-based 3-D EM simulator, CST Microwave Studio 2015, is used to simulate the time-domain E-field waveforms of the radiated picosecond pulses. An ideal far-field E-field probe is used in the simulation to capture the radiated pulse waveform. As shown in Fig. 20 , at the straight angle (θ = 180°), the radiated waveform has a FWHM of 3.6 ps when V bias is 2.3 V. The peak amplitude of the radiated pulse is reduced by 38% when V bias is increased to 2.5 V. Compared with the simulated excitation signal at the antenna's input node, shown in Fig. 15 , the additional ringing in the radiated pulse waveform is caused by the antenna's resonances [31] .
IV. CHIP CHARACTERIZATION
Conventionally, electronic oscilloscopes (real-time sampling or equivalent-time sampling) with antennas or probes are used to sample short pulses in time domain. As discussed in Section I, this method has major drawbacks. First, current off-the-shelf electronic oscilloscopes have the shortest rising time of 4.5 ps [32] , which is not sufficiently fast to measure picosecond pulses accurately. Second, in this method, the picosecond pulses received by antennas/probes have to be transferred to electronic oscilloscopes through waveguides, coaxial cables, and coaxial adapters. Therefore, these blocks need to be accurately de-embedded by performing a broadband calibration, which is complicated, time-consuming, and prone to error [27] .
In this paper, we built a time-domain measurement system based on ASOPS for characterizing the radiated picosecond pulses by the designed silicon chip. ASOPS has been historically introduced in the THz time-domain spectroscopy (THz-TDS), where ultrashort EM pulses are generated by the ASOPS system and are used to perform spectroscopy analysis of passive or active samples [33] . However, in this paper, the generated picosecond pulse is produced by the designed silicon chip rather than the ASOPS system, and this demands technical solutions to ensure that the sample (silicon chip) is synchronized with the ASOPS system. Additionally, in the conventional ASOPS-based THz-TDS, the repetition rate of the generated ultrashort pulse is close to the sampling rate. Instead, in this paper, the repetition rate (1 GHz) of the radiated picosecond pulse from the designed silicon chip is much higher than the sampling rate (50 MHz + 5 Hz) of the ASOSP system, which will be discussed later.
In this section, first, the ASOPS system is briefly reviewed. Second, the technical challenges of using this technique to characterize our chip are addressed. Finally, the measurement results are presented. Fig. 21(a) illustrates the schematic of an ASOPS system. Two femtosecond laser sources generate a pump beam and a probe beam, respectively. The pump beam excites a PCA emitter that radiates a THz pulse, while the probe beam excites a PCA detector that samples the received THz pulse. The sampled data is then transferred to data acquisition electronics. The repetition rates of these two beams ( f r1 and f r2 ) are slightly different, which enables the PCA detector to sample the entire pulse waveform quickly, as shown in Fig. 21(b) . The two femtosecond laser sources are controlled by two phase-locked loops that share a common frequency reference for frequency stability purposes. A typical rising time of a PCA detector is of the order of 100 fs [34] , [35] , which is fast enough to measure picosecond pulses. In addition, because the PCA detector samples the THz pulse right at the antenna, calibration requirements are relaxed significantly compared with the conventional method of using electronic oscilloscopes.
A. Overview of Asynchronous Optical Sampling
B. Measurement Setup for Characterizing the Prototype Chip in the Time Domain
In this paper, we used a commercial ASOPS system (TAS7500TS) from Advantest Corporation. It is capable of measuring a 380 fs THz pulse with an SNR > 1 BW of more than 4 THz [29] . The f r1 and f r2 of this system are 50 MHz and 50 MHz + 5 Hz, respectively. According to the working mechanism of ASOPS, the prototype chip needs to be synchronized with the pump femtosecond laser.
1) 50 MHz Synchronization Configuration:
To test this measurement technique, a straightforward synchronization configuration is first examined, as shown in Fig. 22(a) . A photodetector is used to convert the 50-MHz pump laser into an electrical trigger, which is fed to the prototype chip. A PCA detector is placed in the far-field region. With the four impulse generation channels ON, the measurement setup captures a 4 ps (FWHM) radiated pulse. Its normalized power spectrum is obtained by performing discrete Fourier transform on the time-domain waveform. The measured radiated pulse has a peak frequency component at 58 GHz, a 10-dB BW of 60 GHz, and an SNR > 1 BW of 161 GHz [ Fig. 22(b) ]. It is necessary to note that the commonly used relation between pulsewidth (T p ) and BW, which is BW = 2/T p , is not valid in this case, because the generated picosecond pulses are not obtained by modulating a sinusoidal carrier signal with a square wave.
2) Custom Synchronization Setup: One of the drawbacks of the TAS7500TS system is that the repetition rate of the pump femtosecond laser is fixed at 50 MHz. However, the prototype chip can radiate picosecond pulses with a repetition rate as high as 1 GHz. Therefore, another configuration is designed to generate an adjustable and synchronized trigger, as shown in Fig. 23(a) . In the synchronization circuitry, a broadband divide-by-five frequency divider is used to extract a 10-MHz sinusoidal signal from the 50-MHz pump femtosecond laser. Then, an RF signal generator is locked with the 10-MHz signal and used to generate synchronized triggers with tunable frequencies. RF filters and low noise amplifiers are used in the synchronization circuitry to achieve a low-noise locking with the RF signal generator. Meanwhile, a PCA detector is placed in the far-field region. With a 1-GHz trigger, this measurement setup captures a 4.8 ps (FWHM) radiated pulse. Similar to the simulated result (Fig. 20) , some ringing appears after the main pulse. The ringing before the main pulse is caused by the multiple round-trip reflections between the chip and the PCA detector. The maximum distance between them is 4 cm, limited by the low sensitivity of the PCA detector. Its normalized power spectrum has a peak frequency component at 54 GHz, a 10-dB BW of 53 GHz, and an SNR > 1 BW of 144 GHz [ Fig. 23(b) ]. Fig. 24(a) presents the radiated time-domain waveforms in different angles. This measurement shows that the pulse duration remains small in a wide range of angles. In Fig. 24(b) , the measured radiation patterns of the peak-to-peak amplitude of the radiated pulse waveform are slightly tilted compared with the simulation results, which is mainly caused by the tiny misalignment of the silicon lens.
It is important to note that the Advantest TAS7500TS ASOPS system is designed for THz-TDS applications, which measures the changes caused by the sample under test. Therefore, the PCA detector and its internal amplifiers are not fully calibrated for their gains and distortion effects. As a result, all the reported time-domain waveforms and spectrums are normalized and the difference between the measured pulsewidth of the radiated pulse and the simulated result is mainly due to the nonideality of the PCA detector. Radiated power characterizations cannot be performed using this system.
C. Demonstrations of NLQSI Effects on Pulse Amplitudes and Pulse Amplitude Modulation
Using the measurement setup shown in Fig. 23(a) , pulse amplitude modulation is demonstrated. Fig. 25 shows the results of this measurement. As discussed in Section II, when the bias voltage of transistor Q 2 is set to 2.5 V, the peak of the measured radiated picosecond pulse reduces by 35% compared with that when V bias is 2.3 V. This relative change 5 is close to the simulated value (38%) shown in Fig. 20 .
The same setup is also used to measure pulse amplitude modulation by turning ON/OFF the impulse generation channels. Fig. 26 presents the measured peak amplitudes of the radiated combined pulses in all 16 combinations. Due to the 5 Due to the non-ideality of the PCA detector and its internal amplifiers, it is accurate to perform relative change value comparisons. limited sensitivity of the measurement setup, the combined pulses of the first two combinations are too weak to be detected. The differences between the measured and simulated results are due to the nonidealities of the on-chip antenna and the PCA detector.
D. EIRP and Frequency-Domain Radiation Pattern
To characterize the equivalent isotropically radiated power (EIRP) spectrum, a frequency-domain measurement setup is utilized, as shown in Fig. 27(a) . Four OML harmonic mixers and four standard-gain horn antennas are used to measure EIRP from 50 to 220 GHz. The RF signal generator provides a 1-GHz trigger for the prototype chip. Fig. 27(b) shows the measured average EIRP spectrum of the radiated picosecond pulse train with a 1-ns period. It has a peak frequency component at 54 GHz with an average EIRP value of −9.4 dBm, which is close to the simulated value. The measured and simulated EIRP spectrums have similar decay trends. However, in the mm-wave regime, the simulated EIRP values are larger than the measured results, which is mainly caused by the inaccurate extrapolated transistor models. The highest detectable frequency component is at 197 GHz, which is limited by the sensitivity of the measurement setup. Meanwhile, the peak frequency in the measured EIRP spectrum is identical to that obtained using the ASOPS system, as shown in Fig. 23 . Compared with the measured power spectrum in Fig. 23 , the decay rates after the peak frequency components differ in the two figures, which is due to the gain effects of the PCA detector and internal amplifiers in the ASOPS system.
Using the setup in Fig. 27(a) , the radiation patterns at the peak frequency of 54 GHz are measured, as shown in Fig. 28 . The measured main lobes are slightly tilted compared with the simulation results, which is due to the chip package and the tiny misalignment of the silicon lens.
Finally, the chip micrograph is shown in Fig. 29 . The chip is fabricated in a 130-nm SiGe BiCMOS process and occupies a die area of 1 mm × 1 mm. The chip consumes a dc power of 170 mW.
V. CONCLUSION
In this paper, an NLQSI technique is reported for the generation of picosecond pulses with tunable peak amplitudes. A prototype chip is implemented that comprises four NLQSI-based impulse generation channels, an on-chip impulse combiner, and an on-chip antenna. In addition, an on-chip impulse-coupling scheme is introduced to eliminate the undesired ringing caused by the transmission line effects of the supply routings. The on-chip impulse combiner provides a single-chip solution for radiating picosecond pulses with amplitude modulation capability. For the first time, an ASOPS system is used to characterize the picosecond pulses directly radiated by a silicon chip in the time domain. Based on the measurements, the prototype chip radiates 4 ps pulses with an SNR > 1 BW of 161 GHz. The performance of the chip is compared with state-of-the-art picosecond impulse radiators in Table I. 
