Abstract-Real-time analog multiplication of two signals is one of the most important operations in analog signal processing. The multiplier is used not only as a computational building block but also as a programming element in systems such as filters, neural networks, and as mixers and modulators in a communication system. Although high performance bipolar junction transistor multipliers have been available for some time, the CMOS multiplier implementation is still a challenging subject especially for low-voltage and low-power circuit design. Despite the large number of papers proposing new MOS multiplier structures, they can be roughly grouped into a few categories. This tutorial provides a complete survey of CMOS multipliers, presents a unified generation of multiplier architectures, and proposes the most recommended MOS multiplier structure. This tutorial could serve as a starting reference point (and metric) for comparison of new CMOS multiplier circuit configurations. An illustrative CMOS chip prototype verifying theoretical results is presented.
I. INTRODUCTION

M ULTIPLIERS perform linear products of two signals
and yielding an output where is a multiplication constant with suitable dimension. Multipliers are often categorized as single-quadrant ( and are unipolar), two-quadrant (where or can be bipolar), and four-quadrant multipliers (where both and can be bipolar). Noise and bandwidth are often not optimized for multipliers. Modulator and mixer are particular cases of multipliers that are designed with noise and frequency constraints. The history of the analog multipliers is originated from its use as a mixer and as an amplitude modulator which involves a multiplication of two signals. The basic idea of the multiplier implementation is illustrated in Fig. 1 . Two signals, and , are applied to a nonlinear device, which can be characterized by a highorder polynomial function. This polynomial function generates terms like and many others besides the desired . Then it is required to cancel the undesired components. This is accomplished by a cancellation circuit configuration.
A multiplier could be realized using programmable transconductance components. Consider the conceptual transconductance amplifier of Fig. 2(a) , where the output current Manuscript received April 11, 1997 ; revised August 14, 1998 . This work was supported in part by the Mixed-Signal Group, Texas Instruments. This paper was recommended by Associate Editor F. Larsen.
G. Han is with the Department of Electronic Engineering, Yonsei University, Seoul, Korea.
E. Sánchez-Sincencio is with the Department of Electrical Engineering, Texas A&M University, College Station, TX 77843-3128 USA (e-mail: sanchez@ee.tamu.edu).
Publisher Item Identifier S 1057-7130(98)09978-9. is simply given by (1) where (2a)
For a bipolar transconductor, becomes
where is the thermal voltage . Next, a small signal is added to the bias current as shown in Fig. 2(b) . The second input signal can be converted into a current, , as illustrated in Fig. 2(c) . Then, the output current yields (3a) (3b) or (3c) Thus, represents the multiplication of two signals and and an unwanted component . This component can be eliminated as shown in Fig. 2(d) . Better cancellation is achieved when the third transconductor becomes a fully differential transconductor, and and are fully differential inputs as illustrated in Fig. 2(e) . (4) This is the basic operation principle of a Gilbert cell [1] , [2] . Operational transconductance amplifier (OTA)-based implementations are reported in [3] - [4] . The connection to the Gilbert cell can be seen by substituting the transconductors in Fig. 2 (e) by bipolar junction transistor (BJT) differential pairs.
As the digital technology dominates in modern electronics, analog circuits are required to share the same standard CMOS process for low-cost fabrication. Thus, the popular BJT Gilbert Cell is not suitable in a standard digital process, and designers must address low power supply voltage requirements. One problem that circuit designers often encounter is how to select the best multiplier architecture for their applications. Unfortunately, designers who propose multipliers in the literature often do not make reference to or comparison to other multipliers. This lack of comparison causes the same basic multiplier architectures to be, from time to time, reported as "new" architectures.
In this tutorial, we cluster transconductance multipliers into eight types. They can be categorized into two groups based on its MOS operating region, linear [5] - [21] and saturation [22] - [51] . It should be emphasized that the fundamental multiplier circuit topology for many of the multipliers is the same. Besides the above major multiplier structures, multipliers operating in the weak inversion region [52] - [54] , dynamic multipliers for sampled signal system or neural networks [55] - [61] , voltage-current, and current-current multipliers [62] - [64] have been reported.
In what follows, we attempt to classify a number of multiplier architectures according to different criteria, i.e., transistor region of operation, nonlinearity cancellation schemes, and signal injection method. Table I summarizes these results, more details are discussed next.
II. OPERATION MODES AND CIRCUIT TOPOLOGIES
Despite many reported circuits, only two cancellation methods for the four-quadrant multiplication are known. Since a single-ended configuration cannot achieve complete cancellation of nonlinearity and has poor power supply rejection ratio (PSRR), a fully differential configuration is necessary in a sound multiplier topology. The multiplier has two inputs, therefore there are four combinations of two differential signals, i.e.,
, and . The topology of Fig. 3(a) is based on single-quadrant multipliers. Fig. 3(b) is based on square law devices. These topologies achieve multiplication and simultaneously cancel out all the higher order and common-mode components ( and ) based on the following equalities: (5) or (6) respectively. Note that, throughout the paper, lower case letters, i.e., , , , , represent small signals.
MOS transistors can be used to implement these cancellation schemes and the fundamental operation is a transconductance multiplier because the MOS FET is a transconductance device. The simple MOS transistor model is expressed as for (7) for (8) for N MOS FET in its linear and saturation regions, respectively. and are the conventional notation [65] for the transconductance parameter and the threshold voltage of the MOS transistor, respectively. The terms in (7), in (7), or in (8) can be used to implement (5) and (6), respectively. More details follow in Sections II.1 and II.2. Fig. 4 shows the application methods for two signals ( and ) in a MOS FET (See Table I [66] - [67] , although the performance of the summer directly affects the performance of multiplier. In this tutorial, it is assumed that these signals are available.
In the following subsection, multiplier topologies are categorized based on the signal injection method. All the multiplier types are summarized in Table I .
II.1. MOS MULTIPLIERS OPERATING IN LINEAR REGION
A. Using (TYPE I)
First we introduce a programmable linear transconductor and show how it can be used to yield a multiplier. In Fig [68], operates in the linear region while operates in the saturation region when proper bias voltage and are provided. If the transconductance is much larger than , then behaves as a source follower and of is controlled by through the source follower . The source follower can be replaced with a BJT emitter follower [69] [ Fig. 5(b) ] or a gain-enhanced MOS source follower [70] , [71] . The configuration shown in Fig. 5 (c) enhances the effective transconductance of the source follower. In the case of gainenhanced MOS source follower, the auxiliary feedback may cause some stability problem [72] degrading transient behavior unless the amplifier is properly designed. A multiplier can be realized by combining two programmable transconductors as shown in Fig. 6 (a). The output currents are obtained from (7) where and .
The difference of output current yields a multiplication as (10) In Fig. 6 (a), the op amps keep the sources of the FET virtually grounded. This approach has been used in conjunction with switched-capacitor circuits to implement a weighted-sum or a weighted-integrator [5] - [8] . The configuration in Fig. 6 (b) uses MOS source followers and achieves multiplication in the same way in (9) except in (9) is replaced with . This configuration is reported in [9] - [13] with gain-enhanced source follower.
A fully differential configuration improves the linearity and PSRR further because a better nonlinearity cancellation is obtained. The fully differential configuration using four MOS transistors operating in the linear region is shown in Fig. 7 .
These configurations are based on the topology in Fig. 3 (a) and correspond to (5), yielding (11) or (12) The op amp in Fig. 7(a) [14] - [16] can be replaced by the source followers shown in Fig. 7(b) [17] . This is possible because the purpose of the op amp is to keep the source potential of transistors constant. The circuit shown in Fig. 7 (c) [18] is the fully (pseudo) differential extension of the circuit shown in Fig. 6 (b) for a complete implementation of (5). The of , which is operating in linear region, also can be applied as shown in Fig. 7(d) [19] , [20] .
B. Using (TYPE II)
The MOS FET operating in the linear region has a square term, as in (7). This term can be used to realize the cancellation method in (6). In Fig. 8 , sum and difference of two input signals are applied to the gate of source followers, , and they control the drain voltage of that operates in the linear region. The summer indicated at the gate of can be implemented using an active circuitry or passive components such as resistors or a floating gate [66] . This circuit, based on (6), yields (13) However, the linearity of this configuration is poor. 
C. Dual Gate in Linear Region (TYPE III)
Another method to inject signal is by modulation [21] 
II.2. MOS MULTIPLIERS OPERATING IN SATURATION REGION
A. Using with Diode Connection (TYPE IV)
The drain current of a diode-connected MOS FET depends on in the saturation region. Thus, this signal can be applied using source follower whose gate input is the sum and difference of two input signals as shown in Fig. 10 . The topology is similar to type II. However, the linearity of this configuration is often poorer.
B. Using with Gate and Source Injection (TYPE V)
A four-quadrant multiplier based on Fig. 3(b) can be realized by four cross-coupled transistors as shown in Fig. 11 . The output current yields (16) based on (6) and (8) . Varieties of source signal application methods are reported in the literature. Fig. 12(a) uses an opamp [22] , (b) uses a linear differential amplifier [23] , (c) uses source followers . A separate source follower, as shown in Fig. 12(d) [24] , can be provided to each transistor in cross-coupled transistors. A gain-enhanced source follower [25] - [30] or a BJT emitter follower [31] can be used to apply the source signal. This type is the most widely implemented multiplier structure. 
C. Using with Substrate Terminal (TYPE VI)
The substrate of a MOS FET can be used as an additional input terminal as long as the substrate-source junction is kept reverse biased. The substrate potential controls the threshold voltage for an NMOS transistor as (17) where is the threshold voltage when is the body effect coefficient, and is the Fermi potential. Substituting in (8) with (17) , the configuration shown in Fig. 13 , based on topology Fig. 3(b) and (6), gives (18) The approximation is valid only if . However, the linearity of this configuration is poor.
D. Using with Voltage Adder (TYPE VII)
This multiplier architecture is based on the nonlinearity cancellation of Fig. 3(b) and voltage summing circuits. Four cross-coupled transistors with voltage summer realize a fourquadrant multiplier as shown in Fig. 14(a) . The tail current can be removed as shown in Fig. 14(b) . The output current is obtained as (19) based on (6) and (8 an active adder. The floating voltage source can be used as shown in Fig. 15 to realize the voltage summation [40] - [42] .
Reference [43] provides a summary of this multiplier type. The structure of this type is similar to type IV. As type IV requires an additional transistor, it does not have any advantage over type VII.
E. MOS Gilbert Cell (TYPE VIII)
A MOS differential pair operating in saturation region generates a differential output current characterized by (20) for where is the tail current and is half of the differential input voltage.
The topology shown in Fig. 16 is the same as the one shown in Fig. 11 . However, here is a voltage signal and is a current signal. The differential output currents from two differential pairs are subtracted yielding (21) The input current is generated by another differential pair as shown in Fig. 17(a) as (22) where is the differential input voltage and is the transconductance constant of the transistor in the third differential pair. Thus, the MOS Gilbert multiplier shown in Fig. 17(a) yields (23) where and are both voltage signals.
Note the similarity of Fig. 17(a) to Fig. 2(e) when the transconductors are replaced by differential pair. The tail current can be removed [44] as shown in Fig. 17(b) . As in (20) , wider input range or higher linearity is obtained with higher bias current.
Note that this cancellation scheme follows (5). The Gilbert cell is implemented using lateral BJT in CMOS process in [45] . The MOS version of Gilbert multipliers is reported in [46] . As its linearity is poor, several modified versions including linearization schemes [46] - [48] , folded structures [48] - [50] , and active attenuators [51] have been reported.
III. REMARKS ON MULTIPLIER STRUCTURES
AND FABRICATION RESULT None of the above analyses includes higher order effects of MOS device [65] such as -effect, -effect, and mobility degradation effect. Besides the higher order effect of MOS in the multiplier core, the nonidealities of source follower 
1) General Comparison:
The measurements of the multiplier performance can include input range, linearity, commonmode effects, minimum power supply voltage, power consumption, silicon area, frequency range, noise, and so on. Since all these performance measures are strongly application dependent, there is not an absolute standard comparison metric. Some limited qualitative comparisons are summarized in Table II. From this table, Fig. 21 shows total harmonic distortion (THD) when the other input signal is fixed to be 1 V. In this Fig. 21 , we can observe that the circuit Fig. 12(d) shows the worst linearity for both input signals. Circuit Fig. 7(d) and the circuit Fig. 12(d) show poor linearity for input signal.
Note that this tendency is generally true, although the above results are dependent on the transistor size ratio, process parameters, and bias conditions. From the above simulation results summarized in Table III , and complexity of circuit topologies, circuits of Figs. 7(c) and 12(c) might be considered to have better performance than others because:
1. the circuit Fig. 7(d) has low transconductance, is sensitive to mismatch, and has poor linearity; 2. the circuit Fig. 12(b) is sensitive to mismatch and has low transconductance; 3. the circuit Fig. 12(d) consumes high power and has poor linearity. Now we focus our attention to these two multipliers with good properties [Figs. 7(c) and 12(c) ].
3) Detailed Linearity Simulation: For comparison between circuits Figs. 7(c) and 12(c), the effect of source follower's transistor size is simulated. All other transistors have W/L 10 m/10 m. Fig. 22 shows that the linearity of circuit Fig. 7(c) improves as the source follower uses larger W/L ratio. On the contrary, this effect is not clear in the case of circuit Fig. 12(c) . This simulation result implies that circuit Fig. 7(c) can outperforms circuit Fig. 12(c) when (or ) is large enough (at least three times larger than ).
4) Comparison by Experimental Result:
Considering the above simulation result, circuits Figs. 7(c), (d), and 12(c) are fabricated using Orbit 2 m N-well process. These multipliers were designed with identical transistor size ( and for all others), transconductance (10 A/V), and power consumption (360 W). The input common-mode voltages ( and ) are set to allow approximately 2 V differential input range for both and . Fig. 23(a) and (c) show the output differential currents (for simplicity, only one quadrant is shown in the figures) from three fabricated multipliers. The linearity errors are shown in Fig. 23(b) and (d) . The linearity error of the circuit Fig. 7(c) is (24) and they are depicted in Fig. 25(a) . The conditions for circuit Fig. 12 (c) are (25) and they are depicted in Fig. 25(b) . For the same input range and output node voltage swing , circuit Fig. 7 (c) requires much lower power supply voltage than Fig. 12(c) . For instance, for a 1-V input range for both and input signal, threshold voltage V and a 2 V output signal swing, the circuit Fig. 7 (c) requires 3 V power supply while the circuit of Fig. 12(c) requires more than 4 V power supply. 
6) Remarks on Noise:
Another performance measure of a multiplier is noise, especially for small signal applications where the input range is not a major concern. A thermal noise current power density of a MOS transistor is conventionally modeled as (26) for transistor operating in linear and saturation, respectively [73] . In the case of circuit Fig. 7(c) , total output noise current is given (27) where (28) (29)
In the case of circuit Fig. 12(c) , if the current source has the same transistor size as the source follower, then the total output noise current is given (30) 
If
, then circuit Fig. 12 (c) has higher output noise because in (30) is much larger than in (27) . The output noise floors of fabricated multipliers are measured with 1-k resistor at 1 kHz. The circuit Fig. 7(c) showed 26 dB lower noise floor than circuit Fig. 12(c) . Fig. 7(c) as the most recommended analog MOS multiplier structure. The circuit Fig. 7(c) has clear tradeoff between noise and linearity.
The input reflected equivalent noise voltage of circuit Fig. 7(c) is obtained by dividing (27) by the square of transconductance of multiplier , which is determined by as in (11) , yielding (31) when other input is unity. Substituting and in (31) with (28) and (29) results in (32) This analysis suggests that and should be reduced to improve the noise performance for the given . This is the direct tradeoff with linearity and input range because should be increased to improve linearity as illustrated in Fig. 22, and determines the input range shown in Fig. 25(a) .
The noise performance of circuit Fig. 7(c) is verified through simulation. In the simulation, the output noise is measured at the one of the output node with 50-load resistor and integrated within 1 MHz-2 MHz range. For all simulation, the transistor length is 10 m for all transistors. Fig. 26(a) shows that the total output noise is almost a linear function of as expected from (27) , (28) , and (29) when the source follower's transconductance is large enough . The input reflected equivalent noise is inversely proportional to as shown in Fig. 26(b) . This result agrees with analysis in (32) . However, if the source follower is not large enough , (32) is no longer valid as shown in Fig. 26(b) because (32) is based on the assumption that the source follower is an ideal one. The noise performance starts to be degraded when is smaller than around three ( and ) . Fig. 27 shows the noise dependency on source follower size and suggests ratio larger than around three ( and ). These analyses are conflicting each other as follows.
1) From Figs. 26(b) and 28, the ratio should be larger than three to make (32) valid. 2) From (32), the ratio should be minimized for low input reflected noise. These two observations lead us to the conclusion that the optimal ratio for low noise design is around three for this specific process. Remember that ratio should be maximized for high linearity as shown in Fig. 22 . Fig. 28 shows that the input noise is almost a linear function of the difference of two input common-mode voltages, , as expected in (32) . This difference is the summation of two input ranges. Therefore, for low noise design, the input range should be sacrificed. In designing the circuit Fig. 7(c) , ratio and are the most important design parameters that determine the tradeoff among noise, linearity, and input range.
V. CONCLUSION
Although a large number of transconductance multipliers are reported in the literature, they fall into eight categories described in this tutorial and are summarized in Table I .
Several multiplier architectures do not have any clear advantage over others. As the current trend of circuit design is low voltage and low power, the circuit shown in Fig. 7 (c) seems to be one of the most attractive low-voltage and highperformance MOS multiplier structures. A BiCMOS version that uses BJT instead of the source follower will improve its performance. Several design considerations of the circuit Fig. 7(c) were provided.
A reader should be aware that this comparison might not hold for all cases. The choice of circuit topology is completely dependent on design specifications.
