Abstract-The analysis, design, and implementation of a twostep current-sampling switched-current (S 2 I) multiplier is presented. The S 2 I technique has been employed to compensate analog errors due to charge injection as well as those arising from the finite output impedance. A thorough circuit analysis investigating the offset sources of the S 2 I cell and of the multiplier's nonlinearities sets up the platform to effectively design the multiplier and to avoid the use of feedback, or cascode techniques, to deal with channel modulation effects. The multiplier has been implemented using a 2-m n- A complete set of detailed experimental results is provided in the paper.
I. INTRODUCTION
A NALOG multipliers are fundamental functional blocks in many circuits and systems [1] - [3] . Sampled-data multipliers were first implemented some time ago using switched capacitor (SC)-based circuits [4] - [9] . Many different approaches have been investigated in this context ranging from time-division charging of a capacitor [4] to multiplication of binary numbers by interfacing an analog multiplier with DAC's and ADC's [5] . Other approaches have been based on depletion-mode MOS transistors (DMOST's) operated in the triode region and embedded in SC-based circuits [6] - [7] ; multivalued logic [8] and pulse-based arithmetic [9] SC multipliers have been presented as well.
Switched-current (SI) circuits [10] represent a feasible alternative to SC circuits especially due to their compatibility with digital technology. However, although some nonfiltering applications of SI circuits have been developed [10] , [11] , to the best knowledge of the authors, only one SI multiplier has ever been presented until now [1] . The complexity of this circuit required one extra clock phase besides the normal clock phases used in common second generation SI cells [10] , two explicit capacitors to cope with clock feedthrough problems, and a regulated-cascode architecture to deal with channel modulation effects [1] . In this circuit, the multiplication of any Manuscript received July 8, 1997 ; revised March 5, 1998 two given currents and is accomplished by evaluating their quadratic terms:
. One smart aspect of Leenaert's [1] multiplier is that any quadratic term is evaluated by using the same squarer circuit in different clock phases. This avoids the need for precisely matched squarer circuits as is the case in some continuous time current multipliers [2] . This paper presents an alternative implementation by using the S I switched-current technique that has already been proven effective in filtering and converting applications [12] - [14] . One of the most important features of S I is that it allows to compensate for analog errors due to charge injection [10] , [14] , [15] as well as for those arising from the finite output impedance. The signal-dependent clock feedthrough is sampled and stored in the initial sampling phase of the current copier operation and then algebraically added to the corrupted current to minimize the corresponding error. Likewise, more sophisticated circuit techniques have been applied to this elementary circuit architecture [13] , [14] . We have considered only the basic cell in the design of the multiplier.
II. ANALYSIS OF THE S I MEMORY CELL
The circuit of an S I cell and its phases is shown in Fig. 1 .
is referred to as the coarse memory while is called the fine memory. Fig. 2 shortly reports a small-signal analysis of the cell:
and are the currents stored on and , respectively, while is the clock feedthrough. Current is mainly the offset due to the signal-independent clock feedthrough in , and is the total output impedance of augmented by the feedback effect due to its drain-gate capacitance. Notice that , therefore only the coarse memory error is stored into the fine memory [ Fig. 2(b) ] on . Moreover, most of the input current is nulled by making the actual input resistance smaller than its equivalent on . The dimensionless constant is the voltage gain of a CMOS inverter in which works as the current source and works as the driver. The final output current is
Observe from (1b) that the first term in brackets is close to 0 while the other one is less than 1. Therefore, the second and third terms can be neglected in (1a). Let us now assume that all the MOST have the same small-signal transconductance . Two cases are distinguished: if the load consists of a diodeconnected transistor (e.g., another current mode stage) then . If, instead, the load is another S I memory cell then is partitioned into (during which ) and (during which ). In the former case, the information retrieved from the memory is underestimated when in (1):
Taking , (2) can be approximated to (3) where represents the output current offset. Given that the ideal transfer function of an S I cell is , we can take the -transform
A perfect signal-dependent clock feedthrough cancellation is obtained considering only the effect of the channel length modulation of and on . The corresponding final expression is (5) where is the total output conductance of the memory cell.
When, instead, the load of the memory cell is constituted by another S I cell, then the load changes to during . Therefore, dominates over the output impedance of the cell allowing to completely transfer the output current to the next stage. This implies that (2) is substituted by (6) The fractional term multiplying in (6) is very close to one. Moreover, taking :
Equation (7) is formally equal to (5) considering that the total output conductance is . If, however, as it is commonly assumed [10] , the channel modulation is neglected on and but considered on , then we have: 1) there is no current attenuation on and and 2) the Norton equivalent of the S I cell (with an output conductance equal to ) transfers the retrieved current to a load with conductance equal to on . Hence (8) This analytically proves that the S I cell achieves not only the feedthrough cancellation but it also reduces the error due to the finite ratio as compared with common secondgeneration cells [12] - [13] . Moreover, this analytical result is in complete agreement with the simulation [12] and experimental results [13] , [14] obtained by Hughes et al.
The sampling frequency in common second-generation cells is limited by the settling time of the cell sampler [10] . In the case of S I, the cell treats the coarse phase settling error as the other errors and so attempts to cancel it during the fine phase . This means that the bandwidth is the same as that of a standard second-generation cell. 
III. THE S I MULTIPLIER ARCHITECTURE
Starting from the algorithm introduced by Leenaerts et al. [1] , an alternative circuit implementation is now presented. The product of two currents and fed at the inputs of the multiplier is accomplished by evaluating the left-hand side of (9):
The squarer circuit shown in Fig. 3 has been considered [16] in order to obtain the square of a current. For this circuit, a relationship including the input/output offset errors gives (10) Fig. 3 . The current squarer. where and are the output and input currents, respectively, is a constant current related to the bias voltage is the output offset error, and is the input offset error. The approach introduced in [1] consists of using the same squarer circuit many times while storing intermediate results in the SI memory cells.
The block diagram of the whole system realizing the above algorithm is depicted in Fig. 4 . The algorithm consists of four main steps described by means of four main clock phases:
. The circuitry consists of three building blocks: the current squarer, an adjustable current mirror, and two S I memory cells. From this figure it is seen that, analogously to [1] , a complementary version of the squarer circuit shown in Fig. 3 is used. In this way, due to the fact that the adopted technology is a CMOS n-well, it is possible to avoid the bulk effects on by connecting its source to the n-well. As recalled in Section II, the S I cell minimizes the channel length modulation effect and implicitly the problem of the finite ratio in comparison to other SI cells not utilizing cascode or feedback-based approaches. However, the retrieved contents of the memory cells are expected to have an attenuation. Among the four steps, the biggest unbalance occurs during because the squared input signal is directly added up to the rest of the previously calculated results without an equally corresponding attenuation. Therefore, during this last phase, the ratio of the mirror is changed so that the unbalance is compensated. The mirror has unity gain during all the phases but . Strictly speaking, similar problems are expected also in phases and . Therefore, for better accuracy, a similar ratio-tuning can also be accomplished in those phases with only a small increase in the complexity of the system. However, simulation results have clearly shown that for a satisfying accuracy [nonlinearity of around 1% full scale (FS)] this is not necessary. The analysis of this behavior is reported in the next section.
The circuit schematic of the complete multiplier is shown in Fig. 5(a) . In this figure, it is clearly seen how the current gain of the mirror is slightly changed during by adding a smallarea diode-connected MOST parallel to the existing one. This, however, will cause an extra bias current coming out from the right branch of the mirror due to the unbalance in . This will add up to the constant offset error present in the output of the S I cells and will cause a constant current offset on . This undesired offset can be canceled in two ways. If the multiplier is going to be used alone, an extra current can be added during by using an additional current source and a current steering switch. This is realized by , as shown in Fig. 5(a) . This is, for example, what has been done in the fabricated chip whose experimental results are discussed in Section V. If, instead, many multipliers are going to be used in the same chip (as in the case of a neural network or in the case of adaptive filters and so on), then a more suitable technique consists of realizing another multiplier (namely a replica multiplier) without inputs. The output current offset of this multiplier is added up to the outputs of all the other multipliers by using current mirrors. In this way, the output offset of the other multipliers can be drastically reduced in spite of process variations.
Seven of the nine control signals used to drive the switches are depicted in Fig. 5(b) . The other two signals are and . The nine control signals are obtained as combinations of the various "master" phases. It can be noticed that while the sub-phases used to control the internal switches of the memory cells are not overlapping (namely, ), the current steering switches are controlled by signals with overlapping rising and falling edges . This minimizes the generation of current spikes without interfering with the operation of the circuit.
IV. ANALYSIS AND DESIGN OF THE S I MULTIPLIER
In this section, the behavioral analysis of the S I multiplier is carried out. The approach presented takes into account the nonidealities of the circuits and devices. Eventually, an algorithm for the circuit design of the multiplier is discussed.
A. Circuit Analysis of the Multiplier
In [12] and [13] , the behavior and some of the applications of the S I cell have been discussed. However, to analyze the proposed multiplier, the effect of the finite conductance ratio in the memory cells, as well as in the other building blocks, has to be taken into account. These topics are discussed below.
For the sake of clarity, it will be assumed that all the 's are equal and that all the 's are equal as well. Strictly speaking, a complete analysis requires the behavioral analysis of the multiplier in seven phases ( and ). However, taking advantage of the analysis of the S I memory cell carried out in Section II, we can consider its small-signal equivalent circuit as follows. During and , the memory cell working in the sampling phase (composed by the two subphases "a" and "b") has a small signal equivalent circuit consisting of a conductance equal to . The current stored in the cell is the one flowing into this conductance. During the retrieval phase the small-signal circuit is composed of an ideal current source supplying the current stored in the previous phase in parallel with a conductance equal to . Notice that, although the cells supply a fixed output offset when the stored current is retrieved, we can nevertheless implicitly consider this offset as part of the squarer offset . During , instead, the circuit load is assumed to be the generic conductance . Therefore, the analysis is carried out in the four main phases and .
Let us now refer to the complete circuit schematic shown in Fig. 5(a) and to the small-signal equivalents shown in Fig. 6 . The current source depicted in Fig. 5(a) is implemented by using a single PMOST supplying a constant current equal to . According to the discussion of Section III, the mirror -supplies the following current (11) in phase :
The corresponding small-signal equivalent is shown in Fig. 6(a) . Here, represents the output conductance of the right-hand-side branch of the current mirror. Therefore, following the previous discussion, we have that . Notice that only cell 1 is connected to node A and that the input conductance of the memory cell is . Hence, the current stored in the cell is (12) In phase , the mirror supplies which is added to (retrieved from cell 1) and the result is stored in cell 2. Again, the actual current stored in cell 2 is the one flowing into . Hence, from the small-signal equivalent circuit shown in Fig. 6(b) : (13) In phase , the mirror supplies which is added to supplied by cell 2 and the result is stored into cell 1. Thus, considering the small-signal equivalent circuit depicted in Fig. 6 (c) (14) Finally, in phase , the mirror changes its ratio. This is accomplished by shunting and . The ratio changes from to with . However, because the bias current on the left branch of the mirror is not changed, an extra current is supplied by its right-hand-side branch. This represents an output offset that can be canceled as discussed in the previous section. In terms of the small-signal analysis, the mirror supplies which is added to supplied by cell 1. The result is that flows into the output load . So, from the small-signal equivalent circuit of Fig. 6 (d) (15) Substituting (12)- (14) into (15), the following expression is obtained: (16) Defining (17) Equation (16) can be approximated to (18) where is just a current proportional to the actual output current . Let us then analyze by substituting and from (11)- (15) into (18) . After some algebra, the following relationship is obtained:
The first two terms of the right-hand side represent an offset, the last term is the desired result, while the remaining terms constitute the nonlinear distortion. Note that is very close but less than 1. Therefore, the third and fifth terms, being multiplied by a factor , can be neglected. Thus, relationship (19) can be rewritten as (20) where is the offset current. The nonlinear error is canceled by setting the current mirror ratio as (21) The final expression is therefore (22) It is worth noting that because the offset is almost canceled as well. It follows then that (23) The analysis carried out in this section demonstrates the big advantage that the proposed architecture has on the effect of the nonlinearity cancellation by selecting the appropriate mirror gain.
B. Circuit Design
Let us consider the circuit diagram shown in Fig. 5 and, for the sake of clarity, assume a symmetric power supply . Let us consider the two memory cells composed by -and the corresponding switches. The quiescent drain voltages of the MOST's are chosen to stay at ground. To avoid the drain voltage of these transistors jumping when the switches turn on or off, and are designed for (24) Therefore 0 V. The bias current can be chosen to be a safe value for the adopted technology. Alternatively, it can be chosen to satisfy other possible requirements on and/or or for a desired ratio. From these considerations the aspect ratios are determined as (25) There is still a degree of freedom that allows us to choose either or . Two alternatives are possible. If the output impedance of the cell is of major concern then is fixed and is determined accordingly. The second alternative regards the area . Larger areas minimize the clock feedthrough. On the other hand, smaller areas are necessary for high speed.
Both transistors and constitute a current mirror. Hence, to minimize the current error due to the channel modulation effect, the quiescent drain voltage of is chosen to stay at ground. and constitute the squarer circuit and it is assumed that all of them are equal [16] . Hence, from the above considerations, it follows that the quiescent drain voltage of stays at . The square law holds for [16] , [1] . Therefore, a minimum value for is fixed because the maximum value of is essentially given by . Taking into consideration the above assumptions for the node voltages and the fact that and are biased for a drain current equal to , while and are biased for a drain current equal to , the design of these transistors is then straightforward.
In (21), the mirror ratio (current gain) has been obtained. Assuming that , the aspect ratio of is obtained as follows: (26) However, the final design of also depends on the actual voltage drop of the switch in series with . A common measure of the accuracy is given by the difference between the multiplier's actual and ideal output, at full scale, as a percentage of the full scale itself. This is known as internal trim error [3] . This includes the effect of offset, feedthrough, nonlinearity, and scale-factor errors. A plot of versus , obtained by HSPICE simulation for a 100-kHz clock frequency and large signals, is shown in Fig. 7(a) .
Let us now consider the switches. The control voltages for the switches go from rail to rail while their terminals stay at a voltage close to ground during the whole operation of the circuit. Therefore, minimum size switches can easily be designed by using either NMOS transistors or CMOS transmission gates. However, for the adopted technology, no appreciable difference has been noticed by substituting the NMOST's with the CMOS switches, thus, single NMOST's have been used in the considered implementation. There is one exception only: because the voltage at the drain of is the two switches steering the currents and at the input of the squarer are implemented using CMOS switches instead of simple NMOST's.
V. EXPERIMENTAL PERFORMANCE EVALUATION
A prototype of the proposed multiplier has been fabricated in MOSIS Orbit n-well 2-m technology. In this section, the experimental results obtained by testing the chip are discussed. A summary of the measured parameters is reported in Table I .
The control signals for the switches have been generated on-chip by means of digital circuitry [17] , [18] driven by an external master clock of frequency . The multiplier performs multiplications per second since the frequency of the four phases to is . The nominal clock frequency is 400 kHz. Indeed, it has been experimentally verified [19] . The output current of the multiplier is measured by feeding a 2.2-k grounded resistor. The power supply is 3 V and the power consumption with zero inputs is 0.3 mW (essentially constant until 1.7 MHz). The multiplier's ideal transfer characteristic is 5000 A . The measured input current ranges from 35 to 35 A with a maximum output current of 6 A. It has also been verified that the multiplier is still working for an input range of around 40 A but with reduced linearity as shown in Fig. 8(a) . The experimental transfer characteristic of the multiplier is shown in Fig. 7(b) . Due to the inherently discretetime nature of the circuit, the output current is a pulse train and its envelope corresponds to the result of the multiplication. Thus, in order to trace the curves of Fig. 7(b) , the voltage swing at the output resistor has been amplified and filtered by using a lowpass filter that separates the carrier from the envelope. The curve has been traced using an HP 4145B curve tracer and the corresponding currents are reported close to the original instrument scales.
As previously pointed out, in this realization the constant offset current is internally compensated by inserting during phase a current source in node A (realized by the NMOST). However, due to unavoidable process variations, an offset current of 200 nA has been measured at the output. This can be easily zeroed by a suitable shift of the voltage level of the second terminal of the output resistor (less than 1 mV in our case).
An internal trim error of 1.0% FS has been measured at a clock frequency of 400 kHz. This degrades to 1.5% FS at 1.7 MHz. More complete information on the nonlinearity is given by the total harmonic distortion (THD) for a sinusoid at 50 Hz. The trim error, in fact, is referred only to the accuracy at full scale, while the THD involves the entire range of operation of the circuit. In order to measure THD, a 50-Hz sinusoid is fed at one input while the other input is held constant. From the analysis carried out in the previous section, it turns out that the input current that mainly affects the linearity is . Therefore, a worst-case result is obtained if the sinusoidal signal is fed on the input . This is the case herewith considered. The THD has been measured by acquiring the spectrum of the output current using an HP 3588A spectrum analyzer. A plot of the THD for different values of the sinusoid amplitude and of is shown in Fig. 8(a) . The plot includes values outside the normal range of operation as well. Measurements obtained with negative are analogous to the one shown in Fig. 8(a) .
Another important performance parameter is the input feedthrough [3] . In particular, the -feedthrough refers to the case in which the input is zero. Conversely, thefeedthrough refers to the case in which the input is zero.
The maximum values of feedthrough obtained by keeping one of the inputs at zero and varying the other input through the full allowed range are depicted in Fig. 8(b) as a function of the clock frequency. The 3-dB small-signal bandwidth, measured at a clock frequency of 1.7 MHz, is close to 200 kHz. It is worth mentioning that, in practice, higher clock frequencies are hardly applicable and the performance of the circuit rapidly starts to degrade. The full power bandwidth, at 1.7 MHz, is 150 kHz. Finally, the percentage of THD variation at full scale for a variation of the power supply is 0.213%/% (i.e. 13 dB %/%) while the SNR is 50 dB (again, as well as for the measurement of the THD, this is the worst case: the signal is fed to while fixes the gain).
A die photograph of the multiplier is shown in Fig. 9 . The whole area, including the references for biasing the circuit is 225 250 m .
VI. CONCLUSION
The design, analysis, and experimental results of an S I switched-current multiplier have been presented. A comprehensive analysis to understand the sources of offsets and nonlinearities of our circuit has been performed. It has been found that, by appropriately setting a current-mirror gain, the nonlinearity can effectively be canceled. An actual IC prototype was fabricated using MOSIS n-well 2-m technology. Experimental results are consistent with theoretical findings. The use of improved S I cells described in [13] can enhance the performance even more.
