Abstract-We present a continuous time analog VLSI CMOS circuit consisting of resistors and transconductors for computing the best-fit line to a set of data points. The circuit can implement standard least-squares linear fitting, as well as a form of linear fitting that is more robust to outliers. We analyze the static and transient response of the chip, and present design criteria given desired constraints on speed and accuracy. Finally, we describe the transistor level design and measurement results from a 50-input prototype fabricated using a 1.2 m n-well process.
I. INTRODUCTION
T HIS paper presents a continuous time analog VLSI CMOS circuit consisting of resistors and transconductors for computing the best-fit line to a set of data points. The circuit can implement standard least-squares linear fitting, where the quality of the fit is determined by the sum of the squares of the deviations between the line and the data, as well as a form of linear fitting that is more robust to outliers.
The circuit is targeted toward application in neuromorphic vision sensors, where analog pixel parallel processing circuits are implemented alongside photosensing circuits [1] . It is motivated along the same lines as circuits information from an array of pixels is "summarized" in a few voltages or currents which encode some global property of an image. This includes circuits for global velocity estimation [2] , winner-take-all computation [3] , [4] , orientation computation [5] and centroid computation [6] .
The architecture is perhaps most similar to the constraintsolving circuits of Tanner's architecture for global velocity estimation [2] . A set of global wires distributes the current velocity estimates to the pixels. Each pixel checks to see if the velocity estimate satisfies its local constraint and generates a correcting current which charges or discharges the global wires. At steady state, the solution is a least-squares fit to the local constraints.
If the local image velocity estimate can be encoded as a voltage, this work opens the possibility to extend Tanner's approach to the case where the velocity varies linearly over the array, i.e., affine motion. Affine models are used in parametric model-based algorithms for estimation of image motion, e.g. [7] , [8] . For small fields of view and smooth changes in viewpoint, the image velocity field can be well approximated by an affine transformation where denotes the image velocity, is the translational velocity, and is a linear transformation of the image coordinates [9] . In the two-dimensional case, the matrix can be decomposed into independent components, the divergence, curl and deformation. The divergence and deformation can be used to measure surface orientation and time-to-contact from a moving image [10] . In the 1D case as presented here, the divergence and deformation would be identical. However, the time-to-contact can be extracted if the camera translation is purely toward the surface patch (i.e., there is no translational velocity parallel to the image plane).
This work is also similar to work on image filtering using analog VLSI circuits [11] - [15] . The filtering operation can be considered as a more general form of curve fitting. The output at each pixel can be considered to be compromise between a data fidelity term and a regularization term. The data fidelity term ensures that the output is close to the input. The regularization term constrains the shape (e.g., smoothness) of the solution. It can also be chosen to account for intensity discontinuities [16] . The work here imposes a stricter linear model on the data.
The remainder of this section summarizes the basic concepts of linear fitting. Section II presents the proposed circuit architecture, as well as an analysis of its operation. Section III outlines the procedure by which the components of the architecture can be specified given desired computational characteristics of the array such as the speed and accuracy. Section IV details the transistor level design and test results from a 50-input prototype fabricated using the 1.2 m AMI process available through MOSIS.
The problem of linear fitting is to find the line which best describes a set of data points for . Here, we restrict , so the data correspond to samples equally spaced along the line. The "best fit" is determined by the choice of the cost function. Express the best-fit line by where , the difference between the values of the line at and , controls the slope of the line and , the value of the line at its midpoint , controls the offset. Define the deviation between linear model and the data point to be (1) 1057-7130/00$10.00 © 2000 IEEE then for standard least-squares fitting, we wish to find the coefficients and that minimize
The optimal values of and are
One problem with using the sum of squares cost function is the sensitivity of the optimal point to outliers, data points which are far from the linear fit. The sensitivity arises because large deviations between the linear model and the data are heavily penalized by the squaring operation. A cost function which is less sensitive to outliers is given by (4) where and is a positive constant. The contribution by data points which deviate from the linear model by less than the threshold still increases as the square of the deviation. However, the contribution by data points whose deviation exceeds increases only linearly. As , the cost function approaches (2) . Unlike the least-squares case, there is no closed form expression for the optimal values of and .
II. CIRCUIT ARCHITECTURE
The proposed circuit for linear fitting is shown in Fig. 1 . The output voltages of the two op amps represent the endpoint values of the best-fit line at the origin and at the point .
The upper resistor line computes the linear estimates of the data points by linear interpolation between these two values
The row of transconductance amplifiers computes the deviations between the predicted and actual data points. To implement the least-squares cost function, the output currents depend linearly upon the deviation. To implement the robust cost function, the output currents are linear for deviations less than , but saturate for larger deviations. The lower resistive line distributes these error currents appropriately to each capacitor so that its voltage moves in the correct direction to decrease the total squared error. The remainder of this section analyzes the operation of this circuit under the assumption that the op amps are ideal. The effects of a more realistic op-amp model are studied in the next section.
We consider the evolution of the differential and common mode components of the endpoint voltages since their dynamics are independent, while the dynamics of and are coupled. Under the ideal op amp assumption, the voltages at the ends of the lower resistive line are held at virtual ground. This implies where and . The evolution of the differential component depends upon the difference of the currents at the two ends of the lower resistive grid, while the evolution of the common mode component depends upon their average.
By superposition To implement least-squares fitting, we choose . The currents and can each be split into two components, one due to the input data and one due to the best-line estimate where we have substituted (5) and (7) is the effective transconductance from the differential voltage to the differential current between the two ends of the lower resistive line.
is the transconductance from the common mode voltage to the average of the two currents. Combining the above where (8) are time constants determining the response speed of the array. The response speed is determined by the settling time of the differential component. For time varying inputs, , the outputs are low pass filtered versions of the optimal estimates and with transfer function (9) The same array can implement the robust cost function in (4) by exploiting the natural saturation characteristic of many implementations of transconductance amplifiers. This idea has been exploited in other analog computational circuits [17] , [18] . Differentiating (4) with respect to and , we obtain where is given by (1) and
Comparing with (6), if we let the output of the transconductors be then and In other words, the differential and common mode voltages evolve such that the cost function is minimized. Since the cost function is convex, stability is guaranteed.
III. CIRCUIT DESIGN CRITERIA
In this section, we derive constraints upon the circuit elements used to implement the array which are used in the transistor level designs described in the next section. In particular, we replace the ideal op amp model used in the previous analysis with a more realistic model of a voltage-controlled current source (VCCS) in parallel with an output resistance. We assume that the capacitance is large enough that the internal dynamics of the operational amplifiers are negligible. In addition, we consider constraints arising from the finite output range of the transconductance amplifiers. In the end, we find that we can consider the number of input data , the desired speed of the array, the capacitance, the desired accuracy, and the ratio of the output and input ranges of the transconductance amplifiers as the free parameters which determine the component values.
The analysis assumes that the transconductance amplifiers are linear, i.e., the least-squares cost function is implemented. The designed parameters should be effective for the robust cost function since for a small number of outliers, most of the transconductors should be operating in their linear region. In a sense, the linear analysis represents a worst case for the robust cost function, since if some of the transconductors are saturated, the aggregate gains and are reduced, improving the stability of the array at the expense of a decrease in speed.
A. Analysis With Nonideal Op Amp
Replacing the operational amplifier with a VCCS with gain in parallel with an output resistance , we obtain the circuit shown in Fig. 2 . Appendix I shows that where and is the total resistance of the lower resistive line and is the total resistance of the upper resistive line.
is the open-loop voltage gain from the voltage difference between the left and right op-amp inputs to the voltage difference between the op amp outputs.
is the similar open-loop voltage gain, but from output to input
The finite transconductance of the op amp increases the response speed, although the array may be unstable if is too low. There is also less attenuation of high frequency components in the input (e.g., noise) due to an additional zero to the transfer function To limit the effect of this zero, we choose
. If the effect of the op-amp output impedance is negligible, this is equivalent to (10) If the inputs are constant, the steady state values of the differential and common-mode voltages are
The common mode voltage still settles to its ideal value. However, the differential voltage is decreased by a factor which depends upon the product . In order to ensure a given accuracy, say %, we must choose (11) For accuracy, and should be large. Typically, is fixed by the application and is fixed by the desired speed of the array. There is also an upper limit on the determined by the maximum output range of the transconductor. Increasing increases the node voltages , which may exceed the output range of the transconductance amplifiers. We examine this constraint in more detail below.
Given a set of transconductor output currents and, assuming the op amps are ideal, the voltage at node of the lower resistive line is (12) This equation is derived by superposition. The first sum corresponds to the voltage at node due to currents entering nodes to the left and the second corresponds to currents entering nodes to the right.
Below, we derive an estimate of the variance of at steady state assuming the variance of the noise in the data is known. This enables us to upper bound the probability that exceeds the output range of the transconductors. Appendix II derives a stronger bound guaranteeing that does not exceed the output range of the transconductors even during the transient. The bound is based upon the operating condition, where the input is a horizontal line with value , but the current estimate is a horizontal line with value . This condition would be rarely observed in practice and never observed at steady state. That bound is also less convenient from a design point of view since it dictates a smaller value of , which in turn requires larger values of and (i.e., a larger op-amp open-loop voltage gain) to satisfy a given accuracy constraint. Thus, we feel the steady-state variance statistical bound derived below, which should be sufficient to ensure correct operation for slowly varying stimuli under most conditions, is preferable.
Assume that the input is a noisy line where are random variables which satisfy (13) Assume that and for all . At steady state, the differential and common-mode voltages settle to and . The output currents of the transconductors are given by . The variance measures the expected deviation between the data and a line. Assuming that the correlation between and for is negligible, the variance of is which is maximized for
Since it is a weighted sum of random variables , we assume to be Gaussian. If then the probability that exceeds the output range of the transconductance amplifiers is less than 0.3%. This requires that (14) If is uniformly distributed between , then . This implies that (15)
B. Component Specification Procedure
Combining the results above, we find that we can consider the number of input data points , the desired speed of the array , the capacitance , the desired accuracy of the differential voltage , and the ratio of the output and input ranges of the transconductance amplifiers , as free parameters. The component values and can be determined using the procedure outlined below.
Given and 1) choose using (8) 2) choose according to (10) 3) choose according to (15) 4) choose and according to (11) For example, the following component parameters satisfy the requirements s, nf, %, and
We have implemented a 50-cell array based on the design specified above on a 2 mm 2 mm die using the 1.2 m AMI process provided through MOSIS. The prototype requires 2.5 V supplies. This section describes the CMOS transistor level design of the circuits as well as the measured results from the prototype.
A. Tranconductance Amplifier
We use the standard 5 transistor NMOS input differential pair transconductance amplifier shown in Fig. 3 to implement 
B. Resistors
The resistors in the lower line are implemented using polysilicon lines. Their measured resistance is 227 . The resistors in the upper line are implemented using complementary transistors in parallel biased in their linear region [ Fig. 4(a) ]. A global bias circuit generates the bias voltages and which are distributed to all circuits in the array.
Although complementary transistors enable large resistances in smaller area than required by polysilicon, the current varies nonlinearly with the terminal voltages. Using the EKV model [19] , the current through the resistors can be approximated to second order by (16) where is the common mode voltage across the resistor and is the differential voltage across the resistor and
The pinch-off voltages and are referenced with respect to ground potential and are functions of the gate voltages and . Note that they have opposite signs. The parameters are given by . The slope parameters and are approximately equal to one, but depend weakly upon the gate voltages. The ratios of the transistors can be sized to provide the desired resistance while decreasing the effect of the common mode voltage. We chose the ratios to minimize based on BSIM3 simulation models. Due to parameter mismatch, the measured variation in current due to common mode offsets is larger than the simulated variation. Fig. 4(b) plots the current through the resistor circuit versus the common mode voltage across the terminals for differential voltage varying between 300 mV. Ideally, the lines should be horizontal. We obtain the parameters A/V ( k ) and /AV by least-squares fitting of (16) to the data for mv mV, which is the expected operating range of the resistors. Over this range, the variation in the conductance is 6.7%. 
C. Operational Amplifier
The operational amplifier is implemented using the conventional two stage op amp shown in Fig. 5 . It is designed to have output resistance k and transconductance mA/V.
D. Input Stage
The inputs to the array are scanned in and stored by a sample and hold circuit implemented at every cell and shown in Fig. 6 [20] . The voltages are connected to a common line and a shift register provides the voltages and which select the capacitor which will store the current voltage on the input line. The capacitor voltage is connected directly to the inverting input of the transconductance amplifier to supply the voltage .
E. Array Performance
To test the chip's ability to perform linear fitting with clean data, inputs of the form with mV and varying between 600 mV were applied to the chip. The steady-state values of and are plotted versus in Fig. 7(a) .
Ideally, the graph of should be a line with slope one passing through the origin and the graph of should be a line with zero slope passing through the origin. However, there are slope and offset errors in both curves. The offset errors are caused by offsets in the op amps, transconductance amplifiers and measure- Transistor sizing is in units of = 0.6 m. ment circuits. The slope error for is partially due to the finite transconductance of the op-amp, as described above. However, the observed error is much larger than predicted by this factor alone. Our measurements reveal that the actual resistance of the upper resistors near the right hand side of the array is lower than that of those to the left. For the voltages are larger than those which would be obtained by linearly interpolating between the two op amp outputs, since increases with faster for small than for large . Thus, the best-fit line to has a larger slope than predicted by and a larger (more positive) offset than predicted by . For , the slope and offset are more negative. The magnitudes of the slope and offset errors increase with . The errors described above can be largely compensated by a linear scaling plus constant offset for and an additive offset for , which is an affine function of (17) For this chip, mV, and mV. The result of applying this compensation to the measured data is shown in Fig. 7(b) . The remainder of the data reported uses this compensation.
To test the effect of varying the offset, clean data with mV and varying values of between 250 mV were applied to the chip. The steady state values of and after compensation are plotted versus in Fig. 8 . As expected, is approximately a line with unit slope and is approximately a line with zero slope.
To test the linear fitting capabilities of the circuit for noisy data, similar measurements were taken using noisy data where was generated using independent random numbers uniformly distributed between which were corrected by subtracting a linear and offset term so that (13) is satisfied. The mean absolute error between the chip output and the true line parameters computed over 500 trials is shown in Fig. 9 for two lines with different slope. Values of ranged from 0 to 200 mV. The error increases with the noise level, with the error in the differential voltage being larger than that in the common mode voltage. For the largest noise level mV corresponding to a input mean absolute error of 100 mV, the mean absolute error in the common mode voltage is about 6 mV. The error in the differential voltage is about 27 mV.
Provision for measuring the response speed of the array was made by implementing CMOS transmission gates across the two capacitors. During normal operation, these transmission gates are open. When closed, the output of the op amp is shorted to virtual ground. By first closing then opening the gate and observing the op amp outputs, the speed of the response can be measured. This measurement was performed on clean data with mV and mV. The measured rise time (10%-90%) of the differential component was 15 s, corresponding to a time constant of 6.8 s. The rise time of the common mode component was 5.0 s, corresponding to a time constant of 2.3 s. As predicted by (8) , the common-mode component evolves three times faster than the differential component. The power dissipation of the array was 4.7 mW.
Robust linear fitting was tested by decreasing the bias current of the transconductance amplifier so that the output current saturated at a differential voltage across the inputs of 100 mV. Clean data with mV and mV was corrupted by additive impulsive noise with a fixed magnitude of 200 mV. The percentage of corrupted data points was varied from 0 to 100 with positive and negative impulses being equally likely. The mean absolute error between the chip output and the true line parameters are compared for the chip configured for least-squares fitting and for robust fitting in Fig. 10 . For fewer than 55% corrupted data points, the mean absolute error for robust fitting is smaller than that for least-squares fitting, since the data points which are corrupted contribute less to the cost function. On the other hand, when most of the data points are corrupted, the least-squares circuit performs better since it better utilizes the information contained in the corrupted data to estimate the underlying slope and offset.
Reducing the bias current for robust fitting decreased the power consumption to 1.9 mW. The speed of the array was also reduced due to the decrease in the transconductance . For clean data with mV and mV, the rise time of the differential component increased to 575 s and the rise time of the common mode component increased to 266 s.
V. CONCLUSION
We have described a continuous-time analog CMOS circuit architecture for performing least-squares and robust linear fitting targeted at applications in analog parallel processing arrays. Design criteria in terms of desired computational characteristics of the array, such as speed, accuracy and input range were derived. Test measurements from a 50-input prototype verify the functionality of this architecture.
APPENDIX I
Since we are no longer assuming an ideal op amp, the input voltage at the inverting inputs of the op amps and are no longer held at virtual ground. Define By superposition (18) where and are given by (7) and is the total resistance of the lower resistive line. The factor of 2 arises in the first equation since the current due to the voltage difference which leaves the left side of the lower resistive grid is the negative of the corresponding current leaving the right side, and it is counted twice in the difference.
By KCL, the output currents of the op amps must satisfy where is the total resistance of the upper resistive line. Adding and subtracting these equations and substituting (18), we obtain thus (19) The voltages across the left and right capacitors and can be split into differential and common mode components as well:
Substituting (19) Differentiating (20) Substituting (19) into (18) 
APPENDIX II
Here we derive an upper bound on the value of given the maximum voltage input to the transconductance amplifiers . Unlike the estimate of the steady-state variance, this is a deterministic estimate which is valid for the entire transient. Since all of the coefficients in the sum (12) are positive, is maximized if all of the achieve their maximum value . This corresponds to the operating condition when the input data is a horizontal line with value but the current parameter estimates correspond to a horizontal line with value . Substituting the value of into (12) and evaluating the sum, we obtain
The right-hand side of the inequality reaches its maximum at Let be the maximum output of the transconductance, amplifiers with respect to ground. To guarantee proper operation This bound is stronger than the variance based bound given in (14) , assuming that . He has been working on the estimation of image velocity. His research interest is on mixed-signal VLSI design, power electronics, and control applications.
