Abstract-
I. INTRODUCTION
Applications of adaptive filtering in portable audio systems call for special-purpose micropower and high-density hardware implementing the filtering and adaptive functions. While digital signal processing (DSP) solutions provide adequate levels of power dissipation for many applications, a micropower approach is needed for applications such as hearing aids and MEMS sensors. This can be achieved using dedicated analog circuits with MOS transistors in subthreshold [l] , [2] .
Several analog implementations of adaptive filters exist in the literature [3] - [6] . The filtering process itself involves linear multiplication of filter coefficients with a set of timedelayed inputs. Analog multiplication is often implemented using the square-law characteristic of a MOS transistor above threshold [7] , [SI. Alternative implementations with the subthreshold MOS operation yield potentially lower power dissipation [9] - [12] , but are inherently nonlinear in the voltage domain.
We propose an alternative pulse-width modulation scheme with wide-range linear voltage inputs and MOS switched current sources, biased in the subthreshold region for a large (exponential) dynamic range of weight coefficients. A pulsed signal internal representation is also attractive from the prospective of neural models of information and signal processing [ 131 with area efficient implementation in VLSI [14] - [16] .
For adaptation, the least-mean-square (LMS) algorithm, is widely used, due to its simplicity [ 171. Analog implementation of the LMS rule requires four-quadrant outer product multiplication of the input and error vectors, typically implemented using Gilbert multipliers. We propose LMS adaptation using pulse-arithmetic and charge-based updates, that is less sensitive to analog effects and transistor mismatch.
CIRCUIT ARCHITECTURE
The input-output relationship of an adaptive filter is defined bv with filter coefficients adapted through LMS learning rule: (2) where q is the adaptation rate for learning. In the following we will describe in detail each of the building blocks of an adaptive filter: delay line, adaptation cell and multiply-and-accumulate circuit with output driver.
A. Delay Line
Typical audio applications require a large number of taps. In our system, a 64-tap analog delay line is realized. The cumulative effect of offset and linear gain errors in each stage of the delay line results in a sizable offset and scaling at the output. However, offset and gain errors do not disturb the linear filtering operation, and only contribute to the DC component of the output signal and modified filter coefficients. Indeed, assume an additive offset o k and non-unity gain ak for each stage. Then, the resulting output of the filter is where k k
1=1
which still implements a linear filter with an additive DC offset. Therefore, stringent design constraints on the offset and gain specifications of the delay element can be avoided, and a standard switched capacitor (SC) design can be used.
The delay element, shown in Figure 1 , is implemented by cascading two sample-and-hold circuits [18] . A cascoded inverter is used for the high-gain amplifier. This delay element has small chip area, is parasitic insensitive, and operates fast. The clock rate for audio applications is not high, so there are no problems with the slew-rate and settling-time.
B. Multiply-Accumulate
The multiply-accumulate terms in (1) are implemented by integrating switched currents controlled by a pulse-width modulation of input signal and gate voltages of a pair of CMOS current sources, as shown in Figure 2 . The realized multiplication is four-quadrant, with differential weights and bipolar input signal. The source voltage of the multiplication transistor is pulsed, where the width of the pulse is proportional to absolute value of input signal, 2. The polarity of the input signal, relative to reference voltage V T e f , controls the position of the pulse with respect to reference time to, counted as negative on one side, and positive on the other, as given in the Figure 3(a) . The circuit used for pulse-width modulation and for determining the sign of the input signal is shown in Figure 3 from the sum of positive and negative terms.
The circuit for implementing this weighted subtraction is similar to one used in the algorithmic A/D converter [19] and is given in Figure 2 . The fully differential design is adopted throughout, with separate signal paths for w+ and w-contributes (each in turn with separate integrations of I,'+ -I/-).
Following the two differential integration stages is a standard SC subtraction stage.
C. Adaptation Cell
The adaptation weight cell capable of providing fine weight changes with both positive and negative increments, is shown in Figure 4(a) . Implementation of the learning rule (2) These voltage levels are applied externally, which control the value of adaptation rate q. During the sgn(z) pulse, the parasitic capacitor C , at node A is charged to voltage Vl, 
SIMULATION RESULTS
The operation of the circuits was verified through SpectreS simulation in Cadence using parameters obtained from a 0.5 pm CMOS process. The step response of the delay line, at different tap positions, is shown in Figure 5 . After the transients due to the (zero) initial conditions, the cumulative offset at each tap position settles to a constant over time. Figure 6 shows the trajectory of the differential weights with constant sign of input and error signal. Figure 7 shows linear characteristic of the multiplication. Estimated power dissipation for two 64-tap filters, at a 100 kHz sampling rate, is 200 ,uW and the energy dissipated per cell per clock cycle is 16 pJ.
When update goes high, the charge on C, and C, is shared. is regulated by the weight decay term on the right side of (1 I),
pulling the values towards the center of the range VAO.
There are two physical mechanisms besides the adaptation that affect voltage on the weight capacitor. The fkst is chargeinjection from transistor fiI2 and the second is charge leakage due to n drain p substrate junction of M S . Since both mechanisms affect voltage in the same direction, we need to compensate this bias, which is accomplished by applying voltages and Vht that are biased in the opposite direction to VAO. If a longer time of the weight storage is needed, dynamic refresh of the capacitor memory is necessary [20] .
IV. CONCLUSION
A n efficient, low-power and high-density analog realization of FIR adaptive filter is presented, making use of pulse-based charge-mode computation. The circuit operates with subthreshold MOS transistors and achieves a wide linear voltage range. The scheme extends to the design of neural filtering systems, including Independent Component Analysis (ICA) [21] .
An efficient charge-mode implementation of the LMS rule is included in the architecture. The delay line can be replaced by more general elements such as all-pass filters for further enhancement. 
