Abstract-A 170-MHz analog finite impulse response (FIR) filter operating from a single 3.3-V supply is described. The design has been fabricated in the HP 1.2-m CMOS process and has an area of 2.35 mm by 1.97 mm including bonding pads. This 9-tap filter dissipates 70 mW when operating at 170 MHz. The multipliers are implemented using multiplying digital-to-analog converters (MDAC's) with 6-b resolution.
polynomials [e.g., in (1) ]. Another implementation used an additional master T/H amplifier before the circular buffer to reduce the fixed pattern noise (FPN) caused by sample time variation in the circular buffers [13] . However, the use of the master T/H amplifier removes the speed advantage of the circular buffer architecture. The filter of [13] was fabricated in a 0.8-m BiCMOS process, had five taps, and consumed 240 mW when operating at 160 MHz.
In the implementation presented in this paper, digital tap weights are used to provide the programmability needed for a general-purpose FIR filter. We demonstrated the circuit's performance in equalizing a magnetic recording read signal to different PR polynomials. The test also shows that the required accuracy can be achieved without a master T/H amplifier by careful design to improve matching in the parallel T/H array. The multiplying digital-to-analog converter (MDAC) structure used in this design reduces the overall power dissipation. Fabricated in a 1.2-m CMOS process with 1-m minimum channel lengths, the IC can work at 170 MHz operating from a single 3.3-V supply while consuming 70 mW. This paper is organized as follows: Section II describes the circular buffer architecture and the resulting FPN. Circuit design issues are discussed in Section III. Section IV shows the chip layout. Section V reports test results and Section VI presents a summary.
II. CIRCULAR BUFFER ARCHITECTURE
The circular buffer architecture was first published in 1992 [5] and has been used elsewhere as well [3] , [6] , [13] . Fig. 1(a) shows the basic structure of the circular buffer.
In the circular buffer, an array of parallel T/H's is used to track the input. The number of T/H's, , is larger than the number of taps, , so that at any time, out of T/H's are used in the calculation of the output samples. By choosing and and using proper T/H control clocks, the acquisition and hold settling times of the T/H's can be greater than one sample period, and high-speed operation is facilitated. The T/H control clocks cycle around the array in a circular manner as shown in Fig. 1(b) for and , which are used in this design.
is the master clock period, so an output sample is produced every seconds. In the time domain, each T/H has two complete sampling periods for tracking the input and another period to settle to a stable held value after the track-to-hold transition. The stable held value stays unchanged for nine periods before the next hold-to-track transition. At the end of each period, one T/H cell changes from track mode to hold mode, one changes from hold mode to track 0018-9200/98$10.00 © 1998 IEEE mode, and all others remain unchanged. As a result, nine stable T/H output values are available for multiplication at any given time. Another advantage of this structure compared to a conventional transversal filter is that all the T/H cells track the input directly and the errors do not accumulate as they do in a serial T/H delay line.
One complication of this architecture is that when a T/H is in hold mode, its held value becomes after one cycle and must be rerouted to the appropriate MDAC. This rerouting is realized by an -input, -output multiplexer. Alternatively, the weights to the multipliers could be rotated, but doing that consumes more power because of the large number of bits involved. Another problem with this structure is the FPN due to mismatches among T/H cells. The discussion and analysis are similar to those for parallel A/D converters [12] and will not be repeated here. The main sources of FPN are summarized in Table I. III. CIRCUITRY A 3.3-V supply was used for this design. A simplified signal path is shown in Fig. 2 , which includes each function shown in the block diagram of Fig. 1(a) plus the input buffers described later in Section III-D. Fig. 2 illustrates that differential signals are used throughout. The design considerations for the individual blocks are discussed below.
A. Multiplication
Each multiplication is carried out by a 6-b MDAC. One bit of the MDAC is shown in Fig. 3(a) . An NMOS differential pair is used to convert the input voltage to a differential current output. Multiplication by one or zero is realized by turning the bias current on or off using digital tap weight values. Using the digital inputs to control the bias current instead of stacking differential pairs allows for relatively large input and output voltage swings with a 3.3-V supply. Also, since multiplying by zero is realized by turning the corresponding MDAC cell off, power consumption is reduced.
For high-speed operation with reasonable size transistors in the multiplexer, it is important to minimize the input and output capacitances of the multipliers. For that purpose, partial segmentation is used in the 6-b MDAC [7] ; only the four most significant bits use a segmented architecture, while the remaining two bits use a binary-weighted array made by scaling the input transistor sizes and biasing currents as shown in Table II . The current outputs of all nine MDAC's are summed together and get amplified by the summer stage shown in Fig. 3(b) . This circuit yields a known common-mode output current, which is convenient for designing the next stage.
The key parts in the multiplication stage are the differential pairs used in the MDAC and the summer. There are two dominant distortion sources. The first is the mismatch between the two input transistors M1 and M2, which gives rise to second harmonic distortion. The second is that the relation cannot be treated as linear for large signal analysis, which mainly contributes to third harmonic distortion. The details are shown in the Appendix and the two formulas for the harmonic distortions are HD (2) and HD (
The sizes used in the different MDAC cells and the summer are listed in Table II . As the simulation results show, the summer introduces the majority of the distortion due to the large differential pair used to achieve a large . The size could (and probably should) have been chosen to be smaller to reduce the distortion; the penalty would be a reduction in the output current magnitude, which would result in reduced dynamic range for the next stage.
B. Track-and-Hold Cells
In order to achieve high speed, the T/H cells use the openloop structure shown in Fig. 4 , consisting of a single NMOS switch (
) and a total hold capacitance of about 0.4 pF. The hold capacitance comprises a 0.15-pF capacitor ( ) and the input capacitance of the following unity-gain amplifier (UGA). The hold step is mostly a common-mode change or gain error for fully differential signals [8] . The NMOS switches are 9.6 m/1 m, and the input common-mode (CM) level is 0.9 V. For the simulations, the gates were driven by clocks with 0.5-ns transition times, and the source impedance was 1 k .
The unity-gain buffer shown in Fig. 4 uses the closedloop structure for better gain matching. Also, since the output follows the input, the current in each branch of the differential pair does not change with the input. Therefore, the drain voltages of the input PMOS transistors do not change, and relatively large input and output swings are possible. To ensure sufficient matching and output driving ability, large size devices are used: m; m; m; all 's are 1 m. SPICE simulations yield mA/V and .
C. Level Shifters
In order to achieve a large track-mode bandwidth, the CM input voltage of the T/H cells and the multiplexer must be kept low because of the NMOS switches used. We used a CM voltage of 0.9 V in this design. On the other hand, the MDAC input pair requires a CM input above 2 V for proper operation. Therefore, the CM level is shifted from 0.9 V to about 2.3 V using PMOS source followers (M5 in Fig. 4) prior to the MDAC's. The analog signal path shown in Fig. 4 is able to handle a voltage swing of 0.5 V with a differential nonlinearity of less than 1%.
D. Input Buffers
The input buffers are identical to the UGA's. For this 9-tap FIR filter, 12 T/H cells are used. Therefore, each cell has two cycles to track the input and a complete cycle for the output to settle after the track-to-hold transition. Switching a T/H from track to hold or vice versa injects charge to the input and the resulting disturbance may not settle out in one clock period, so the 12 T/H cells are divided into two groups driven by separate differential input buffers. The groups are arranged so that the two T/H's being switched on a given clock edge are not in the same group as the T/H that is currently tracking the input.
E. Control Clocks
The control clocks are produced on chip using a dynamic flip-flop (DFF) loop as shown in Fig. 5 . The small steps in the T/H control clocks shown are from coupling between these lines and the master clock. SPICE simulations show a very sharp falling edge with ns, which corresponds to a clock slew rate (SR) of 9 V/ns. Using a simple switch model which assumes that the NMOS transistor is open at a clock level above the input signal level, the variation in the sampling time is SR as shown in Fig. 6 . For a signal , assuming the maximum slope of is much less than the SR, the resulting error from is
The phase of relative to the sample clock is random, so is a random variable and the error variance is SR
For PRML read channels, 6-b accuracy is often sufficient, and that requires SR
For an input with peak-to-peak value V , the maximum speed given by (5) and (6) is SR V For V V, (7) gives MHz. For this 170-MHz design, it is almost three times larger than the Nyquist frequency, and the noise from sampling time variations is negligible for the required accuracy.
IV. CHIP LAYOUT
The final design was fabricated through MOSIS in the HP 1.2-m CMOS process, which uses a mask bias to achieve 1-m minimum channel lengths. A die photo of the final IC is shown in Fig. 7 . The analog circuits are separated from the internal clock generating circuits by guard rings to avoid noise coupling. The total die area is 2.35 mm 1.97 mm including the bonding pads. Fig. 8(a) shows the test setup. A test signal is produced by the arbitrary waveform generator (AWG) and passed through the device under test to give a differential current output. This output current is converted to a voltage by the I-to-V converter shown in Fig. 8(b) , and the final output voltage is sampled by the oscilloscope. The collected data are further processed by MATLAB programs. In the test, a sampling rate of 170 MHz and a single power supply of 3.3 V are used except where explicitly stated. Therefore, the corresponding fundamental frequency for FPN terms is 14.2 MHz. We begin by examining the noise and distortion of the equalizer.
V. TEST RESULTS

A. Distortion
We first examine distortion versus input signal magnitude. Fig. 9 shows the output spectra for a 10-MHz input signal at two different input amplitudes. We set while all other weights are set to zero. These settings ensure that the filter does not reduce the magnitude of the harmonic distortion terms. It can be seen that the output harmonic distortion increases with the input signal amplitude. Furthermore, the third harmonic distortion goes up at a faster rate than the second harmonic distortion, as predicted by (2) and (3). To check the dependence of harmonic distortion on the input magnitude, measurements of the second and third harmonic distortion are plotted for different input values in Fig. 10 and the measurements agree well with the linear and square-law relations predicted by (2) and (3). On the other hand, the magnitudes of noise terms relative to the output signal decrease for larger input magnitude. To keep the total of noise and distortion more than 30 dB below the fundamental ( 30 dBc), a good tradeoff can be achieved for an input in the range of 0.4-0.5 V . V is used in the following test.
B. Fixed Pattern and Jitter Noise (FPN)
We next examine the filter's FPN for different signal frequencies. Fig. 11 shows output spectra for inputs of 20 and 40 MHz. As the input frequency is increased, the fixed pattern noise terms at do not change much, but the terms at increase (they are not visible above the noise floor for MHz). These terms increase because they are mainly due to variation in the sampling times and, for the same time variation, high frequency inputs yield larger errors.
C. Supply Variation
To test the effect of supply variation on performance, we set the weights to be more representative of what might be used in an application. We chose , , and all other 's . The spectra for supply voltages of 3.0 and 3.6 V are shown in Fig. 12 . It can be seen that the fixed pattern noise is insensitive to the supply change, while harmonic distortion gets worse for lower supply voltages. Fig. 13 shows the curves of harmonic distortion with respect to supply voltage. For a total of noise and distortion of 30 dBc, a nominal supply voltage of 3.3 V is a good choice.
D. Equalization
To check the overall operation, pseudorandom binary data are passed through a linear model of a magnetic recording channel to produce an oversampled response which is then output by the AWG. The magnetic recording channel is modeled using a Lorentzian pulse with a width of at 50% of the peak amplitude [9] . During testing, timing recovery was done manually and the calculated tap values were manually downloaded to the on-chip SRAM's. Using a single 3.3-V power supply, the chip has been tested for the polynomials given by (1) with (PR4), (EPR4), and (EEPR4) and performs well at a sampling frequency of 170 MHz. Resampled output data for PR4, EPR4, and EEPR4 targets at 170 MHz are shown in Fig. 14 and the expected discrete output levels for the different target polynomials can be clearly seen. The deviations of the samples from the ideal discrete levels are the net effect of the distortion and noise introduced by the circuits and also the finite number of the taps and finite resolution of the MDAC's. This net effect is called misequalization here. We define the signalto-misequalization ratio (SMR) to evaluate the equalization performance quantitatively; SMR , where is the signal power and is the mean squared error from the ideal discrete levels. Similarly, the SNR is calculated as SNR where is the noise variance. Finally, to get a quantitative measure of the performance of the equalizer, we define an effective SNR as the ratio of signal power to noise plus misequalization SNR SNR
The difference between the SNR and the effective SNR shows how much the filter degrades the SNR and is given by SNR (9) Table III shows both the measured SMR and SNR for different equalization targets and sampling rates. A PR4 Viterbi detector requires a detector SNR of about 16 dB for a bit error rate (BER) of 10 8 [10] , [11] . Higher-order partial response polynomials require even lower SNR's for the same BER [9] . A 16-dB SNR value is therefore used in calculating the SNR values. Table IV shows the SMR's of several chips for an EPR4 target.
The sensitivity of the SMR to changes in the resampling phase at the output is shown in Fig. 15 . It can be seen that a 5% sampling error will result in an SMR loss of about 1 dB. For reasonable variations in the phase of the resampling following the filter, the performance is quite robust.
The overall performance of this design is summarized in Table V. VI. CONCLUSION A 9-tap programmable discrete-time analog FIR filter using the circular buffer architecture has been designed and fabricated in a 1.2-m CMOS process with 1-m minimum channel lengths. The equalizer works well for PR4, EPR4, and EEPR4 partial response targets in magnetic disk read channels and achieves a maximum speed of 170 MHz working from a single 3.3-V supply while consuming 70 mW. where is the input independent offset. The term can be expanded using the Taylor series which can be reasonably approximated by the first two terms for . Another distortion source is the mismatch between transistors M3 and M4, which are used as loads in the summer to convert the summed current to a differential voltage (A8) From (A8), we can see that a mismatch contributes to harmonic distortion in the same way as . Assuming the mismatch in M1 and M2 is independent of the mismatch in M3 and M4, and are independent and their mean-square values add. When the resistor mismatch is included, the second-order harmonic distortion term becomes HD
where . The third-order harmonic distortion term is unchanged.
