Abstract-The types of circuits in which analog design techniques are employed typically differ from those in which digital design methods are used, with analog circuits being commonly applied to high speed, low precision functional blocks such as mixers and RF modulators, while digital circuits are chosen for high precision, high complexity blocks that operate at frequencies well below the f T of the transistors from which the circuits are comprised. Yet there still exist applications for which the superior circuit implementation-analog or digital-is unclear. The recent birth of commercial interest in spread-spectrum communications provides the motivation for investigating one such application, that of the parallel programmable matched filter. In this paper, analog and digital circuit realizations of a parallel programmable matched filter are examined. Through wide variations of the design space parameters, the general trend that is observed is that short, fast circuits tend to favor an analog implementation, while longer, slower circuits make a digital implementation more appropriate. A methodology is provided for choosing the preferable circuit-implementing technology when power consumption-as a function of data precision, filter length, operating frequency, technology scaling, and the maturity of the fabrication process-is used as the primary metric of comparison. It is shown that neither the analog nor the digital matched filter implementation is universally more power efficient than the other. Rather, a surface is mapped in the multidimensional design space where, on one side of this surface, a digital solution is preferable, while on the other side of the surface, an analog circuit is appropriate. Equations are given which delineate the position of this transitional surface in terms of the design space parameters, and example calculations and plots depicting the regions of dominance for the digital and analog matched filters for specific process and system parameters are presented.
settled for quite some time. There is not much question, for example, that digital is the preferred circuit implementation for a multiplier with precision equivalent to 64 bits, nor is there much question that analog is the proper choice for a 2 GHz RF modulator/upbander.
Functions still exist, however, for which the choice between a digital or an analog circuit implementation is unclear. Recent commercial interest in spread-spectrum systems for wireless data networks [1] - [3] provides motivation for evaluating one of these remaining areas, that of programmable parallel matched filters.
In the discussion that follows, two different CMOS realizations of a programmable parallel matched filter-one digital and one analog-are evaluated and compared in order to determine the most appropriate circuit-implementing technology for use in commercial spread-spectrum communication systems. Throughout this paper, power consumption-as a function of data precision, filter length, operating frequency, technology scaling, and the maturity of the fabrication process-is used as the primary metric of comparison. In Section II of this paper, the function and utility of matched filters are reviewed. In Section III, a low power digital matched filter is presented and analyzed, while in Section IV the same is done for an analog matched filter. In Section V, the analog and digital implementations are compared, and a unified result is presented that shows under which set of conditions the digital solution is superior to the analog solution, and vice versa. Last, the conclusions of the paper are found in Section VI.
II. A REVIEW OF MATCHED FILTERS
The sole purpose of a matched filter is to monitor a wireless communication channel for a preselected waveform, say . The matched filter is a finite impulse response (FIR) filter with a system response that is a time reversed replica of the waveform that is to be detected. If the preselected waveform is present in the data that are input to the matched filter, the matched filter will compress most of the energy of that waveform into a small slot of time, allowing signal detection-even in the presence of large amounts of noise-to be performed by thresholding the matched filter output (see Fig. 1 ). For an additive white Gaussian noise channel, the matched filter is the ideal receiver, and it can be shown that 1057 the signal-to-noise ratio (SNR) is improved by a factor that is proportional to the time-bandwidth product of the transmitted waveform [4] - [7] . That is SNR SNR (1) where is the time-length and is the one-sided bandwidth of the transmitted signal. In addition to noise rejection, the matched filter is able to select one out of many signals simultaneously transmitted over the same wireless channel and within the same spectrum allocation, provided the signals are designed to have small cross-correlation values with respect to each other [5] , [8] , [9] . The ability of the matched filter to compress the energy of the signal into a small time slot permits the utilization of energy from multiple signal reflections (e.g., from the line-ofsight path, and reflections from buildings and/or geographical features) to be combined for a further increase in output SNR. In existing nonspread-spectrum mobile radio systems, this multipath effect is not exploited and, in fact, results in signal degradation through Rayleigh fading. Proposed spreadspectrum CDMA systems use multipath reflections as diversity branches, combining energy from the multipath branches using a multi-fingered rake receiver in conjunction with the matched filter or correlator [1] , [2] , [10] , [11] . Thus, transmitted signal energy that is normally lost or that normally causes selfinterference and fading is used to improve signal quality.
In the past, most of the interest in spread-spectrum systems has been confined to military applications and satellite communications. Within the past several years, however, commercial systems have been proposed and implemented which exploit desirable spread-spectrum characteristics such as immunity to the fading effects of multipath, ease of multiple-access overlay, and the ability to transmit at lower radiated power levels [1] - [3] , [11] , [12] . A birth of interest in spread-spectrum communication systems targeted to commercial applications has occurred and is due in part to the steady advances in high density semiconductor fabrication technologies, which allow the complex modulate and demodulate functions of the spread-spectrum transceivers to be realized while still meeting the size, cost, and power budgets imposed by the consumer marketplace.
A matched filter [4] , [13] is an integral part of an asynchronous spread-spectrum communication receiver or rangefinding device and may be used to shorten search and synchronization times within receivers of long code systems [3] , [14] , [15] , such as the CDMA cellular proposal, IS-95 [16] . Many evolving areas of commercial use employ portable communication terminals with limited power sources. Thus, low power transmitters and receivers are required if spreadspectrum systems are to become commercially viable. Though the generation and transmission of spread-spectrum signals are well understood and can be realized by technologies which consume relatively small amounts of power [5] , [8] , the receivers for these systems-including the matched filter block-are considerably more complex [3] , [12] , [17] , and, thus, work is on-going to develop practical lower power devices.
The matched filter block shown in Fig. 2 (a) of a radio occupies a place between the RF input circuitry and the digital data processing circuitry. As such, the conversion from analog to digital can take place either just before the matched filter input or just after its output, leaving the designer with the freedom to implement the most suitable matched filter in either analog or digital circuitry.
The analysis that follows compares digital and analog integrated circuit implementations of programmable parallel matched filters, using power consumption as the primary metric of comparison. The qualifier programmable is an important one, and it is used here to mean: 1) the sampling frequency can be changed to accommodate various data rates, and 2) the stored coefficients of the tapped delay line can be changed by an external microcontroller or memory. These design constraints allow maximum flexibility of the portable receiver and make it suitable for multimedia applications where, for example, voice or text information may be transmitted over the same wireless channel at different bit rates. 
III. DIGITAL MATCHED FILTER IMPLEMENTATION
Within this section, a low power digital matched filter is presented and analyzed. The issues associated with quantization levels, specifically as these levels relate to the digital matched filter, are discussed in Section III-A. A discussion of the primary subcircuits of the digital matched filter-the multipliers, adders, and registers-is presented in Section III-B. A power estimate of the digital matched filter is formulated in Section III-C.
A. Data and Reference Quantization Levels
A parallel matched filter is identical in structure to a FIR tapped delay line filter. The function of the matched filter, however, permits certain constraints associated with the FIR filter to be relaxed. The FIR filter requires precise representation of the input data and stored tap coefficients in order to suppress the spectral components within the stopband areas of the filter frequency response. This required precision often calls for a data representation of 8 bits, 10 bits, or greater [18] , [19] .
The parallel matched filter does not, however, rely on the precision of the stored coefficients as strongly as the FIR filter. The sign (positive or negative) of the result of each of the multiplications (where is the sampled timelength of the transmitted signal) in the parallel structure carries most of the desired information. Turin [4] used this property as the basis for a discussion of all-digital matched filters in which both the stored reference waveform and the input data are represented with only one bit Degradation in matched filter performance accompanies this coarse level of quantization [19] , but, due to the inherent processing gain of spread-spectrum systems, such degradation may be tolerable for some applications given the dramatic reduction in hardware costs. In practice, higher quantization levels may be used (typically up to approximately six bits) to accommodate chirp-type waveforms or continuous-phase fast-frequency-hop codes, or to compensate for an imperfect automatic-gaincontrol (AGC) amplifier at the receiver input. In the discussion that follows, the quantization levels of the input data and stored reference coefficients are used as design parameters.
B. Digital Matched Filter Subcircuits
The digital programmable parallel matched filter structure that is used in this analysis is pictured in block diagram form in Fig. 2(b) . The operation of the filter is such that each new input data sample appears simultaneously at the input of each multiplier block. After multiplication with the stored reference coefficients, the partial sums of products are shifted one cell to the right during each clock cycle, with the final sum at the bottom right being the desired matched filter output. This structure is commonly used in FIR filters (e.g., [20] ) and has several advantages over a direct implementation of Fig. 2(a) , such as ease of layout and no need for a single large adder to generate the output result.
The power consumption of the digital matched filter can be estimated by summing the contribution of the major functional blocks: the registers, adders, and multipliers. In the following analysis, it is assumed that the dynamic charging and discharging of the inherent circuit capacitances is the primary contributor to the total circuit power consumption. The total circuit power may therefore be estimated by computing or measuring the effective switched capacitance of each of the major logic blocks and calculating the familiar dynamic power dissipation. However, in the spirit of [21] , supply voltage can vary as a function of operating frequency and quantization level. A single supply voltage is assumed for the entire matched filter chip, with the value for determined by the speed requirements of the most performance-limiting circuit elements, the multipliers. For data and reference quantization level, and respectively, exceeding two to three bits each, the multiplier block is the most complex of the subcircuits of the digital matched filter, and as such it exhibits the longest delay between registers. When a single supply voltage is used for the entire integrated circuit (IC), the multiplier block defines the minimum supply voltage that is required by the IC to meet the operating frequency specifications. In general, the basic m-bit multiplier consists of an array of adders. For a straightforward array implementation, the multiplier speed decreases nearly linearly with increasing A number of schemes have been proposed that reduce the delay and/or the power required to implement the multiplication operation in digital circuitry. For a discussion of these schemes, the reader is referred to [22] and the references therein. A noteworthy result shown by the analysis of [22] is that, for multipliers with less than eight bits, there is little difference-in terms of power and worst case delay-between the straightforward array implementation and the various recoded multiplier schemes. Because of this characteristic, and because the parallel matched filter uses word sizes smaller than eight bits in width, the choice of multiplier architecture is not critical for these applications. Array-type multipliers are therefore assumed in the power analysis for the remainder of the paper.
The adder blocks within the matched filter also consume a sizable fraction of the total IC power. Reference [23] contains a comparison of six of the most common adder types. In this comparison, there are significant differences in the worst case delay among the adder types, even at bit widths as small as 16 bits, with a carry lookahead adder exhibiting a delay less than one third of that exhibited by a ripple carry adder. Yet, at a fixed supply voltage and operating frequency, the ripple carry adder consumes 25-30% less power. Note again that for greater than one or two bits, the multiplier blocks of the matched filter set the critical delay and, therefore, the power supply voltage level. This attribute implies that, as long as the chosen adder architecture can add two numbers in less time than is required to multiply two numbers, the most power efficient adder should be used, irrespective of the worst case delay. Hence, for the pipelined partial sum adders shown at the bottom of the structure in Fig. 2(b) , simple ripple carry adders can be used.
Finally, static CMOS implementations of the multipliers, adders, and registers are assumed throughout this analysis. This assumption is made since, for a given technology, static logic design techniques support a greater degree of voltage scaling than dynamic logic techniques.
C. Power Estimate of the Digital Matched Filter
From [21] , an estimate of gate propagation delay is given as a function of supply voltage threshold voltage load capacitance oxide capacitance per unit area electron mobility and the CMOS gate width-to-length ratio
Also, for a given technology, the time required to complete a bit bit multiplication can be approximated as [24] ,
where is the delay of a one-bit adder in units of seconds/bit, and and are the level of data and reference quantization, respectively.
Combining (1) and (2) gives (4) where and have been merged to form the parameter a technology dependent constant with units of volt seconds/bit.
The value of that is derived from (4) can be used to estimate the total power consumption of a digital multiplier (5) where and is a technology dependent constant with units of farads/bit 2 which characterizes the average effective capacitance that is proportional to the switching activity of the logic primitives from which the multiplier is constructed.
The total power consumption of the digital matched filter is (6) where is the number of filter taps, is the operating frequency of the matched filter, and are the effective switched capacitance of the full adders and onebit registers, respectively, and and are the number of full adders and one-bit registers in the digital matched filter, respectively, as given by and The choice of implementing technology has an enormous effect on the power consumption of a digital matched filter. Assuming the operating frequency requirements of the matched filter can be satisfied by a particular technology, the supply voltage in the scaled technology can be decreased to realize additional power savings [21] beyond that portion of the savings that is achievable by a direct application of classical scaling rules [26] , [27] . A reduction in power dissipation depends upon the degree of technology scaling (represented by the scaling constant [26] ) and the ratio of the threshold voltage to the supply voltage in the unscaled technology (7) The power scaling constant, which is defined in this paper as is (8) or (9) is applicable assuming ideal constant electric field scaling [26] . applicable assuming the threshold voltage does not scale with the other device parameters. The current industry trend lies somewhere between these two schemes, as shown in Fig. 3 . In Fig. 3 and the ensuing plots, the values found in Table I are assumed for a 2.0 m CMOS process.
Applying (8) and (9) to (6) yields an estimate of the power consumption of a digital matched filter as a function of operating frequency, quantization level, filter length (number of taps), and technology scaling. depending on which scaling procedure is used to shrink the device dimensions and operating voltages.
IV. ANALOG MATCHED FILTER IMPLEMENTATION
In this section, a low power analog matched filter is presented and analyzed. A discussion of the primary subcircuits of the analog matched filter is presented in Section IV-A, and a power estimate for the analog circuit is formulated and given in Section IV-B.
A. Analog Matched Filter Subcircuits
The tapped delay line that is required by the parallel matched filter is shown in Fig. 2(a) . This delay line can be implemented in analog form via a number of circuit techniques, including bucket-brigade devices (BBD's), surface acoustic wave (SAW) devices, and charge-coupled devices (CCD's). The BBD's are generally inferior to CCD's in terms of power expenditure for a given level of signal integrity [28] . SAW devices do not lend themselves well to programmability of both the sampling rate and the filter tap weights [29] , [30] . In addition, the physical size of SAW devices is strongly dependent upon the time-length of the signal to which the filter is matched (requiring approximately 3 mm of piezoelectric substrate per s of signal length), making the SAW size prohibitively long for most portable data terminal applications (30 cm for a 100 s signal) . The analysis that follows, therefore, employs CCD's as the chosen means of implementing the tapped delay line that is essential to the operation of the matched filter.
There are two ways to implement CCD's in a silicon CMOS process (see [28] , [31] , [32] for general references on CCD's). The charge stored under the MOS capacitor can be held either at the interface of the bulk semiconductor and the oxide layer, or the signal charge can be stored away from the oxide interface and inside the bulk semiconductor through the use of a channel implant. The former method is known as a surface-channel CCD (SCCD), and the latter is known as a bulk-or buried-channel CCD (BCCD). BCCD's have been used extensively for image sensing applications, because-for low signal levels-signal degradation due to carrier trapping effects is less severe in the bulk semiconductor than it is at the semiconductor-oxide interface. However, the charge handling capacity of SCCD's is greater than BCCD's, and it has been shown by Wong et al. [33] that when the signal charge is electrically injected into the CCD, the SNR of the SCCD's can be made superior to the SNR of the BCCD's. In addition, SCCD's are inherently more linear than BCCD's, and the SCCD devices can be fabricated in a standard doublepoly CMOS process, while the BCCD's require an extra mask step for the buried-channel implant. Thus, for reasons related to power consumption, programmability, linearity, signal integrity, and manufacturability, the analog matched filter in Fig. 2(c) analyzed in the following discussion is based on a SCCD delay line. In the configuration depicted in Fig. 2(c) , a floating-gate tap [34] is attached to every third CCD gate and is used to nondestructively sense the signal charge packets which pass under these gates. The multiply function and reference coefficient storage are achieved by the two-transistor EEPROM structure shown in Fig. 2(c) . The reference coefficient voltage is stored by altering the threshold voltages of the two EEPROM's in a cell via control circuitry not shown in Fig. 2(c) . The resulting thresholds are and The sources of the two EEPROM's are connected to separate current-summing busses, both of which are held at virtual ground. When operating in the triode (or linear) region, the difference between the drain currents through the two EE-PROM's is proportional to a multiplication of the drain voltage and the stored reference voltages.
The structure and circuit elements pictured in Fig. 2(c) have appeared in the literature in various forms for over two decades [34] - [37] , [44] . This structure is used here because, as far as the authors are aware, it is the most power efficient means to implement an analog parallel programmable matched filter in readily available silicon technology. As such, the structure is suitable for comparison with its digital counterpart.
B. Analog Matched Filter Power Estimate
The power consumption of the analog matched filter is a function of the signal-to-noise ratio (SNR), where the SNR is defined as the ratio of the maximum signal swing to the physical self-noise of the analog electronics. This relationship is analogous to considering the power consumption of the digital matched filter as a function of the quantization level. The analog SNR and the digital quantization level are related via SNR (11) where is the data quantization level used in (2). The static current of the multiplier/tap structures and the dynamic switching of the CCD gates are the dominant sources of power dissipation within the analog matched filter. The dominant noise sources are the thermal noise of the tap and multiplier FET's and interface trapping effects under the CCD gates [33] , [38] , [39] . The problem of flicker noise is assumed to be alleviated by a correlated-double-sampling amplifier [40] on the matched filter output. With these assumptions, and with the aid of the CCD signal capacity and signal quality equations given in [33] and [39] and the CMOS amplifier noise equations given in [38] , an estimate of the power dissipation within the entire analog matched filter is SNR (12) where the thermal energy is in units of joules, is the number of filter taps, is the operating frequency, is the surface state density, is the threshold voltage of the EEPROM and PFET's (assumed to be equal in magnitude), is the charge of an electron, is the oxide capacitance per unit area, and is the minimum surface potential under the CCD gates. Note also that the term in (11) is dependent upon the physical parameters of the CCD fabrication process, while the term is dependent upon the bias conditions of the multiplier/tap FET's.
As noted above, an enormous power savings in the digital circuitry can be achieved by scaling the device dimensions and power supply voltage. These significant savings are generally not realized with the analog matched filter implementation shown in Fig. 2(c) . The decreased anneal time for thinner gate oxides may lead to an increase in surface state density [40] . Thus, even though the switched capacitance of a scaled analog matched filter may be smaller than the unscaled filter, the supply voltage may need to be increased to offset the negative effects of the increased surface state density within the scaled device. Thus the effect of scaling on the power consumption of the analog matched filter is process dependent. Similar, more general conclusions pertaining to analog circuits and scaling issues are drawn in [42] .
The choice of supply voltage is more flexible in the analog filter than in the digital filter. This flexibility exists because a matched filter of equivalent performance can be made from a circuit with a higher supply voltage if the CCD gate areas are decreased. For example, in Fig. 4 it is shown that only a small change in the power dissipation of the analog matched filter occurs for wide changes in the supply voltage.
V. COMPARISON OF THE DIGITAL AND ANALOG MATCHED FILTER IMPLEMENTATIONS
Both of the circuit implementations under consideration perform the same operation and use the same physical device-the MOS capacitor-as the primary circuit element. The digital implementation, in treating the MOS capacitorbased devices as binary switches, sacrifices power efficiency for the ability to regenerate weak or degraded signals within the circuit.
The analog implementation uses the fundamental device I-V transistor relationships to efficiently achieve the sum and multiplication functions. However, the analog circuit is unable to regenerate weak or degraded internal signals. Thus, at the cost of additional power consumption, the analog circuit must inject added signal quality (surplus SNR) at the input of the shift register structure, such that the design specifications are satisfied by the degraded signal at the end of the shift register. This difference is depicted in the power consumption equations, which quantify how the power consumption as a function of filter length increases at a quadratic rate for the analog circuit, while at a less harsh, nearly linear rate for the digital circuit.
The digital implementation is more sensitive to changes in circuit speed. This sensitivity is due to the ability of the digital circuit to exploit a decrease in operating frequency by supporting a corresponding decrease in the supply voltage. The digital circuit is also more sensitive to changes in technology. By decreasing the minimum feature size of the CMOS technology, significant power savings may be realized. This behavior is not necessarily true for the analog circuit, since device physics limit the analog circuit more severely than the digital circuit.
Since changes in filter size, technology scaling, and operating frequency have different effects on the power consumption of the analog and digital circuits, it follows that there may be points in the design space where the digital circuit is more power efficient than the analog circuit, and still other points in the design space where the opposite may be true. The plot shown in Fig. 5 confirms this expectation. For a given effective quantization level and for given process parameters, a surface exists in the multidimensional design space that defines the boundary between the regions of superior power efficiency of an analog implementation and a digital implementation of a matched filter. Inside the volume in Fig. 5 , the digital implementation is more power efficient. Outside the volume, the analog circuit is more power efficient. Also note that the top flat region in Fig. 5 (and also in the plots of Figs. 6 and 7) is an artifact of the plotting routine; only sampling frequencies The data in Figs. 5-7 are derived by setting (10) equal to (12) and numerically solving for the sampling frequency at which the equality is satisfied. The surface of equal power dissipation changes position for variations in the quantization level (SNR) and the quality of the analog fabrication process. The plot in Fig. 6 shows that a digital circuit implementation becomes more preferable to an analog circuit implementation as the level of signal integrity (that is, the quantization level or effective SNR) is increased. In Fig. 7 , a range of positions for the surface of equal power dissipation is shown for four currently available analog processes, with the analog process quality ranging from that of a commercial imaging-CCD foundry in Fig. 7(a) to that of an inexpensive digital process (not optimized for analog circuits) in Fig. 7(d) . Specific estimates of the power dissipated by the digital and analog circuit implementations may be obtained with (10) and (12), respectively.
VI. CONCLUSIONS
The analog circuit implementation described in this paper is more power efficient for shorter, faster matched filters, and, conversely, the digital circuit is more power efficient where the filters are longer and slower. These generalizations, when coupled with the specific information contained in the digital and analog power equations, (10) and (12), indicate which circuit-implementing technology yields a more power efficient matched filter for a given set of system and process parameters. These concepts and the preceding analyzes may be applied to electronic circuit design in general and provide insight into why the preferred implementation of a 64 bit multiplier or a 2 GHz modulator/upbander is straightforward, while determining the most power efficient means to implement a programmable parallel matched filter requires close examination.
