Sigma-Delta Signal Processing or SDSP has been proposed as a method for reducing system costs by eliminating the decoding of a bitstream prior to processing. In this paper we examine the design problems inherent in this and analyse the tradeo to the more conventional approach through the study of a bitstream FIR lter. We nd that the system imposes particular constraints on the design of the digital modulator used to remodulate the FIR lter output. We nd also that the system cost of the SDSP FIR lter is less than that for the decoded PCM lter below a certain number of taps, currently estimated at 50. We also present the design of a VLSI demonstrator that implements 16 FIR taps and remodulator, has 16 bit dynamic range and is cascadable for higher lter orders.
Introduction
Sigma-Delta ( ) modulation is frequently used in the conversion of signals between the analogue and digital domains. This is largely because the low resolution format simpli es the analogue circuitry, and decoding is by means of linear ltering.
In a typical signal processing system an analogue input is to be digitised for subsequent processing, storage and transmission and often, nally, conversion back to analogue format.
Conventionally, representation in Pulse Code Modulation (PCM) format is an intermediate stage in this.
As a proposition, let us examine the role of the PCM format in the system. Although the conversion of the stream to PCM is through simple linear low-pass ltering, and there are many standard DSP algorithms to be used for signal conditioning, PCM itself is rarely the target format of the conversion. Instead some processed (e.g. ltered) source or channel coded digital representation or analogue representation is the target. In this case we might question whether decoding to PCM is necessary or whether the bitstream may be instead directly processed to the desired target, and what the saving in system costs, if any, are. This is the foundation of our study of SDSP.
This paper represents a rst step in this study; the system where the target format is analogue and the processing is linear ltering (realised in FIR form) is considered. A Page : 3 public address system requiring sophisticated signal processing a ords a practical example.
Speci cally we examine the replacement of the system in gure 1 with the system in gure 2. There are some important issues involved in this. In gure 1 there are xed overheads, independent of the complexity of ltering performed. The cost of converting to PCM can be high, for example a 4096 tap FIR lter in 1] for audio applications. There will also be the cost of the upsampling for the nal digital to analogue conversion. However, as will be shown, the VLSI costs, per tap, of a highly oversampled bitstream FIR lter are greater than a PCM FIR lter. Thus one issue we address is this tradeo . Another relates to the modulator following the FIR lter in the SDSP system. There are particular requirements of the modulator for remodulating SDSP signals as opposed to baseband modulation. This modulator is used either immediately prior to D/A conversion or to simplify the interconnection of subsequent stages of bitstream processing (c.f. loop block of gure 2). Because a FIR lter operating on the input bitstream produces a multibit output at the oversampled rate with considerable noise power outside the audio band, a modulator structure with a low-pass signal-transfer function (rather than the more usual all-pass transfer function) is essential to reduce noise and to maintain stability. Additionally, it turns out that the remodulator contains the critical timing path of the digital parts of the system and we therefore seek to reduce the complexity of the realisation of its loop lter.
These matters are discussed in detail in section 3.
Section 2 reviews the basic theory and methods of modulation and relevant prior work on bitstream processing.
In section 4 the VLSI implementation of a bitstream FIR lter is presented. We nd that the lter maps into a regular design style that helps to minimise design e ort. Following a general evaluation, the design of a VLSI demonstrator is described and some conclusions regarding the tradeo between SDSP and PCM signal processing are made. is the rst-order coder illustrated in gure 3, where the integrator is implemented using either analogue or digital circuitry depending on the application, and the quantiser is assumed to have a single bit output (i.e. a comparator). The encoder can be thought of as a non-linear control system in which negative feedback and error integration act to force the time-average of the output y n] to track the input x n]. The output bitstream will evidently contain noise due to the quantisation process; this can be modelled using the linear approximation indicated in gure 4. Simple loop analysis leads to the relation:
describing the system output in terms of the desired signal X(z) and ltered quantisation noise E(z). Clearly the input signal is left unaltered save for a simple delay, while the noise has been di erentiated or high-pass ltered. For a su ciently high sampling rate, the noise is shaped to lie outside the audio band, and can thus be ltered to increase the baseband signal-to-quantisation noise (SQNR) ratio.
Unfortunately, to achieve adequate SQNR for high-delity digital audio, the sampling rate required using simple integration is prohibitively high, thus attention has been focussed on higher order loop lters. However, whereas stability in rst and second order modulators is assured, in higher order modulators it is a factor in the design of the loop lter.
Their behaviour determines both noise performance and stability, the design goal being to maximise the degree of noise-shaping while still maintaining adequate stability margins.
A fundamental problem in analysing high order modulators is the model for the quantiser, which for a single bit system only weakly approximates the linear model of gure 4. In 2] the quantiser is modelled by a signal dependent gain and a stochastic noise source. For quantiser inputs that are approximately gaussian, they show that for low-pass inputs (modelled as DC) the added noise variance is determined entirely by the input and quantiser step-size, while the AC loop-gain is inversely proportional to the quantiser input variance. In a standard modulator, the quantiser input variance is determined by the system input and the fedback quantisation distortion signal; stable operation establishes a balance between these and the quantiser gain. For systems with extra noise sources (e.g. internal dither, or externally generated high-frequency noise), the quantiser input variance is increased. This has a detrimental e ect on stability via a reducing AC loop gain, and must be considered in the design of the modulator.
Signal Processing of Bitstreams
This was rst reported in 1973 in 3] which dealt with Delta Modulation (DM) bitstreams.
A useful bibliography for the pre-90s activity on bitstream ltering appears in 4].
Little work to date (see 5] for an exception) has examined coding both the input signal and the lter coe cients in a format.
Since the early 1990s a signi cant body of work has appeared, dealing with a variety of signal processing algorithms implemented to process a signal bitstream. 15, 20] and present a wide range of the potential applications, though somewhat brie y.
The work described in this paper follows that of Wong and Gray 6] whose SDSP FIR lter is illustrated in gure 5. While they considered the cases when either or both the signal and coe cients are represented by single bit words, we address our attention solely to the (more interesting) case when the input signal is a bitstream and the lter's coe cients are xed or oating point.
The number and values of the coe cients are identical to the PCM equivalent system, however the impulse response is zero-interleaved by the z ?R delays, where R is the oversampling ratio. For R = 8 this results in the frequency response of gure 7 for a lter with the same non-zero coe cients as a Nyquist sampled PCM lter with frequency response of gure 6. The periodicity in the frequency domain is evident and is the result of zero-interleaving the impulse response. The need for a high order lter (as would be expected when designing for such a restricted part of the processing band) is hence avoided by recognising that the lter response in the bitstream noise band is arbitrary.
The key to hardware implementation is that the tapped delay line is fed by a modulator. Each of the multiplies then becomes a multiplexer at the expense of more Page : 7 adds per sample in the accumulation due to oversampling. One method for implementing the lter 3] uses a ROM storing all possible output words addressed by the tapped delay outputs (V n , V n?R ...). As Wong and Gray pointed out, though, the ROM size grows exponentially with lter order, becoming prohibitively large for high N. As a solution, they suggest breaking the convolution down and using smaller ROMs addressed in parallel for the partial sums, and adders to generate the nal output. However, the full consequences in terms of VLSI area, speed and power have still to be assessed. Our approach (section 4) avoids this problem by realising the bitstream FIR lter in a more conventional way, with adders.
A second key issue is the design and implementation of the second modulator. Since the adders of the FIR lter must operate at the oversampled rate, it is desirable for the oversampling ratio to be as low as possible, necessitating the use of a high order modulator.
The principle di erence in designing for SDSP FIR output signals as opposed to baseband audio signals is the large amount of high frequency energy present in the input (c.f. gure 7). If fed un ltered to the quantiser, this noise dramatically increases the quantiser's input variance, reducing the AC loop gain and hence the system's stability margin.
To avoid this we propose the use of a modulator which low pass lters the input signal and has a high pass error transfer function. The stopband attenuation required of the signal transfer function depends on the power gain of the FIR lter. Figure 15 . This is a tap-slice which has been pipelined at the word level by a partial transposition of the direct form lter of Figure 5 and it is clear that this can lead to a very regular layout, with minimal overhead in terms of control and memory access Page : 11 circuitry. Without exploring the PCM architectural possibilities, an indication of the relative VLSI areas of the SDSP and PCM systems can be found by considering the computational requirements, in terms of memory and arithmetic, of both systems through examining the number of additions required per unit sample period T. However, it should be noted that, as is well known, actual VLSI area depends not only on logical complexity but also on regularity.
We suppose that the data and coe cient wordlengths are each 16 bits and formulate the memory and arithmetic hardware requirements in each case. In both cases 16N bits of memory are required to store the coe cients. The data storage is 16N bits in the PCM case and RN bits in the SDSP case.
Assuming that the execution of a 16 bit multiplication is given by 16 additions, the PCM lter is required to perform 16N additions in time T and the SDSP lter N additions in time T/R, ie. NR additions in time T.
For R=16 the PCM lter, using adders of the same speed and function, implies an increased system cost over SDSP because of the cost of control and memory access circuitry and loss of regularity. By the same token, for R=64 the SDSP lter will be larger than the PCM lter by a factor that is no more than 4 and likely to be signi cantly less.
The full bitstream-in bitstream-out PCM system has VLSI area given by A0 + N c A1, where A0 is the area of the decimator and interpolator and A 1 is the area per FIR tap.
Given that the VLSI area cost per FIR tap in the PCM system is less than that for the SDSP system, there therefore exists a number of tap N c above which the area of the SDSP system exceeds that of the PCM system. This value has some error attached to it by virtue of dependence on architectural and implementation details. A better approximation is derived below.
Design of a Demonstrator Chip
In order to improve upon the estimate for N c and indeed to prove the design and evaluate the speed, power and the objective and subjective noise performance of the system, we have designed a demonstrator chip that contains a low order (16 tap) lter and a fourth order digital modulator as described in section 3. As is shown below, the design is such that higher order lters can be formed by cascading devices, with the cost of unused modulators in all chips but the nal one.
The desire for accurate estimates for the area and time requirements of the design in current VLSI technology indicates a full custom design. However, even for a regular structure such as the SDSP FIR lter, the design time for a fully hand crafted design Page : 13 is considerable. Additionally, the modulator is less regular. Therefore a standard cell .) The core area was 4:1mm 2:8mm. Of this, 42% is taken up by the FIR lter, whilst the modulator uses 14% and the delays require 44%. The design contains 2826 standard cells and requires a total of 70 I/O pads with eight dedicated for use as power and ground for pad and core cells. Switch level simulations were performed to fully verify the functionality of the circuit. It is estimated that the design will have a typical sample rate of around 15Mhz, which is easily within the requirements for the design. It will be possible therefore to reduce the power dissipation of the chip by lowering the supply voltage.
To use the areas from our design in an improved comparison of SDSP and PCM, requires some art in estimation since the areas of the decimator and interpolator known to us are for previous full custom 1:5 m and 2:0 m designs respectively, and our SDSP design area is for 1 m standard cell design. However scaling down the decimator and interpolator designs to 1 m equivalents, by simple scaling, gives an estimate of 20:8mm The fact that this is close to the gure obtained using the complexity analysis above suggests that a very accurate analysis, requiring synthesis of both lter forms with control for design e ort and style and technology, while in principle desirable, may not be particularly enlightening.
The VLSI design of an SDSP FIR lter chip has been presented. It has been shown that for an oversampling rate of 64 with 16 taps, the VLSI area of the SDSP system is approximately the same as that of the decimator and interpolator in the PCM system. It has been estimated that the SDSP system requires fewer additions per unit time when compared with the PCM system for less than 72 FIR taps. Evaluations based on design and estimation also suggest a similar result, showing that that the SDSP system requires less VLSI area than the PCM system for FIR lters with less than 79 taps.
Our study of FIR lters represents a rst but signi cant step towards a full understanding of the potential for SDSP. We have examined a number of the fundamental and practical 
