The lattice wave digital lter (LTWDF) is known to be especially suitable for high speed ltering applications because of the low coe cient wordlengths required. The authors present a comparison between several architectures of the LTWDF using redundant and non-redundant arithmetic. It is shown that architectures using non-redundant arithmetic require up to 2.5 times less VLSI area, whilst o ering higher sample rates for most practical ltering applications.
Introduction
The lattice wave digital lter(LTWDF) and other lter structures composed of a parallel connection of allpass sub lters (PCAS) have many desirable properties that make them attractive alternatives 4] to other recursive lter types.
High speed architectures for recursive digital lters including the LTWDF have attracted considerable interest recently [2] [3] [4] [5] [6] [7] [8] . Unlike non-recursive architectures which can employ ne grained pipelining to achieve very high sample rates, recursive lters su er from a reduction in the sample rate because of the increased latency introduced in the feedback loop. Lookahead techniques 8] are not suitable for use with LTWDFs as the computational e ciency and low coe cient sensitivity are lost. However the low coe cient wordlengths in LTWDFs 3, 4] lead to short critical paths, enabling high speed ltering applications to be targeted.
This letter considers the relative VLSI cost for several high speed architectures of the LTWDF using redundant and non-redundant arithmetic.
i 2 Architectures
The main computational block of the LTWDF is the 2 nd order allpass section. This has been conventionally realised with 2-port adaptors and some high speed architectures have been proposed 3], 7]. Considerable prior research has focused on the performance of digital lters implemented using many types of arithmetic. Two types have proved to be particularly suitable for the LTWDF. One is based on performing operations least-signi cant-bit rst, using non-redundant arithmetic with a transposed Pezaris carry-save array 3]. The other type uses redundant (signed-digit) arithmetic and performs operations most-signi cant-bit rst 7]. Both schemes are suitable for high-speed applications because the arrays have low latency. The lsb rst array has a latency of p + 2 full-adder delays (where p denotes the coe cient wordlength), whilst that of the msb rst array is xed at 9 full-adders. So the lsb rst architecture has a higher sample rate for coe cient wordlengths less than 8 bits.
Low latency can be exploited using a block-level pipelining scheme to yield a critical path that is data wordlength independent. The arrays are pipelined between the rows to form blocks, where the block size is equal to the latency. The block-level pipelining scheme in 3] for the 2 nd order allpass section is shown in gure 1(b). This has a critical path equal to the latency of 2 blocks. The cell-level view of a block for the lsb rst array is shown in gure 2(b). Data ows from the top to bottom, left to right. The circles and squares represent full-adders optimised for carry-save and carry-ripple characteristics, respectively. Recently it has been shown 1] that by using the 3-port series adaptor instead of two 2-ports, the sample rate can be increased by up to a factor two. The coe cient wordlength must be increased by one bit to maintain performance 1]. Note that the coe cients in the 3-port structure lie in the range 0 < i < 2, whilst 2-port coe cients lie in the range ?1 < i < 1. An implementation using non-redundant lsb rst arithmetic can be shown to have a critical path equal to p+3 (see gure 2a) 2]. This is achieved using the block level pipelining scheme in gure 1(a).
In order to design an msb rst architecture for the 3-port adaptor, it is necessary to determine the computational latency 6]. For the 3-port architecture, dlog 2 (4 3 j j max + 4e = 5, whilst for the 2-port, dlog 2 (4 2 j j max + 4e = 4. The latency of the msb rst array is dependent on the coe cient range but independent of the coe cient wordlength p.
ii The pipelining scheme for msb rst IIR lters recently presented in 5] can be applied to the 3-port msb rst array, giving a critical path of 22 full-adders. Application of the pipelining scheme in gure 1(a) to the msb rst array, yields a superior architecture with a xed critical path of slightly more than 18 full-adders. The cell level view is shown gure 2(c).
In terms of maximum sample rate, a comparison between the 2-port msb and 3-port lsb architectures should be made. The cross-over point at which both architectures have equal sample rate occurs when p+3 = 18. So the msb rst architecture has a superior sample rate only for coe cient wordlengths greater than 15 bits. Because LTWDFs have low coe cient sensitivity, many lter speci cations can be met with as little as 6-bit coe cients. But even for stringent magnitude and delay characteristics, coe cient wordlengths greater than 15-bits are rarely required.
Relative VLSI cost
The above designs were implemented in ES2 1 m standard cell CMOS technology using the Preview Place and Route tools in Cadence DFWII, with a data wordlength d = 18 and p=4 (2-port), p=5 (3-port). Skew and deskew circuitry was included.
The maximum sample rates of the lsb and msb designs were found to be 60Mhz and 26:3Mhz, respectively. The core area of the lsb 3-port design was 2444 2078 m 2 . The msb 2-port design had a core area of 3430 3716 m 2 , 2.5 times the area of the lsb design. The VLSI requirements for the 3-port design are as follows: adders-44%, delays?8%, partial product generators-19% and over ow circuitry-29%. The VLSI requirements for the 2-port design are as follows: adders-66%, delays?26% and partial product generators-8%.
The main reasons for the msb rst design having up to 2.5 times more area are therefore because of the more complex partial product generators and the over ow cells (optimised for area), which are required because of the redundant arithmetic. Table 1 lists estimated cell areas (excludes routing) for both designs forvarious values of d and p. 4 
Conclusions
It has been shown that the use of msb rst arithmetic for LTWDFs is unlikely to be worthwhile. The resulting architectures require up to 2.5 times more VLSI area than lsb rst designs, whilst having a lower sample-rate for most practical coe cient wordlengths. 
