Introduction
Allpass sections have been shown to be useful building blocks for a variety of digital filters [1] . They possess many desirable properties from the points of view of algorithmic design and implementation. In this paper we will concentrate on implementation and application to structures suitable for use in one-dimensional subband coding.
Implementation
The second order allpass section is the fundamental design unit for filters. In Figures 1  and 2 of [1] , it was shown how to realise it with two-port adaptors. Such a realisation, to a certain view, minimises the use of computational resources, having one multiplier per filter order. For VLSI, complexity will in general depend on the number and wordlength of the arithmetic operations involved in the filter. Thus one multiplier per filter order, coupled with the known short coefficient wordlengths possible, promises efficient implementation. Because the sections are recursive the maximum speed (or minimum power through supply voltage scaling [2] ) will depend on the critical recursive path. Inspection of the cited figure reveals that this contains two multipliers and six additions. There is an alternative realisation of the second order allpass section, based on 3-port adaptors [3] , which has a much shorter critical path. Here this will be reviewed and the pipelining method for the 3-port [4] will be explained in simple stages. 
It has proved to be advantageous to transform the SFG to that shown in figure 1(c) . This allows the graph to be partitioned into two almost identical units for the recursive part and an output unit of two adders. This increased modularity is useful for VLSI implementation. The critical path may be extracted from figure 1(c) and is shown in figure 1(d) . The register transfer is through three adders and one multiplier, which compares favourably with the situation for the section realised with two port adaptors. before pipelining, as will now be shown. Using simple implementations, the bit level view of an adder is shown in figure 2 (a). There are many ways to synthesise multipliers. Space does not permit a full exposition here, but we have found that, of the simple arrays the lowest latency in practice [5, 6] (which proves to be important) is given by the array shown in figure 2 (b). The cells are standard multiplier cells, with the logical AND of the appropriate broadcast bits of x and γ added to the accumulating partial product sum, which runs vertically. Carries run horizontally and in this array partial products are accumulated most-significant first. Now it turns out that this two dimensional array can be cast as an equivalent one-dimensional array (like an adder) with the partitions shown by the faint dotted lines in figure 2(b) . The rule for doing this is that there is no path of greater length than one cell within any one partition. The result is figure 2(c), which is an array, where, because of the above rule, the delay between any input and input bit is given by the number of cells in the path. The latency, L, of the array is defined as the number of cells which have an input bit but no output bit and for this array is p − 1. Other multiplier arrays can be treated similarly, different values of L result. Figure 3 shows the arrays of figure two applied to the repeated unit in the critical path of the 3-port, figure 3(a) . Replacing the adders and the multiplier with equivalent one-dimensional arrays results in figure 3(b) . This two dimensional array can, as with the multiplier, be cast as an equivalent one-dimensional array, with the partitions shown. The result is figure 3(c). Therefore the critical path delay is much less that that estimated above, being p + 2 + d. Applying this structure leads to the observation that the (important part) of the 3-port, with data wordlength d is equivalent to a cross-connected, registered pair of one-dimensional arrays with some latency (p + 2). This is shown in figure 4 . The final part of the story is that there is a faster structure than this, obtained by some bit-level retiming. Firstly it is useful to redraw figure 4 with the recursion unfolded, figure 5 . In this figure A refers to left hand array and B the right hand array. This is just an easy way to see all the paths through the system graph. Functionally equivalent systems can be obtained by repositioning the delays and using them as pipeline registers to break the arrays of cells into blocks. The constraint for equivalence is that the number of delays in a path between cells in the same (A or B) array must always add up to 2 (If there is any doubt about the equivalence -it has been confirmed in simulation). This is satisfied by the arrangement shown in figure 6 , which is one of a number of possibilities and is the best idea to date, and probably optimal. Because of the staggered arrangement of delays, the critical path is just L + 1 = p + 3. There is now no dependence of maximum sample rate on data wordlength, and short coefficient wordlengths achieved in the design phase are rewarded by high speed in the implementation.
Design and Implementation of Multirate Filterbanks
Signal sub-band coding is achieved by intelligently quantizing a set of signals containing frequencies only within given bands. For example standard MPEG audio coding splits the signal into 32 uniformly spaced bands (each decimated by a factor of 32). A standard benchmark for candidate filters for this task is the quality with which the signal can be reconstructed by synthesis following analysis into the subbands. Therefore perfect reconstruction FIR quadrature mirror filter (QMF) banks are widely used for this [7] . However, as always, we argue that our IIR QMF bank having the same stopband energy will be of lower order, giving reduced complexity Additionally the low overall delay of IIR QMF banks is useful when meeting certain CCITT standards [8] . The allpass-based IIR QMF bank is free from aliasing and amplitude distortion, but has phase distortion (PHD), which can be reduced by using a separate allpass equaliser once the signal has been reconstructed [7] . An alternative, which may produce a lower overall system delay than the equaliser method, is to minimise PHD by using allpass filters having approximately linear phase (ALP). Good ALP has been achieved for single rate filters designed [9] .
The prototype structure for our investigations is the two-channel analysis-synthesis (reconstruction) system shown in figure 7 . The allpass subfilters A 0 (z) and A 1 (z) are to be designed with ALP However unlike the lattice wave digital filters considered in [9] where ALP is only required in the passband, QMF banks require good ALP across the entire frequency response. In order to achieve this, the subfilter phase responses are split by π/2 either side of the average linear phase of the filter. The branch orders differ by one, and the highest order branch must split below the average phase. The phase specification is shown in figure 8 . With this development designs can be produced with a direct design method followed by finite wordlength optimisation via simulated annealing [9] .
In order to establish the relationship between the phase distortion, filter order and coefficient wordlength, a large number of finite wordlength designs were undertaken.
The method is applied to the design of the system of figure 7 with a stopband edge frequency ω = 0.64 π and a stopband attenuation of 33 dB. Figure 9 shows the maximum group delay error vs. coefficient wordlength for various filter orders. It can be seen that for a given order, the delay error cannot be significantly reduced further when the coefficient wordlength is increased above 8-bits. Referring to the discussion above, this is good from the implementation point of view.
Higher order QMF banks can have virtually negligible phase distortion. A 47 th order design was performed with 8 bit coefficients, which achieves a 44dB stopband attenuation and a delay ripple of X0.0269 samples. This compares favourably with the 47 th order design obtained in [10] using an FIR prototype filter followed by optimisation where the maximum group delay distortion was X0.0309 samples with 42.5 dB stopband attenuation and coefficients implemented using floating point arithmetic with 12-bit mantissa.
Three port adaptors designed using the method shown above have been used together with a configurable sequential architecture to implement in VLSI multirate filterbanks using the ideas presented here [11] .
Other work in progress is examining noise and distortion in multi-channel systems and evaluating the allpass based filterbank within an MPEG music codec.
[11] S. Summerfield and C. K. 
