CORDIC based IIR digital lters possess desirable properties for VLSI implementations such as regularity, local connection, low sensitivity to nite word-length implementation, and elimination of limit cycles. Recently, ne-grain pipelined CORDIC based IIR digital lter architectures which can perform the ltering operations at arbitrarily high sample rates at the cost of linear increase in hardware complexity have been developed. These pipelined architectures consist of only Givens rotations and a few additions which can be mapped onto CORDIC arithmetic based processors. However, in practical applications, implementations of Givens rotations using traditional CORDIC arithmetic are quite expensive. For example, for 16 bit accuracy, using oating point data format with 16 bit mantissa and 5 bit exponent, it will require approximately 20 pairs of shiftadd operations for one Givens rotation. In this paper, we propose an e cient implementation of pipelined CORDIC based IIR digital lters based on fast orthonormal -rotations. Using this method, the Givens rotations are approximated by angles corresponding to orthonormal -rotations, which are based on the idea of CORDIC and can perform rotation with minimal number of shift-add operations. We present various methods of construction for such orthonormal -rotations. A signi cant reduction (over 70%) of the number of required shift-add operations is achieved. All types of fast rotations can be implemented as a cascade of only four basic types of shift-add stages. These stages can be executed on a modi ed oating-point CORDIC architecture, making the pipelined lter highly suitable for VLSI implementations.
I. INTRODUCTION
CORDIC based cascade orthogonal IIR digital lters 1]{ 4] are digital lters whose internal computations consist of only orthogonal transformations. These lters possess desired properties for VLSI implementations such as regularity, sharp transition band, low sensitivity to nite precision arithmetic, elimination of limit cycle and over ow oscillations, and stability in spite of parameter quantization. Recently, ne-grain pipelined CORDIC based IIR digital lters were developed. These lters can perform the ltering operations at arbitrarily high sample rates at the cost of linear increase in hardware complexity 5] 6]. The pipelined lter architectures consist of only Givens rotations and a few additions which are suitable for CORDIC based VLSI implementations 7] .
The implementation complexity is mainly determined by the complexity of the rotation evaluation or angle computation. Di erent approaches for implementing the rotations, and modifying the rotations to reduce the This paper presents an e cient implementation of pipelined CORDIC based IIR digital lters based on fast orthonormal -rotations. Using this method, the Givens rotations are approximated by angles corresponding to orthonormal -rotations, which are based on the idea of CORDIC and share the property that the rotation requires a minimal number of shift-add operations. We present various methods of construction for such orthonormal -rotations. A signi cant reduction (over 70%) of the number of required shift-add operations is achieved. All types of fast rotations can be implemented as a cascade of only four basic types of shift-add stages. These stages can be executed on a modi ed oating-point CORDIC architecture, making the pipelined lter highly suitable for VLSI implementations.
II. Pipelined CORDIC Based Cascade IIR Digital Filters
The CORDIC based IIR digital lters are developed for the realization of any stable, passive digital rational real transfer function in a cascaded interconnection of orthogonal sections. Each orthogonal section realizes one real zero or a pair of complex conjugate zeros of the transfer function. The cascade implementation leads to low sensitivity to nite word-length truncation in the lter stop band; while the orthogonality of the lter guarantees the low sensitivity in the lter pass band. Therefore, these lters have good numerical properties over the entire frequency band 4] 5]. A typical fourth-order CORDIC based IIR digital lter architecture is shown in Fig. 1 6] . Notice that the critical path in Fig. 1 goes forward and backward through the entire lter structure, which contains 7 multiplications and 7 additions. The maximum sample rate is limited by the computation time in the feedback loop. In order to further increase the maximum throughput, Finegrain pipelined CORDIC based IIR digital lter architectures have been developed using the constrained lter design and polyphase decomposition technique 6]. A three-level pipelined 12th order CORDIC IIR digital lter architecture adapted from the reference 6] is shown in Fig. 2 , which is pipelined at ne-grain level with a linear increase in terms of number of CORDIC units. In practice, implementations of Givens rotations using exact CORDIC arithmetic can be expensive. For example, using oating point data format with 16 bit mantissa and 5 bit exponent, it will require approximately 20 pairs of shift-add units for one Givens rotation. This is mainly due to the fact that all angles are realized using the same number of -rotation stages which is usually not necessary. In this paper, we present Givens rotation implementations using the so called fast orthogonal -rotations 15] 16]. Using this method, the CORDIC based IIR digital lters can be implemented with signi cantly lower hardware complexity.
III. Fast Orthonormal {Rotations
The Givens rotation is a planar orthonormal rotation de ned by the G matrix in ( 
where 2 f?1; +1g determines the direction of rotation. The orthonormal {rotation or the fast rotation is an approximation to the angles of rotation de ned in G. The {rotation is de ned by the F matrix in (1), andm represents the scaling factor of the matrix. Theĉ;ŝ are pairwise approximations of a sine/cosine pair, satisfying 0 ĉ;ŝ 1, and are chosen such that 1. The multiplication withĉ andŝ is cheap to compute, or that the combined evaluation of the rotation has a cheap implementation, limited to only a small number of shift-add operations. 2. The error in scaling is smaller than the required accuracy. When this is the case, the e ect of the error in the scaling is overshadowed by the rounding error in the computation. The angle of rotation = is determined by 2 f?1; +1g, and by the absolute angle of rotation , with < =2, which is xed through the choice of theĉ;ŝ pair as: = arctan ŝ c . We will present two classes of methods, namely the unscaled and scaled orthonormal {rotations, and select four methods from them, taking into account the cost involved for performing such rotations. 
and the scaling e ect of the rotation can be neglected. Solving (2) for the working limit W by substituting the actual value ofm for method I; II, III in Table I in (2) : (3) Similar to the unscaled methods, one can derive the working limit W for method IV , which is a function of both the accuracy n m and the number of scaling steps m, as shown in Table I . In a similar way, The limit M, which represents the minimum number of scaling steps m necessary to reach the required accuracy for a given mantissa n m and angle index k can be computed as M = 
IV. Realization of orthonormal -rotations Four basic types of shift-add stages, shown in Fig. 3 are su cient to implement any kind of fast rotation. Each stage implements a pair of shift-add operations. In turn, these four types of stages are combined in the uni ed stage, as also shown in Fig. 3 , which forms the basis of the fast orthonormal -rotation architecture.
We illustrate the realization architecture for Method II. From Table I 
The rotation of (5) is realized as a cascade of simple stages, as shown in Fig. 4 Fig. 4 (a) and 4(c), respectively. The scaled rotation method IV consists of a double rotation, implemented as two rotation stages, followed by a variable length scaling sequence. This is shown in Fig. 4(d) , for a scaling sequence of length 3.
V. Fast rotation implementations of pipelined CORDIC IIR filters
In this section, we employ the fast orthonormal {rotations to implement the pipelined CORDIC IIR digital lter presented in Fig. 2 . Since the {rotation is an approximation to the orthonormal Givens rotation, there exists a trade-o between the implemented lter performance and the hardware complexity. In general, the more accurate the approximate angle is, the better the implemented lter performance compared to the original one. But this will lead to a higher hardware complexity. Table II are shown in Fig. 5 , where the solid line denotes the original lter spectrum and the dashed line denotes the approximated spectrum. Here, we can see that the implementation cost has been dramatically reduced by using {rotation approximation compared to the ordinary CORDIC rotation, while the desired lter performance can still be achieved. For example, in case 3, there is almost no loss in the magnitude of lter frequency response, but the implementation complexity is reduced by over 70%. In case 4, approximation error below 1% is achieved with over 65% cost reduction. Fig . 6 shows the relationship between the approximation error and the implementation cost. In general, in order to achieve high performance, large number of shift-add operations will be needed. Notice that the approximation error drops dramatically when the number of shift-add pairs is in the range of 76|82, 109| 115, and 125{139. In particular, when the number of shift-add pairs is beyond 130, the error is below 1% and the lter converges quite fast. This is partly due to the orthogonality of the lter which guarantees lowsensitivity to nite word-length truncation. In practice, the {rotation approximations are chosen to meet the requirements of both the hardware complexity and the lter performance.
VI. Conclusion
In this paper, an e cient implementation of pipelined CORDIC based IIR digital lters using fast rotations is presented. Di erent types of orthonormal -rotations are used as approximate rotations. Four basic types of shift-add stages are proposed which can be cascaded to implement all types of fast rotations. The fast rotation implementations of pipelined CORDIC IIR lters lead to a signi cant reduction (over 70%) of the number of required shift-add operations, which makes these lters highly suitable for VLSI implementations. It is noted that the fast rotation method presented in this paper also apply to many other Jacobi-type algorithms such as singular value decomposition (SVD) 17], eigenvalue decomposition (EVD) 14], and FIR lter banks for image coding 18] problems.
