High performance architectures for the data intensive and latency restrained applications can be achieved by maximizing both parallelism and pipelining. In this paper, the CORDIC based hardware primitives of 3-D rotation with high throughput 3-D vector interpolation are presented. The proposed architecture for 3-D vector interpolator, which is based on the redundant CORDIC arithmetic, has been implemented by VLSI.
Introduction
Flexible hardware along with precision control is much desirable for the power-aware 3-D graphics rendering applications. In [1] , 3-D vector interpolation is required. The 3-D vector interpolator of Euh et al. provides multiple precisions for the design of poweraware systems [2] .
The well known CORDIC algorithm, which has been applied with a great success to the hardware implementations of many signal processing tasks, e.g. sine and cosine generation, vector rotation, coordinate transformation, and linear system solving, is suitable for the implementation of 3-D vector interpolation [3] - [4] . In CORDIC, only simple shifters and adders are needed, which can be realized by the use of reconfigurable hardware platforms, especially by FPGA [5] . Thus, the CORDIC-based 3-D vector interpolator is more flexible for the interpolation task.
In this paper, the architecture of 3-D vector interpolator based on the CORDIC algorithm is proposed. It is suitable for VLSI implementation in terms of the computational complexity.
The remainder of the paper is organized as follows. In section 2, the conventional CORDIC algorithm is reviewed. In section 3, the 3-D CORDIC algorithm is given. The proposed VLSI architecture of 3-D vector interpolator based on the CORDIC rotation algorithm is presented in section 4. Its analysis is given in section 5, and the conclusion an be found in section 6. where m denotes the circular (m=1), linear (m=0) or hyperbolic (m=-1) coordinate system, i=0, 1,2,…., n-1, 
The CORDIC Algorithm
where the rotation direction is defined by . The relationship between the Cartesian coordinates and spherical coordinates of R and S are given by
3-D CORDIC Algorithm
Equations (9), (10) and (11) (6), (7) and (8), equations (12), (13), (14), (18), (19) and (20) can be computed by using the following set of CORDIC rotations. 
, which is shown in Fig. 3 . Thus, the proposed architecture is composed of the auxiliary generator ( 0 U , 0 V , 0 W ), the redundant CORDIC arithmetic (for the computation of 3-D vector interpolation), and dual-memory banks (for storing the coordinates (
) and (
), respectively). The hardware code of the proposed system is written in Verilog-hardware description Language (HDL) [7] . The system diagram is shown in Figure 4 . The chip is synthesized by TSMC 0.18 m µ 1P6M
CMOS cell libraries [8] . The layout view of the 16-bit 3-D vector interpolator is shown in Figure 5 . The gate count is reported by the Synopsys ® design analyzer.
The power consumption is reported by PrimPower 
Advantages of New Architectures and Algorithms
The Euler angle method takes a sequence of three rotations [2] , [9] , each of which rotates with respect to one of the three orthogonal axes. This method can be represented by the Euler angles corresponding to the sequence of rotations with respect to the coordinate axes. In [2] , the 3-D rotation is implemented by cascading two 2-D CORDIC processors. Lang and Antelo developed a method to replace the two 2-D CORDIC processors by one 3-D CORDIC processor [7] . The sequence of rotations is composed of one 2-D CORDIC rotation followed by one 3-D CORDIC rotation. Both of the aforementioned methods require more than two 2-D CORDIC computations. In the proposed 3-D rotation algorithm, the architecture based on the conventional CORDIC processor requires one 2-D CORDIC computation in parallel. The auxiliary generator of coordinate ( 0 U , 0 V , 0 W ) and the redundant arithmetic CORDIC for 3-D rotation can perform in parallel.
Conclusions
High-throughput architecture for the 3-D vector interpolation task based on the CORDIC algorithm is presented. It takes only one conventional CORDIC computation time.
The proposed architecture by the use of CORDIC processor is simple, regular and therefore suitable for VLSI implementation. In power-aware 3-D graphics rendering, the performance of 3-D vector interpolation can be improved by using the proposed algorithm and architecture. Table 1 shows the comparison of this work with Eberly [10] and Lang and Antelo [9] . 
