Real-time 3D graphics will be a major power consumer in future portable embedded systems. In this paper we present a 3D CORDIC vector interpolator for power-aware graphics system. This new interpolator supports dynamic control of computing precision with an output accuracy range of 6 to 10 bits. The output precision control of this interpolator exploits the Human Visual Perception (HVP) to mask the image quality degradation resulting from low precision computation. The results of implementation and simulation show a 77% energy savings over a non-adaptive system while sharing no noticeable image quality degradation.
INTRODUCTION
3D computer graphics hardware has been receiving great attention recently. Due to intensive computation and high memory bandwidth, as shown in Chen's work [I] , 3D grpahics application is outranked than other applications in terms of power consumption. In order to ease this power consumption problem, approximate signal processing [ 21 could be well suited for 3D graphics as used in video signal processing systems. In our earlier work [3] , it is preferred to have the hardware block of a certain task to be flexible in terms of precision control for power-aware 3D graphics systems. By varying precision of the hardware block properly, power savings could be realized while keeping the image quality constant to human visual system. Three dimensional vector interpolation required by high quality 3D shading, such as Phong shading [4] , is one functional block where multiple precision capability could help in the power-aware system design.
The CORDIC (Coordinate Rotation DIgital Computer) [5] algorithm is widely recognized as well-suited for hardware implementation and is applied to many digital signal processing tasks: Sine and Cosine computation, vector rotation, coordinate transformation, and even linear functions. This algorithm is especially suitable for power aware implementation as precision can be controlled by changing the number of iterations. Additionally, as CORDIC requires only shifters and adders, its realization on reconfigurable hardware platforms, specially on FPGAs (Field Programmable Gate Array), results in comparable performance to DSP [6] implementations. Since the vector interpolation algorithm required in 3D graphics can be realized with vector rotation, CORDIC could be used in this function block.
Three dimensional vector interpolation used in Phong shading algorithm requires 5 additions, 9 multiplications, 3 divisions, and 1 square-root computation for each interpolation point for vector normalization. Not only are these computations non trivial in terms of power consumption, they require division and square-root computing blocks in addition to multipliers and adders. Replacing this interpolation with a CORDIC architecture could reduce power as well as implementation area. The presented CORDIC vector interpolator is designed with 16 adders and 10 shifters performing up to 140 additions and 80 shift operations for each vector interpolation. The number of computations can be changed over a wide range depending on the way of CORDIC architecture design.
In order to utilize the multiple precision CORDIC architecture for vector interpolation, human visual perception (HVP) characteristics are studied and applied as a control criteria. Unlike typical digital signal processing algorithms with set precision, vector interpolation in 3D shading algorithms can take advantage of the perception limits of the human visual system. The HVP has varying sensitivity to the different temporal and spatial frequencies of an object on the screen. This HVP characteristic makes it possible to run vector interpolation in an adaptive mode varying the precision of computation.
In this paper we present a 3D CORDIC architecture for vector interpolation that supports dynamic control of computing precision with a range of 6 to 10 bits output accuracy.
The control criteria uses the limits of HVP to accept lower image quality resulting from lower computation precision computation. The introduction of these two new concepts, CORDIC and HVP, improves energy consumption in the 3D vector interpolator, by up to 77% with out any noticeable image quality degradation.
0-7803-7587-4/02/$17.00 02002 IEEE
BACKGROUND

CORDIC algorithm
Due to the simplicity of hardware implementation, the CORDIC algorithm [5] has been applied to many digital signal processing tasks. The basic theory of CORDIC algorithm can be explained with two dimensional vector rotation example. Eq. 1 shows the equations for the general vector rotation CORDIC algorithm. x and y are the Cartesian components, z is used for the angle accumulation, Ki is the scale constant for i t h iteration, and (~i is the sign of accumulated angle for ith iteration.
To rotate a vector V(x, y) for angle z, 20, yo and zo are initialized with 2, y and z respectively. After n t h iteration, if z,, has converged close to 0, xn and yn are taken as the components of the rotated vector. The rotation angle for each iteration can be precomputed with t~~l ( 2 -j ) and is stored in a small ROM before starting the process.
3D vector rotation can be calculated by cascading 2D CORDIC blocks [7] , or by introducing redundant variables into CoRDIc algorithm [SI to speed up the computation.
Eq. 2 introduced in [SI is analogous to 2D CORDIC equation except that three additional variables (U, U , w) are used and z is a coordinate component, rather than angle value.
t i 2 -~ + ~j t i p i 2 -~~)
Zi+ 1 = Ki(Zi -wipi2-i) (2) = xi+1 =
Human visual perception
As described in [9, IO], the human visual perception (HVP) model shows that visual sensitivity for an object varies with the spatial frequency expressed in cycles per degree (cpd) and the velocity in degreesper second (dps). These characteristics of the HVP have been used as the basis for video and computer graphics applications to reduce the number of computations and enhance the efficiency of systems. , an object's screen velocity and depth from a viewer are used as the criteria for an adaptive shading algorithm. Fig. 1 shows the contrast sensitivity function graph based on the equation from [9] where the shaded region highlights the spatiotemporal regions of low sensitivity, (less than 50) calculated using spherical interpolation and linear interpolation. It should be noted that the vectors computed by linear interpolating do not quite match the result of spherical interpolation. This computation error, however, is unnoticeable to the human visual system.
CORDIC vector interpolator
In the context of CORDIC as well as other steps in the rendering pipeline, interpolating the polar components instead of Cartesian components has more hardware implementation benefits [ 171. If polar coordinate data is not supported by system, coordinate conversion would be required for the CORDIC vector interpolator. Fortunately this conversion can be done within the existing CORDIC architecture. In this paper, it is assumed that the vertex data set includes polar coordinates. Notice that the image produced with 10 iteration CORDIC interpolation is not perceivably different from the one using ideal spherical interpolation. For this reason, the CORDIC vector interpolator presented here is designed for 10 bit data path (9 data bits and 1 sign bit).
Angle between two veRws
Effect of human visual perception
As shown in Fig. 6 , the quality degradation of the 10 pixel moving teapot is not noticeable when 8 iteration CORDIC is used. Similarly 6 iteration CORDIC is sufficient to achieve an acceptable image quality of the 30 pixel moving teapot. The relationship between moving speed (pixels per frame) and the number of iterations for this teapot example is shown in Fig. 7 . Although the numbers in Fig. 7 depend on the viewer, number of frames per second, size of the image and viewer distance from the display device, it gives trend for the controlling the output precision of CORDIC vector interpolator. A control map like Fig. 7 could be prepared for each object if the number of objects used in an application is Fig. 8 shows the presented CORDIC architecture based on Eq. 2 that supports variable output precision by providing proper control signal. Although this redundant arithmetic CORDIC requires more computations than a cascaded 2D CORDIC architecture, it has the advantage of easy precision control. Also the number of computations can be reduced by using small size internal memory block as described later. Since the function block for the first step of CORDIC vector interpolator is the same as for Cartesian components interpolation, it is not illustrated. As shown in 4-input adder block diagram, a data bypass is implemented to halt the operation of an adder after 5th iteration as the adder inputs have then been shifted out. Thus for the 10 iteration mode, 80 shift operations and 140 additions are required. The required total operations are too many to compete with conventional vector normalization requiring 3 multiplications, 3 divisions, 1 square-root and 2 additions for each vector. It is noticed that as there are only two possible initial inputs a small memory block could be used to eliminate the first several rotation steps.
CORDIC architecture for vector interpolation
The internal memory included in many current Field Programmable Gate Array (FPGA) devices can be used for the pre-computed data sets to reduce the number of computations. As shown in Table 1 
46% 77%
tions. The simulation results show the presented CORDIC vector interpolator is competitive in the 10 iteration mode, and better for lower numbers of iterations in terms of energy consumption.
CONCLUSION
expected. The architecture and operation of CORDIC Vector Interpolator can be used in other steps of 3D computer graphics pipeline, such as vector rotation computation in geometry engine. This paper presents a CORDIC Vector Interpolator that improves the power efficiency of a 3D graphics system. The adaptive nature of the output precision CORDIC is due to the introduction of HVP based control criteria. In addition to precision control, dynamic voltage scaling could complement energy savings during operations with fewer iterations. In addition, this architecture control approach can be used in other steps of 3D computer graphics pipeline, such as vector rotation computation in geometry engine.
