In this paper, we propose a CoOrdinate Rotation DIgital Computer (CORDIC) like processor for computing absolute magnitude of a vector and its corresponding phase angle. It does not require the scale factor compensation step and addition/subtraction operation along the z datapath, has a convergence range over the entire coordinate space and shows similar error characteristics as that of the conventional CORDIC. The synthesis result shows that the proposed processor is hardware economic and suitable for low power applications.
INTRODUCTION
CORDIC algorithm is used for elegant computation of several transcendental functions [1] . Two such functions are the absolute magnitude of a vector and the corresponding phase angle (arctangent computation). These functions can be evaluated using the CORDIC in its angle accumulation or vectoring mode. In this case, the y component of the vector is forced to zero using iterative vector rotation in a to and fro manner through a set of elementary rotation angles. At the end, the magnitude value and the accumulated angle (the phase angle) are available as the x and z component of the output respectively. However, the main drawback of the traditional CORDIC algorithm is that it generates a scale factor that needs to be compensated using extra circuitry that incurs a computation complexity of the same order as that of the CORDIC itself. On top of that, several not actually needed iterations are performed while forcing the y component to zero.
In this paper we propose a similar type of algorithm and the corresponding VLSI architecture which eliminates the requirement of scale factor compensation, simplifies the angle accumulation operation along the z datapath and reduces the hardware cost significantly compared to that of the conventional CORDIC. The algorithm has a convergence range over the entire coordinate space. In essence, this algorithm is based on a scaling free CORDIC algorithm having a limited range of convergence proposed earlier [2, 3] . However, this algorithm is not as versatile as the CORDIC and is only comparable with its vectoring operation in the circular coordinate system. This work is resulted from a larger project that targets at a single chip implementation of IEEE 802.11a compatible modem. This new algorithm has been used for the synchronizer section of the targeted modem [4] . The rest of the paper is structured as follows: Section 2 describes the theory of the proposed algorithm, and Section 3 describes the VLSI implementation of the algorithm. The performance evaluation of the proposed scheme is done in Section 4 and conclusions are drawn in Section 5.
THEORETICAL BACKGROUND
In developing the algorithm, we will proceed in two steps: First we will show that a CORDIC with a convergence range of [0, π/8] is absolutely sufficient to cover the entire coordinate space using a novel scheme called domain folding and second, we will use the scaling free CORDIC formulation described in the reference [2] in combination with first step to formulate the new algorithm.
Domain folding
We start with the assumption that the CORDIC has a convergence range [ One thing to be noted is that in this formulation, the range of convergence needed is always [0, π/8]. Thus, in essence, all the domains are "folded back" into domain A and hence the name domain folding.
It is straightforward to see that the same procedure is also applicable for the vectors lying in other quadrants. In this case, the input vector is first pre-rotated in the clockwise direction by appropriate angle, viz., by π/2 when in 2 nd quadrant, by π when in 3 rd quadrant and by 3π/2 when in 4 th quadrant. Then the operation proceeds as shown in Figure 1 . This pre-rotation essentially means only changing of sign and swapping of the x and y components. At the output, the pre-rotated angle is added to the accumulated angle to get the final result. Thus, a CORDIC having a convergence range of [0, π/8] is sufficient to cover the entire coordinate space.
The scaling free CORDIC
The details of the scaling free CORDIC algorithm are provided in the reference [2, 3] . The working equation of the scaling free CORDIC can be given as Figure 2 by the dotted boundary.
Each of the elementary rotational stages of the scaling free CORDIC costs two adders and two shifters more compared to that of the conventional CORDIC. For pipeline implementation the shifters essentially reduce to wire connections only and thus the resulting overhead is just two adders. However, for the iteration index (elementary rotational section) i ≥ (b/2) 1, the hardware cost of the rotational stages will be the same as that of the conventional one since a right shift by (2i+1)-bit results in machine zero or retention of sign bit only. Furthermore, since this formulation completely eliminates the requirement of scale factor compensation circuit, the overall hardware complexity of the scaling free CORDIC is less than the conventional one.
The new algorithm
The new CORDIC like algorithm for computing the absolute magnitude and phase angle of a vector can be constructed by utilizing the scaling free CORDIC algorithm in conjunction with the domain folding technique. The complete algorithm can be summarized as follows:
1. Detect the quadrant in which the vector lies: This can be easily detected by checking the MSB of the input parameters x and y. 2. Modify the input vector: This step should be done by following the domain folding technique described in Figure 1 . The aim of this operation is to bring the actual angle to be accumulated within the range [0, π/8].
Use scaling free CORDIC in vectoring mode:
This step corresponds to the actual angle accumulation operation and can be carried out as in the conventional CORDIC. 4. Output generation: The correct output can be generated by following the rules described in subsection 2.1.
Under an implementation point of view, further optimization can be done by only considering one-sided vector rotation instead of to and fro motion of the vector. Rotating the vector in one single direction essentially means that the accumulated angle can be described as a pure summation of powers of two. In this process, the not actually needed iteration steps are to be skipped. The final accumulated angle can be described by a bit pattern that contains logic '1' corresponding to the allowed iteration steps and logic '0' corresponding to the not allowed iteration steps. In essence, this technique eliminates all the required addition/subtraction operation along the z datapath and reduces the hardware cost drastically. This process can be summarized as follows:
1. Compute the intermediate vector at i th iteration step. 2. If y i+1 < 0 then assign x i+1 = x i and y i+1 = y i and enter a logic '0' (d i in Figure 2 ) in the appropriate position of the z datapath register. This operation essentially means that the i th iteration is ignored. 3. If y i+1 > 0 then assign x i+1 = x i+1 and y i+1 = y i+1 and enter a logic '1' (d i in Figure 2 ) in the appropriate position of the z datapath register. This operation essentially means that the i th iteration is accepted. 4. Take the binary value of the z datapath register when y i+1 = 0 (this is the final accumulated angle) and process it to generate the final output value following the rules described in subsection 2.1. Considering these modifications, the final structure of an elementary rotational stage is as shown in Figure 2. 
ARCHITECTURE AND IMPLEMENTATION
The complete architecture of the proposed processor consists of three modules viz. the Domain Detection Circuit, Basic CORDIC Pipeline and Output Unit. For convenience, we describe a 16-bit fixed-point pipeline implementation of the proposed Processor. Two's complement arithmetic is used throughout the implementation.
The Domain Detection Circuit is responsible for detecting the appropriate quadrant and the corresponding domain in which the vector lays. It consists of three comparators, two adders and a scaling circuit of √2. The scaling circuit is realized using shift-and-add technique and thus, it is more economical compared to a full multiplier. It generates two 2-bit signals namely quad and domain. While the quad signal indicates the initial quadrant in which the vector lays, the domain signal indicates the domain in the first quadrant where it is folded back.
The Basic CORDIC Pipeline has a convergence range of [0, π/8]. For a 16-bit implementation, the value of p is 4 (see subsection 2.2). Thus, the largest right shift allowed in this formulation is by 4 bits. In order to cover the convergence range of [0, π/8], we have used the i = 4 stage six times and i = 5, 6, …, 14 stages once each. The stage i =15 is omitted since the right shift of a number by 15-bit position essentially results in the retention of the sign bit only.
The architecture of the basic CORDIC pipeline is shown in Figure 3 . Each of the pipeline stages corresponding to i = 4, 5 and 6 requires four adders. On the other hand, stages i = 7, 8, …, 14 require two adders each. In order to balance the pipeline, the stages i = (7, 8), (9, 10), (11, 12) and (13, 14) have been merged as shown by the dotted boundary in Figure 3 , hence reducing the total length of the pipeline to 12 stages (index j in Figure  3 ). An array of registers is associated with different pipeline stages to keep the intermediate binary values of the accumulated angle. Depending on the decision of a particular stage, i. e., whether a rotation operation is accepted or rejected, logic '1' or '0' is entered at the LSB position of the register array and the value is passed to the next stage as shown in Figure 3 . However, a simple combinatorial circuit is necessary to interpret the decisions made by the six i = 4 stages. The decisions made in these sections give the 3 MSB of the final representation of the accumulated angle. At the end, the basic CORDIC pipeline generates a 13-bit unsigned value for the accumulated angle ϕ, which can be further processed by the output unit according to the principle stated in subsection 2.1. The absolute magnitude of the vector is available at x output. The domain and quad signals generated by the Domain Detection Circuit flow through the pipeline along with the data (not shown in Figure 3) . Thus, it can be viewed as if each of the data has a token attributed to it that essentially carries the information about its initial quadrant and domain which can be processed by the output unit to generate the final result. , respectively. The power consumption of the processor is 6 mW.
PERFORMANCE EVALUATION

Error analysis
The error performance of the algorithm is shown in 
Hardware complexity
A comparison of the hardware complexity of the proposed design with some other CORDIC processors operating in the vectoring mode is provided in Table 1 
CONCLUSIONS
In this paper, we propose a CORDIC like algorithm for computing the magnitude and phase of a vector. A 16-bit VLSI implementation is also addressed. The proposed algorithm does not need the scale factor compensation step. Its hardware cost is less than that of the conventional CORDIC when the scale factor compensation circuitry is taken into account. The complete elimination of the arithmetic processing for the z datapath makes it an attractive one from the hardware cost and low power application point of view. The algorithm proposed here shows similar error characteristic to that of the conventional CORDIC. The synthesis results show that the proposed design occupies a very small area and consumes very low power.
