Abstract-The sine/cosine function generator is based on parallelization of the original CORDIC algorithm by predicting all the rotation directions directly from the binary bits of the initial input angle. Unlike previous approaches that require complicated circuits or exponentially increased ROM, our proposed architecture has a relatively simple prediction scheme through an efficient angle recoding. The critical path delay is also reduced by utilizing the predicted rotation directions to design an efficient multioperand carry-save addition structure.
A Memory-Efficient and High-Speed Sine/Cosine Generator Based on Parallel CORDIC Rotations Shen-Fu Hsiao, Member, IEEE, Yu-Hen Hu, Fellow, IEEE, and Tso-Bing Juang, Member, IEEE Abstract-The sine/cosine function generator is based on parallelization of the original CORDIC algorithm by predicting all the rotation directions directly from the binary bits of the initial input angle. Unlike previous approaches that require complicated circuits or exponentially increased ROM, our proposed architecture has a relatively simple prediction scheme through an efficient angle recoding. The critical path delay is also reduced by utilizing the predicted rotation directions to design an efficient multioperand carry-save addition structure.
Index Terms-CORDIC, microrotation angle recoding, multioperand carry-save addition, parallel sign prediction, sine/cosine function generator.
I. INTRODUCTION
O NE OF THE key components in direct digital frequency synthesizer (DDFS) system [1] is the sine/cosine function generator that computes binary representation of and to a precision of fractional bits is. In this letter, we propose a novel realization of a sine/cosine generator based on the CORDIC algorithm [2] . CORDIC is an arithmetic algorithm developed to compute various elementary functions through a series of iterations of a unified microrotations operation. In particular, in a circular rotation mode, microoperations as illustrated below will be executed for sign (1) After iterations, the accumulated rotation angle is Using the definition of , one has (note that ) Then, it can be easily deduced that where is a constant that can be precomputed in advance. Set , , , then , can be easily computed after iterations. In conventional CORDIC, the direction sign is determined sequentially since it depends on the sign of calculated at the previous iteration. This dependence relation makes it difficult to execute multiple microrotations in parallel. In this letter, we propose a new method to quickly select the rotation directions in order to speed up the calculation.
II. PREDICTIONS OF ROTATION DIRECTIONS

A. Binary to Bipolar Recoding (BBR)
The initial input angle with is assumed to be in the range as in the application example of DDFS. It has been shown in [3] that to -bit precision if . Thus, the last rotation directions (from to ) can be obtained in parallel after completing the first iterations. As proposed in [3] , we divide the angle into two parts (the higher part and the lower part) (2) The binary bits in the higher part can be recoded into bipolar digits as follows: where
Equation (3) is called BBR for , . 
B. Microrotation Angle Recoding (MAR)
The BBR for with is
The first eight rotation directions are selected concurrently as (6) Then, all the signed error terms , and the last term in (5) are added to , generating the corrected lower part represented in twos complement format, i.e.,
It can be shown that . Since , within precision of 24 fractional bits, the algorithm converges after the above selection of directions for the first several rotations. The directions for the remaining microrotations can be derived immediately from (7) using again the BBR (8) leading to the parallel prediction for the last microrotations.
III. IMPLEMENTATION AND COMPARISON Fig. 1(a) is a 24-bit sine/cosine generator based on unfolded CORDIC architecture where each numbered block (a stage) denotes a microrotation performing the recurrence of and in (1) . Note that the microrotation of angle is repeated once and the microrotation of angle is repeated three times. The scaling factor is still kept constant by taking into account these fixed repetitions This constant can be precomputed and serves as one of the initial inputs , , . Since the first bits of the input angle are directly used as the directions in the first rotations, the execution of stages 1-12 (including repetition stages) in the left of Fig. 1(a) can be performed in parallel with the prediction adder that generates the directions for the remaining microrotations. We adopt carry-save addition (CSA) in each stage where a 4 : 2 compressor is used to produce the carry-save form (a sum term plus a carry term) for each output, as shown in Fig. 1(b) . Assuming the delay of a 4 : 2 compressor to be where is the delay of a full adder, the delay for the stages in the left of Fig. 1(a) is . Note that the first stage does not need CSA. The prediction adder is to calculate the sum of the nine operands in (7), using a CSA tree and a fast carry-propagate adder (CPA), leading to a delay of where denotes the delay of a CPA. In general, the prediction adder is not in the critical path as long as a fast CPA is used.
The derivation in Section II can be easily extended to different bit precision where the total number of repetitions can be found to be , , . In some application (such as the DDFS) where the input angle is further limited in , the first rotation direction is always , and thus the first several stages controlled by can be merged and precomputed along with the constant factor as the initial input to the X/Y datapath. In this situation, the numbers of repetitions in the sine/cosine generation are reduced to , , . The second half of microrotation stages can be merged into a multioperand carry-save addition architecture by observing that (9) Thus, the microrotation stages numbered from 13-25 in Fig. 1(a) can be merged into a CSA tree performing the parallel addition of 28 operands (14 numbers in carry-save forms) with critical path delay . Summing up the delay of all the stages, the total delay of our proposed sine/cosine generator is for 24-bit accuracy, a significant speed improvement compared with other previous approaches as will be discussed in the following. Table I compares our proposed sine/cosine generator with other CORDIC rotation algorithms. To make a fair comparison, we assume that all methods use CSA in X/Y datapath except for the last stage where a fast CPA is required to obtain the nonredundant representation of the output ( is not counted in Table I for reason of clarity). In [3] , the rotation directions after iterations are derived from the z remainder. However, the first iterations still adopt the conventional sequential approach where a delay of is assumed for an -bit CPA. In [4] , two rotations are executed in a single step at the price of more complicated Z datapath where several most significant digits are examined. In [5] , the first rotation directions are predicted based on approximation of binary angle input to , similar to our method. But the direction prediction requires several carry-look-ahead adders plus complicated logic circuits instead of directly from the binary bits of the input angle as in our method. In [6] , the first several directions are derived using a ROM of size exponentially increased with . Unlike these above methods that require some delay to predict the first rotation directions, our proposed method generates the first rotation directions immediately from the first binary bits of A 16-bit CORDIC-based sin/cosine generator similar to the architecture in Fig. 1 was synthesized and mapped on the Virtex-300 type FPGA chip. The critical path delay across the entire architecture is less than 75 ns. Compared with the approach in [3] , our design achieves more than 25% improvement in speed performance (due to the parallelization of the sign bit selection), and 30% saving in hardware cost (due to the elimination of the Z datapath). Another comparison is made for the ROM-based approach in [6] that calls for 47 configurable logic blocks (CLBs) for the polarity prediction, our proposed sin/cosine generator, with only 28 CLBs for the sign-bit determination, saves more than 40% hardware cost in the prediction of rotation directions.
IV. CONCLUSION
We presented a novel recoding method to predict all the directions of CORDIC microrotations and apply it to the generation of sine and cosine functions. The proposed architecture does not need exponentially increased ROM or complicated prediction hardware. The speed is also improved by implementing the microrotation stages using carry-save addition with reduced number of operands.
