Introduction
Arithmetic operations of digital systems are usually done by binary numbers [8] . However, for high-speed digital signal processing, p-nary (p > 2) numbers are often used [1, 5] . Computation for finance usually uses decimal numbers instead of binary numbers. In such cases, conversions between binary numbers and p-nary numbers are necessary. Such operation is radix conversion [2, 7] .
Various methods exist to convert p-nary numbers into qnary numbers, where p ≥ 2 and q ≥ 2. Many of them require large amount of computations. Radix converters can be implemented by table lookup. That is, to store the conversion table in the memory. This method is fast, but requires a large amount of memory. When the number of inputs is large, the memory is too large to implement. Thus, more efficient methods have been developed.
In [6] , ROMs and adders are used to implement binary to decimal converters.
In [10] , LUT cascades [9] are used to implement binary to ternary converters, ternary to binary converters, binary to decimal converters, and decimal to binary converters.
In [13] , weighted-sum functions (WS functions) is introduced to design radix converters by LUT cascades.
In [3] , LUT cascade and arithmetic decomposition [12] are used to implement p-nary to binary converters that require smaller memory.
In this paper, we consider a design method of p-nary to q-nary (q > 2) converters by using LUT cascades and arithmetic decompositions. To explain the concepts, we use examples of binary to decimal converters, but the method can be easily modified to any numbers of p and q. A 16-bit binary to decimal converter is designed to show the method.
Radix Converter

Radix Conversion
Definition 2.1 Let x = (x n−1 , x n−2 , . . . , x 0 ) p be a p-nary number of n-digit, and let y = (y m−1 , y m−2 , . . . , y 0 ) q be a q-nary number of m-digit. Given the vector x, the radix conversion is the operation that obtains y satisfying the relation:
where
When p (q) is not 2, they are represented by binary coded p (q)-nary numbers.
Definition 2.2 Let i be an integer. BIT (i, j) denotes the j-th bit of the binary representation of i, where the LSB is the 0-th bit.
Example 2.1 Note that an integer 6 is represented by the binary number (1, 1, 0)=(BIT (6, 2), BIT (6, 1), BIT (6, 0)). Thus, BIT (6, 2) = 1, BIT (6, 1) = 1, and BIT (6, 0) = 0.
(End of Example) 
(End of Example)
Conventional Realization
Random Logic Realization
Radix converters can be implemented by using comparators, subtracters, and multiplexers. Let d in be the number of bits to represent an input digit. Then, d in = log 2 p . Since the number of input digits is n, the total number of bits in the input is nd in . A p-nary number with n digits takes values from 0 to p n − 1. Let d out be the number of bits in an output digit. Then, d out = log 2 q . The number of output digits is m= n · log q p . Note that the most significant digit requires only
bits. This method is simple and fast, but, when the number of digits for the radix converter is large, the memory will be huge. (End of Example)
LUT Cascade Realization
In a single memory realization of a p-nary to q-nary converter, the size of memory tends to be too large.
To reduce the amount of hardware, LUT cascades realizations are used in [10] , where outputs are partitioned into groups. By using the functional decomposition theory [8] , we can predict such a realization is feasible or not. To perform functional decompositions, a Binary Decision Diagram for Characteristic Function (BDD for CF) [10] is used as the data structure. Figure 2 .3 shows the 16-digit binary to decimal converter [10] , where the LUT cascade with three cells realizes all the outputs. LUTs are implemented by memories. The total memory size is 73, 728 bits. This method uses only memory, and the interconnections are limited only to adjacent memories. However, since the logic synthesis uses BDD for CFs, when the number of digits is large, the computation time tends to be excessive.
On the other hand, the design method in [3] finds LUT cascades without using BDDs for CF. It produces circuits 
Figure 2.1. 8-digit binary to decimal converter: Random Logic Realization. with a smaller amount of memory than [10] by using arithmetic decompositions. Unfortunately, this method can design only p-nary (p > 2) to binary converters, and is not applicable to binary to q-nary (q > 2) converters.
Realization by Memories and q-nary Adders
Design methods of binary to decimal converters using memories and adders are shown in [6] . Figure 2 .4 shows a 16-digit binary to decimal converter [6] . The features of this circuit are as follows:
1. In the binary to decimal converter, the input 2 0 is directly connected to the least significant bit of the output (1). 2. Other 15 inputs are divided into two: The upper 9 bits and the lower 6 bits. For all possible inputs, each LUT stores corresponding BCD numbers. 3. The most significant digit (40K, 20K, and 10K) is obtained by a 3-input LUT (originally, which was implemented by gates) and a binary adder. 4. The total amount of memory is 8, 216 bits. 5. The middle LUT stores BCD values in excess-6-code. 6. BCD additions are implemented by binary adders and special subtracters. When the carry is added to the next higher order digit, 6 is subtracted from the BCD digit. The original circuit in [6] used a 1-of-16 decoder which generates memory selection signals for 16 small-scale memories. Since larger memories are available today, we replaced them by a single LUT.
With this method, a binary to decimal converter is implemented by using LUTs and q-nary (q = 10) adders. This method reduces the total amount of memory by partitioning the inputs into two.
WS Function
The weighted sum function (WS function) is a mathematical model of radix converters, bit-counting circuits, and convolution operations [12, 13] . In this section, we show some properties of WS functions, and give a design method of radix converters by using them. In this paper, we represent p-nary to q-nary conversions with WS functions. From here, unless otherwise noted, w i and x i denote non-negative integers. Next, we will consider the range of WS functions. 
Definition 3.1 An n-input WS function [13] is defined as
W S( x) = n−1 i=0 w i · x i ,(3.
Definition 3.4 Range(f (x)) denotes the range of a function f (x).
Realization Using LUT Cascades and Arithmetic Decompositions
In this part, we consider design methods of radix converters by using LUT cascades and arithmetic decompositions. Figure 4 .1(a) shows the conventional circuit which is redrawn from Fig. 2.4 with fewer blocks. This circuit is implemented by using only three LUTs and five decimal adders. In Fig. 4.1(a) , outputs of the middle LUT are connected to q-nary adders.
On the other hand, Fig. 4.1(b) shows the proposed method in this paper. In this method, each LUT stores BCD values, and feeds to a digit and a carry out. Consequently, outputs from each LUT blocks are connected only to the corresponding q-nary adder and the adjacent q-nary adder. However, this method requires m LUTs. Furthermore, the numbers of inputs of LUTs for lower digits are large. From here, we are going to reduce the total amount of memory for LUT cascades using the concept of WS functions. 9 9 4 2 1 5 2 6 3 1 10 0 8 4 2 6 8 4 2 6 8 4 2 6 8 4 2 1 
LUT Cascade Realizations of WS Functions
Here, we consider the logic function realized by each LUT in Fig. 4.1(b) . We can obtain functions of LUTs in Fig. 4 .1(b) from Table 4 .1. For example, the value of the digit for 10000 = 10 4 can be computed with x 14 , x 15 , and the carry propagation signal from the lower digit. We obtain the following equations: Since equations (4.1) -(4.5) represent WS functions, we can use the properties in Section 3 to construct an LUT cascade having small amount of memory. Also, we can reduce the number of levels for the LUT cascade.
The LUT cascade shown in Fig. 4 .2(a) realizes equation (4.5). Each LUT has only one external input x i . As shown in Table 4 .1, weight coefficients are non-negative integer. Equation (4.5) has only non-zero weight coefficients. Note that equation (4.3) representing the digit 10 2 , has zero weight coefficients w 12 , w 11 , w 10 ,w 6 , . . ., w 0 which are multiplied by x 12 , x 11 , x 10 , x 6 , . . ., x 0 .
In the circuit realization, terms with zero coefficients are omitted, and the input variables are reordered so that the weights are arranged in increasing order.
To find the minimum cascades, we consider the cases where adjacent LUTs are merged and not. Let s be the number of LUTs in the LUT cascade. The number of different combinations is 2 s−1 . In this case, the input variable x i incidents to LU T i , and each LUT has d = log 2 p two- 
.4(a)-(d) show the all possible LUT cascade realizations for the LUT cascades with 3 cells shown in (a).
The next lemmas show methods to detect mergeable LUTs in an LUT cascade. Lemma 4.2 shows a method to reduce one levels without increasing total amount of memory. For other cases, the merge of adjacent LUTs will increase the total amount of memory. 
Conclusion
In this paper, we presented design methods of p-nary to q-nary converters. For readability, we showed the concept by using the examples for p = 2 and q = 10. However, in Table 4 .1, by replacing 2 i by p i in the top row and 10 j by q j in the leftmost column, the method can be easily modified to any radix converters. In this case, a p-nary to q-nary converters are implemented by using LUT cascades, binary adders, and q-nary adders.
