Abstract-Constant weight codewords, in which the number of 1 's is constant, are essential to combinatorial computing. For example, it is often useful to generate all subsets of a set with a fixed number of elements. In this paper, we show an efficient circuit that converts a constant weight codeword into a unique index of that codeword. This circuit is a necessary part of a circuit that uses constant weight codewords to transmit data on and otT chip. Our circuit is based on the combinatorial number system in which the digits are binomial coefficients (�). Experimental results show the efficiency of our design.
I. INTRODUCTION
A constant weight code to index converter is needed when constant weight Gray codes are used to encode data in flash memory. In local rank modulation [2] , data stored in flash memory is viewed as an n-bit constant weight codeword that differs in exactly two bits from an adjacent memory location (because of overlap). All codewords in this encoding have the same weight (number of l's).
Balanced codes, with as many O's as 1 's, can be used to transfer data on and off VLSI chips so that the current fluctu ations are minimized [8] . On the other hand, codes with small weight are desired in this application because they yield faster and more compact circuits [8] . Constant weight codewords can be used to counter "side-channel" attacks against secure systems [4] . Such attacks use data dependent differences in power consumption to extract hidden information. Constant weight codewords have been used in asynchronous logic to implement delay-insensitive codewords [9] .
The use of constant weight codewords requires two parts, an index to constant weight code converter and a constant weight code to index converter. We considered the first part in [1] . However, we have not seen a hardware implementation of the second part, except for an implementation that requires O(2n) complexity [7] . In this paper, we propose an imple mentation with O(n 3 ) complexity. In Section II, we discuss the combinatorial number system. We show how it can be used to convert a constant weight codeword to an index, and we present its circuit implementation. Then, in Section III, we show an improvement to this circuit that significantly reduces delay for large n. Finally, in Section IV , we give concluding remarks.
T. Sasao

Department of Computer Science and Electronics
Kyushu Institute of Technology Iizuka, Fukuoka, JAPAN
II. T HE COMBINATOR IAL NUMBER SYSTEM
A. Introduction
The basis for our constant weight code to index converter is the combinatorial number system [3]. Example 1. Table I shows the representation of integers in the (�) combinatorial number system. The leftmost column shows the integer's value in decimal and its vector represen tation. The middle column shows how this value is computed according to (1). The rightmost column of Table I shows the corresponding 6 bit constant weight code. Note that the three elements of the vector representation shown in the leftmost column correspond to the positions of the l's in the constant weight codeword. For example, 19 = 5 4 3 corresponds to 111000, there being l's in positions 5, 4, and 3.
(End of Example)
B. Circuit Implementation A major contribution of this paper is to show how the combinatorial number system can be used to realize an ef ficient circuit that transforms a constant weight codeword to the index for that codeword. Such a circuit has for inputs the values of the rightmost column of Ta ble I (the bits of the constant weight code) and has as outputs the standard binary number representation of the numbers shown in the leftmost column (the values of the index N). As shown in Ta ble I, the I-bits in the constant weight code contribute a value to its corresponding index depending on the 1 bit's position in the codeword. For example, from Ta ble I, the 1 's in the codeword 111000 contribute (�), @, and (�), from left to right. This can be seen in Fig. 1 , which shows a circuit that converts a 3-out-of-6 constant weight codeword into the corresponding index of that codeword.
This circuit contains an array of decoders that control which digits occur in the combinatorial number. Fig. 2 shows the detail of the decoders and the tri-state circuit that provides constants for the combinatorial number. We can make the following observations. horizontal outputs. This is because their inputs, X5, X4, and X 3 , are 1. However, the decoder in the upper right hand comer is driven by X2, which is 0. So, the 1 at its OR gate input is directed now to its vertical output (while its horizontal output is 0). Because Xl and Xo are both 0, this 1 is directed downward (along dotted lines) through two decoders into the 2-input OR gate that drives
Valid. That is, when X5X4X 3 X2XIXO = 111000, Valid is 1, indicating the input codeword is a valid 3-out-of-6 codeword. 3) All other valid codewords result in a path of 1 's from the upper left hand comer to the lower right hand comer, causing Valid to be 1. Conversely, a non-codeword causes Valid to be 0. 4) All horizontal lines from decoders drive binomial co efficient generators which apply to one of three bus lines that drive inputs of an adder whose output is the Index. Specifically, a 1 on the horizontal line causes the corresponding binomial coefficient generator to drive its line. A ° disconnects the binomial coefficient generator.
For example, in the case of X5X4X 3 X2XIXO = 111000, the three horizontal lines driven by decoders cause m, @, and m to be applied to the three adder inputs resulting in 19 at the output, which is the index of 111000 . Fig. 1 , the adder has three tri-state inputs, and each tri-state adder input is driven by four tri-state outputs from binomial coefficient generators. The first (leftmost) 1 in the constant weight codeword specifies which binomial coefficient generator drives the left input of the adder. The second 1 determines which drives the middle adder input, and the third (rightmost) 1 determines which drives the right adder input.
5) As shown in
C. Complexity of Implementation
The complexity of the constant weight code to index con verter, is dominated by the array of binomial coefficient gener ators and decoders. This array is an r + 1 by n-r + 1 rectangle, with a total of (r + 1)(n -r + 1) elements. With r = �, the (worst case) number of binomial coefficient generators and decoders is O( n 2 ) for each. The decoder has a complexity that is independent of n. However, the binomial coefficient generator requires O(n) tri-state buffers. That is, the binomial coefficient with the most tri-state buffers is the one in the upper left hand corner; it realizes (n� l ). This requires no more that O(n) tri-state buffers. Thus, the total complexity is O(n 3 ) tri state buffers. And so, the constant weight codeword to index converter has complexity polynomial in n. Table II shows the exact number of tri-state buffers and decoders needed in the proposed constant weight codeword to index converter. In the case of the tri-state buffers, the array cell at the top of each column corresponds to the largest binomial coefficient in that column and thus determines the number of bits needed for that adder input. These binomial coefficients are (n-rii-l ) for r ::; i ::; 1, where i represents the i-th column (i = r is on the left and i = 1 is on the right). The number of bits needed is fZog2(n-rii-l )1 -Since there are r + 1 rows, we have the following Theorem 1: The total number of tri-state buffers needed by the constant weight code to index converter is (r + 1) E; = l fZOg2 (n-rii-l ) 1 - The longest path in the circuit is from X n-l through the array to the valid output, and it is O(n). The delay of the adder can be neglected, since it is O(logn). Thus, the overall delay is O(n).
D. FPGA Resources Used
To understand how the complexity of a (;) combinatorial number system constant weight code to index converter de pends on n and r, we implemented this system for various n and r on the 40 nm Altera Stratix IV EP4SE530F43C3NES
FPGA. Table III shows the delay obtained and the resources used in this implementation. The leftmost column shows the constant weight code as a binomial number. For example, C;l) corresponds to a 64-out-of-128 bit code. The second column shows how many bits in the output Index are needed to represent the largest codeword. The third column gives the delay achieved, which is inversely proportional to the frequency of the circuit. The rightmost column gives the number of ALMs needed to realize this circuit, which a measure of the area. Although this table shows only balanced constant weight code generators where the number of bits is a power of 2, our approach applies to any number of bits and to any weight. Our circuit was synthesized using Synplify Pro and modeled using ModelSim. A large codeword is achievable; a 64-out of-128 bit converter uses only 9% of the available ALMs. The large values of n required special Verilog programming. For example, to implement the 64-out-of-128 bit constant weight codeword to index converter requires that the binary value of (�2J) be applied to the adder circuit. The binary number that represents this value requires 124 bits, which exceeds the 32 bits used by Synplify Pro to represent integers. To overcome this deficiency, we computed the binary value of C; J) and other values of (;) in a MATLAB program and wrote it to a header file that was included in the Verilog code.
III. COMPLEX DISJOINT DECOMPOSITION SOLUTION
It can be seen for Fig. 1 that the longest path through the array of the constant weight codeword converter has length n, where n is the number of bits in the constant weight code. In computing the index, each 1 contributes a value that depends on the number of 1 's that preceded it. A 1 in the leftmost bit position is an exception to this. This 1 always contributes (�) = 10 . This can be seen in Ta ble I; the constant weight codewords with a 1 in the leftmost bit corresponds to a combinatorial number in which the most significant digit is (�). However, a 1 in the second bit from the left contributes a different value to its combinatorial number representation depending on whether the leftmost bit is 1 or O. If 1, then the second 1 contributes @. If 0, then it contributes (�).
A similar phenomena exists at the right side. Interestingly, the least significant bit, whether 0 or 1 contributes 0 to the combinatorial number's value. This is because that bit is "forced" to be 0 or 1 depending on whether there are six or five 1 bits to its left. However, note that if the right bit of the constant weight code is 1, then the right digit of the combinational number system is O. This can also be seen in the circuit of Fig. 1 . Here, if Xo is 1, then 0 drives the least significant digit if the other five bits have three O's and two 1 'so Thus, none of the other three binomial coefficients can drive this least significant digit. That is, Xo and the only decoder it drives is the "mirror" image of X5 and the only decoder it drives. Similarly, Xl and the two decoder it drives are the mirror image of X4 and the two decoder it drives. Therefore, we can realize the same circuit by reversing the decoders in In the new circuit, the inputs are divided into two parts {X5,X4,X 3 } and {X2,X 3 ,xd, where each part drives a sepa rate triangular-shaped subcircuit. The two subcircuits, in turn, drive inputs to the adder, which, in turn, drives the Index output. Such a circuit is said to have a complex disjoint decomposition (CDD).
Ta ble IV shows the delay achieved and the resources used for the CDD circuit. The benefit of the new circuit is its reduced delay, especially in large circuits. This can be seen by comparing Ta bles IV and III. For example, for 64-out-of- 128 codes, the delay of circuit consisting of two subcircuits is 64% that of the full rectangle circuit.
IV. CONCLUDING REMARKS
Although there is a need for a circuit that computes an index from a constant weight codeword, we have not seen a simple implementation. We show a circuit based on the combinatorial number system that has complexity O(n 3 ), where n is the number of bits in the code. Our circuit is useful, for example, in the encoding/decoding of data, such as between on-chip and off-chip and in delay-insensitive logic for asynchronous circuits. It has only O( n) delay. We also show an improvement that reduces by about half the delay that still has O(n 3 ) complexity. We have implemented our designs on an Altera Stratix IV EP4SE530F43C3NES FPGA. This has shown that both circuits are efficiently implemented. For a comparison of various circuits that realize a constant-weight code to index converter, please see [6] .
V. ACKNOW LEDGMENTS
