Abstract
Introduction
Consider a set of k distinct binary vectors of n bits. An address generation function produces a unique address from 1 to k for the input that matches a vector in the set, and produces 0 for vectors outside the set. Address generation functions are used in the IP filtering in the internet, pattern matching, memory patching circuits, etc. Address generators often need to be reconfigured dynamically. Also, the functions are often random. Thus, conventional design methods are unsuitable for the design of address generators.
In this paper, we assume that the number of vectors k in the set is much smaller than that of the maximal possible input combinations 2 n . For example, consider an address generation function with n = 32 and k = 40, 000. The straightforward way to implement this address generation function is to store the truth table into a memory. However, this method require a memory with unrealistic size, since the size of the memory is proportional to 2 n . Another method to implement the function is a programmable logic array (PLA). Unfortunately, this method often requires excessive number of logic elements when it is implemented by an FPGA.
In this paper, we present, the super hybrid method, an efficient method to implement an address generation function using hash memories and a reconfigurable PLA.
This method is particularly suitable for FPGAs where both logic elements and embedded memories are available. In this method, hash memories implement about 96% of the vectors, while the reconfigurable PLA implements the remaining 4% of the vectors. Theoretical analysis supports the experimental results.
Besides address generation functions, this design method can implement an n-variable function where the number of non-zero outputs k is much smaller than 2 n . Address Vector  1  0010  2  0111  3  1101  4  0101  5  0011  6  1011  7  0001 memory (CAM) [2, 6] . Various method exist to implement reconfigurable PLAs or CAMs. The register and gates approach uses a register to store the value of each bit. Fig.  3 .1 shows a match circuit. A PLA or a CAM can be implemented by adding an encoder consisting of OR gates. With this approach, words of any width can be configured, and a fast reconfiguration is possible. Suppose that the reconfigurable PLA is implemented by Altera Cyclone II FPGAs. When the output part is fixed, to implement an n-input qoutput and k-vector PLA, we need
This formula was obtained by designing many reconfigurable PLAs of various sizes on a Cyclone II FPGA. Note that the first term is related to the registers, the second term is related to the comparators and AND gates, and the last term is related to the encoder. This implies that we need approximately 7 6 nk LEs. 2 For example, when n = 40 and k = 1730, we have q = log 2 (1730 + 1) = 11. Thus, the number of LEs is 1 In Altera device [1], the LE (logic element) denotes the basic building block consisting of a 4-input look-up table (LUT), a register and an additional carry and cascade logic. In Xilinx device, it corresponds to the LB (logic block), which consists of a 4-input LUT, a register, and a carry logic. 2 If we use the LUTs of a Xilinx FPGA, we can implement the address generators more efficiently. This requires SRL16E macro [15] . 
Hash-Based Design
From here, we are going to study a method to implement an address generation function using memories.
Before explaining the super hybrid method, we introduce the hybrid method, a simpler version of the super hybrid method.
Hybrid Method
In the address generation function, the number of registered vectors k, is much smaller than 2 n , the total number of the input combinations. Consider the set of linear hash functions that maps 2 n elements into 2 p elements, where 2 p ≥ k + 1. By using linear hash functions y i = x i ⊕ g i (X 2 ), (i = 1, 2, . . . , p), we can reduce the 2 nelement space into a 2 p -element space. With this, we can implement the address generation function by using a pinput memory instead of an n-input memory.
Unfortunately, collisions of data occur. That is, two or more registered vectors are mapped into the same element. In such cases, we implement only one registered vector by the hash memory, and other registered vectors are implemented by other circuit.
Let f (X 1 , X 2 ) be the given address generation function. We can decompose it into the AUX memory, and the comparator, while
is implemented by the reconfigurable PLA. In the hybrid method, we implement about 90% of the registered vectors by the hash memory. Since the 2 n -element space is reduced into the 2 p -element space by a set of linear functions, each output combination of the hash memory corresponds to 2 n−p input combinations.
1. When all the 2 n−p input combinations are nonregistered, the hash memory stores zero for that input.
When only one combination is registered, and other
2 n−p − 1 combinations are non-registered, the hash memory stores the index of the registered vector.
3. If two or more input combinations are registered, the hash memory stores an index of only one registered vector.
Thus, when the output of the hash memory is non-zero, the input vector can be registered or no-registered. To decide whether it is registered or not, we use the AUX memory. The AUX memory has q inputs and (n − p) outputs. It stores the values of X 2 for each registered vector. If the input X 2 is equal to the output of the AUX memory, then the hash memory produces the correct output. Otherwise, the output of the hash memory is wrong, so 0 is sent to the output. In this way,f 1 (Y 1 , X 2 ) is implemented by the hash memory, the AUX memory, and the comparator.
A Method to Generate A Hash Function
A hash function is used to scatter the non-zero elements of the address generation function uniformly. In this paper, we use the following function Y 1 = (y 1 , y 2 , . . . , y p ), where
Design of Address Generator
For an address generation function f (X 1 , X 2 ) with weight k, letf (Y 1 , X 2 ) be the function that is obtained by replacing
p , where B = {0, 1}, whenf ( a, X 2 ) has more than one non-zero output, replace the non-zero elements except for the minimum value by 0, to obtain the functionf 1 (Y 1 , X 2 ). Next, letf 2 (Y 1 , X 2 ) be the function that shows the remaining non-zero elements. Since,
takes non-zero value for at most one non-zero element for each Y 1 . Next, let
and realizeĥ(Y 1 ) by the hash memory, where n 2 denotes the number of variables in X 2 . Since the value ofĥ(Y 1 ) can be different from the value off 1 (Y 1 , X 2 ), we check if it is correct or not by using the AUX memory. Also, by trans- Table 4 .1 is a decomposition chart of a 6 variable function f (X 1 , X 2 ) with weight k = 7. In this function, transform the variables 
is the input that produces the non-zero output. The non-zero output is 4, and its binary representation is (1, 0, 0). This is implemented by ORing the most significant bit of the AND gates. (End of Example)

Numbers of Registered Vectors Realized by Hash Memory
In this part, we assume that the non-zero elements in the address generation function are uniformly distributed in the decomposition chart. In this case, we can estimate the fraction of registered vectors realized by the hash memory. Fig. 4 .1 is given by 
Theorem 5.1 Let f be an n-variable address generation function with weight k, and the non-zero elements be uniformly distributed in the decomposition chart. Then, the fraction of registered vectors realized by the hash memory shown in
δ 1 − 1 2 ( k 2 p ) + 1 6 ( k 2 p ) 2 ,
Super Hybrid Method
Principle
In the hybrid method, about 90% of the registered vectors are implemented by the hash memory and the remaining 10% of the registered vectors are implemented by the 
X2
Reconfigurable PLA PLA. When we use two hash memories, we can implement about 96% of the registered vectors, and the remaining 4% of the registered vectors are implemented by the PLA. Such implementation is called super hybrid method. The super hybrid method shown in Fig. 6 .1 is more complicated than the hybrid method, but requires smaller memories.
Hybrid Method The hash memory has (q + 2) inputs and q outputs. The AUX memory has q inputs and (n − q − 2) outputs. Therefore, the total amount of memory is q · 2 q+2
Super Hybrid Method
The first hash memory has (q +1) inputs and q outputs. The first AUX memory has q inputs and (n−q −1) outputs. The second hash memory has (q − 1) inputs and (q − 2) outputs. The second AUX memory has (q −2) inputs and (n−q +2) outputs.
Therefore, the total amount of memory is q · 2 q+1
q−2 . This implies that when n ≤ 7 log 2 (k + 1) − 2, the super hybrid method requires smaller amount of memory. 
Example
Example 6.1 Consider the case of n = 40 and k = 1730. In this case, q = log 2 (k + 1) = log 2 (1730 + 1) = 11.
Reconfigurable PLA. A problem in the super hybrid method is that the second hash memory has only q − 2 outputs. Thus, the indices of the registered vectors in the second hash memory should be smaller than or equal to 2 q−2 − 1. The first hash memory stores registered vectors whose indices are greater than 2 q−2 .
Experimental Results
List of English Words
To demonstrate the usefulness of the design method, first we realized lists of frequently used English words. Here, we use three kinds of English word lists: List 1, List 2, and List 3. The numbers of letters in the word lists are at most 13, but we only consider the first 8 letters. For the English words consisting of fewer than 8 letters, we append blanks to the end of words to make them 8-letter words. Each English alphabet letter is represented by 5 bits. Thus, each English word is represented by 40 bits. The number of words in the lists are 1730, 3366, and 4705, respectively. In each word list, each English word has a unique index, an integer from 1 to k, where k = 1730 or 3360 or 4705. The numbers of bits for the indices are 11, 12, and 13, respectively.
The number of inputs for the hash function is log 2 (k + 1) + 2. List 1 consists of k = 1730 words. The number of bits for the index is q = log 2 (1 + k) = log 2 (1 + 1730) = 11. The number of bound variables is p = q+2 = 13. The number of columns in the decomposition chart is 2 p = 2 13 = 8192. The number of columns that has only one non-zero element is 1389. The number of columns that has two or more non-zero elements is 165. The number of registered vectors that are not realized by the hash table is 176. In other words, about 90% of the registered vectors are realized by the hash memory, and the remaining 10% of the registered vectors are realized by the reconfigurable PLA. Table 7 .1 shows the design results for three English word lists by the hybrid method. Table 7 .2 compares the amount of hardware for reconfigurable PLA, the hybrid method, and the super hybrid method. It shows that the super hybrid method efficiently uses both LEs and M4Ks of the FPGA. In the super hybrid method, the number of vectors realized by the reconfigurable PLA is smaller than 4% of the registered vectors. This is because we optimized hash functions.
