This paper presents a method to realize index generation functions using multiple Index Generation Units (IGUs). The architecture implements index generation functions more efficiently than a single IGU when the number of registered vectors is very large. This paper proves that independent linear transformations are necessary in IGUs for efficient realization. Experimental results confirm this statement. Finally, it shows a fast update method to IGUs.
Introduction
One of the important tasks in information processing is to find desired data from a large data set. For example, consider a network router, where IP addresses are represented by 32 bits. Assume that a network router stores 40,000 of the 2 32 possible combinations of the inputs, and checks if an input pattern matches a stored pattern. A content addressable memory (CAM) [4] is a device that performs this operation directly. CAMs are also used for virus scanning and spam-mail filters.
An index generation function [10] describes the operation of a CAM. For example, an index generation function can be represented by a registered vector table such as shown in Table 1 . It can also be implemented by an FPGA [7] , or a combination of memories and logic. Index generation functions are used in address tables in the Internet, terminal access controllers for local area networks, databases, memory patch circuits, dictionaries, password lists, etc. [10] .
An efficient method to implement an index generation function is presented in [10] . It uses a module called IGU (Index Generation Unit). Since an IGU uses ordinary memory and a small amount of logic, the cost and the power dissipation are much lower than typical CAM-based implementations.
In this paper, we show an efficient method to store many patterns using multiple IGUs. Statistical analysis is used to estimate the size of of the IGUs. The rest of the paper is organized as follows: Section 2 defines the index generation function; Section 3 shows a method to reduce * A preliminary version of this paper was presented at ISMVL-2016 [14] .
a) E-mail: sasao@cs.meiji.ac.jp DOI: 10.1587/transinf.2016LOP0001 the number of variables of the incompletely specified index generation functions; Section 4 introduces an IGU, the hardware to implement index generation functions; Section 5 shows a method to estimate the number of vectors realized by an IGU; Section 6 shows a method to implement an index generation function using four IGUs, which is more efficient than a single IGU realization; Section 7 shows that independent linear transformations are essential for an efficient implementation of the functions; Section 8 shows the experimental results; Section 9 shows a fast update method for IGUs; and Sect. 10 concludes the paper.
Index Generation Function
In this part, we introduce index generation functions [10] , [11] , [13] .
Definition 2.1: Consider a set of k different binary vectors of n bits. These vectors are registered vectors. For each registered vector, assign a unique integer from 1 to k. A registered vector table shows the index of each registered vector. An incompletely specified index generation function is a one-to-one mapping D → {1, 2, . . . , k}, where D ⊆ {0, 1} n , and |D| = k. Since the indices are often greater than two an index generation function is multiple-valued. It produces the corresponding index if the input matches a registered vector. k, the weight of the index generation function, is usually much smaller than 2 n , the total number of possible input combinations. Table 1 shows a registered vector table for a  4 -variable index generation function with weight k = 4.
Example 2.1:

Number of Variables to Represent an Incompletely Specified Index Generation Function
An incompletely specified index generation function F can often be represented with fewer variables than the original function, when don't care values are properly replaced by 0 or some index [1] , [2] , [6] , [8] .
Copyright c 2017 The Institute of Electronics, Information and Communication Engineers Theorem 3.1: Assume that an incompletely specified function F is represented by a decomposition chart [5] . If each column of the decomposition chart has at most one care element, then the function can be represented by only column variables.
Example 3.2:
Consider the decomposition chart in Fig. 1 . x 1 and x 2 specify columns, while x 3 and x 4 specify rows. Also, blank cells denote don't cares. In Fig. 1 , each column has at most one care element. Thus, this function can be represented with only the column variables x 1 and
As for an upper bound on the number of variables, we have the following: Conjecture 3.1: [10] , [11] , [13] When the number of the variables n is sufficiently large, most incompletely specified index generation functions with weight k (≥ 7) can be represented by p = 2 log 2 (k + 1) − 3 variables.
For an incompletely specified function F, we need to realize a circuit such that F(x 1 , x 2 , . . . , x n ) = 0 if (x 1 , x 2 , . . . , x n ) is a non-registered vector.
Index Generation Unit (IGU)
In this section, we show an efficient method to implement an index generation function. With this method, the number of variables to the memory can be reduced. Figure 2 shows the Index Generation Unit (IGU). The linear circuit has n inputs and p outputs, where p < n. It is used to reduce the number of inputs to the main memory. The set of inputs to the linear circuit is partitioned into X = (X 1 , X 2 ), and the output is Y 1 = (y 1 , y 2 , . . . , y p ).
We consider two types of linear circuits. The first type is the single-input linear circuit shown in Fig. 3 . It produces a function y j = x π( j) , where π denotes a permutation on n elements. It consists of p multiplexers and p registers, and selects p variables from n input variables. The multiplexers' data inputs are x 1 , x 2 , . . . , x n . Registers specify which variables are selected by the multiplexer.
The second type of the circuits is the double-input linear circuit shown in Fig. 4 . It performs a linear transformation y i = x i ⊕ x j or y i = x π(i) , where x i ∈ X 1 and x j ∈ X 2 . It uses a pair of multiplexers for each variable y i . The upper multiplexers have the inputs x 1 , x 2 , . . . , x n . The register with log 2 n bits specifies the variable to select by the multiplexer. The lower multiplexers have the inputs x 1 , x 2 , . . . , x n , except for x i . For the i-th input, the constant input 0 is connected instead of x i . By setting y i = x i ⊕ 0, we can implement y i = x i . Note that both types of linear circuits produce a special class of linear functions. The main memory has p inputs and q = log 2 (k + 1) outputs. The main memory produces correct indices for registered vectors. However, it may produce incorrect indices for non-registered vectors, because the number of input variables is reduced to p. In an IGU, if the input vector is non-registered, then it produces 0 outputs. To check whether the main memory produces the correct index or not, we use the AUX memory. The AUX memory has q inputs and (n − p) outputs: It stores the X 2 part of the registered vectors for each index. The comparator checks if the inputs are the same as the registered vector or not. If they are the same, the main memory produces the correct index. Otherwise, the main memory produces a wrong index, and the input vector is non-registered. Thus, the output AND gates produce 00 . . . 0, showing that the input vector is non-registered. Note that the main memory produces the correct indices only for the registered vectors.
Theorem 4.2:
Consider the IGU in Fig. 2 . Assume that it realizes the index generation function F(X 1 , X 2 ), where X 1 = (x 1 , x 2 , . . . , x p ) and X 2 = (x p+1 , x p+2 , . . . , x n ). Also, assume that Y 1 = (y 1 , y 2 , . . . , y p ), where y i = x i ⊕ x j for j ∈ {p + 1, p + 2, . . . , n}, or y i = x i , are applied to the input to the main memory. Then, F can be realized by the circuit where the AUX memory stores only the values for X 2 .
Number of Vectors Realized by an IGU
In this section, we review the expected number of registered vectors realized by an IGU [10] .
Lemma 5.1: When 0 < α 1, 1 − α can be approximated by e −α .
Lemma 5.2:
Let F(X) be a uniformly distributed random index generation function of n variables with weight k, where k 2 n . Consider a decomposition chart [5] , where p is the number of variables labelling the columns. Then, the probability that a column of the decomposition chart has all-zero elements is approximately e −ξ , where ξ = k 2 p . Theorem 5.3: Consider a set of uniformly distributed index generation functions F(x 1 , x 2 , . . . , x n ) with weight k. Consider an IGU whose inputs to the main memory are x 1 , x 2 , . . ., and x p . Then, the expected number of registered vectors of F that can be realized by the IGU is 2
Realization Using Four IGUs
In an IGU, the main memory has p inputs and q = log 2 (k + 1) outputs, while the AUX memory has q inputs and (n − p) outputs. Thus, the total amount of memory for an IGU is q2 p + (n − p)2 q . Conjecture 3.1 shows that to implement an index generation function with weight k by an IGU, the number of inputs to the main memory is p 2 log 2 k − 3. Also, note that q log 2 k and n k. Thus, the size of the memory is O(k 2 log k). This shows that, when k is large, a single IGU realization of an index generation function is inefficient.
Example 6.3:
Let k = 2 20 − 1. Then, by Conjecture 3.1, we have p = 2 log 2 (k + 1) − 3 = 37. Thus, the size of the main memory in a single IGU realization is q2 p = 20×2 37 = 2.75 × 10 12 bits. Thus, we need a more efficient method.
To reduce the total amount of memory, we partition the registered vectors into m groups, and realize each group independently [3] , [9] . Figure 5 shows a network using four IGUs. This architecture is called a 4IGU [9] . In this case, we should use independent linear transformations for different IGUs. The importance of the linear transformations will be discussed in Sect. 7.
Next, we show that index generation functions can be realized with a 4IGU. This is more efficient than a single IGU realization when k is large.
Theorem 6.4:
Consider an index generation function with weight k. Then, more than 99.9% of the registered vectors can be realized by a 4IGU, where the number of input variables to the main memory for each IGU is p = log 2 (k + 1) .
(Proof) Let k 1 = k. We assume that, for each IGU, the distribution of the vectors is uniform.
The number of realized vectors is 2 p (1 − e −ξ 1 ). The number of remaining vectors is
). The number of remaining vectors is
. The number of realized vectors is 2 p (1 − e −ξ 4 ). The number of remaining vectors is
When k 1 = 2 p , the fraction of the original vectors that remain is about 1.6 × 10 −6 .
Note that, in the proof, we assumed that IGUs have independent linear transformations, so that the distribution of the vectors are uniform.
Example 6.4:
Consider an index generation function with weight k = 2 20 − 1 = 1048575. Let us realize the function by the 4IGU shown in Fig. 5 . Suppose that the number of inputs to the main memory in each IGU is p = 20. We assume that for each IGU, the distribution of the vectors is uniform. Note that, in a 4IGU, the main memory of each IGU has p inputs and p outputs, while the AUX memory has p inputs and (n− p) outputs. Thus, the total amount of memory for each IGU is
Then, the total memory for the 4IGU is 4n2 p . Thus, when n = 40 and p = 20, the 4IGU requires 4n2 p = 4 × 40 × 2 20 = 167.7 × 10 6 bits. This is more efficient than the single IGU realization in Example 6.3, which requires 2.75 × 10 12 bits.
Definition 6.2:
Let the linear circuit realize the p compound variables:
Then, the transformation matrix is Thus, the numbers of variables to represent two functions f 1 (x 1 , x 2 , x 3 , x 4 ) and f 2 (y 1 , y 2 , x 3 , x 4 ) are the same, and both are two. Next, consider the decomposition chart, where Z 1 = (z 1 , z 2 ), z 1 = x 1 and z 2 = x 2 ⊕ x 3 , are column variables. Figure 6 (right) is the corresponding chart, and let f 3 (z 1 , z 2 , x 3 , x 4 ) be the function. Compared with Fig. 1 , the element 3 is moved to the right in Fig. 6 (right) . The number of variables to represent f 3 (z 1 , z 2 , x 3 , x 4 ) is different from that of f 1 (x 1 , x 2 , x 3 , x 4 ). Note that f 1 (x 1 , x 2 , x 3 , x 4 ) corresponds to the matrix A, f 2 (y 1 , y 2 , x 3 , x 4 ) corresponds to the matrix B, and f 3 (z 1 , z 2 , x 3 , x 4 ) corresponds to the matrix C, in Example 6.5. 
Selection of Linear Transformations
In the previous sections, we assume that IGUs have independent linear transformations. However, when the linear transformations are the same for all the IGUs, the number of registered vectors realized by IGUs will be decreased. In this part, we will prove this using statistical analysis. First, we illustrate the design method for a 4IGU. When X 1 = (x 1 , x 2 , x 3 ) are used for the main memories, four IGUs are necessary to implement the function.
Theorem 7.6:
Let k be the number of registered vectors, and p be the number of inputs to the main memory. Then, the expected number of vectors realized by a 4IGU using the same linear transformations is
where β = . Then, αk = β. No Ball: The probability that a certain bin has no ball after one throw is
The probability that a certain bin has no ball after k throws:
because each throw is an independent event.
One Ball: The probability that a certain bin has one ball after one throw is α. The probability that a certain bin has exactly one ball after k throws:
Two Balls: The probability that a certain bin has two balls after two throws is α 2 . The probability that a certain bin has exactly two balls after k throws:
Three Balls: The probability that a certain bin has three balls after three throws is α 3 . The probability that a certain bin has just three balls after k throws:
In this case, most of the vectors can be realized by a 4IGU as follows: (x 1 , x 2 , x 3 ) as inputs to the main memory, while IGU2 uses X 2 = (x 4 , x 5 , x 6 ) as inputs to the main memory. The registered vectors are divided into three parts, and realized separately as follows:
1. IGU1 stores one element for each non-empty column.
It It realizes the mapping of vectors to index values 20, 10, 6, and 17.
In this case, all the vectors can be realized by three IGUs.
Experimental Results
Realization with 4IGUs
To show the validity of the analysis, we generated 100 random index generation functions with n = 40 and k = 2 20 − 1, and realized them by 4IGUs, where p = 20.
In the experiment, we used the following linear transformations: Let (x 1 , x 2 , . . . , x n ) be the input variables. For the i-th IGU, (y 1 , y 2 , . . . , y p ) were used as the inputs to the main memory, where y j = x j ⊕ x p−i+ j , (1 ≤ j ≤ p). Table 2 compares the estimated values and experimental results. The column labeled Estimated denotes the results that were obtained in Example 6.4. The column labeled Experimental shows the average of 100 sample functions.
In the estimation, the remaining vectors not realized by the 4IGU is only two, that is k 5 = 2. On the other hand, in the experiment, the number of the remaining vectors is 1.82, on the average.
The reasons for the disparity may be
• The approximations in the estimation made an error.
• The registered vectors in the experiment were not truly random.
• The number of sample functions were not sufficient.
In practice, we can easily find a good linear transformation using a minimization tool [12] for the last IGU. Thus, each function can be realized by a 4IGU. The total amount of memory is mn2 p = 4×40×2 20 = 160×2 20 167.8×10 6 . 
Effect of Independent Linear Transformations
In Sect. 7, we showed that independent linear transformations should be used for IGUs. To demonstrate this, we used the previous 100 random index generation functions with n = 40 and k = 2 20 − 1, and realized them by 4IGUs, where p = 20. Table 3 compares the two 4IGU realizations. In the column labeled Same, the same linear transformations are used for four IGUs. In the column labeled Independent, independent linear transformations were used for the different IGUs. The sample functions are the same as that of Table 2 .
The effect is very clear. When the same linear transformations are used for the 4IGU, on the average, 4504.88 vectors remain, which is not far from the estimated value 4562 in Example 7.8. On the other hand, when the independent linear transformations are used for the 4IGU, on the average, only 1.82 vectors remained which is near to the estimated value 2.0 in Example 6.4.
Fast Update of IGUs
Index generation functions often require quick update. For example, in the routers for the internet, registered vectors are often updated in every milli second. In this part, we show a fast method to update registered vectors.
To update the registered vectors quickly, we have to modify IGUs. Figure 8 shows a fast updatable IGU. Two additional outputs are appended to the original IGU shown in Fig. 2 . The first one is the collision detection signal (CD). It shows that the main memory produces non-zero output. CD can be generated by the OR of the outputs of the main memory. The second one is the match signal (MT ). MT shows that the input vector matches to a registered vector in the IGU. With these two signals, an update of the registered vectors becomes quite easy. An update of a registered vector can be done by two steps: When the UPDATE signal is 1, this circuit performs update. During the update, the Busy signal is 1, and the search operation is forbidden. When the update is successful (i.e., the new vector can be stored in one of the IGUs in Fig. 9 ) the fail signal is 0, while when the update is unsuccessful the fail signal is 1.
Let X = (X 1 , X 2 ) be the vector to be updated, and let INDEX(X) be its index. According to the values of the collision detection signal (CD) and the match signal (MT), the following operations are done. The controller works as follows: Then the UPDATE signal is 1, the state is the update mode. When the UPDATE is 1 and INDEX is non-zero, it appends a new registered vec- Table 4 . It has n = 6 variables, and its weight is k = 15. Lets us implement this function by three IGUs each of which has a main memory with three inputs. IGU1 implements the function shown in Table 5 . The inputs are (x 1 , x 2 , x 3 ), and its weight is 8. Note that the main memory and the AUX are combined together.
IGU2 implements the function shown in Table 6 . The inputs are (x 4 , x 5 , x 6 ), and its weight is 5. The index 0 denotes that no vector is stored in the memory.
IGU3 implements the function shown in Table 7 . The inputs are (y 1 , y 2 , y 3 ), where 
and its weight is 2. Note that these three IGUs implement all the registered vectors shown in Table 4 .
Deletion of a vector
To delete a vector, the UPDATE signal to the controller in Fig. 9 is set to 1. Then, the vector to be deleted is set to the inputs of three IGUs. Each IGU checks if the vector is stored or not. If the vector is stored, the match signal (MT ) will be one. Next, the controller identifies the IGU that stores the input vector using (MT 1, MT 2, MT 3) and (CD1, CD2, CD3).
If an IGU store the input vector, both the match signal (MT) and collision signal (CD) will be one.
If there is an IGU that store the input vector, the controller send the WE signal to the corresponding IGU, and rewrite the index to zero.
For example, to delete the first vector in Table 4 , the controller rewrite the the index of the vector (x 1 , x 2 , x 3 ) = (0, 1, 1) in IGU1, to zero.
Addition of a vector
To add a new vector, UPDATE signal is set to 1. Also the index to be add is set to the INDEX of the controller. Then, the vector to be add is set to the inputs of three IGUs. Each IGU checks if a collision exists or not. If the collision exists, the collision signal (CD) will be one.
The controller find the IGU that has no collision with the input vector using the values of (CD1, CD2, CD3). If a vacant location is available in an IGU, the controller write the index to the IGU by setting WE to 1.
Assume that a new vector (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ) = (1, 1, 0, 0, 0, 0) and index = 1 is appended. IGU1 and IGU2 have collisions, but IGU3 has no collision. Thus, the controller writes the index = 1 to the address (y 1 , y 2 , y 3 ) = (1, 1, 0).
Conclusion and Comments
In this paper, we presented a method to implement index generation functions using multiple IGUs. Important results are
• An index generation function with many registered vectors should be realized by an mIGU rather than a single IGU.
• Most index generation function with weight k can be realized by a 4IGU, where p = log 2 (k + 1) .
• In an mIGU, the liner transformations should be independent.
With the result of this paper, we can estimate the size of of the IGUs necessary to implement a given number of vectors.
In the application to the internet, the registered vectors must be updated frequently, but only a short time is available for reconfiguration. With updatable IGUs, we can quickly update the vectors.
In this paper, to insert a new vector, we used the IGU with the smallest index. Although this strategy works well in most cases, it can fail to store a few vectors in the given IGUs. In such a case, we can use one of the following methods:
• Increase the number of IGUs.
• Use an additional small CAM to store the remaining vectors.
• Optimize the linear transformation [12] of the last IGU to store all the remaining vector into the last IGU. Since the number of registered vectors to store in the last IGU is much smaller than its capacity, all the remaining vectors can be stored in the last IGU. Conjecture 3.1 shows a sufficient number of variables to represent a given index generation function.
