Introduction
Shannon [18] introduced the concepts of confusion and diffusion as a fundamental technique to achieve security in cryptographic systems. The confusion principle is reflected in the nonlinearity of Boolean functions, since most linear systems are easily breakable. There are various criteria that imply nonlinearity, one of them being bentness. Bent functions were first introduced by Rothaus in 1976 [15] , as functions having maximum distance away from the set of affine functions.
Bent Boolean functions have the highest nonlinearity possible, which makes them useful in the design of block and stream ciphers. Maximum length sequences based on bent functions have cross-correlation and autocorrelation properties that are close to the ones of Gold and Kasami codes [12] , which have applications in spread spectrum communication [6] .
While we can mathematically define bent functions precisely, to generate them it is a different matter. One needs sophisticated mathematical (like invariant theory) and computational tools to list all n-variable bent functions (this has been achieved for n ≤ 8). Some of these methods cannot be easily parallelized, and do not offer a significant improvement in a reconfigurable environment.
Using the SRC-6 reconfigurable computer, we have tested millions of Boolean functions. Specific sets of Boolean functions were chosen based on their specific properties, including degree, homogeneity, and symmetry. These groups were evaluated for relationships between nonlinearity and specific properties. The objective is to find groups of Boolean functions that are rich in bent functions [1] . These groups, if small enough, can be tested exhaustively. Testing across the entire set of functions, even for small numbers of variables, e.g., n = 6 or more, is infeasible because of the large number of functions. The use of the transeunt triangle enables functions to be generated easily in one form, converted to another form and then tested for certain characteristics. Without the transeunt triangle [2] , [4] , important groups of functions could not be tested efficiently. Example 2.4. Rothaus [15] showed that bent functions have nonlinearity
Background and Definitions
.
(End of Example)
The property of "bentness" depends on the function's truth table representation. However, another representation provides alternative insight into bentness and allows a reduction in the number of functions that must be searched during bent function discovery. 1 , a 2 , . . . , a n ), c a , a i ∈ F 2 , x 0 i = 1, and Because of f = x 1 x 2 ⊕ x 3 x 4 , Theorem 2.1 does not hold for n = 4. Qu, Seberry, and Pieprzyk [14] found 30 homogeneous 6-variable bent functions of degree 3, and so, Theorem 2.1 does not hold for n = 6. Therefore, from [14] , [20] , for n > 6, degreen 2 n-variable bent functions exist, but none are homogeneous. More recently, Meng et al. [11] showed (purely combinatorially) that, for any nonnegative integer k, there exists a positive integer N , such that for n ≥ N , there do not exist 2n variable homogeneous bent functions having degree n−k or more, where N is the least integer satisfying 2
) .
Architecture of Bent Function Enumerator
A reconfigurable computer allows one to adapt the architecture to the problem. Fig. 1 shows the architecture to enumerate bent functions based on the ANF of the tested functions. This and other variations yield the data we present later. In all cases, a counter was used to enumerate prospective functions. This is shown on the left. This is applied to a block labeled Transeunt Triangle. In this case, the counter enumerates ANFs; each bit of the counter determines the presence or absence of a term in the ANF. The transeunt triangle produces the corresponding truth table. This is then applied to a block that computes the function's nonlinearity, N L. If N L is maximum, the function is bent, and it is stored. 
NL

Maximum
NL?
Fig. 1. Bent function enumeration circuit
In the SRC-6 reconfigurable computer, this circuit is implemented on a Xilinx Virtex2 Pro FPGA. It is pipelined and runs at 100 MHz. Specifically, one function is tested every clock cycle. We used this to enumerate all 6-variable bent functions [16] . If we had to enumerate all 2
19 6-variable functions, this would take 5,849 years. However, Rothaus [15] showed that no bent function has degree greater than n 2 . By eliminating functions with degree greater than n 2 , it is only necessary to enumerate 2 ( 35 . At one class (function) per 100 MHz clock period, this enumeration takes only 5.7 minutes plus 0.5 minutes for data transfer for a total of 6.2 minutes. That is, by enumerating only the affine classes corresponding to functions of degree 3 or less, we achieve a reduction of Table   Distance vectors to affine functions
Fig. 2. Nonlinearity circuit
Both the Ones Count and the Minimum circuit in Fig. 2 are trees. Fig. 3 shows that, in the case of the Ones Count circuit, adders of various sizes form the circuit. Fig. 4 shows that, in the case of the Minimum circuit, two-input one-output minimum circuits are used. 
Fig. 3. Ones Count circuit
The part of the circuit in Fig. 1 that has the ANF as input and the "store" signal as output is combinational. However, its delay is larger than the SRC-6's 100 MHz clock period, and so, it is pipelined. For n = 6, the pipeline stages are shown in Table 1 . 
. Minimum circuit
All of this is implemented on the FPGA and is described in Verilog. The counter that produces the ANF in Fig. 1 is implemented in C code that is compiled into a circuit on the FPGA. This and overhead circuitry require 6 more pipeline stages. Therefore, there are a total of 14 stages.
As discussed earlier, there are 2 35 = 3.4 × 10 10 iterations. Therefore, the 14-clock latency is miniscule in comparison to the total computation time. Even reducing the latency to 0 (no pipeline) would yield no perceptible reduction in computation time. The above discussion applies to the bent function enumeration that is described in Section 5.2. Other enumerations, such as the distribution of nonlinearity of 8-variable rotation symmetric Boolean functions described in Section 5.4, for example, correspond to somewhat different circuits (e.g. do not use the transeunt triangle). However, the same conclusion holds; the latency has an imperceptible affect on the computation time.
An examination of Figs. 1-4 reveals why a reconfigurable computer is much more efficient than a conventional computer in computing bent Boolean functions. The Ones Count circuit requires many small adders that can be used simultaneously. An FPGA can realize these, albeit at an increased delay, compared to a conventional computer. A conventional computer has only a few large wordwidth adders. The large wordwidth is not used efficiently. Similarly, the Minimum circuit requires many comparators that can be used simultaneously on an FPGA, but are much less abundant on a conventional computer.
The Transeunt Triangle
Definition
Green [9] and others [2] , [3] , [4] , [8] , [19] propose the transeunt triangle as a means to derive the ANF from the truth table of a given function and, in so doing, produce compact exclusive OR sum-of-products circuits. In this paper, we show the benefit of the transeunt triangle in a computational application. Not only can the ANF be computed from the truth (End of Example) * . Green [9] and others [4] define the transeunt triangle to be the logic values at the inputs and outputs of the 2-input 1-output exclusive-OR gates. We define it to be a circuit of exclusive-OR gates. Green [9] did not prove that the transeunt triangle converts a truth table representation to an ANF representation. We do so now. The following result from [10, p. 68] will be used in our proof.
The Transeunt Triangle Proof
Theorem 4.2 (Lucas). Let p be a prime number, and two integers represented in base p, namely n
The main result of this section is as follows.
Theorem 4.3. If the input to the transeunt triangle is the truth table representation of an n-variable function f , then the output is the ANF representation of f . Conversely, if the input to the transeunt triangle is the ANF representation of an n-variable function f , then the output is the truth table representation of f .
Proof: The second statement follows from the first because the logic values in the transeunt triangle are unchanged if all exclusive-OR gates are rotated 120 degrees clockwise (thus exchanging the input with the output). We prove the first statement by induction. Fig. 6a shows that the first statement is true for all functions on n = 1 variable.
Assume the first statement is true for n, and consider an n + 1-variable transeunt triangle. Fig. 6b shows that there are two n-variable transeunt triangles embedded in this transeunt triangle. Applied as an input to the lower one is f 0→x 1 , shown as f 0 in Fig. 6b . By the inductive assumption, the output of this transeunt triangle is the ANF representation of f 0→x 1 .
We now show that f 0→x1 ⊕f 1→x1 is applied as an input to the upper transeunt triangle. Let α be an assignment of values to x 2 , x 3 , . . ., and x n . Then, each input to the upper triangle is driven by a (2 
Fig. 6. Transeunt triangle composition
transeunt triangle whose inputs assignments range from 0α through 1α. This is shown in Fig. 6b as a dotted-line triangle. For example, the left input is driven by a transeunt triangle whose 2 n−1 +1 inputs are 00 . . . 000, 00 . . . 001, . . ., 01 . . . 111, and 10 . . . 000 where α = 0 . . . 000. Consider one triangle, and index its inputs by i, for 0 ≤ i ≤ 2 n−1 . The output of this triangle is the exclusive-OR of some number of its inputs. The number of times an assignment appears in the exclusive-OR expression of the inputs is the number of paths from that input to the output. This is just
there is exactly one path to the triangle's output, and these two inputs appear once in the exclusive-OR expression. Consider i, such that 0 < i < 2 n−1 . We use Theorem 4.2. Since n = 2 k−1 , n i = 0 for all i, except that n k−1 = 1. For 0 < r < n = 2 k−1 , there is at least one j such that
. Thus, the number of paths from any assignment of values in the truth table input to the root is even. It follows that the only terms that occur are 0α and 1α. We can conclude, therefore, that the input to the upper transeunt triangle in Fig. 6b is the truth table representation of f 0→x 1 ⊕ f 1→x 1 , shown in this figure as f 0 ⊕ f 1 .
By the inductive hypothesis, the output of the upper transeunt triangle is the ANF representation of f 0→x 1 ⊕ f 1→x1 . The input to the n + 1-variable transeunt triangle in Fig. 6b is the truth table representation of f 0→x 1x 1 ∨ f 1→x1 x 1 . The output is the ANF representation of f 0→x1 ⊕ (f 0→x 1 ⊕ f 1→x 1 )x 1 , which represents the same function.
Reduced Transeunt Triangle
We note that, in Fig. 6b , only one of the dottedline triangles embeds a transeunt triangle (left dotted-line triangle). That is, all but one of these triangles can be replaced by a single 2-input 1-output exclusive-OR gate. Doing this yields the reduced transeunt triangle. Fig. 5b shows the reduced transeunt triangle for n = 3. In this case, only 12 2-input 1-output exclusive-OR gates are needed, compared to 28 gates for the full transeunt triangle. Proof: The number of gates in the full transeunt triangle is
. The number of gates r n in the reduced transeunt triangle is given by the recurrence relation r n = 2r n−1 + 2 n−1 , with initial condition r 1 = 1. Solving yields r n = n2 n−1 . The fact that this is the smallest possible can be seen as follows.
Order the inputs so that they are in lexicographical order, 00 . . . 00, 00 . . . 01, . . ., and 11 . . . 11, and construct a minimal balanced transeunt triangle so that the outputs are in lexicographical order. Each output bit indexed by o 1 o 2 . . . o n is the exclusive OR of all input bits indexed by i 1 , i 2 , . . . i n , such that i j ≤ o j . For example, output bit 00 . . . 00 is driven by input bit 00 . . . 00, and no gate is needed. Output bit 00 . . . 01 is driven by a gate with input bits 00 . . . 00 and 00 . . . 01, and one gate is needed. Specifically, each output bit is the root of a full binary tree, where the leaves are driven by input bits whose index is the same as the output node's index where some 1's may be changed to 0's. Input bit 00 . . . 00 is in the binary tree of every output. Let wt (o 1 o 2 . . . o n ) It follows that the number of gates in a balanced transeunt triangle is bounded below by the total number of 1's among all binary n-tuples, which is n2 n−1 .
In addition, the reduced transeunt triangle yields smaller delay than the full transeunt triangle. It is straightforward to show the following. Since the full and reduced transeunt triangles are balanced, the delay to an output from any of the inputs is identical.
Experimental Results
Speed-up Achievable by the Reconfigurable Computer
We compare the computation time required by an SRC-6 reconfigurable computer with the time required by a conventional computer. In our case, this is an Intel Xeon processor running at 2.8 GHz, which is one of two conventional microprocessors associated with the SRC-6. The program, written in C, computes the nonlinearity of nvariable functions, forming the distribution of functions to nonlinearity. Similarly, the time it takes to do the same calculation on the SRC-6 can be calculated since the throughput is one function per clock period. The results are shown in Table 3 . Speed-up factors range from 39.9× for n = 2 to 62,111× for n = 8. Note that the speed-up factor should nearly quadruple for each increase in n by 1. On the PC, the computation time doubles for each increase in n because the number of affine functions doubles. Similarly, the number of Ones Count operations also doubles. However, on the SRC-6, the circuit size increases; the throughput of one function per clock cycle remains the same. The computation times for 2 ≤ n ≤ 5 shown in Table 3 were achieved by programs that enumerated all 2 2 n n-variable functions. The computation times for 6 ≤ n ≤ 8 for the PC were obtained by running the C program over a fraction of the functions and then prorating to compute the time had all functions been enumerated. Although the computation time on the SRC-6 for these values of n is much less, it is still excessive, and this computation could not be done. However, the speed-up applies when we enumerate a sufficiently small subset of all functions. For example, we enumerated all 6-variable functions with degree 3 or less and, in so doing, enumerated all bent functions [16] using the theorem by Rothaus [15] . As discussed in Section 3, this computation required 6.2 minutes. Had this computation been done on the PC, it would have taken 6805.9× (5.7 mins.) longer or 27 days. We achieved the 62,111 speed-up associated with n = 8 in Table 3 in computing the distribution of rotation symmetric functions, as described in Section 5.4.
Number of 6-Variable Bent Functions
The computation described in the previous section verified Preneel's [13] result that there are 5,425,430,528 bent functions on 6 variables. We showed further, that 1,777,664 of these functions or 0.03% have degree 2. All of the remaining have degree 3. Table 4 shows the resource usage on the Xilinx Virtex2 Pro. 
Nonlinearity of 6-Variable Homogeneous Boolean Functions
In the search for trends in Bent function properties, it is useful to examine the nonlinearity distribution of homogeneous Boolean functions. There are ∑ 6 k=0 (2 ( 6 k ) − 1) = 1, 114, 237 6-variable homogeneous functions. Fig.  7 shows the distribution of 6-variable homogeneous functions to nonlinearity and degree, as computed on the FPGA. The vertical axis shows the log 2 number of functions. For example, there are 63 homogeneous functions of nonlinearity 0 and degree 1; these are the linear functions.
The bent functions have nonlinearity 28, and Fig. 7 shows there are two different degrees. 13,888 have degree 2 and 30 have degree 3. The next largest nonlinearity is 23, and again, functions exist with only degrees 2 and 3. For degrees 3, 4, and 5, there is bell-like distribution across nonlinearity. This information, when combined with the same data for higher n, could lead to further reduction in the number of test functions resulting in the ability to find more bent functions without increasing computation time. Fig. 7 . Distribution of homogeneous 6-variable functions by nonlinearity and degree Table 5 shows the resource usage on the Xilinx Virtex2 Pro. , x 2 , x 3 , . . . , x n ) = f (x n , x 1 , x 2 , . . . x n−1 ).
In a rotation symmetric function, "rotating" an assignment of values to the variables leaves the function unchanged. Rotation symmetric functions have interesting properties [6] and there is evidence to suggest that this class is rich in bent functions. It is conjectured [7] that the weight and nonlinearity of a third degree homogeneous rotation symmetric function are identical. Fig. 8 shows the distribution of 8-variable rotation symmetric functions to nonlinearity. This shows that more rotation symmetric functions have nonlinearity around 110 than other values. Relatively few have low nonlinearity (0 -75) or high nonlinearity (> 113). This distribution resembles the distribution of nonlinearity to all functions, which is known only for n = 4 [1] . Table 6 shows the resource usage on the Xilinx Virtex2 Pro. 
