Abstract.This paper presents a heuristic cost minimization approach to synthes izing linear reversible circuits. Two bidirectional linear reversible circuit syn thesis methods are introduced, the Alternating Elimination with Cost Minimiza tion method (AECM) and the Multiple CNOT Gate method (MCG). Al gorithms, example syntheses, and extensions to these methods are presented. An MCG variant which incorporates line reordering is introduced. Tests com paring the new cost minimization methods with the best known method for large circuits are presented. Results show that of the three methods MCG had the lowest average CNOT gate counts for linear reversible circuits up to 24 lines, and that AECM had the lowest counts between 28 and 60 lines.
Introduction
Linear reversible circuits, which are circuits which employ only controlled-NOT (CNOT) gates, play a fundamental role in both reversible and quantum computing. The most basic form of linear reversible circuit synthesis is built on a GF(2)-based variant of Gaussian Elimination which uses an invertible n×n Boolean matrix M as its input and produces an n×n linear reversible circuit as its output [1, 2] . Using GF (2) in put vector x and output vector y this circuit performs the function y = Mx. Synthesis using Gaussian Elimination produces circuits with O(n (2) matrix inversion [3, 4] to the case of linear reversible circuit synthesis. By using a strategy of processing two or more matrix columns simultaneously, "Al gorithm 1" produces circuits with O(n 2 /log2n) CNOT gates in O(n 3 /log2n) time. Since the introduction of "Algorithm 1" little else has been published about improving the gate count in linear reversible circuit synthesis. While "Algorithm 1" is "asymptotic ally optimal up to a multiplicative constant" [1] this algorithm is too simplistic to find an exact minimum solution to the function y = [x1, x1  x2, x1  x2 x3, x1 x2 x3 x4] As an alternative to "Algorithm 1", two bidirectional linear reversible circuit syn thesis methods -the Alternating Elimination with Cost Minimization method (AECM) and the Multiple CNOT Gate method (MCG) -are introduced herein. These methods were developed to more efficiently synthesize circuits of up to 64 lines (a.k.a. wires) which do not use ancilla lines. The efficiency comparison of AECM, MCG and "Algorithm 1" is shown in the Section 5.
AECM is built on the Alternating Elimination method [2] while MCG is based solely on cost minimization and is outside the Gaussian Elimination family of meth ods. Alternating Elimination extends the Gaussian Elimination approach of forward substitution and backward elimination to process diagonal matrix elements iteratively, in each iteration solving a diagonally intersecting row and column. Since there are n! possible orderings of diagonal matrix elements, Alternating Elimination can generate a large number of functionally equivalent circuit solutions which have a range of CNOT gate counts.
Both AECM and MCG lend themselves to parallelization and can be extended to perform deeper, albeit slower, syntheses. In comparison with "Algorithm 1", which sequentially solves columns before rows, AECM and MCG solve rows and columns in a data dependent, nondeterministic order which provides the greatest cost reduc tion. Both AECM and MCG determine cost by means of a heuristic function which depends on a GF(2) linear function's matrix and its inverse. When synthesis begins these matrices correspond to the input function specification, and as CNOT gates are synthesized these matrices correspond to a remainder function more closely resem bling the identity matrix. The main heuristic cost function is defined as is the number of differences between a linear function's matrix and the identity mat rix. The second is the number of differences between the inverse of that linear func facilitates synthesis of a permutation of an input linear reversible function specifica tion which will be discussed in Section 3.
The organization of this paper is as follows. Section 2 introduces and discusses al gorithms for AECM and MCG. Section 3 illustrates algorithm flow of a nonconver gent MCG synthesis of a 5×5 linear function; next it compares AECM, MCG, and "Algorithm 1" synthesis of a 6×6 linear function; and lastly it illustrates line-reorder ing MCG synthesis. Section 4 shows numerical results of comparative testing of the different synthesis methods. Section 5 briefly discusses additional strategies to im prove AECM and MCG. Section 6 concludes the paper.
2
Algorithms and Discussion
Algorithms
The 
Discussion
Because AECM is based on Alternating Elimination, it will always converge to a solution for any linear function matrix given as input [2] . The AECM method iterat ively compares O(n) matrix diagonalizations and then commits to the diagonalization which produces the greatest cost reduction per CNOT gate ratio. In comparison with MCG, which requires the cost of the remainder function to be lower with each itera tion, AECM can commit to using CNOT gate sequences which causes the cost to in crease and therefore cannot be trapped in a local minimum.
The AECM diagonalization function has four stages, and in each stage the changes in cost of choosing candidate CNOT gates are compared. The first stage performs pre processing through an O(n) search to find row and column forward substitutions which lower the cost by at least two. Using a cost reduction of at least two is based on testing which showed that in over half the syntheses examined it produced lower CNOT gate counts than using a cost reduction of at least one or skipping the prepro cessing stage. Using a cost reduction of at least three was in some instances superior and in other instances inferior to using at least two. Each CNOT gate synthesized in this stage replaces two or more CNOT gates which would have been synthesized in the third and fourth stages.
If the diagonal matrix element associated with the current iteration is a 0, then a second stage is used which performs forward substitution. In this second stage either a row or column forward substitution is chosen through an O(n) search to find the CNOT gate which establishes a 1 on the diagonal and results in the lowest cost re mainder function. When it is necessary to perform a forward substitution, a check is made to ensure that the forward substitution CNOT gate was not synthesized in the first stage. This situation is unusual but possible. In these cases the CNOT gate list can be rearranged to detect pairs of identical CNOT gates. Because CNOT gates are self-inverse [2] , all detected identical CNOT gate pairs can be erased.
In the third stage O(n) row-based backward eliminations are performed to process column elements which are equal to 1. Unlike Gauss-Jordan Elimination which per forms eliminations using the diagonally intersecting row, here each row elimination employs an O(n) search to find the lowest cost backward elimination operation. Simil arly in the fourth stage O(n) column-based backward eliminations are performed to process row elements which are equal to 1, each employing an O(n) search to find the lowest cost backward elimination operation.
Performing one row or column addition with a cost difference computation takes
. In order to support partial syntheses the AECM algorithm uses the parameter threshold. Using AECM with threshold = 0 causes a complete synthesis to be per formed. Using AECM with larger threshold values causes synthesis to terminate when the cost of the remainder function c1 goes below threshold. In the algorithm's outermost loop, CNOT gate selection is performed by comparing gain3 with gain2. for remainder function F at iteration k. The AECM algorithm can be extended to handle occurrences in which these ratios are equal, thus facilitating algorithm exten sions such as recursion and probabilistic gate selection.
The MCG synthesis method performs synthesis with linear functions composed of two CNOT gates, but in general this approach can be extended to three or more CNOT gates at the expense of increased computation time. The two-CNOT-gate func tions can be categorized as one of three types: 1) functions of two elementary row op erations corresponding to two CNOT gates synthesized from output towards input; 2) functions of two elementary column operations corresponding to two CNOT gates synthesized from input towards output; 3) functions of one elementary row operation and one elementary column operation representing one CNOT gate synthesized from output towards input and another synthesized from input towards output. The MCG method iteratively compares the cost of applying all possible two-CNOT-gate func tions and commits to the pair of CNOT gates which produces the greatest reduction in cost. In the event that the cost reaches a local minimum, synthesis temporarily switches to AECM until cost drops below the local minimum cost. In this situation a flag is set indicating that MCG failed to converge and MCG synthesis resumes. In each iteration MCG retrieves two-CNOT-gate functions from an O(n 
. Like AECM, MCG can be extended to perform more sophisticated gate selections in iterations where multiple minimum-cost alternatives exist. This will be demon strated later using a probabilistic gate selection. Also, MCG's speed can be improved by using precalculated two-CNOT-gate functions. In the above MCG algorithm all possible CNOT gate sequences are generated, and many will be redundant. For in stance, the two-CNOT-gate function CNOT(1, 2) followed by CNOT(3, 4) is equival ent to CNOT(3, 4) followed by CNOT (1, 2) . If MCG is extended to use three-CNOTgate functions, a greater variety of redundant sequences will be generated. The following summarizes MCG synthesis of the linear function in Fig. 2 to pro duce the linear reversible circuit in Fig. 6 . Initially the convergence flag is set to true and cost is computed to be 20. In the first iteration the search for a two-CNOT-gate function which would lower the cost to 19 or less fails. The convergence flag is set to false and AECM is called to perform a partial synthesis with threshold = 19. AECM selects the first row and column to be diagonalized and it synthesizes four CNOT gates, a1 through a4, resulting in the remainder function shown in Fig. 3 . The cost of the remainder function and its inverse is 16. In the second iteration gates b1 and b2 are found to reduce the cost to 11, resulting in the second remainder function shown in Fig. 4 . In the third iteration gates c1 and c2 are found to reduce the cost to 5, resulting in the third remainder function shown in Fig. 5 . In the fourth iteration gates d1 and d2 are found to reduce the cost to 0, resulting in both the remainder function and its in verse becoming equal to the identity matrix indicating synthesis is complete. The final CNOT gate count of the circuit in Fig. 6 is 10, which is one gate above the exact min imum. The convergence flag plays no role in synthesis but was created to be used in statistics that correlate the increase in total CNOT gate count with nonconvergence.
The linear function in Fig. 2 is unusual as tests show that MCG typically converges for a majority of linear functions representing circuits of 32 lines or less. A linear function, introduced in [1] , for which MCG converges is shown in Fig. 7 . Fig. 8 illus trates MCG, AECM, and "Algorithm 1" syntheses of this linear function. The total CNOT gate counts are 12 for MCG, 13 for AECM, and 15 for "Algorithm 1". which is an alternative cost function based on the sparseness of a linear function and its inverse. This creates a synthesis method that incrementally approaches a low-cost permutation of the identity matrix as it searches for efficient two-CNOT-gate func tions. An approach of synthesizing a permutation of a linear function is useful when the output line order is flexible or when the cost of a SWAP gate is negligible in com parison to the cost a CNOT gate. The resulting line-reordered MCG circuit shown in Figure 9 employs only eight CNOT gates. The circuit realization of the linear function in Figure 10 can be described as the output vector yPermutation = [y2, y1, y4, y6, y3, y5] T . There are some complications in using line-reordering MCG not described in the above MCG algorithm. The last operation in the MCG algorithm transfers CNOT gates in the output-side CNOT gate list to the input-side CNOT gate list. In the linereordering MCG variant this transfer must take into account the final permutation re mainder function. Therefore for each Moutput CNOT gate matrix an Minput CNOT gate matrix is computed as Minput = P · Moutput · P -1 . Permutation matrices must be stored with CNOT gate lists for verification and circuit integration purposes. Verification of linear reversible circuit synthesis is usually a straightforward task of applying a CNOT gate list to the identity matrix and testing the result and the input linear revers ible function specification for equivalency. In line-reordering MCG verification, each CNOT gate list must apply the associated permutation to the input linear function be fore testing.
Tests
The first set of tests was performed for circuits with 8 to 64 lines, and these results are shown in Table I . For each dimension of lines, 100 randomized linear reversible circuits were synthesized with multiple methods. The n-wire circuit randomization function used 2n 2 operations on the identity matrix, and each of these operations rep resented either a random distant CNOT gate or a random distant SWAP gate. The MCG method became increasingly slow as the number of lines increased, so MCG testing was stopped at 40 lines. The average CNOT gate count results showed that MCG tended to outperform AECM in functions of up to 24 lines, though which meth od was best was data dependent. AECM tended to outperform MCG from 28 through 60 lines. At 64 lines "Algorithm 1" tended to outperform AECM, though which meth od was best was data dependent. The second set of tests compared the three synthesis methods with exact minimum syntheses of all 9999360 linear functions of size 5×5 [5] . Table 2 shows the frequency distribution of total CNOT gate counts from exact minimum synthesis of all 5×5 lin ear functions, a majority of which employ either eight or nine CNOT gates. The res ults showed that MCG achieved the exact minimum gate count 7175807 (71.76%) times, AECM achieved the exact minimum gate count 5886350 (58.9%) times, and "Algorithm 1" achieved the exact minimum gate count 474738 (4.75%) times. MCG failed to converge in 89 (< 0.001%) linear functions, each time producing a gate count above the exact minimum. This provided strong evidence that nonconvergence in MCG is correlated with increased CNOT gate counts. The last set of tests compared 1000 syntheses of the 16×16 linear function shown in Fig. 11 using probabilistic variations of AECM and MCG named AECMP and MCGP respectively. In these synthesis variations whenever two candidate CNOT gate sequences are compared and found to be of equal cost or gain, one CNOT gate se quence is chosen at random and the other discarded. The purpose of this test was to examine the typical range of total CNOT gate counts produced from AECM and MCG and determine the possible benefits from a multiple-pass approach. The results shown in Table 3 indicated that MCGP showed a difference of 18 CNOT gates between the best and worst syntheses, and the typical spread around the median syn thesis was just under four CNOT gates. The results for AECMP indicated a difference of 11 CNOT gates between the best and worst syntheses, and the typical spread around the median synthesis was just above three CNOT gates. Comparing the results it appears that MCGP and, to a lesser degree, AECMP both are likely to benefit from a multiple-pass approach, avoiding a potential high total CNOT gate count resulting from a single-pass synthesis. 
Additional Strategies
A preprocessing strategy can be employed in some linear functions to speed up synthesis. For instance, functions with linearly separable components can be mapped into multiple smaller matrices, individually synthesized, and then mapped back to the entire circuit. If any of these smaller matrices are permutation matrices they can be quickly optimally synthesized.
A postprocessing strategy can be employed with all linear reversible circuits. Any section of a synthesized linear reversible circuit that uses a maximum of five lines can be mapped to a 5×5 matrix and synthesized with an exact minimum table. AECM lends itself to this kind of optimization because at some point during processing it will have exactly five diagonal matrix elements to process, whereas MCG may partially process all rows and columns before completing a single diagonalization.
Conclusion
The bidirectional linear reversible circuit synthesis methods AECM and MCG were introduced. The main heuristic function used to represent cost in AECM and MCG was introduced, as was an alternative cost function. The alternative cost func tion was used to synthesize a permutation of an input linear reversible function spe cification, thus eliminating the need for subsequent line reordering. Both of these methods outperformed "Algorithm 1" in the majority of synthesis tests for circuits with less than 64 lines. All test results were verified to be accurate. Probabilistic ver sions of AECM and MCG were introduced and shown to benefit from a multiple-pass approach.
Although use of "Algorithm 1" seems ideal when the goal is to quickly synthesize thousands of large circuits, the test results indicate that other methods such as AECM and MCG are recommended for smaller circuits, especially when given a significant amount of processing time. Future work in this area will be to use elements from the "Method of the Four Russians" GF(2) matrix inversion approach and other search strategies with the cost minimization approaches introduced here.
