The synthesis of reversible functions has been an intensively studied research area in the last decade. Since almost all proposed approaches rely on representations of exponential size (such as truth tables and permutations), they cannot be applied efficiently to reversible functions with more than 15 variables.
Introduction and Background
Given a bijective function f : IB n → IB n , also called a reversible function, synthesis describes the problem of determining a circuit composed of reversible gates that realizes f . If this circuit consists of exactly n signal lines, the synthesis is called ancilla-free. In the last decade, ancilla-free synthesis approaches have been presented that start from a reversible function represented e.g. as truth tables (Miller et al., 2003) , permutations (Shende et al., 2003) , or Reed-Muller spectras (Maslov et al., 2007) . Since all these representations are always of exponential size with respect to n, the respective algorithms do not scale well and are thus not efficiently applicable to large reversible functions. Reversible functions and circuits play an important role in quantum computing (Saeedi and Markov, 2013) and low-power computing (Landauer, 1961; De Vos, 2010; Bérut et al., 2012) .
One of these truth table based algorithms was presented in (De Vos and Van Rentergem, 2008) . . , x n ) = (y 1 , . . . , y n ) updates the value of the input variable x i with respect to a Boolean control function c : IB n−1 → IB that is defined on all other variables. That is, the target gate computes a new value at the target output y i = x i ⊕ c(x 1 , . . . , x i−1 , x i+1 , . . . , x n ) and leaves all other output variables unaltered, i.e. y j = x j for j = i. Figure 1 shows the diagrammatic representation of a single-target gate based on Feynman's notation (1985) .
The algorithm described in (De Vos and Van Rentergem, 2008 ) makes use of the property that each reversible function f : IB n → IB n can be decomposed into three IB n → IB
(1) such that f ′ does not change in x i . Based on the truth table representation of f , the algorithm determines two control functions l and r from which f ′ = T [l, i] • f • T [r, i] can be determined, since single-target gates are self-inverse. Applying the decomposition in (1) for each variable results in 2n − 1 single-target gates 1 which composed as a circuit realize f . The algorithm always traverses the whole truth table and is therefore exponential with respect to the number of variables. Consequently, it cannot efficiently be applied to large functions.
The contributions described in this paper are as follows:
• We propose an algorithm based on a symbolic function representation to determine the control functions l and r from (1). The algorithm makes use of Boolean operations that can efficiently be implemented using binary decision diagrams (BDDs). Since BDDs allow for a more compact representation of many practical functions, the efficiency of the proposed algorithm is increased and can therefore be applied to larger functions.
• In (De Vos and Van Rentergem, 2008) only one variable ordering for the sequential application of (1) has been considered, i.e. the natural ordering x 1 , x 2 , . . . , x n . We investigate different heuristics to find variable orderings that allow for more compact synthesis results.
• The original algorithm is not applicable to partial functions, i.e. functions for which not all input/output mappings are specified. Our algorithm can be adjusted in order to support partial functions.
• Finally, we provide open source implementations for both the original truth table based approach and our proposed approach.
The majority of the proposed synthesis algorithms use Toffoli gates (Toffoli, 1980) as underlying gate library. Our proposed method uses single-target gates which can be transformed into a cascade of Toffoli gates. However, since the lower bound for required Toffoli gates in a reversible circuit is exponential with respect to the number of lines (Maslov et al., 2005; Soeken et al., 2014a) , the use of Toffoli gates in large reversible circuits may not be convenient. In contrast, the upper bound for required single-target gates in a reversible circuit is linear as described above. The algorithms presented in this paper do not necessarily consider a specific target technology. Hence, circuits obtained from our synthesis method are a reasonable intermediate gate-level representation for large reversible functions (Abdessaied et al., 2014) . It remains for future work to show how appropriate technology mappings can be derived from single-target gates. In the experimental evaluation we are making use of straight-forward mapping techniques.
Related work
In (Soeken et al., 2012b) another algorithm for synthesizing large reversible functions without adding ancilla lines has been proposed that is based on quantum multiple-valued decision diagrams (Miller and Thornton, 2006) . Compared to the approach presented in the present paper, this algorithm uses a different method to determine Toffoli gates. Our experimental evaluations show that with our technique smaller circuits with respect to the number of Toffoli gates and quantum cost can be found. The algorithm proposed in (Wille and Drechsler, 2009 ) also uses BDDs to find reversible circuits. However, the algorithm uses irreversible functions as input and embeds them into reversible functions implicitly using a hierarchical approach. The approach produces an enormous number of additional helper lines which are still far beyond from the theoretical upper bound. In contrast, the algorithms presented in this paper make use of a scalable exact embedding algorithm (Soeken et al., 2014b) and hence guarantee synthesis without adding ancilla lines.
In (Saeedi et al., 2010) a cycle-based approach to synthesize reversible functions is proposed. In this approach a decomposition algorithm is first used to extract building blocks from the input specification of the function. The input specification is represented as a permutation and is decomposed into smaller permutations, so called k-cycles, until building blocks can be used for synthesis. However, this method needs all input assignments to derive the required k-cycles and as a consequence it has the same limitations as truth table based approaches and cannot be applied to large functions. In our approach the function is decomposed symbolically using the co-factor representation of the function obtained from a BDD, which allows us to find cycles without necessarily traversing the whole input space. A similar approach has been presented in (Sasanian et al., 2009 ) but shares the same limitations as it is also based on an exponential function representation.
Outline
The remainder of the paper is organized as follows. Preliminaries are given in the next section and Sect. 3 reviews the truth table based decomposition technique from (De Vos and Van Rentergem, 2008) . Section 4 illustrates the general idea of the proposed algorithm while Sect. 5 shows special BDD operations that are commonly used in the description of the algorithm in Sect. 6. Optimization techniques targeting efficiency and the number of gates are presented in Sects. 7 and 8, respectively. The algorithm is extended for partial functions in Sect. 9. Experimental evaluations are presented in Sect. 10 before the paper is concluded in Sect. 11.
x 1 x 2 y 1 y 2 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 Fig. 2 . Truth table for the function f (x1, x2) = (x1 ⊕ x2, x1 ∧ x2)
Preliminaries
In this section, we introduce notation for Boolean functions, binary decision diagrams, reversible functions, and reversible circuits.
Boolean Functions
Definition 1 (Boolean function). Let IB def = {0, 1} denote the Boolean values. Then we refer to
as the set of all Boolean multiple-output functions with n inputs and m outputs, where m, n ≥ 1.
We write B n def = B n,1 for each n ≥ 1 and assume that each f ∈ B n is represented by a propositional formula over the input variables x 1 , . . . , x n . Furthermore, we assume that each function f ∈ B n,m is represented as a tuple f = (f 1 , . . . , f m ) where f i ∈ B n for each i ∈ {1, . . . , m} and hence f ( x) = (f 1 ( x), . . . , f m ( x)) for each x ∈ IB n . Output variables of a function are denoted y 1 , . . . , y m .
Example 1. Figure 2 shows the truth table of the Boolean multiple-output function
that has two input and two output variables. It represents the functionality of a halfadder.
Boolean matrices can be represented by a Boolean function as described by the following definition.
Definition 2 (Boolean matrix). A Boolean-valued 2 n × 2 m matrix A can be represented by a Boolean function f A ∈ B m+n with
where A r,c denotes the element at row r = 
Definition 3 (Co-factors). Given a Boolean function f ∈ B n over the variables x 1 , . . . , x n and a variable x i we define the positive co-factor f xi ∈ B n−1 and the negative cofactor fx i ∈ B n−1 as
and
respectively.
Definition 4 (Smoothing operator). Given a Boolean function f ∈ B n and an input x i of f , the smoothing operator ∃x i (Touati et al., 1990 ) is defined as the disjunction of both co-factors, i.e.
That is, the smoothing operator returns a function that does not depend on the variable x i anymore. Informally one can describe the smoothing operator by replacing all occurrences of x i andx i with don't cares. The smoothing operator can e.g. be used for matrix multiplication as illustrated by the following lemma.
Lemma 1 (Touati et al., 1990) . Let A be a 2 k × 2 n Boolean matrix and B be a 2 m × 2 k Boolean matrix that are represented by Boolean functions f A (x 1 , . . . , x n , y 1 , . . . , y k ) and f B (y 1 , . . . , y k , z 1 , . . . , z m ) respectively. Let
Then f C is the Boolean function representation for the 2 m ×2 n Boolean matrix C = B ·A where '·' is the matrix multiplication in the Galois field IF 2 . ✷ Definition 5 (ON-set and OFF-set). Given a Boolean function f ∈ B n the sets
are called ON-set and OFF-set of f . It can easily be seen that on(f ) ∩ off(f ) = ∅ and on(f ) ∪ off(f ) = IB n .
Definition 6 (Characteristic function). Given a function f = (f 1 , . . . , f m ) ∈ B n,m its characteristic function χ f ∈ B n+m is defined as
for each x ∈ IB n and each y ∈ IB m .
The characteristic function allows to represent any multiple-output function as a singleoutput function. It can be computed from a multiple-output function by adding to the variables {x 1 , . . . , x n } the additional output variables {y 1 , . . . , y m }:
In the remainder of this paper, we denote the characteristic function χ f of a function f by a capital letter, i.e. F .
Example 3. The function f A (x 1 , x 2 , y 1 , y 2 ) in (5) is the characteristic function of f (x 1 , x 2 ) = (y 1 , y 2 ) in (3).
Binary Decision Diagrams
Binary decision diagrams (BDD) are an established data structure for representing Boolean functions. While the general concepts are briefly outlined in this section, the reader is referred to the literature for a comprehensive overview (Bryant, 1986; Knuth, 2011) . Let x = x 1 , . . . , x n be the variables of a Boolean function f ∈ B n . A BDD representing the function f is a directed acyclic graph with non-terminal vertices N and terminal vertices T ⊆ { ⊥ , ⊤ } where N ∩ T = ∅ and T = ∅. Each non-terminal vertex v ∈ N is labeled by a variable from x and has exactly two children, low v and high v. The directed edges to these children are called low-edge and high-edge and are drawn dashed and solid, respectively. A non-terminal vertex v labeled x i represents a function denoted σ(v) given by the Shannon decomposition (Shannon, 1938) 
where σ(low v) and σ(high v) are the functions represented by the children of v with σ( ⊥ ) = 0 and σ( ⊤ ) = 1. The BDD has a single start vertex s with σ(s) = f . A BDD is ordered if the variables of the vertices on every path from the start vertex to a terminal vertex adhere to a specific ordering. Not all of the variables need to appear on a particular path, but a variable can appear at most once on any path. A BDD is reduced if there are no two non-terminal vertices representing the same function, hence the representation of common subfunctions is shared. In the following only reduced, ordered BDDs are considered and for briefness referred to as BDDs.
Multiple-output functions can be represented by a single BDD that has more than one start vertex. Common subfunctions that can be shared among the functions decrease the overall size of the BDD. In fact, many practical Boolean functions can efficiently be represented using BDDs, and efficient manipulations and evaluations are possible.
Example 4. Figure 3 (a) illustrates the Shannon decomposition from (12). A binary decision diagram for the function in Fig. 2 is given in Fig. 3(b) .
Reversible Boolean Functions
Definition 7 (Reversible function). A function f ∈ B n,m is called reversible if f is bijective, otherwise it is called irreversible. Clearly, if f is reversible, then n = m.
A reversible function f ∈ B n,n can also be represented by a permutation of {0, 1, . . . , Fig. 2 
Fig. 3. Binary decision diagrams
where nat : IB n → {0, 1, . . . , 2 n − 1} maps a bit-vector to its natural number representation. Further, f can be represented as a 2 × 2 Boolean permutation matrix Π f where
The following lemma relates the characteristic function to the permutation matrix of a reversible function.
Lemma 2. Let f ∈ B n,n be a reversible function, Π f the permutation matrix of f , and F the characteristic function of f . Then Π f is the matrix representation of F according to Definition 2. We have | on(F )| = 2 n .
Reversible Circuits
Reversible functions can be realized by reversible circuits that consist of at least n lines and are constructed as cascades of reversible gates that belong to a certain gate library. Single-target gates have already been defined in the introduction. The most common gate library consists of Toffoli gates.
Definition 8 (Toffoli gate). Mixed-polarity multiple-control Toffoli (MPMCT) gates are a subset of the single-target gates in which the control function c can be represented with one product term or c = 1. We refer to MPMCT gates as Toffoli gates in the following. The literals in the control function are also referred to as controls or control lines.
In (Shende et al., 2003) , it has been shown that any reversible function f ∈ B n,n can be realized by a reversible circuit with n lines when using Toffoli gates. That is, it is not necessary to add any temporary lines (ancilla) to realize the circuit. Note that each single-target gate can be expressed in terms of a cascade of Toffoli gates, which can be obtained from an ESOP expression (Sasao, 1993) , respectively. For drawing circuits, we follow the established conventions of using the symbol ⊕ to denote the target line, solid black circles to indicate positive control lines and white circles to indicate negative control lines. The annotated values demonstrate the computation of the gate for a given input assignment.
(b) Toffoli gate with negative literal 
Truth table based Decomposition
In (De Vos and Van Rentergem, 2008) an algorithm was presented that determines control functions l and r as in (1) to decompose a reversible function f : IB n → IB n that is represented as a truth table. When applying the algorithm for each circuit line one can obtain a reversible circuit composed of single-target gates that realizes f . The underlying theory for the algorithm is based on Young subgroups, however, for the scope of this paper it is sufficient to explain the algorithm by means of an example. Consider the truth table in Fig. 5 (a) that is described by the variables x 1 , x 2 , x 3 and y 1 , y 2 , y 3 . First, the aim is to determine two control functions l and r for single-target gates that act on the first circuit line which maps x 1 to y 1 . All values of the variables x 2 , x 3 and y 2 , y 3 are copied into a new truth table that will eventually represent f ′ and is illustrated by the two inner blocks in Fig (1) The functions defined by the truth table described by the first two blocks and by the last two blocks must be reversible, i.e. each pattern must occur exactly once. (2) The function f ′ , defined by the inner two blocks, must ensure that x ′ 1 = y ′ 1 . The reversibility of f ′ follows from both constraints and since both x 2 , x 3 and y 2 , y 3 were copied. It also follows that the functions described by the first and last two blocks can be described by single-target gates.
The values for x ′ 1 and y ′ 1 can be filled using the following procedure. We start by inserting a value at the first empty cell of x ′ 1 , in our example we choose the value 1 indicated by a rectangle in Fig. 5 (c). To satisfy the first constraint, next a 0 is inserted where the pattern repeats for x 2 and x 3 . In order to satisfy the second constraint, the 0 has to be assigned to y ′ 1 in the same row. Consequently, a 1 has to be filled for y ′ 1 where the pattern repeats. This procedure is repeated until a cell is met that has already been filled. If there are still empty cells, the overall procedure is repeated from the beginning. All numbers have been filled in Fig. 5(d) .
Remark 1. The first entry position in the procedure above can freely be chosen. It can easily be seen, that the corresponding cycle will not change, it is just entered at a different position. Also the value of the first entry can be changed without violating the constraints, if the other values are adjusted accordingly. Hence, one obtains up to 2 k different functions for l and r where k is the number of cycles. 
(f) Equalizing x 2 and y 2
Fig. 5. Truth table based decomposition
After all values for x ′ 1 and y ′ 1 have been assigned, the control functions can be determined as illustrated by means of Fig. 5(e) . The first control function l can be determined from the first two blocks by inspecting the assignments for x 2 and x 3 where x 1 and x ′ 1 are different. These are 00 and 01, hence l =x 2x3 ∨x 2 x 3 =x 2 . This can be done analogously for r and one obtains r =ȳ 2 .
This process is called variable equalization in the following. The two blocks in the middle of Fig. 5 (e) are a truth table representation of f ′ which can now be used to obtain two further single-target gates by equalizing the variables x 2 and y 2 . This process is illustrated in Fig. 5(f) where the starting points for filling numbers are again indicated
x 2 x 3 x 2 x 3 x 2 x 3 00 01 11 00 01 10 10 11
Outputs y 2 y 3 y 2 y 3 y 2 y 3 y 2 y 3 00 11 00 01 01 11 10 10 by rectangles. The control functions for the single-target gates to equalize x 2 and y 2 arex ′ 1x 3 andȳ ′ 1 , respectively. Note that the last single-target gate to equalize x 3 and y 3 can now directly be read from the truth table described by the middle two blocks in When putting all single-target gates together, one obtains a circuit that is depicted in Fig. 5(g ). Note that in this particular case, all single-target gates are also Toffoli gates.
General Idea
This section illustrates how the decomposition procedure can be described symbolically with Boolean function operations that can efficiently be implemented using BDDs. The problem with the current synthesis approach is the underlying truth table representation which grows exponentially with respect to the number of variables. As a result, the processing time of the algorithm depends essentially only on the size of the function and therefore impedes an efficient processing of functions with many variables.
Aiming for reducing the complexity of the algorithm, we propose to implement the algorithm based on BDDs, which provide a compact representation for many reversible functions of practical interest. Using BDDs allows for reducing the required memory as a first consequence. Moreover, our proposed algorithm for decomposing a function according to (1) is constructed in a manner such that it does not necessarily traverse the whole truth table; therefore reducing the run-time.
In order to better illustrate our algorithm we represent the reversible function f by its co-factor table. This table represents f by means of the co-factors of the characteristic function F of f . For each variable x i of the function there are four co-factors
yi where x i = 0 maps to y i = 1 in f , and • F pi = (F xi ) yi where x i = 1 maps to y i = 1 in f .
Remark 2. Note that this co-factor table representation is used only for illustrative purposes in this paper. In the implementation for the algorithm the reversible function is stored using BDDs and therefore in a more compact way.
Example 6. The co-factor table of the function in Fig. 5(a) is depicted in Fig. 6 and shows all four co-factors with respect to the decomposition of x 1 to y 1 and separates each of the eight function entries by its input and output assignments. Hence, the order of the entries in the co-factor table matters since input and output assignments are linked according to their position. As an example, the entry highlighted using a rectangle corresponds to the mapping 101 → 011 and the entry highlighted using a rounded rectangle corresponds to the mapping 110 → 111.
We can now describe the principle of the algorithm based on the co-factor table representation. The aim of the algorithm is to find the two control functions l and r that equalize the variables x 1 and y 1 . As a result, after applying single-target gates controlled by these functions, no pattern remains in the middle part of the co-factor table, i.e. in columns F p ′ 1 and F n ′
1
. In order to understand the moving of the entries with respect to the control functions, consider the following scenario. If l(0, 1) = 1 and r(1, 1) = 0 (entry highlighted by an rectangle in Fig. 6 ), the entry moves to the column F n1 , because the input pattern is inverted at x 1 , therefore changing the pattern to 001 → 011. Figure 7 shows all possible combinations of the evaluations of l and r and their effect on the entries in
, and F pi referred to as n, p ′ , n ′ , and p, respectively.
Example 7. Hence, when moving the highlighted entry as described in the previous example, i.e. l(0, 1) = 1, also the second pattern in column F n1 , i.e. 001 → 001 will move to column F p ′ 1 since r(0, 1) = 0 and each input and output subpattern occurs twice in f .
In order to move all entries to the outer columns, the following two conditions must hold:
(1) For the outer entries either none or both control functions must evaluate to true, hence the entry remains at the same position or is transferred to the other side. (2) For the inner entries exactly one of the control functions must evaluate to true.
It is not obvious how to determine l and r immediately from the given co-factors. In the following, we present a method which moves one entry from the middle blocks to the outside. Repeating this process until eventually all entries from the middle blocks have been moved to the outside will guarantee an equalization of the considered variable.
Example 8. We explain the method based on the co-factor table as previously presented. As pattern from the middle blocks we pick 101 → 011, which is located in column (1) 101 → 011 To determine l(x 2 , x 3 ) and r(y 2 , y 3 ), we first set l ←x 2 x 3 according to the assignment to x 2 and x 3 and r ← ⊥. This will imply the following steps:
• Because l(01) = 1, the highlighted pattern in the column F p ′ 1 is moved to F n1 .
• The original mapping 101 → 011 changes to 001 → 011, because the input at x 1 is inverted. However, at the same time also the pattern 001 → 001, currently in column F n1 , would change to 101 → 001 and be passed to column F p ′ . We call this pattern the implied pattern.
• In order to avoid this, one must additionally set r ←ȳ 2 y 3 thereby passing the implied pattern over to F p1 instead, since now both l and r evaluate to true for this pattern. The assignment of r will in turn affect another pattern, hence this process is repeated until the implied pattern is in F n ′
. Figure 8 illustrates the process for this example. The initial pattern is given in the first row and all implied patterns are given in the rows below. Input patterns are shown on the left and output patterns are shown on the right. The implication of a pattern is indicated by '⇓'. The last column shows a Boolean variable d which indicates whether function l or r is updated. It is used in the algorithm, which is described in the Sect. 6. Remark 3. In this example only one cycle had to be resolved and therefore the control functions l and r could directly be determined. If more than one cycle needs to be resolved, the illustrated process is repeated and all obtained functions are composed using the '⊕' operation for the overall control function. This procedure is described in Sect. 6 in more detail.
Characteristic Representation of Reversible Functions
The algorithm that is proposed in the following section heavily relies on the characteristic function representation F ∈ B 2n of the given reversible function f ∈ B n,n for which a circuit should be determined. The BDD representation of F allows for efficient function manipulation and evaluation, which is described in more detail in this section.
Example 9. Figure 9 illustrates how the co-factors of a characteristic function can be obtained from its BDD representation. Further, a BDD representing the characteristic 
is depicted in Fig. 9 (b).
As pointed out in Lemma 2 the characteristic function of a reversible function corresponds to its Boolean matrix representation. Since gate application of reversible functions corresponds to matrix multiplication in the permutation matrix representation, Lemma 1 tells us that this operation can be carried out efficiently using BDD manipulation.
Since a reversible function is a 1-to-1 mapping of inputs to outputs, each path to ⊤ in the BDD for F visits all variables. Hence, picking one such path c represents one input/output mapping of f and has a valuation for each input variable and output variable. In order to obtain the input pattern from c one can remove all outputs using the smoothing operator. This can be done analogously to get the output pattern. In summary we have x = ∃ y c and y = ∃ x c if, and only if
Example 10. If f ∈ B 3,3 is a reversible function with f (1, 0, 1) = (1, 1, 0) then the pattern c = x 1x2 x 3 y 1 y 2ȳ3 is contained in F . We have ∃ y c = x 1x2 x 3 and ∃ x c = y 1 y 2ȳ3 .
Besides co-factors this operation will be our most powerful tool in order to describe the synthesis algorithm. The smoothing operator can also be used on F to check whether f is reversible. We have: f is reversible, if and only if | on(F )| = 2 n and ∃ y F = ⊤ and ∃ x F = ⊤,
i.e. F has 2 n paths to ⊤ , and contains all input and output patterns.
The Algorithm
Based on the procedure exemplarily illustrated in the previous section, we first explain the basic algorithm to decompose a function as in (1) based on a symbolic function representation. Afterwards, different techniques for optimization are illustrated in this and successive sections.
Algorithm D (Symbolic Decomposition). Given a reversible function f : IB n → IB n that is symbolically represented by its characteristic function F ( x, y) = F (x 1 , . . . , x n , y 1 , . . . , y n ) using BDDs and a variable x i ∈ {x 1 , . . . , x n }, this algorithm finds two control functions l and r from which a reversible function
there is no cube in which x i differs from y i , terminate. Otherwise, set l ′ ← ⊥, r ′ ← ⊥.
D3.
[Pick an inner pattern.] Pick one pattern c from F ( x, y) ∧ x i ∧ȳ i , i.e. c represents a mapping in which x i = 1 is mapped to y i = 0. Also, set the direction bit d ← 0.
e. the newly derived cube c represents a mapping in which x i equals y i , return to step 4. Otherwise, continue.
D7.
[Update gate functions.] Update l and r using l ′ and r ′ and recompute f . Return to step 2.
The algorithm creates the control functions l and r in an iterative manner. Being initially assigned ⊥ in step 1, in each iteration of the outer loop (steps 2 to 7), the algorithm resolves one cycle as described in Fig. 8 and stores the resulting functions in l ′ and r ′ . To resolve a cycle first one input/output pattern c is picked in which x i = 1 and y i = 0 (step 3). Based on this pattern successive patterns are implied as described in the previous section and l ′ and r ′ are updated accordingly (steps 4 to 6). This inner loop is repeated as long as the implied pattern is neither contained in
Eventually, after a cycle has been resolved and l ′ and r ′ are determined, the control functions l and r are updated and f is recomputed. For this purpose, we first set
This changes r ′ such that all variables y j are substituted by corresponding x j . Afterwards, we set l ← l ⊕ l ′ , r ← r ⊕ r ′ , and
Note that throughout the whole procedure of Algorithm D the reversible function will only be represented as its characteristic function using BDDs and all operations will be carried out on this symbolic representation. This includes (17) for which also the singletarget functions are represented by its characteristic functions. Function composition then corresponds to Boolean matrix multiplication which can efficiently be implemented using the smoothing operator (Knuth, 2011) . Theorem 1. Algorithm D is sound and complete.
Proof. The inner loop will always terminate as illustrated in Fig. 8 . Also, by resolving a cycle Algorithm D will always move at least one pattern from the inner co-factors F p ′ i and F n ′ i to the outer co-factors F ni and F pi . Hence, the termination condition for the outer loop in step 2 will eventually hold. The algorithm is sound since the input function is updated using the same extracted gates that are returned. ✷
Increasing Efficiency
This section describes two strategies to increase the efficiency of Algorithm D by considering certain special cases explicitly.
Only one single-target gate required
It turns out that in many non-random functions only one single-target gate is sufficient in order to equalize a variable, i.e either l = ⊥ or r = ⊥. Also, this case can easily be checked and if it applies, the control functions can efficiently be computed.
We consider the case in which r may be ⊥ and hence only one single-target gate needs to be added for the control function l. The check first removes the variable x i from the characteristic function and then reassigns x i such that it equals y i . The result is assigned to F ′ :
′ is reversible, only one single-target gate is required. Reversibility of F ′ can be checked by the condition
i.e. F ′ still consists of all possible input patterns. The operation in (19) can efficiently be performed using BDDs.
Example 11. A negative and positive example of this optimization step is illustrated in Fig. 10 . In both cases the variable x 1 is updated to equal the value of y 1 . Figure 10(a) shows the truth table from Fig. 5(a) and it can be seen that the resulting truth table represents an irreversible function since the input patterns 001 and 111 occur twice. For the function in Fig. 5(b) one single-target gate is sufficient to equalize x 1 and y 1 since the resulting truth table represents a reversible function.
Once F ′ has been determined one can determine the control function l by inspecting the input patterns which differ from the original function F . For this purpose one needs to remove all output variables and x i from their intersection, i.e. Example 12. For the function in Fig. 10 the application of (20) yields
In order to consider this special case, we added a preprocessing step to Algorithm D before step 2:
The same can be done for the case that l = ⊥ analogously:
and terminate.
Cycles of length 1 and 2
To resolve a cycle in Algorithm D one computes implied patterns iteratively in each step. As a result, cycles need to be resolved one after the other, which becomes time consuming if many cycles need to be resolved. Alternatively, one can also express cycles of a fixed length in one Boolean formula and resolve them all at once. However, the size of the formula grows with respect to the length of the cycles and is therefore only applicable to small lengths. In our implementation we considered cycles of length 1 and 2.
Example 13. The columns F p ′ 1 and F n ′ 1 of the co-factor table in Fig. 11 both contain the entries 01 and 10. A single-target gate controlled by l ′ = x 2 ⊕ x 3 can resolve both of them at once without the computationally more expensive iteration steps in Algorithm D.
In general such control functions for cycles of length 1 are expressed by
i.e. the intersection of the co-factors is computed after dropping the outputs or inputs, respectively. It is important not to compute these two control functions at the same time as they may depend on each other. Furthermore, in our experimental evaluation we discovered that the best results are obtained when f gets recomputed after each
11 01 01 11 10 10 11 00
Outputs y 2 y 3 y 2 y 3 y 2 y 3 y 2 y 3 00 11 00 01 10 11 01 10
(a) Length 1
Inputs
x 2 x 3 x 2 x 3 x 2 x 3 x 2 x 3 00 01 11 00 01 10 10 11
Outputs y 2 y 3 y 2 y 3 y 2 y 3 y 2 y 3 00 11 10 00 10 11 01 01 control function has been determined. Cycles of length 1 are therefore handled using the following additional step:
. Update r using r ′ and recompute f .
The 'updates' correspond to the same procedure as referred to in step 7 in Algorithm D. This idea can be extended to cycles of length 2, as illustrated by means of the following example.
Example 14. Consider the co-factor table in Fig. 11(b) . Starting with the pattern 101
, Algorithm D will first imply the pattern 001 → 010 (column F n1 ) and afterwards pattern 011 → 110 (column F n ′ 1 ). Then, the cycle is resolved. In these three patterns the input part 01 of the middle pattern 001 → 010 matches the input part of the first pattern that is contained in column F p ′
1
. At the same time, the output part 10 matches the output part of the last pattern that is contained in column F n ′ 1 and therefore l ′ =x 2 x 3 and r ′ = y 2ȳ3 .
In general, cycles of length 2 are obtained by
Cycles of length 2 are therefore handled using the following additional step:
. Set l ′ ← ∃ y g and r ′ ← ∃ x g. Update l and r using l ′ and r ′ and recompute
. Set l ′ ← ∃ y g and r ′ ← ∃ x g. Update l and r using l ′ and r ′ and recompute f .
Variable Ordering
In order to obtain a circuit from a given reversible function f : IB n → IB n , one needs to apply Algorithm D for each of the n variables. As a result, Algorithm D is called n times. However, the order in which variables are equalized by Algorithm D does not necessarily need to be the natural variable ordering x 1 , x 2 , . . . , x n , which is e.g. being used in (De Vos and Van Rentergem, 2008) . Instead, any of the possible n! different variable orderings can be used and each ordering may lead to a different circuit realization with respect to the number of gates. Since the number of different variable orderings is large, finding the smallest circuit cannot be done efficiently. In this section we are motivating two heuristics that may lead to cheaper circuits.
Greedy Heuristic
The first heuristic follows a greedy approach. In each step the two single-target gates for all variables that have not yet been processed are computed. The single-target gatepair is chosen that leads to the cheapest realization after expansion to Toffoli gates. Using this heuristic the algorithm is executed n(n + 1)/2 times for a function with n variables. Consequently, using this heuristic will probably increase the overall run-time.
Hamming Distance Heuristic
Algorithm D computes control functions l and r by equalizing the values for the variables x i and y i . Consequently, it seems plausible to use variables x i and y i that already agree on many of the input/output mappings. This heuristic first counts this number for each variable and then chooses the one with the maximum number of such patterns.
Note that the patterns do not have to be counted explicitly. The number of such patterns is equal to (23) is large. The assumption is that in this case less cycles need to be resolved. However, the experimental results will not confirm this assumption.
Partial Functions
A partial reversible function on n variables is a function that does not contain all input/output mappings. The original algorithm presented in (De Vos and Van Rentergem, 2008 ) cannot be applied to such functions in its original form since then cycles may not be complete. Algorithm D can however be extended in order to support partial functions by extending the function on demand.
Example 15. A partial reversible function is given in terms of its truth table representation in Fig. 12 . Its representation as characteristic function is F (x 1 , x 2 , x 3 , y 1 , y 2 , y 3 ) =x 1x2x3ȳ1 y 2ȳ3 ∨x 1 x 2x3 y 1ȳ2ȳ3 ∨x 1 x 2 x 3ȳ1 y 2 y 3 ∨ x 1x2x3ȳ1ȳ2ȳ3 ∨ x 1x2 x 3 y 1 y 2ȳ3 .
Inputs
x 2 x 3 x 2 x 3 x 2 x 3 x 2 x 3 00 00 10 01 11
Outputs y 2 y 3 y 2 y 3 y 2 y 3 y 2 y 3 10 00 00 10 11 (a) Co-factor table When dealing with partial functions, Algorithm D may run into the problem that the implied pattern c may be assigned ⊥ after step 4 when a corresponding pattern is not defined by the original function.
Example 16. Figure 13 demonstrates this problem. In Fig. 13 (a) the co-factor table of the function in Fig. 12 is shown to aid the comprehension of the resolving steps in Fig. 13(b) . As can be seen, two further patterns are applied from the initial one that have been picked from the co-factor F p ′
1
. The iteration step is not completed after the third pattern, however, there is no pattern available to continue that has the input 001.
The idea is to extend the function in such situations. All not specified patterns can be chosen arbitrarily as long as reversibility of f is ensured. For illustration purposes, let us consider the case in which after step 4 we have c = ⊥ and d = 0. That is, there is no pattern specified in f for the input patternx i c ′ . It is sufficient to set c ←x i c ′ c ′′ where c ′′ is a cube randomly picked from ∃ xF , i.e. all output patterns that are not specified in f . Afterwards, the new pattern is added to f by setting f ← f ∨ c. Consequently, it is not contained in ∃ xF any longer.
Example 17. For Example 16 we have x i = x 1 , c ′ =x 2 x 3 , and ∃ xF =ȳ 1ȳ2 y 3 ∨ y 1ȳ2 y 3 ∨ y 1 y 2 y 3 .
In order to further increase the efficiency, our implementation first tries to find an output pattern c ′′ that is in y i ∧ ∃ xF , i.e. patterns in which y i is assigned 1. In this case, one can ensure that the condition in step 6 fails and the cycle has been resolved immediately.
Remark 4. Note that of course both algorithms can deal with partial functions if they are made total prior to synthesis by randomly filling all non-specified mappings. However, that technique may not always be efficient. In contrast, our proposed technique does not necessarily add all non-specified mappings. from (Soeken et al., 2012b) . The truth table based approach has been re-implemented in the RevKit program 'young subgroup synthesis'. The considered functions are taken from the LGSynth'93 benchmarks (www.cbl.ncsu.edu:16080/benchmarks/lgsynth93/) and from www.cs.uvic.ca/~dmaslov/. These functions are mainly irreversible and provided in terms of their sum-of-product representation saved as PLA files. We have used the embedding algorithm proposed in (Soeken et al., 2014b) to embed them as a reversible function that is represented by means of the binary decision diagram of the characteristic function. The time required for the embedding is not accounted for in the reported runtimes but can be obtained from (Soeken et al., 2014b) . Since the embedding algorithm produces a partial function we first applied our proposed approach as this is the only approach out of the three ones that supports partial functions. Based on the resulting circuit we created a truth table representation using the RevKit program 'circuit to truth table' which then was used as input to the truth table based algorithm. For the QMDD based algorithm we used the circuit as input to construct the QMDD from which a different circuit is created. Since the truth table based approach and the proposed approach create single-target gates we used exorcism (Mishchenko and Perkowski, 2001 ) to translate them into mixed-polarity multiple-control Toffoli gates. The time required for this translation is included in the overall reported run-times. The QMDD based synthesis approach directly creates MPMCT gates. Table 1 presents the results. The first block of columns denotes the name of the benchmark together with the number of circuit lines (n). For each algorithm a block lists the number of Toffoli gates (d), the quantum costs in terms of T gates in a Clifford+T mapping according to (Amy et al., 2013) (q) , and the run-time (in seconds) required for the whole synthesis (t).
It can be seen that our approach scales well and the performance it is comparable to the QMDD based synthesis approach (fastest run-times are set in bold face). The truth table based approach was never the fastest one and moreover, if the function has more than 15 variables, the truth table based algorithm was not able to finish before the timeout. It can be seen that the run-time correlates much more to the number of variables for the truth table based approach compared to the proposed approach.
In addition to the better run-times, the proposed approach also leads to better results with respect to the number of Toffoli gates and quantum costs. For the latter, in the best case, we get an improvement of over 83% compared to the original truth table based approach and over 91% compared to the QMDD based approach.
Remark 5. Due to Remark 1 there must exist a choice of assignments such that the truth table based approach leads to the same results in terms of Toffoli costs as the proposed approach. Hence, it seems that due to the symbolic decomposition described in Algorithm D a better choice for the assignments is implicitly taken.
Remark 6. Most of the benchmarks used in Table 1 originally describe irreversible functions and were embedded as reversible functions. We have used a different embedding approach (Soeken et al., 2014b) . This explains that in some cases a different number of lines and hence also to a different number of gates is obtained compared to the values given in (Soeken et al., 2012b) .
Evaluating Variable Ordering Heuristics
We have evaluated the heuristics for variable orderings as discussed in Sect. 8 for the same benchmarks as in the previous section and listed the results in Table 2 . Both the Greedy approach and the approach based on the hamming distance can achieve significant improvement in quantum costs. The maximum improvement is over 92% (co14 ) for the Greedy approach and over 80% (dk27 ) for the Hamming approach.
The quantum cost is not always improved. However, in case of the Greedy approach the difference to the better solution is not too large. In fact, in these cases the gate count has actually decreased (which is also the cost criteria in the implementation of the heuristics). One can overcome this problem by changing the cost criteria in the implementation which leads to a higher run-time of the algorithm. In case of the approach based on the hamming distance, the quantum cost can also increase significantly (e.g. C7752 ).
The run-time for the Greedy approach is much higher whereas the run-time for the Hamming approach is comparable to the original approach and usually correlates with the achieved improvement. If a higher improvement can be achieved, usually less cycles need to be resolved and hence the run-time decreases.
x (i+2) mod n+1 if i is even.
(invert or rotate) 
Evaluating Scalability
In order to further evaluate the scalability of the proposed approach we have created BDDs of characteristic functions that represent reversible functions directly in memory. Figure 14 lists the functions that have been used for this experiment and are parameterized by the number of lines.
Note that for invert or rotate and bitwise-xor the number of lines n needs to be even. We have implemented the experiment as the RevKit test-case 'rcbdd scalability'. The results are given by means of plots in Fig. 15 . The values of the x-axis and y-axis denote the number of lines and run-time in seconds, respectively. As can be seen functions with a large number of variables can be synthesized with the proposed approach. The run-time increases rapidly when the problem instances get larger and the effect is more noticeable when the function is complex.
Conclusions
In this paper we presented an algorithm for ancilla-free synthesis of large reversible functions. The reversible function is represented by its characteristic function using binary decision diagrams. This enables an efficient symbolic function manipulation. We formalized a decomposition technique that has formerly been used in a truth table based synthesis approach by means of co-factors on the characteristic function. This enabled a synthesis approach that is applicable to significantly larger functions. Additionally, we proposed heuristics to reduce gate cost and provide extensions to apply the algorithm to partial functions.
An experimental evaluation demonstrates the applicability of the proposed approach to large functions and also shows that the realized circuits lead to smaller circuits compared to state-the-art synthesis approaches.
Most run-time is spent on resolving cycles. We are currently investigating how this process can be done more efficiently. One way to overcome the efficiency problem is to synthesize transpositions directly. This would however lead to circuits that are not structured in a V-shape and as a result a linear number of single-target gates for their representation is no longer guaranteed. An increase in gate count and circuit cost is also 
