Ancilla-free synthesis of large reversible functions using binary
  decision diagrams by Soeken, Mathias et al.
ar
X
iv
:1
40
8.
39
55
v1
  [
cs
.E
T]
  1
8 A
ug
 20
14
Ancilla-free synthesis of large reversible
functions using binary decision diagrams
Mathias Soeken a,b Laura Tague a Gerhard W. Dueck c
Rolf Drechsler a,b
aDepartment of Mathematics and Computer Science, University of Bremen, Bremen, Germany
bCyber-Physical Systems, DFKI GmbH, Bremen, Germany
cFaculty of Computer Science, University of New Brunswick, Fredericton, Canada
Abstract
The synthesis of reversible functions has been an intensively studied research area in the last
decade. Since almost all proposed approaches rely on representations of exponential size (such
as truth tables and permutations), they cannot be applied efficiently to reversible functions with
more than 15 variables.
In this paper, we propose an ancilla-free synthesis approach based on Young subgroups us-
ing symbolic function representations that can efficiently be implemented with binary decision
diagrams (BDDs). As a result, the algorithm not only allows to synthesize large reversible func-
tions without adding extra lines, called ancilla, but also leads to significantly smaller circuits
compared to existing approaches.
Key words: Reversible functions, binary decision diagrams, synthesis
1. Introduction and Background
Given a bijective function f : IBn → IBn, also called a reversible function, synthesis de-
scribes the problem of determining a circuit composed of reversible gates that realizes f . If
this circuit consists of exactly n signal lines, the synthesis is called ancilla-free. In the last
decade, ancilla-free synthesis approaches have been presented that start from a reversible
function represented e.g. as truth tables (Miller et al., 2003), permutations (Shende et al.,
2003), or Reed-Muller spectras (Maslov et al., 2007). Since all these representations are
always of exponential size with respect to n, the respective algorithms do not scale well
and are thus not efficiently applicable to large reversible functions. Reversible functions
and circuits play an important role in quantum computing (Saeedi and Markov, 2013)
and low-power computing (Landauer, 1961; De Vos, 2010; Be´rut et al., 2012).
One of these truth table based algorithms was presented in (De Vos and Van Rentergem,
2008) and uses single-target gates as gate library. A single-target gate T[c, i] : IBn →
Preprint submitted to Journal of Symbolic Computation 12 August 2018
x1 y1 = x1
xi−1 yi−1 = xi−1
xi yi = xi ⊕ c(x1, . . . , xi−1, xi+1, . . . , xn)
xi+1 yi+1 = xi+1
xn yn = xn
c
Fig. 1. Single-target gate
IBn with T[c, i](x1, . . . , xn) = (y1, . . . , yn) updates the value of the input variable xi
with respect to a Boolean control function c : IBn−1 → IB that is defined on all
other variables. That is, the target gate computes a new value at the target output
yi = xi ⊕ c(x1, . . . , xi−1, xi+1, . . . , xn) and leaves all other output variables unaltered,
i.e. yj = xj for j 6= i. Figure 1 shows the diagrammatic representation of a single-target
gate based on Feynman’s notation (1985).
The algorithm described in (De Vos and Van Rentergem, 2008) makes use of the prop-
erty that each reversible function f : IBn → IBn can be decomposed into three IBn → IBn
reversible functions
f = T[l, i] ◦ f ′ ◦ T[r, i] (1)
such that f ′ does not change in xi. Based on the truth table representation of f , the
algorithm determines two control functions l and r from which f ′ = T [l, i] ◦ f ◦ T [r, i]
can be determined, since single-target gates are self-inverse. Applying the decomposition
in (1) for each variable results in 2n − 1 single-target gates 1 which composed as a
circuit realize f . The algorithm always traverses the whole truth table and is therefore
exponential with respect to the number of variables. Consequently, it cannot efficiently
be applied to large functions.
The contributions described in this paper are as follows:
• We propose an algorithm based on a symbolic function representation to determine the
control functions l and r from (1). The algorithm makes use of Boolean operations that
can efficiently be implemented using binary decision diagrams (BDDs). Since BDDs
allow for a more compact representation of many practical functions, the efficiency of
the proposed algorithm is increased and can therefore be applied to larger functions.
• In (De Vos and Van Rentergem, 2008) only one variable ordering for the sequential
application of (1) has been considered, i.e. the natural ordering x1, x2, . . . , xn. We
investigate different heuristics to find variable orderings that allow for more compact
synthesis results.
• The original algorithm is not applicable to partial functions, i.e. functions for which
not all input/output mappings are specified. Our algorithm can be adjusted in order
to support partial functions.
• Finally, we provide open source implementations for both the original truth table based
approach and our proposed approach.
The majority of the proposed synthesis algorithms use Toffoli gates (Toffoli, 1980) as
underlying gate library. Our proposed method uses single-target gates which can be
transformed into a cascade of Toffoli gates. However, since the lower bound for re-
quired Toffoli gates in a reversible circuit is exponential with respect to the number of
1 For the last variable only one single-target gate is required. The middle part of (1) degenerates to the
identity.
lines (Maslov et al., 2005; Soeken et al., 2014a), the use of Toffoli gates in large reversible
circuits may not be convenient. In contrast, the upper bound for required single-target
gates in a reversible circuit is linear as described above. The algorithms presented in this
paper do not necessarily consider a specific target technology. Hence, circuits obtained
from our synthesis method are a reasonable intermediate gate-level representation for
large reversible functions (Abdessaied et al., 2014). It remains for future work to show
how appropriate technology mappings can be derived from single-target gates. In the
experimental evaluation we are making use of straight-forward mapping techniques.
1.1. Related work
In (Soeken et al., 2012b) another algorithm for synthesizing large reversible functions
without adding ancilla lines has been proposed that is based on quantum multiple-valued
decision diagrams (Miller and Thornton, 2006). Compared to the approach presented in
the present paper, this algorithm uses a different method to determine Toffoli gates. Our
experimental evaluations show that with our technique smaller circuits with respect to
the number of Toffoli gates and quantum cost can be found. The algorithm proposed
in (Wille and Drechsler, 2009) also uses BDDs to find reversible circuits. However, the
algorithm uses irreversible functions as input and embeds them into reversible functions
implicitly using a hierarchical approach. The approach produces an enormous number of
additional helper lines which are still far beyond from the theoretical upper bound. In
contrast, the algorithms presented in this paper make use of a scalable exact embedding
algorithm (Soeken et al., 2014b) and hence guarantee synthesis without adding ancilla
lines.
In (Saeedi et al., 2010) a cycle-based approach to synthesize reversible functions is
proposed. In this approach a decomposition algorithm is first used to extract building
blocks from the input specification of the function. The input specification is represented
as a permutation and is decomposed into smaller permutations, so called k-cycles, until
building blocks can be used for synthesis. However, this method needs all input assign-
ments to derive the required k-cycles and as a consequence it has the same limitations as
truth table based approaches and cannot be applied to large functions. In our approach
the function is decomposed symbolically using the co-factor representation of the function
obtained from a BDD, which allows us to find cycles without necessarily traversing the
whole input space. A similar approach has been presented in (Sasanian et al., 2009) but
shares the same limitations as it is also based on an exponential function representation.
1.2. Outline
The remainder of the paper is organized as follows. Preliminaries are given in the
next section and Sect. 3 reviews the truth table based decomposition technique from
(De Vos and Van Rentergem, 2008). Section 4 illustrates the general idea of the pro-
posed algorithm while Sect. 5 shows special BDD operations that are commonly used in
the description of the algorithm in Sect. 6. Optimization techniques targeting efficiency
and the number of gates are presented in Sects. 7 and 8, respectively. The algorithm
is extended for partial functions in Sect. 9. Experimental evaluations are presented in
Sect. 10 before the paper is concluded in Sect. 11.
3
x1 x2 y1 y2
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
Fig. 2. Truth table for the function f(x1, x2) = (x1 ⊕ x2, x1 ∧ x2)
2. Preliminaries
In this section, we introduce notation for Boolean functions, binary decision diagrams,
reversible functions, and reversible circuits.
2.1. Boolean Functions
Definition 1 (Boolean function). Let IB
def
= {0, 1} denote the Boolean values. Then we
refer to
Bn,m
def
= {f | f : IBn → IBm} (2)
as the set of all Boolean multiple-output functions with n inputs and m outputs, where
m,n ≥ 1.
We write Bn
def
= Bn,1 for each n ≥ 1 and assume that each f ∈ Bn is represented by a
propositional formula over the input variables x1, . . . , xn. Furthermore, we assume that
each function f ∈ Bn,m is represented as a tuple f = (f1, . . . , fm) where fi ∈ Bn for
each i ∈ {1, . . . ,m} and hence f(~x) = (f1(~x), . . . , fm(~x)) for each ~x ∈ IB
n. Output
variables of a function are denoted y1, . . . , ym.
Example 1. Figure 2 shows the truth table of the Boolean multiple-output function
f(x1, x2) = (x1 ⊕ x2, x1 ∧ x2) (3)
that has two input and two output variables. It represents the functionality of a half-
adder.
Boolean matrices can be represented by a Boolean function as described by the fol-
lowing definition.
Definition 2 (Boolean matrix). A Boolean-valued 2n×2m matrix A can be represented
by a Boolean function fA ∈ Bm+n with
fA(c1, . . . , cm, r1, . . . , rn)
def
= Ar,c (4)
where Ar,c denotes the element at row r =
∑n
i=1 2
n−iri and column c =
∑m
i=1 2
m−1ci.
Example 2. The Boolean matrix
A =


1 0 0 0
0 0 0 1
0 1 1 0
0 0 0 0


4
is represented by the Boolean function
fA(c1, c2, r1, r2) = c¯1c¯2r¯1r¯2 ∨ c¯1c2r1r¯2 ∨ c1c¯2r1r¯2 ∨ c1c2r¯1r2. (5)
Definition 3 (Co-factors). Given a Boolean function f ∈ Bn over the variables x1, . . . , xn
and a variable xi we define the positive co-factor fxi ∈ Bn−1 and the negative co-
factor fx¯i ∈ Bn−1 as
fxi
def
= f(x1, . . . , xi−1, 1, xi+1, . . . , xn) (6)
and
fx¯i
def
= f(x1, . . . , xi−1, 0, xi+1, . . . , xn), (7)
respectively.
Definition 4 (Smoothing operator). Given a Boolean function f ∈ Bn and an input xi
of f , the smoothing operator ∃xi (Touati et al., 1990) is defined as the disjunction of
both co-factors, i.e.
∃xi f
def
= fx¯i ∨ fxi (8)
We denote ∃~x f
def
= ∃x1 · · · ∃xn f .
That is, the smoothing operator returns a function that does not depend on the variable xi
anymore. Informally one can describe the smoothing operator by replacing all occurrences
of xi and x¯i with don’t cares. The smoothing operator can e.g. be used for matrix
multiplication as illustrated by the following lemma.
Lemma 1 (Touati et al., 1990). Let A be a 2k × 2n Boolean matrix and B be a 2m ×
2k Boolean matrix that are represented by Boolean functions fA(x1, . . . , xn, y1, . . . , yk)
and fB(y1, . . . , yk, z1, . . . , zm) respectively. Let
fC(x1, . . . , xn, z1, . . . , zm) = ∃y1 . . .∃yn (fA ∧ fB).
Then fC is the Boolean function representation for the 2
m×2n Boolean matrix C = B ·A
where ‘·’ is the matrix multiplication in the Galois field IF2. ✷
Definition 5 (ON-set and OFF-set). Given a Boolean function f ∈ Bn the sets
on(f)
def
= {~x ∈ IBn | f(~x) = 1} and off(f)
def
= {~x ∈ IBn | f(~x) = 0} (9)
are called ON-set and OFF-set of f . It can easily be seen that on(f) ∩ off(f) = ∅ and
on(f) ∪ off(f) = IBn.
Definition 6 (Characteristic function). Given a function f = (f1, . . . , fm) ∈ Bn,m its
characteristic function χf ∈ Bn+m is defined as
χf (~x, ~y)
def
=
{
1 f(~x) = ~y
0 otherwise
(10)
for each ~x ∈ IBn and each ~y ∈ IBm.
The characteristic function allows to represent any multiple-output function as a single-
output function. It can be computed from a multiple-output function by adding to the
5
variables {x1, . . . , xn} the additional output variables {y1, . . . , ym}:
m∧
i=1
(yi ↔ fi(x1, . . . , xn)) (11)
In the remainder of this paper, we denote the characteristic function χf of a function f
by a capital letter, i.e. F .
Example 3. The function fA(x1, x2, y1, y2) in (5) is the characteristic function of
f(x1, x2) = (y1, y2) in (3).
2.2. Binary Decision Diagrams
Binary decision diagrams (BDD) are an established data structure for representing
Boolean functions. While the general concepts are briefly outlined in this section, the
reader is referred to the literature for a comprehensive overview (Bryant, 1986; Knuth,
2011).
Let ~x = x1, . . . , xn be the variables of a Boolean function f ∈ Bn. A BDD representing
the function f is a directed acyclic graph with non-terminal vertices N and terminal
vertices T ⊆ {⊥ , ⊤} where N ∩ T = ∅ and T 6= ∅. Each non-terminal vertex v ∈ N is
labeled by a variable from ~x and has exactly two children, low v and high v. The directed
edges to these children are called low-edge and high-edge and are drawn dashed and solid,
respectively. A non-terminal vertex v labeled xi represents a function denoted σ(v) given
by the Shannon decomposition (Shannon, 1938)
σ(v) = x¯iσ(low v) + xiσ(high v) (12)
where σ(low v) and σ(high v) are the functions represented by the children of v with
σ(⊥ ) = 0 and σ(⊤ ) = 1. The BDD has a single start vertex s with σ(s) = f .
A BDD is ordered if the variables of the vertices on every path from the start vertex
to a terminal vertex adhere to a specific ordering. Not all of the variables need to appear
on a particular path, but a variable can appear at most once on any path. A BDD is
reduced if there are no two non-terminal vertices representing the same function, hence
the representation of common subfunctions is shared. In the following only reduced,
ordered BDDs are considered and for briefness referred to as BDDs.
Multiple-output functions can be represented by a single BDD that has more than
one start vertex. Common subfunctions that can be shared among the functions decrease
the overall size of the BDD. In fact, many practical Boolean functions can efficiently be
represented using BDDs, and efficient manipulations and evaluations are possible.
Example 4. Figure 3(a) illustrates the Shannon decomposition from (12). A binary
decision diagram for the function in Fig. 2 is given in Fig. 3(b).
2.3. Reversible Boolean Functions
Definition 7 (Reversible function). A function f ∈ Bn,m is called reversible if f is
bijective, otherwise it is called irreversible. Clearly, if f is reversible, then n = m.
A reversible function f ∈ Bn,n can also be represented by a permutation of {0, 1, . . . , 2n−
1}, i.e.
πf
def
= (nat(f(0, . . . , 0, 0)), . . . , nat(f(1, . . . , 1, 1))) , (13)
6
xi
low v high v
σ(v) = x¯iσ(low v) + xiσ(high v)
(a) Shannon decomposition
x1
x2 x2
x1
⊤ ⊥
y1 y2
(b) BDD for function in Fig. 2
Fig. 3. Binary decision diagrams
where nat : IBn → {0, 1, . . . , 2n − 1} maps a bit-vector to its natural number representa-
tion. Further, f can be represented as a 2× 2 Boolean permutation matrix Πf where
(Πf )r,c
def
= (πf (c) ≡ r) (14)
The following lemma relates the characteristic function to the permutation matrix of a
reversible function.
Lemma 2. Let f ∈ Bn,n be a reversible function, Πf the permutation matrix of f , and F
the characteristic function of f . Then Πf is the matrix representation of F according to
Definition 2. We have | on(F )| = 2n.
2.4. Reversible Circuits
Reversible functions can be realized by reversible circuits that consist of at least n lines
and are constructed as cascades of reversible gates that belong to a certain gate library.
Single-target gates have already been defined in the introduction. The most common
gate library consists of Toffoli gates.
Definition 8 (Toffoli gate). Mixed-polarity multiple-control Toffoli (MPMCT) gates are
a subset of the single-target gates in which the control function c can be represented with
one product term or c = 1. We refer to MPMCT gates as Toffoli gates in the following.
The literals in the control function are also referred to as controls or control lines.
In (Shende et al., 2003), it has been shown that any reversible function f ∈ Bn,n can
be realized by a reversible circuit with n lines when using Toffoli gates. That is, it is
not necessary to add any temporary lines (ancilla) to realize the circuit. Note that each
single-target gate can be expressed in terms of a cascade of Toffoli gates, which can
be obtained from an ESOP expression (Sasao, 1993), respectively. For drawing circuits,
we follow the established conventions of using the symbol ⊕ to denote the target line,
solid black circles to indicate positive control lines and white circles to indicate negative
control lines.
Example 5. Figure 4(a) shows a Toffoli gate with two positive literals, while Fig. 4(b)
shows a Toffoli gate with mixed polarities. Figure 4(c) shows four Toffoli gates in a
cascade forming a reversible circuit. The annotated values demonstrate the computation
of the gate for a given input assignment.
7
x1 y1 = x1
x2 y2 = x2
x3 y3 = x3 ⊕ x1x2
(a) Toffoli gate
x1 y1 = x1
x2 y2 = x2
x3 y3 = x3 ⊕ x1x¯2
(b) Toffoli gate with negative literal
x1 = 0 y1 = 1
x2 = 1 y2 = 1
x3 = 0 y3 = 1
g1
1
1
0
g2
1
1
1
g3
1
0
1
g4
(c) Toffoli circuit
Fig. 4. Reversible circuitry
3. Truth table based Decomposition
In (De Vos and Van Rentergem, 2008) an algorithm was presented that determines
control functions l and r as in (1) to decompose a reversible function f : IBn → IBn
that is represented as a truth table. When applying the algorithm for each circuit line
one can obtain a reversible circuit composed of single-target gates that realizes f . The
underlying theory for the algorithm is based on Young subgroups, however, for the scope
of this paper it is sufficient to explain the algorithm by means of an example. Consider
the truth table in Fig. 5(a) that is described by the variables x1, x2, x3 and y1, y2, y3.
First, the aim is to determine two control functions l and r for single-target gates
that act on the first circuit line which maps x1 to y1. All values of the variables x2, x3
and y2, y3 are copied into a new truth table that will eventually represent f
′ and is
illustrated by the two inner blocks in Fig. 5(b). Afterwards the values of x′1 and y
′
1 are
assigned new values such that the following two constraints are met:
(1) The functions defined by the truth table described by the first two blocks and by
the last two blocks must be reversible, i.e. each pattern must occur exactly once.
(2) The function f ′, defined by the inner two blocks, must ensure that x′1 = y
′
1.
The reversibility of f ′ follows from both constraints and since both x2, x3 and y2, y3 were
copied. It also follows that the functions described by the first and last two blocks can
be described by single-target gates.
The values for x′1 and y
′
1 can be filled using the following procedure. We start by
inserting a value at the first empty cell of x′1, in our example we choose the value 1
indicated by a rectangle in Fig. 5(c). To satisfy the first constraint, next a 0 is inserted
where the pattern repeats for x2 and x3. In order to satisfy the second constraint, the 0
has to be assigned to y′1 in the same row. Consequently, a 1 has to be filled for y
′
1 where
the pattern repeats. This procedure is repeated until a cell is met that has already been
filled. If there are still empty cells, the overall procedure is repeated from the beginning.
All numbers have been filled in Fig. 5(d).
Remark 1. The first entry position in the procedure above can freely be chosen. It
can easily be seen, that the corresponding cycle will not change, it is just entered at a
different position. Also the value of the first entry can be changed without violating the
constraints, if the other values are adjusted accordingly. Hence, one obtains up to 2k
different functions for l and r where k is the number of cycles.
8
x1 x2 x3 y1 y2 y3
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 1 0 0
1 0 0 1 0 1
1 0 1 0 1 1
1 1 0 1 1 1
1 1 1 1 1 0
(a) Initial truth table
x1 x2 x3 x
′
1 x2 x3 y
′
1 y2 y3 y1 y2 y3
0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0 0 1
0 1 0 1 0 1 0 0 1 0
0 1 1 1 1 0 0 1 0 0
1 0 0 0 0 0 1 1 0 1
1 0 1 0 1 1 1 0 1 1
1 1 0 1 0 1 1 1 1 1
1 1 1 1 1 1 0 1 1 0
(b) Copy second and third variable
x1 x2 x3 x
′
1 x2 x3 y
′
1 y2 y3 y1 y2 y3
0 0 0 1 0 0 0 0 0 0 0
0 0 1 0 1 1 0 1 0 0 1
0 1 0 1 0 1 0 0 1 0
0 1 1 1 1 0 0 1 0 0
1 0 0 0 0 0 0 0 1 1 0 1
1 0 1 0 1 1 1 0 1 1
1 1 0 1 0 1 1 1 1 1
1 1 1 1 1 1 0 1 1 0
(c) Fill in first four numbers
x1 x2 x3 x
′
1 x2 x3 y
′
1 y2 y3 y1 y2 y3
0 0 0 1 0 0 1 0 0 0 0 0
0 0 1 1 0 1 1 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0 1 0
0 1 1 0 1 1 0 0 0 1 0 0
1 0 0 0 0 0 0 0 1 1 0 1
1 0 1 0 0 1 0 1 1 0 1 1
1 1 0 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 0
(d) Fill in remaining numbers
x1 x2 x3 x
′
1 x2 x3 y
′
1 y2 y3 y1 y2 y3
0 0 0 1 0 0 1 0 0 0 0 0
0 0 1 1 0 1 1 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0 1 0
0 1 1 0 1 1 0 0 0 1 0 0
1 0 0 0 0 0 0 0 1 1 0 1
1 0 1 0 0 1 0 1 1 0 1 1
1 1 0 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 0
(e) Identifying cubes
x1 x2 x3 x
′
1 x2 x3 x
′
1 x
′
2 x3 y
′
1 y
′
2 y3 y
′
1 y2 y3 y1 y2 y3
0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0
0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 1
0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0
0 1 1 0 1 1 0 1 1 0 1 0 0 0 0 1 0 0
1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1
1 0 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1 1
1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0
T[x¯2, 1] T[x¯′1x¯3, 2] T[x
′
2
, 3] T[y¯′
1
, 2] T[y¯2, 1]
(f) Equalizing x2 and y2
x1 y1
x2 y2
x3 y3
x′
1
x′
2
y′
2
y′
1
(g) Resulting circuit
Fig. 5. Truth table based decomposition
After all values for x′1 and y
′
1 have been assigned, the control functions can be determined
as illustrated by means of Fig. 5(e). The first control function l can be determined from
the first two blocks by inspecting the assignments for x2 and x3 where x1 and x
′
1 are
different. These are 00 and 01, hence l = x¯2x¯3∨ x¯2x3 = x¯2. This can be done analogously
for r and one obtains r = y¯2.
This process is called variable equalization in the following. The two blocks in the
middle of Fig. 5(e) are a truth table representation of f ′ which can now be used to
obtain two further single-target gates by equalizing the variables x2 and y2. This process
is illustrated in Fig. 5(f) where the starting points for filling numbers are again indicated
9
Fn1 Fp′1 Fn′1 Fp1
x1 7→ y1 0 7→ 0 1 7→ 0 0 7→ 1 1 7→ 1
In
p
u
ts
x2x3 x2x3 x2x3 x2x3
00 01 11 00
01 10
10 11
O
u
tp
u
ts
y2y3 y2y3 y2y3 y2y3
00 11 00 01
01 11
10 10
Fig. 6. Co-factor table representation
by rectangles. The control functions for the single-target gates to equalize x2 and y2
are x¯′1x¯3 and y¯
′
1, respectively. Note that the last single-target gate to equalize x3 and y3
can now directly be read from the truth table described by the middle two blocks in
Fig. 5(f), since the values in the first and second variables are already equal. The control
function of the last single-target gate is x¯′2.
When putting all single-target gates together, one obtains a circuit that is depicted in
Fig. 5(g). Note that in this particular case, all single-target gates are also Toffoli gates.
4. General Idea
This section illustrates how the decomposition procedure can be described symbolically
with Boolean function operations that can efficiently be implemented using BDDs. The
problem with the current synthesis approach is the underlying truth table representation
which grows exponentially with respect to the number of variables. As a result, the
processing time of the algorithm depends essentially only on the size of the function and
therefore impedes an efficient processing of functions with many variables.
Aiming for reducing the complexity of the algorithm, we propose to implement the
algorithm based on BDDs, which provide a compact representation for many reversible
functions of practical interest. Using BDDs allows for reducing the required memory
as a first consequence. Moreover, our proposed algorithm for decomposing a function
according to (1) is constructed in a manner such that it does not necessarily traverse the
whole truth table; therefore reducing the run-time.
In order to better illustrate our algorithm we represent the reversible function f by its
co-factor table. This table represents f by means of the co-factors of the characteristic
function F of f . For each variable xi of the function there are four co-factors
• Fni = (Fx¯i)y¯i where xi = 0 maps to yi = 0 in f ,
• Fp′
i
= (Fxi)y¯i where xi = 1 maps to yi = 0 in f ,
• Fn′
i
= (Fx¯i)yi where xi = 0 maps to yi = 1 in f , and
• Fpi = (Fxi)yi where xi = 1 maps to yi = 1 in f .
10
Remark 2. Note that this co-factor table representation is used only for illustrative
purposes in this paper. In the implementation for the algorithm the reversible function
is stored using BDDs and therefore in a more compact way.
Example 6. The co-factor table of the function in Fig. 5(a) is depicted in Fig. 6 and
shows all four co-factors with respect to the decomposition of x1 to y1 and separates each
of the eight function entries by its input and output assignments. Hence, the order of
the entries in the co-factor table matters since input and output assignments are linked
according to their position. As an example, the entry highlighted using a rectangle cor-
responds to the mapping 101 7→ 011 and the entry highlighted using a rounded rectangle
corresponds to the mapping 110 7→ 111.
We can now describe the principle of the algorithm based on the co-factor table rep-
resentation. The aim of the algorithm is to find the two control functions l and r that
equalize the variables x1 and y1. As a result, after applying single-target gates controlled
by these functions, no pattern remains in the middle part of the co-factor table, i.e. in
columns Fp′
1
and Fn′
1
. In order to understand the moving of the entries with respect to
the control functions, consider the following scenario. If l(0, 1) = 1 and r(1, 1) = 0 (entry
highlighted by an rectangle in Fig. 6), the entry moves to the column Fn1 , because the
input pattern is inverted at x1, therefore changing the pattern to 001 7→ 011.
Figure 7 shows all possible combinations of the evaluations of l and r and their effect on
the entries in Fni , Fp′
i
, Fn′
i
, and Fpi referred to as n, p
′, n′, and p, respectively.
Example 7. Hence, when moving the highlighted entry as described in the previous
example, i.e. l(0, 1) = 1, also the second pattern in column Fn1 , i.e. 001 7→ 001 will move
to column Fp′
1
since r(0, 1) = 0 and each input and output subpattern occurs twice in f .
In order to move all entries to the outer columns, the following two conditions must hold:
(1) For the outer entries either none or both control functions must evaluate to true,
hence the entry remains at the same position or is transferred to the other side.
(2) For the inner entries exactly one of the control functions must evaluate to true.
It is not obvious how to determine l and r immediately from the given co-factors. In the
following, we present a method which moves one entry from the middle blocks to the
outside. Repeating this process until eventually all entries from the middle blocks have
been moved to the outside will guarantee an equalization of the considered variable.
Example 8. We explain the method based on the co-factor table as previously presented.
As pattern from the middle blocks we pick 101 7→ 011, which is located in column Fp′
1
.
l¯(~x) ∧ r¯(~y) l(~x) ∧ r¯(~y) l¯(~x) ∧ r(~y) l(~x) ∧ r(~y)
n n p′ n′ p
p′ p′ n p n′
n′ n′ p n p′
p p n′ p′ n
Fig. 7. Movings of entries with respect to l and r
11
(1) 101 7→ 011 l← x¯2x3 d = 0
⇓
(2) 001 7→ 001 r ← y¯2y3 d = 1
⇓
(3) 100 7→ 101 l← l ∨ x¯2x¯3 = x¯2 d = 0
⇓
(4) 000 7→ 000 r ← r ∨ y¯2y¯3 = y¯2 d = 1
⇓
(5) 011 7→ 100
Fig. 8. Resolving a cycle
To determine l(x2, x3) and r(y2, y3), we first set l ← x¯2x3 according to the assignment
to x2 and x3 and r ← ⊥. This will imply the following steps:
• Because l(01) = 1, the highlighted pattern in the column Fp′
1
is moved to Fn1 .
• The original mapping 101 7→ 011 changes to 001 7→ 011, because the input at x1 is
inverted. However, at the same time also the pattern 001 7→ 001, currently in col-
umn Fn1 , would change to 101 7→ 001 and be passed to column Fp′ . We call this
pattern the implied pattern.
• In order to avoid this, one must additionally set r ← y¯2y3 thereby passing the implied
pattern over to Fp1 instead, since now both l and r evaluate to true for this pattern.
The assignment of r will in turn affect another pattern, hence this process is repeated
until the implied pattern is in Fn′
1
. Figure 8 illustrates the process for this example.
The initial pattern is given in the first row and all implied patterns are given in the
rows below. Input patterns are shown on the left and output patterns are shown on the
right. The implication of a pattern is indicated by ‘⇓’. The last column shows a Boolean
variable d which indicates whether function l or r is updated. It is used in the algorithm,
which is described in the Sect. 6.
Remark 3. In this example only one cycle had to be resolved and therefore the control
functions l and r could directly be determined. If more than one cycle needs to be resolved,
the illustrated process is repeated and all obtained functions are composed using the ‘⊕’
operation for the overall control function. This procedure is described in Sect. 6 in more
detail.
5. Characteristic Representation of Reversible Functions
The algorithm that is proposed in the following section heavily relies on the charac-
teristic function representation F ∈ B2n of the given reversible function f ∈ Bn,n for
which a circuit should be determined. The BDD representation of F allows for efficient
function manipulation and evaluation, which is described in more detail in this section.
Example 9. Figure 9 illustrates how the co-factors of a characteristic function can be
obtained from its BDD representation. Further, a BDD representing the characteristic
12
xi
yi yi
Fni Fn′
i
Fp′
i
Fpi
F
(a) Co-factors of the charac-
teristic function
y1
y2
x1
y2
y1
x2 x2x2
⊥ ⊤
χf
(b) BDD for characteristic
function in Fig. 2
Fig. 9. Characteristic representation of reversible functions
function of
f(x1, x2) = (x1 ⊕ x2, x1 ∧ x2)
is depicted in Fig. 9(b).
As pointed out in Lemma 2 the characteristic function of a reversible function corre-
sponds to its Boolean matrix representation. Since gate application of reversible functions
corresponds to matrix multiplication in the permutation matrix representation, Lemma 1
tells us that this operation can be carried out efficiently using BDD manipulation.
Since a reversible function is a 1-to-1 mapping of inputs to outputs, each path to ⊤ in
the BDD for F visits all variables. Hence, picking one such path c represents one in-
put/output mapping of f and has a valuation for each input variable and output variable.
In order to obtain the input pattern from c one can remove all outputs using the smooth-
ing operator. This can be done analogously to get the output pattern. In summary we
have
~x = ∃~y c and ~y = ∃~x c if, and only if f(~x) = ~y. (15)
Example 10. If f ∈ B3,3 is a reversible function with f(1, 0, 1) = (1, 1, 0) then the
pattern c = x1x¯2x3y1y2y¯3 is contained in F . We have
∃~y c = x1x¯2x3 and ∃~x c = y1y2y¯3.
Besides co-factors this operation will be our most powerful tool in order to describe the
synthesis algorithm. The smoothing operator can also be used on F to check whether f
is reversible. We have: f is reversible, if and only if
| on(F )| = 2n and ∃~y F = ⊤ and ∃~xF = ⊤, (16)
i.e. F has 2n paths to ⊤ , and contains all input and output patterns.
13
6. The Algorithm
Based on the procedure exemplarily illustrated in the previous section, we first explain
the basic algorithm to decompose a function as in (1) based on a symbolic function
representation. Afterwards, different techniques for optimization are illustrated in this
and successive sections.
Algorithm D (Symbolic Decomposition). Given a reversible function f : IBn → IBn
that is symbolically represented by its characteristic function
F (~x, ~y) = F (x1, . . . , xn, y1, . . . , yn)
using BDDs and a variable xi ∈ {x1, . . . , xn}, this algorithm finds two control functions l
and r from which a reversible function f ′ = T[l, i] ◦ f ◦T[r, i] can be obtained that does
not change in xi.
D1. [Initialization.] Set l← ⊥ and r ← ⊥.
D2. [Resolved all cycles?] If F (~x, ~y) ∧ (xi ⊕ yi) = ⊥, i.e. there is no cube in which xi
differs from yi, terminate. Otherwise, set l
′ ← ⊥, r′ ← ⊥.
D3. [Pick an inner pattern.] Pick one pattern c from F (~x, ~y) ∧ xi ∧ y¯i, i.e. c represents a
mapping in which xi = 1 is mapped to yi = 0. Also, set the direction bit d← 0.
D4. [Update l′ or r′.] Whether l′ or r′ is updated depends on the value of d:
1) If d = 0, let c′ = ∃xi ∃~y c and set l′ ← l′ ∨ c′ and c← F (~x, ~y) ∧ x¯i ∧ c′.
2) If d = 1, let c′ = ∃yi ∃~x c and set r′ ← r′ ∨ c′ and c← F (~x, ~y) ∧ yi ∧ c′.
D5. [Change direction.] Set d← 1− d.
D6. [Repeat iteration?] If F (~x, ~y) ∧ c ∧ (xi ⊕ yi) = ⊥, i.e. the newly derived cube c
represents a mapping in which xi equals yi, return to step 4. Otherwise, continue.
D7. [Update gate functions.] Update l and r using l′ and r′ and recompute f . Return to
step 2.
The algorithm creates the control functions l and r in an iterative manner. Being initially
assigned ⊥ in step 1, in each iteration of the outer loop (steps 2 to 7), the algorithm
resolves one cycle as described in Fig. 8 and stores the resulting functions in l′ and r′.
To resolve a cycle first one input/output pattern c is picked in which xi = 1 and yi =
0 (step 3). Based on this pattern successive patterns are implied as described in the
previous section and l′ and r′ are updated accordingly (steps 4 to 6). This inner loop is
repeated as long as the implied pattern is neither contained in Fn′
i
nor in Fp′
i
, i.e.
F (~x, ~y) ∧ c ∧ (xi ⊕ yi) = ⊥.
Eventually, after a cycle has been resolved and l′ and r′ are determined, the control
functions l and r are updated and f is recomputed. For this purpose, we first set
r′ ← ∃~y

r′ ∧ n∧
j=1
(xj ↔ yj)

 .
This changes r′ such that all variables yj are substituted by corresponding xj . Afterwards,
we set l← l ⊕ l′, r ← r ⊕ r′, and
f ← T[l′, i] ◦ f ◦ T[r′, i]. (17)
14
Note that throughout the whole procedure of Algorithm D the reversible function will
only be represented as its characteristic function using BDDs and all operations will be
carried out on this symbolic representation. This includes (17) for which also the single-
target functions are represented by its characteristic functions. Function composition
then corresponds to Boolean matrix multiplication which can efficiently be implemented
using the smoothing operator (Knuth, 2011).
Theorem 1. Algorithm D is sound and complete.
Proof. The inner loop will always terminate as illustrated in Fig. 8. Also, by resolving a
cycle Algorithm D will always move at least one pattern from the inner co-factors Fp′
i
and Fn′
i
to the outer co-factors Fni and Fpi . Hence, the termination condition for the
outer loop in step 2 will eventually hold. The algorithm is sound since the input function
is updated using the same extracted gates that are returned. ✷
7. Increasing Efficiency
This section describes two strategies to increase the efficiency of Algorithm D by
considering certain special cases explicitly.
7.1. Only one single-target gate required
It turns out that in many non-random functions only one single-target gate is sufficient
in order to equalize a variable, i.e either l = ⊥ or r = ⊥. Also, this case can easily be
checked and if it applies, the control functions can efficiently be computed.
We consider the case in which r may be ⊥ and hence only one single-target gate needs
to be added for the control function l. The check first removes the variable xi from the
characteristic function and then reassigns xi such that it equals yi. The result is assigned
to F ′:
F ′ = ∃xi F ∧ (xi ↔ yi) (18)
If F ′ is reversible, only one single-target gate is required. Reversibility of F ′ can be
checked by the condition
∃~xF ′ = ⊤, (19)
i.e. F ′ still consists of all possible input patterns. The operation in (19) can efficiently
be performed using BDDs.
Example 11. A negative and positive example of this optimization step is illustrated
in Fig. 10. In both cases the variable x1 is updated to equal the value of y1. Figure 10(a)
shows the truth table from Fig. 5(a) and it can be seen that the resulting truth table
represents an irreversible function since the input patterns 001 and 111 occur twice. For
the function in Fig. 5(b) one single-target gate is sufficient to equalize x1 and y1 since
the resulting truth table represents a reversible function.
Once F ′ has been determined one can determine the control function l by inspecting the
input patterns which differ from the original function F . For this purpose one needs to
remove all output variables and xi from their intersection, i.e.
∃xi ∃~y (F ∧ F
′). (20)
15
x1 x2 x3 y1 y2 y3
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 1 0 0
1 0 0 1 0 1
1 0 1 0 1 1
1 1 0 1 1 1
1 1 1 1 1 0
x1 x2 x3 y1 y2 y3
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
1 1 1 1 0 0
1 0 0 1 0 1
0 0 1 0 1 1
1 1 0 1 1 1
1 1 1 1 1 0
(a) Negative example
x1 x2 x3 y1 y2 y3
0 0 0 0 0 1
0 0 1 0 0 0
0 1 0 0 1 1
0 1 1 1 0 0
1 0 0 1 1 1
1 0 1 1 1 0
1 1 0 1 0 1
1 1 1 0 1 0
x1 x2 x3 y1 y2 y3
0 0 0 0 0 1
0 0 1 0 0 0
0 1 0 0 1 1
1 1 1 1 0 0
1 0 0 1 1 1
1 0 1 1 1 0
1 1 0 1 0 1
0 1 1 0 1 0
(b) Positive example
Fig. 10. Negative and positive example for checking whether one single-target gate is sufficient.
The original truth table is always on the left-hand side and an updated version in which x1 is
set equal to y1 on the right-hand side.
Example 12. For the function in Fig. 10 the application of (20) yields
∃x1 ∃~y (F ∧ F
′) = x2x3.
In order to consider this special case, we added a preprocessing step to Algorithm D
before step 2:
D1a. [Is r = ⊥?] Let F ′ = ∃xi F ∧ (xi ↔ yi). If ∃~xF ′ = ⊤, set l ← ∃xi ∃~y (F ∧ F ′) and
terminate.
The same can be done for the case that l = ⊥ analogously:
D1b. [Is l = ⊥?] Let F ′ = ∃yi F ∧ (xi ↔ yi). If ∃~y F ′ = ⊤, set r ← ∃yi ∃~x (F ∧ F ′) and
terminate.
7.2. Cycles of length 1 and 2
To resolve a cycle in Algorithm D one computes implied patterns iteratively in each
step. As a result, cycles need to be resolved one after the other, which becomes time
consuming if many cycles need to be resolved. Alternatively, one can also express cycles
of a fixed length in one Boolean formula and resolve them all at once. However, the size of
the formula grows with respect to the length of the cycles and is therefore only applicable
to small lengths. In our implementation we considered cycles of length 1 and 2.
Example 13. The columns Fp′
1
and Fn′
1
of the co-factor table in Fig. 11 both contain the
entries 01 and 10. A single-target gate controlled by l′ = x2⊕x3 can resolve both of them
at once without the computationally more expensive iteration steps in Algorithm D.
In general such control functions for cycles of length 1 are expressed by
l′ ← ∃~y Fp′
i
∧ ∃~y Fn′
i
and r′ ← ∃~xFp′
i
∧ ∃~xFn′
i
, (21)
i.e. the intersection of the co-factors is computed after dropping the outputs or inputs,
respectively. It is important not to compute these two control functions at the same
time as they may depend on each other. Furthermore, in our experimental evaluation
we discovered that the best results are obtained when f gets recomputed after each
16
Fn1 Fp′1 Fn′1 Fp1
x1 7→ y1 0 7→ 0 1 7→ 0 0 7→ 1 1 7→ 1
In
p
u
ts
x2x3 x2x3 x2x3 x2x3
11 01 01 11
10 10
11 00
O
u
tp
u
ts
y2y3 y2y3 y2y3 y2y3
00 11 00 01
10 11
01 10
(a) Length 1
Fn1 Fp′1 Fn′1 Fp1
x1 7→ y1 0 7→ 0 1 7→ 0 0 7→ 1 1 7→ 1
In
p
u
ts
x2x3 x2x3 x2x3 x2x3
00 01 11 00
01 10
10 11
O
u
tp
u
ts
y2y3 y2y3 y2y3 y2y3
00 11 10 00
10 11
01 01
(b) Length 2
Fig. 11. Cycles of length 1 and 2
control function has been determined. Cycles of length 1 are therefore handled using the
following additional step:
D1c. [1-cycles.] Set l′ ← ∃~y Fp′
i
∧ ∃~y Fn′
i
. Update l using l′ and recompute f . Set l′ ← ⊥
and set r′ ← ∃~xFp′
i
∧ ∃~xFn′
i
. Update r using r′ and recompute f .
The ‘updates’ correspond to the same procedure as referred to in step 7 in Algorithm D.
This idea can be extended to cycles of length 2, as illustrated by means of the following
example.
Example 14. Consider the co-factor table in Fig. 11(b). Starting with the pattern 101 7→
011 (column Fp′
1
), Algorithm D will first imply the pattern 001 7→ 010 (column Fn1) and
afterwards pattern 011 7→ 110 (column Fn′
1
). Then, the cycle is resolved. In these three
patterns the input part 01 of the middle pattern 001 7→ 010 matches the input part of
the first pattern that is contained in column Fp′
1
. At the same time, the output part 10
matches the output part of the last pattern that is contained in column Fn′
1
and therefore
l′ = x¯2x3 and r
′ = y2y¯3.
In general, cycles of length 2 are obtained by
l′ ← ∃~y g and r′ ← ∃~x g, where g = Fni ∧ ∃~y Fp′
i
∧ ∃ ~xFn′
i
. (22)
Cycles of length 2 are therefore handled using the following additional step:
D1d. [2-cycles.] Let g = Fni ∧ ∃~y Fp′
i
∧ ∃ ~xFn′
i
. Set l′ ← ∃~y g and r′ ← ∃~x g. Update l
and r using l′ and r′ and recompute f . Let g = Fpi ∧∃~y Fn′
i
∧∃ ~xFp′
i
. Set l′ ← ∃~y g
and r′ ← ∃~x g. Update l and r using l′ and r′ and recompute f .
8. Variable Ordering
In order to obtain a circuit from a given reversible function f : IBn → IBn, one needs
to apply Algorithm D for each of the n variables. As a result, Algorithm D is called n
times. However, the order in which variables are equalized by Algorithm D does not
necessarily need to be the natural variable ordering x1, x2, . . . , xn, which is e.g. being used
in (De Vos and Van Rentergem, 2008). Instead, any of the possible n! different variable
17
x1 x2 x3 y1 y2 y3
0 0 0 0 1 0
0 1 0 1 0 0
0 1 1 0 1 1
1 0 0 0 0 0
1 0 1 1 1 0
Fig. 12. Partial function represented as truth table
orderings can be used and each ordering may lead to a different circuit realization with
respect to the number of gates. Since the number of different variable orderings is large,
finding the smallest circuit cannot be done efficiently. In this section we are motivating
two heuristics that may lead to cheaper circuits.
8.1. Greedy Heuristic
The first heuristic follows a greedy approach. In each step the two single-target gates
for all variables that have not yet been processed are computed. The single-target gate-
pair is chosen that leads to the cheapest realization after expansion to Toffoli gates. Using
this heuristic the algorithm is executed n(n+1)/2 times for a function with n variables.
Consequently, using this heuristic will probably increase the overall run-time.
8.2. Hamming Distance Heuristic
Algorithm D computes control functions l and r by equalizing the values for the
variables xi and yi. Consequently, it seems plausible to use variables xi and yi that
already agree on many of the input/output mappings. This heuristic first counts this
number for each variable and then chooses the one with the maximum number of such
patterns.
Note that the patterns do not have to be counted explicitly. The number of such
patterns is equal to
| on(F (~x, ~y) ∧ (xi ⊕ yi))|. (23)
Obviously the number of paths in the co-factors Fp′
i
and Fn′
i
are small when the result
of (23) is large. The assumption is that in this case less cycles need to be resolved.
However, the experimental results will not confirm this assumption.
9. Partial Functions
A partial reversible function on n variables is a function that does not contain all in-
put/output mappings. The original algorithm presented in (De Vos and Van Rentergem,
2008) cannot be applied to such functions in its original form since then cycles may not
be complete. Algorithm D can however be extended in order to support partial functions
by extending the function on demand.
Example 15. A partial reversible function is given in terms of its truth table represen-
tation in Fig. 12. Its representation as characteristic function is
F (x1, x2, x3, y1, y2, y3) = x¯1x¯2x¯3y¯1y2y¯3 ∨ x¯1x2x¯3y1y¯2y¯3 ∨ x¯1x2x3y¯1y2y3
∨ x1x¯2x¯3y¯1y¯2y¯3 ∨ x1x¯2x3y1y2y¯3.
18
Fn1 Fp′1 Fn′1 Fp1
x1 7→ y1 0 7→ 0 1 7→ 0 0 7→ 1 1 7→ 1
In
p
u
ts
x2x3 x2x3 x2x3 x2x3
00 00 10 01
11
O
u
tp
u
ts y2y3 y2y3 y2y3 y2y3
10 00 00 10
11
(a) Co-factor table
(1) 100 7→ 000 l′ ← x¯2x¯3
⇓
(2) 000 7→ 010 r′ ← y2y¯3
⇓
(3) 101 7→ 110 l′ ← l′ ∨ x¯2x3 = x¯2
⇓
(4) ?
(b) Pattern implication
Fig. 13. Resolving a cube in a partial function
When dealing with partial functions, Algorithm D may run into the problem that the
implied pattern c may be assigned ⊥ after step 4 when a corresponding pattern is not
defined by the original function.
Example 16. Figure 13 demonstrates this problem. In Fig. 13(a) the co-factor table
of the function in Fig. 12 is shown to aid the comprehension of the resolving steps in
Fig. 13(b). As can be seen, two further patterns are applied from the initial one that
have been picked from the co-factor Fp′
1
. The iteration step is not completed after the
third pattern, however, there is no pattern available to continue that has the input 001.
The idea is to extend the function in such situations. All not specified patterns can be
chosen arbitrarily as long as reversibility of f is ensured. For illustration purposes, let
us consider the case in which after step 4 we have c = ⊥ and d = 0. That is, there is no
pattern specified in f for the input pattern x¯ic
′. It is sufficient to set c← x¯ic′c′′ where c′′
is a cube randomly picked from ∃~x F¯ , i.e. all output patterns that are not specified in f .
Afterwards, the new pattern is added to f by setting f ← f ∨ c. Consequently, it is not
contained in ∃~x F¯ any longer.
Example 17. For Example 16 we have xi = x1, c
′ = x¯2x3, and
∃~x F¯ = y¯1y¯2y3 ∨ y1y¯2y3 ∨ y1y2y3.
In order to further increase the efficiency, our implementation first tries to find an output
pattern c′′ that is in yi ∧ ∃~x F¯ , i.e. patterns in which yi is assigned 1. In this case, one
can ensure that the condition in step 6 fails and the cycle has been resolved immediately.
Remark 4. Note that of course both algorithms can deal with partial functions if they
are made total prior to synthesis by randomly filling all non-specified mappings. However,
that technique may not always be efficient. In contrast, our proposed technique does not
necessarily add all non-specified mappings.
19
Table 1. Comparison to related algorithms
Truth table based QMDD based BDD based
(De Vos et al., 2008) (Soeken et al., 2012b) Proposed approach (Sect. 6)
Name n dT qT tT dQ qQ tQ d q t Imp. T Imp. Q
sym6 7 172 12852 0.48 262 41552 0.02 124 8911 0.11 30.66 78.55
urf2 8 279 26257 0.15 763 165697 0.14 268 24066 0.19 8.34 85.48
con1 8 327 32102 2.06 659 139118 0.09 233 22988 0.23 28.39 83.48
hwb9 9 797 107219 9.26 2294 629433 0.91 584 73465 0.79 31.48 88.33
urf1 9 812 106176 6.74 1957 533680 0.71 563 74858 0.66 29.50 85.97
urf5 9 388 61649 2.78 693 185752 0.13 213 32676 0.16 47.00 82.41
adr4 9 681 101577 7.16 1130 290997 0.26 459 64309 0.78 36.69 77.90
sym9 10 1653 255990 15.34 3895 1276583 2.94 1175 174678 3.19 31.76 86.32
urf3 10 1690 268828 16.86 4071 1347318 3.76 1081 162225 2.15 39.65 87.96
5xp1 10 1507 232995 11.19 2231 724052 1.07 837 134267 3.99 42.37 81.46
rd84 11 3641 689283 31.53 6389 2445443 10.95 2063 401660 20.60 41.73 83.58
sym10 11 3657 664202 33.31 9812 3780469 23.05 2467 461538 12.01 30.51 87.79
urf4 11 3911 713328 37.11 11684 4508910 40.74 2641 491645 12.10 31.08 89.10
clip 11 3542 685118 35.55 7913 3036313 16.88 2271 434952 18.06 36.51 85.67
cycle10 2 12 27 4200 8.41 36 6286 0.07 27 4200 0.05 0.00 33.18
dc2 13 7999 2706179 119.73 11346 5612922 74.78 4102 1395422 224.06 48.44 75.14
misex1 14 34671 12014870 539.30 18412 10115630 274.82 6867 2733073 1046.12 77.25 72.98
co14 15 83652 30502311 2032.06 63640 38678808 2677.82 25065 10028634 730.96 67.12 74.07
urf6 15 15197 7275366 2855.35 23497 14432936 336.52 2164 1215312 3.96 83.30 91.58
dk27 15 50807 19144930 2074.22 57619 35018963 3773.21 23882 11254565 8598.83 41.21 67.86
C7552 20 — — TO 356 309008 133.89 257 180894 8.53 — 41.46
bw 32 — — TO — — MO 2585 3766784 2076.51 — —
The percentage improvements are with respect to the values for quantum costs in the blocks Truth table based
and QMDD based.
10. Experimental Evaluations
The synthesis approach we propose has been implemented in C++ using BDDs on
top of RevKit (Soeken et al., 2012a) and CUDD (Somenzi, 2001). 2 The experimental
evaluation has been carried out on a 3.4 GHz Quad-Core Intel Xeon Processor with 32 GB
of main memory running Linux 3.14. The experimental results in Sects. 10.1 and 10.2
have been generated with the RevKit program ‘rcbdd synthesis ’. For all experiments, we
have set the time-out to 10000 seconds. We have verified all our results using the tool
abc (Brayton and Mishchenko, 2010). 3 The following sections discuss the experiments.
10.1. Comparison to Related Algorithms
In this section, we compare our proposed algorithm to the original truth table based
variant from (De Vos and Van Rentergem, 2008) and the QMDD based synthesis method
2 The source code that has been used to perform this evaluation is available at www.revkit.org (version
2.0).
3 The tool abc can be downloaded from bitbucket.org/alanmi/abc. The read routine for reversible circuit
files is available at bitbucket.org/msoeken/abc
20
Table 2. Evaluation of variable heuristics
BDD based (Sect. 6) Greedy (Sect. 8.1) Hamming (Sect. 8.2)
Name n d q t d q t Imp. d q t Imp.
sym6 7 124 8911 0.11 113 8540 0.56 4.16 107 7973 0.11 10.53
urf2 8 268 24066 0.19 259 24528 1.59 -1.92 263 24535 0.27 -1.95
con1 8 233 22988 0.23 201 20104 1.58 12.55 234 20447 0.34 11.05
hwb9 9 584 73465 0.79 572 74690 8.29 -1.67 584 73465 0.96 0.00
urf1 9 563 74858 0.66 556 74116 6.79 0.99 573 73094 0.92 2.36
urf5 9 213 32676 0.16 188 33243 2.11 -1.74 307 46053 0.34 -40.94
adr4 9 459 64309 0.78 171 23401 2.93 63.61 195 26537 0.42 58.74
sym9 10 1175 174678 3.19 814 148295 27.45 15.10 1014 164388 3.95 5.89
urf3 10 1081 162225 2.15 1051 162267 29.27 -0.03 1058 165242 2.92 -1.86
5xp1 10 837 134267 3.99 838 138159 26.73 -2.90 847 147315 5.38 -9.72
rd84 11 2063 401660 20.60 1996 392350 178.86 2.32 1868 401940 26.01 -0.07
sym10 11 2467 461538 12.01 1405 334313 123.13 27.57 2068 437864 17.63 5.13
urf4 11 2641 491645 12.10 2629 496356 187.30 -0.96 2629 490826 22.21 0.17
clip 11 2271 434952 18.06 2214 423724 193.36 2.58 2344 455392 32.86 -4.70
cycle10 2 12 27 4200 0.05 27 4200 0.39 0.00 27 4200 0.05 0.00
dc2 13 4102 1395422 224.06 3645 1274742 1131.20 8.65 3230 1190154 185.02 14.71
misex1 14 6867 2733073 1046.12 7937 3111290 8044.99 -13.84 3949 1714174 480.96 37.28
co14 15 25065 10028634 730.96 1603 714287 671.89 92.88 — — TO —
urf6 15 2164 1215312 3.96 2215 1253000 69.89 -3.10 2215 1244264 7.66 -2.38
dk27 15 23882 11254565 8598.83 8549 3998148 15257.82 64.48 4338 2141622 795.90 80.97
C7552 20 257 180894 8.53 216 154448 27.52 14.62 367 274456 12.49 -51.72
bw 32 2585 3766784 2076.51 — — TO — — — TO —
The percentage improvements in the blocks Greedy and Hamming are with respect to the values for quantum
costs (q) in the block BDD based.
from (Soeken et al., 2012b). The truth table based approach has been re-implemented in
the RevKit program ‘young subgroup synthesis ’. The considered functions are taken from
the LGSynth’93 benchmarks (www.cbl.ncsu.edu:16080/benchmarks/lgsynth93/) and
from www.cs.uvic.ca/~dmaslov/. These functions are mainly irreversible and provided
in terms of their sum-of-product representation saved as PLA files. We have used the
embedding algorithm proposed in (Soeken et al., 2014b) to embed them as a reversible
function that is represented by means of the binary decision diagram of the characteristic
function. The time required for the embedding is not accounted for in the reported run-
times but can be obtained from (Soeken et al., 2014b). Since the embedding algorithm
produces a partial function we first applied our proposed approach as this is the only ap-
proach out of the three ones that supports partial functions. Based on the resulting circuit
we created a truth table representation using the RevKit program ‘circuit to truth table’
which then was used as input to the truth table based algorithm. For the QMDD based
algorithm we used the circuit as input to construct the QMDD from which a different
circuit is created. Since the truth table based approach and the proposed approach cre-
ate single-target gates we used exorcism (Mishchenko and Perkowski, 2001) to translate
them into mixed-polarity multiple-control Toffoli gates. The time required for this trans-
lation is included in the overall reported run-times. The QMDD based synthesis approach
directly creates MPMCT gates.
21
Table 1 presents the results. The first block of columns denotes the name of the
benchmark together with the number of circuit lines (n). For each algorithm a block lists
the number of Toffoli gates (d), the quantum costs in terms of T gates in a Clifford+T
mapping according to (Amy et al., 2013) (q), and the run-time (in seconds) required for
the whole synthesis (t).
It can be seen that our approach scales well and the performance it is comparable to
the QMDD based synthesis approach (fastest run-times are set in bold face). The truth
table based approach was never the fastest one and moreover, if the function has more
than 15 variables, the truth table based algorithm was not able to finish before the time-
out. It can be seen that the run-time correlates much more to the number of variables
for the truth table based approach compared to the proposed approach.
In addition to the better run-times, the proposed approach also leads to better results
with respect to the number of Toffoli gates and quantum costs. For the latter, in the
best case, we get an improvement of over 83% compared to the original truth table based
approach and over 91% compared to the QMDD based approach.
Remark 5. Due to Remark 1 there must exist a choice of assignments such that the
truth table based approach leads to the same results in terms of Toffoli costs as the
proposed approach. Hence, it seems that due to the symbolic decomposition described
in Algorithm D a better choice for the assignments is implicitly taken.
Remark 6. Most of the benchmarks used in Table 1 originally describe irreversible
functions and were embedded as reversible functions. We have used a different embedding
approach (Soeken et al., 2014b). This explains that in some cases a different number of
lines and hence also to a different number of gates is obtained compared to the values
given in (Soeken et al., 2012b).
10.2. Evaluating Variable Ordering Heuristics
We have evaluated the heuristics for variable orderings as discussed in Sect. 8 for the
same benchmarks as in the previous section and listed the results in Table 2. Both the
Greedy approach and the approach based on the hamming distance can achieve significant
improvement in quantum costs. The maximum improvement is over 92% (co14 ) for the
Greedy approach and over 80% (dk27 ) for the Hamming approach.
The quantum cost is not always improved. However, in case of the Greedy approach the
difference to the better solution is not too large. In fact, in these cases the gate count has
actually decreased (which is also the cost criteria in the implementation of the heuristics).
One can overcome this problem by changing the cost criteria in the implementation which
leads to a higher run-time of the algorithm. In case of the approach based on the hamming
distance, the quantum cost can also increase significantly (e.g. C7752 ).
The run-time for the Greedy approach is much higher whereas the run-time for the
Hamming approach is comparable to the original approach and usually correlates with
the achieved improvement. If a higher improvement can be achieved, usually less cycles
need to be resolved and hence the run-time decreases.
22
F =
n∧
i=1
yi ↔ xi (identity)
F =
n∧
i=1
yi ⊕ xi (invert)
F =
n∧
i=1
yi ↔ x(i+k) modn+1 (rotate)
F =
n∧
i=1
yi ↔


x¯i if i is odd,
x(i+2)modn+1 if i is even.
(invert or rotate)
F =
n/2∧
i=1
yi ↔ xi ∧ y2i ↔ (xi ⊕ x2i) (bitwise-xor)
Fig. 14. Functions for evaluating scalability
10.3. Evaluating Scalability
In order to further evaluate the scalability of the proposed approach we have created
BDDs of characteristic functions that represent reversible functions directly in memory.
Figure 14 lists the functions that have been used for this experiment and are parameter-
ized by the number of lines.
Note that for invert or rotate and bitwise-xor the number of lines n needs to be even.
We have implemented the experiment as the RevKit test-case ‘rcbdd scalability’. The
results are given by means of plots in Fig. 15. The values of the x-axis and y-axis denote
the number of lines and run-time in seconds, respectively. As can be seen functions with a
large number of variables can be synthesized with the proposed approach. The run-time
increases rapidly when the problem instances get larger and the effect is more noticeable
when the function is complex.
11. Conclusions
In this paper we presented an algorithm for ancilla-free synthesis of large reversible
functions. The reversible function is represented by its characteristic function using bi-
nary decision diagrams. This enables an efficient symbolic function manipulation. We
formalized a decomposition technique that has formerly been used in a truth table based
synthesis approach by means of co-factors on the characteristic function. This enabled
a synthesis approach that is applicable to significantly larger functions. Additionally, we
proposed heuristics to reduce gate cost and provide extensions to apply the algorithm to
partial functions.
An experimental evaluation demonstrates the applicability of the proposed approach to
large functions and also shows that the realized circuits lead to smaller circuits compared
to state-the-art synthesis approaches.
Most run-time is spent on resolving cycles. We are currently investigating how this
process can be done more efficiently. One way to overcome the efficiency problem is
to synthesize transpositions directly. This would however lead to circuits that are not
structured in a V-shape and as a result a linear number of single-target gates for their
representation is no longer guaranteed. An increase in gate count and circuit cost is also
23
100 200
0
20
40
Lines
T
im
e
(s
ec
o
n
d
s) identity
100 200
0
100
200
300
Lines
T
im
e
(s
ec
o
n
d
s) invert
100 200 300
0
500
1,000
Lines
T
im
e
(s
ec
o
n
d
s) invert or rotate
10 20
0
0.5
1
Lines
T
im
e
(s
ec
o
n
d
s) rotate k = 3
10 20
0
10
20
Lines
T
im
e
(s
ec
o
n
d
s) rotate k = 5
10 20
0
100
200
Lines
T
im
e
(s
ec
o
n
d
s) rotate k = 7
10 20 30
0
200
400
600
Lines
T
im
e
(s
ec
o
n
d
s) bitwise-xor
Fig. 15. Evaluating scalability
expected. Further future work considers an efficient representation of single-target gates
and also a direct mapping of them to quantum gates.
References
Abdessaied, N., Soeken, M., Thomsen, M. K., Drechsler, R., 2014. Upper bounds for
reversible circuits based on young subgroups. Inf. Process. Lett. 114 (6), 282–286.
Amy, M., Maslov, D., Mosca, M., Roetteler, M., 2013. A meet-in-the-middle algorithm
for fast synthesis of depth-optimal quantum circuits. IEEE Trans. on CAD 32 (6),
818–830.
Be´rut, A., Arakelyan, A., Petrosyan, A., Ciliberto, S., Dillenschneider, R., Lutz, E.,
2012. Experimental verification of Landauers principle linking information and ther-
modynamics. Nature 483, 187–189.
Brayton, R. K., Mishchenko, A., 2010. ABC: An academic industrial-strength verification
tool. In: Computer Aided Verification. pp. 24–40.
Bryant, R. E., 1986. Graph-based algorithms for Boolean function manipulation. IEEE
Trans. on Comp. 35 (8), 677–691.
De Vos, A., 2010. Reversible Computing - Fundamentals, Quantum Computing, and
Applications. Wiley, Weinheim.
De Vos, A., Van Rentergem, Y., 2008. Young subgroups for reversible computers. Ad-
vances in Mathematics of Communications 2 (2), 183–200.
24
Feynman, R. P., 1985. Quantum mechanical computers. Optics News 11, 11–20.
Knuth, D. E., 2011. The Art of Computer Programming. Vol. 4A. Addison-Wesley, Upper
Saddle River, New Jersey.
Landauer, R., 1961. Irreversibility and Heat Generation in the Computing Process. IBM
Journal of Research and Development 5 (3), 183–191.
Maslov, D., Dueck, G. W., Miller, D. M., 2005. Toffoli network synthesis with templates.
IEEE Trans. on CAD 24 (6), 807–817.
Maslov, D., Dueck, G. W., Miller, D. M., 2007. Techniques for the synthesis of reversible
Toffoli networks. ACM Trans. Design Autom. Electr. Syst. 12 (4), 42:1–42:28.
Miller, D. M., Maslov, D., Dueck, G. W., 2003. A transformation based algorithm for
reversible logic synthesis. In: Design Automation Conference. Vol. 40. pp. 318–323.
Miller, D. M., Thornton, M. A., 2006. QMDD: A decision diagram structure for reversible
and quantum circuits. In: Int’l Symp. on Multiple-Valued Logic. Vol. 36. p. 30.
Mishchenko, A., Perkowski, M., 2001. Fast heuristic minimization of exclusive sum-of-
products. In: Int’l Reed-Muller Workshop.
Saeedi, M., Markov, I. L., 2013. Synthesis and optimization of reversible circuits - a
survey. ACM Computing Surveys 45 (2), 21:1–21:34.
Saeedi, M., Zamani, M. S., Sedighi, M., Sasanian, Z., 2010. Reversible circuit synthesis
using a cycle-based approach. ACM Journal on Emerging Technologies 6 (4), 13.
Sasanian, Z., Saeedi, M., Sedighi, M., Zamani, M. S., 2009. A cycle-based synthesis algo-
rithm for reversible logic. In: Asia and South Pacific Design Automation Conference.
pp. 745–750.
Sasao, T., 1993. AND-EXOR expressions and their optimization. In: Sasao, T. (Ed.),
Logic Synthesis and Optimization. Kluwer Academic Publisher, pp. 287–312.
Shannon, C. E., 1938. A Symbolic Analysis of Relay and Switching Circuits. Trans.
American Institute of Electrical Engineers 57 (38–80), 713–723.
Shende, V. V., Prasad, A. K., Markov, I. L., Hayes, J. P., 2003. Synthesis of reversible
logic circuits. IEEE Trans. on CAD 22 (6), 710–722.
Soeken, M., Abdessaied, N., Drechsler, R., 2014a. A framework for reversible circuit
complexity. arXiv 1407.5878.
Soeken, M., Frehse, S., Wille, R., Drechsler, R., 2012a. RevKit: An open source toolkit
for the design of reversible circuits. Lecture Notes in Computer Science 7165, 64–76,
selected Papers from the Third International Workshop on Reversible Computation.
Soeken, M., Wille, R., Go¨ller, S., Keszocze, O., Miller, D. M., Drechsler, R., 2014b.
Embedding of large Boolean functions for reversible logic. arXiv 1408.3586.
Soeken, M., Wille, R., Hilken, C., Przigoda, N., Drechsler, R., 2012b. Synthesis of re-
versible circuits with minimal lines for large functions. In: Asia and South Pacific
Design Automation Conference. Vol. 17. pp. 85–92.
Somenzi, F., 2001. Efficient manipulation of decision diagrams. STTT 3 (2), 171–181.
Toffoli, T., 1980. Reversible computing. In: Colloquium on Automata, Languages and
Programming. Vol. 7. pp. 632–644.
Touati, H. J., Savoj, H., Lin, B., Brayton, R. K., Sangiovanni-Vincentelli, A. L., 1990.
Implicit state enumeration of finite state machines using BDDs. In: Int’l Conf. on
Computer Aided Design. Vol. 9. pp. 130–133.
Wille, R., Drechsler, R., 2009. BDD-based synthesis of reversible logic for large functions.
In: Design Automation Conference. pp. 270–275.
25
