Design Automation for Adiabatic Circuits by Zulehner, Alwin et al.
ar
X
iv
:1
80
9.
02
42
1v
2 
 [c
s.E
T]
  5
 N
ov
 20
18
Design Automation for Adiabatic Circuits
Alwin Zulehner1 Michael P. Frank2 Robert Wille1
1Institute for Integrated Circuits, Johannes Kepler University Linz, Austria
2Center for Computing Research, Sandia National Laboratories, Albuquerque, USA
alwin.zulehner@jku.at mpfrank@sandia.gov robert.wille@jku.at
ABSTRACT
Adiabatic circuits are heavily investigated since they allow for com-
putations with an asymptotically close to zero energy dissipation
per operation—serving as an alternative technology for many sce-
narios where energy efficiency is preferred over fast execution.
eir concepts are motivated by the fact that the information lost
from conventional circuits results in an entropy increase which
causes energy dissipation. To overcome this issue, computations
are performed in a (conditionally) reversible fashion which, addi-
tionally, have to satisfy switching rules that are different from con-
ventional circuitry—crying out for dedicated design automation
solutions. While previous approaches either focus on their elec-
trical realization (resulting in small, hand-craed circuits only) or
on designing fully reversible building blocks (an unnecessary over-
head), this work aims for providing an automatic and dedicated de-
sign scheme that explicitly takes the recent findings in this domain
into account. To this end, we review the theoretical and technical
background of adiabatic circuits and present automated methods
that dedicatedly realize the desired function as an adiabatic circuit.
e resulting methods are further optimized—leading to an auto-
matic and efficient design automation for this promising technol-
ogy. Evaluations confirm the benefits and applicability of the pro-
posed solution.
1 INTRODUCTION
As we approach the end of the semiconductor roadmap [1], we
are entering a regime in which fundamental thermodynamic con-
siderations limit the sub-threshold slope, practical switching volt-
ages, and gate energies—implying that further downscaling of de-
vice sizes and gate capacitances will soon no longer yield improve-
ments in energy efficiency for conventional logic. Industry’s shi
towards 3D geometries [1] will somewhat reduce parasitic energy
losses in circuit structures, but once that line of improvements is
played out, the only remaining approach to further increase energy
efficiency will be to begin applying techniques of energy recov-
ery. In this regard, resonant circuit techniques to recycle and reuse
logic signal energies, rather than dissipating the entire 12CV
2 cir-
cuit node energy on each logic-level transition, are promising. Un-
like all other options, no fundamental theoretical limits on the ulti-
mate energy efficiency of energy recovery are currently known for
this direction—offering a path towards future growth of computing
performance within any given energy dissipation constraints.
But apparently the ideal of 100% energy recovery implies that all
switching activity of a device must be carried out in a manner that
is asymptotically adiabatic—avoiding any abrupt loss of signal en-
ergy to heat. is motivated the consideration of adiabatic circuits
which allow for computations with an asymptotically close to zero
energy dissipation (at the expense of a slower execution). Due to
Landauer’s limit [11], this in turn implies that the computational
function of the switching circuit must be logically reversible, in the
appropriately generalized sense discussed in [6]. Otherwise, the
information lost from a conventional circuit leads to an entropy
increase and, therefore, to an irreducible energy dissipation. is
was recently also advocated to a larger community in [7] stating
that the future of computing depends on reversible computations.
While these concepts have already been around for a while—
general techniques for designing fully-adiabatic and reversible cir-
cuits have been introduced in the 1990’s and resulted in a large
body of literature (see e.g. [8, 9, 13, 17])—most of the adiabatic de-
sign families that have been proposed contain flaws preventing
them from being truly adiabatic [5]. In this regard, two-level adi-
abatic logic (2LAL as proposed in [2]) represents a very promis-
ing, fully-adiabatic transmission-gate logic family that relies on
simple but rather efficient building blocks. However, to realize
correct adiabatic and reversible circuit designs that could truly ap-
proach arbitrarily low levels of energy dissipation requires to sat-
isfy certain switching rules which differ from the design of conven-
tional circuitry—crying out for automated approaches for the de-
sign of such adiabatic circuits. Heading into this direction recently
also gained relevance in industry—triggered e.g. by investments of
funding agencies and national departments [7]. Accordingly, re-
searchers started to work towards such solutions.
However, previously proposed approaches either focus on their
electrical realization (see e.g. [2, 17]) or on designing purely re-
versible building blocks like Toffoli gates (see e.g. [15, 16] in combi-
nation with corresponding synthesis approaches such as [18, 19]).
While the former approaches are restricted to small and hand-craed
circuits only, relying on purely reversible building blocks results
in an unnecessarily large overhead. Instead, recent findings (sum-
marized in [6]) show that conditional reversibility is sufficient for
adiabatic circuits. But thus far, no design automation approach for
adiabatic circuits exists which exploits that in an automatic fash-
ion.
In this work, we overcome this issue by combining expertise
from both adiabatic circuits and design automation. More pre-
cisely, we review the theoretical and technical background of adia-
batic circuits and, based on that, propose an automatic and ded-
icated design flow for this promising technology. Two comple-
mentary design styles (namely retractile and fully-pipelined) are
thereby considered which allow for the generation of adiabatic cir-
cuits either focusing on reducing the number of gates or keeping
the number of so-called power clocks small. Furthermore, opti-
mizations for both design styles are proposedwhich utilize application-
specific properties and, by this, allow e.g. for a reduction in the
number of gates by approx. 37% and 30% on average for the re-
tractile and fully-pipelined design styles, respectively. Evaluations
confirm the benefits and applicability of the proposed solution.
e remainder of this work is structured as follows: Section 2
provides a review of the theoretical and technical background of
adiabatic circuits. Based on that, the proposed design flow is intro-
duced in Section 3 followed by the descriptions of the respective
mapping methods following the retractile and fully-pipelined de-
sign style in Section 4 and Section 5, respectively. Finally, a sum-
mary of the results from our evaluations is given in Section 6 and
the paper is concluded in Section 7.
A B
P
AP BP AN BN
PN
PP
Figure 1: Transmission gate for dual-rail signals
A
B
ϕi Y = A + B
(a) OR gate
ϕi
A B
Y1 = A · B
Y2 = A
(b) AND gate
Figure 2: Adiabatic gates
2 ADIABATIC CIRCUITS
In this work, we consider design automation for adiabatic circuits
according to the two-level adiabatic logic (2LAL, [2]) circuit family.
is type of adiabatic circuit uses only two different voltage levels
and heavily relies on transmission gates. Furthermore, a dual-rail
encoding is used for the signals of the circuit, i.e. each signal occurs
in uncomplemented as well as in complemented form.1
Fig. 1 provides the notation for transmission gates: If the signal
P is 1 (the gate is turned on), A and B are connected.2 Otherwise
(the gate is turned off ), A and B are not connected. Since A and B
are both encoded in a dual-rail fashion and, thus, have an uncom-
plemented as well as complemented form, two transmission gates
as shown in the right-hand side of Fig. 1 are required.3 e gen-
eral switching rules for transistors in adiabatic circuits (e.g. outlined
in [2, 5]) imply that a transmission gate shall never be turned on if
A and B have different values.
Besides that, so-called power clocks (denoted ϕi ) are addition-
ally utilized to realize typical functions such as OR or AND. More
precisely, the inputs of the gate control a network of transmission
gates which connect the output Y of the gate to one of the power
clocks ϕi in case the function to be realized evaluates to 1. To
obey the switching rules, the output Y of the gate as well as the
power clock ϕi are assumed to be 0 initially. By transitioning the
power clock to 1, the output of the gate is set to the desired value.
Moreover, when reseing all inputs of a gate to 0 (and, thus, dis-
connecting ϕi and Y ) while ϕi is still 1, the output preserves its
value (even if reseing ϕi to 0 aerwards). is allows for an in-
herent latching of an output value to be used by following gates.
An example illustrates the idea:
Example 1. Fig. 2 shows the 2LAL realization of an OR gate and
an AND gate. e OR gate is composed of two parallel transmis-
sion gates whose outputs are connected. In case A = 1 (B = 1), the
upper (the lower) transmission gate is turned on and connects the
power clock ϕi to the output Y . Consequently, Y is connected to ϕi
if A + B = 1. Transitioning ϕi to 1 sets Y to the desired value. If we
now reset the inputsA and B to 0, the outputY is latched—its value is
preserved even when seing ϕi back to 0 aerwards. e AND gate
is realized similarly as a sequence of two transmission gates. Note
that a second output Y2 = A is required in this case to operate the
gate in an adiabatic fashion in case A = 1 and B = 0 when used in a
fully-pipelined circuit (cf. Section 5).
1e uncomplemented form of a signal is labeled with a subscript N and the comple-
mented form is labeled with subscript P .
2Note that logic 1 (i.e. X = 1) is realized by XN = 1 and XP = 0 since a dual-rail
encoding is employed.
3For sake of simplicity, we abstract the two transmission gates in the following il-
lustrations and use the more compact form as shown in the le-hand side of Fig. 1
instead.
Once the output of a gate is not needed anymore (e.g. by a fol-
lowing gate), an essential step for adiabatic circuits is the ability
to decompute it—feeding charge back to the power clocks. In case
that the output was not latched (i.e. the output is still connected to
the power clock), it is decomputable by simply reseing the power
clock to 0 (as discussed above). If the output is latched (i.e. it was
disconnected from the power clock by seing the inputs back to 0),
the power clock has to be transitioned to 1 as well, before the in-
puts are applied in order to obey the switching rules. en, the
output is decomputable by transitioning the power clock back to
0.
Example 2. Consider again the 2LAL realization of an OR gate
(cf. Fig. 2a). Assume that the output Y = A+ B of the gate is latched
and that all other signals are set to 0. To unlatch the outputY , we first
have to set the power clock ϕi to 1. By this, ϕi and Y have the same
value if they get connected by reseing the inputs to their original
value. en, Y is decomputable by changing the power clock ϕi back
to 0—the charge representing Y = 1 is fed back to the power supply.
Following this main principle allows for conducting operations
with an asymptotically close to zero energy dissipation (at the ex-
pense of a slower execution since more steps have to be conducted).
In fact, in contrast to conventional circuits in which energy is fre-
quently “grounded”, adiabatic circuits allow for feeding energy back
to the clocks providing the power supply.
However, this concept of feeding back charge to the power clocks
by decomputing signals demands for a logical reversibility of the
underlying computations. is is because, in order to not violate
the switching rules, the original input assignments have to be ap-
plied so that signals with different values are never connected (cf. Ex-
ample 2). While in the past, a pure reversible scheme has been
assumed (see e.g. [15, 16]), findings recently summarized in [6]
showed that conditional reversibility is actually sufficient for adia-
batic circuits. Again, this is illustrated by means of an example:
Example 3. Consider again the OR gate shown in Fig. 2a. Consid-
ering the state of the signalsA, B, and Y , the gate describes a function
f : B3 → B3 = (A,B,Y ) → (A,B,A + B). is function is not re-
versible in general, since the initial value of Y can not be computed
from the output values. However, the function is conditionally re-
versible under the precondition that the value of Y is initially set to
0, i.e. an input combination like e.g. (1, 0, 1) can never occur. Condi-
tional reversibility is a much weaker constraint than unconditional
reversibility (as e.g. considered in [15, 16])—allowing to realize adia-
batic gates as e.g. shown in Fig. 2.
Obviously, conducting computations in such a fashion requires
the corresponding circuits to be designed in a significantly differ-
ent fashion than conventional circuitry. Besides the generation of
a proper netlist composed of transmission gates, this additionally
requires dedicated power clockswhich correspondingly trigger the
required operations at the correct point in time. Moreover, also the
design objectives change. While the number of required (transmis-
sion) gates is still a factor (e.g. to approximate the required area),
their impact on energy consumption is smaller than for conven-
tional circuits. is is because energy is never grounded in adia-
batic circuits but frequently fed back to the power supply as de-
scribed above. In contrast, the number of power clocks is much
more crucial as they are the entities which actually require en-
ergy and whose waveform might be hard to generate. Besides that,
more clocks usually also require longer execution times.
3 PROPOSED DESIGN FLOW
As discussed above, previous design methods for designing adi-
abatic circuits (e.g. [15, 16]) assumed the requirement of full re-
versibility. As recently discussed in [6], this leads to a significant
y1 y0
∧6 ∧7
∧4 ∧5
∧2 ∧3
∧1
x2 x1 x0
(a) AND-Inverter graph
y1 y0
∨6 ∨7
∨4 ∨5
∨2 ∨3
∨1
x2 x1 x0
(b) OR-Inverter graph
Figure 3: Graph representations for Boolean functions
overhead and is not necessarily needed. In fact, conditional re-
versibility as reviewed above is sufficient and constitutes a much
weaker constraint. However, thus far, no design automation for
this kind of adiabatic circuits exists. Also, solely employing con-
ventional design solutions is not an option since, despite the pure
functionality, a dedicatedmapping and clocking scheme is required.
In this work, we present different methods which address these is-
sues. All of them employ thereby a two-stage process. e first
step is similar to the design of conventional circuits: We realize
the function to be synthesized with respect to a certain logic gate
library. Aerwards, the resulting netlist is mapped to an adiabatic
circuit which respectively satisfies and optimizes the rules and ob-
jectives reviewed in Section 2.
For the first part, we utilize a solution based on AND-Inverter
Graphs (AIGs [10]) which realize the function to be synthesized in
terms of NAND gates.4 AIGs allow for a graph-based representa-
tion of Boolean functions. e graph has one root node for each
output of the function. e inputs of the function are provided as
terminals. e intermediate nodes of an AIG represent an AND op-
eration and, thus, have two successors each. To gain universality,
the inputs of the AND operation can be inverted. is is denoted
by black circles on the respective edges. Equal nodes occur fre-
quently and can be shared—allowing for a compact representation
of the function to be realized.
Example 4. Fig. 3a shows the AIG of a 3-input 2-output Boolean
function with inputsx2, x1, and x0 as well as outputsy1 andy0 which
represent y1 = x2x1 + x2x0 + x1x0 and y0 = x2x1 + x1x0 + x2x1x0
in terms of an AIG and, hence, NAND operations.
How to determine and optimize anAIG (e.g. minimizing its num-
ber of nodes/gates) has intensely been considered in the literature
(see e.g. [14]) and, hence, is not covered further in the following.
Instead, we focus on the second step, i.e. how to map the resulting
NAND netlist to an adiabatic circuit, i.e. a network of transmission
gates and the corresponding power clocks. To this end, we trans-
late the AIG into an OR-Inverter graph (OIG) so that a NOR gate
netlist results. An OIG can easily be derived from an AIG by sim-
ply applying De Morgan’s laws, i.e. by relabeling the inner nodes
from AND to OR and inverting the polarity of the edges to the
terminals and the edges to the root nodes (cf. 3b).
Now, the nodes of an OIG can directly be mapped to the adi-
abatic OR gates introduced before in Fig 2a. However, it remains
open and non-trivial how to connect these gates to the power clocks
and how to generate a corresponding waveform of these clocks
(again, following the switching rules and optimization objectives
reviewed in Section 2). To this end, two (complementary) design
styles are considered: retractile circuits (cf. [8]) as well as fully-pipelined
4Note that the design methods proposed in this work can correspondingly be adjusted
to any other synthesis solution and, hence, logic gate library as well.
x2
x1
x0
д2
д1 д3
д4
д5
д6
д7
y1
y0
s0 s1 s2 s3
(a) Circuit
t
ϕ0
ϕ1
ϕ2
ϕ3
0 1 2 3 4 5 6 7 8 9
(b) Power clocks
Figure 4: Synthesized retractile circuit
circuits (cf. [2, 6, 17]). Note that for both design styles the condi-
tional reversibility is inherently satisfied by preserving the inputs
of the signals throughout the whole computation and by assuming
that all additional (intermediate) signals are initially set to 0. In
the following sections, we discuss advantages and disadvantages
of both design styles and present according (automatic) mapping
schemes. More precisely, for each design style we first describe a
straightforward mapping scheme (conveying the main idea of the
design style) followed by an advanced mapping scheme (which re-
sults in a significantly smaller number of gates as well as, in case
of retractile circuits, to a smaller number of power clocks). ese
considerations eventually motivate the implementation of differ-
ent methods for design automation of adiabatic circuits whose per-
formance is eventually discussed in Section 6.
4 RETRACTILE CIRCUITS
4.1 Straightforward Solution
e straightforward mapping for retractile circuits is similar to
conventional circuitry, where an AIG or OIG is directly mapped
to the target technology. In fact, we can realize each node of the
OIG with an OR gate and negations with inverters. Moreover, in
case of adiabatic circuits, the inverters come “for free” since we
are operating on dual-rail signals and, hence, an inverted input
can easily be realized with no further hardware by swapping the
rails of the signal.
Example 5. Consider again the OIG depicted in Fig. 3b. Mapping
the OIG to conventional gates results in the circuit shown in Fig. 4a.
Doing this mapping for adiabatic circuits following the retractile de-
sign style, each OR-gate is realized with two transmission gates as
discussed in Section 2.
To operate the circuit in an adiabatic fashion, all intermediate
signals are first initialized with 0. Furthermore, each stage si (0 ≤
i < N ) of the circuit with depth N has an associated dual-rail
encoded clock ϕi—allowing to compute the individual stages se-
quentially. en, the computations are started by transitioning
the 0th clock from 0 to 1—triggering the desired operations of the
first stage. Once stable, the operations of the next stages are se-
quentially triggered. To allow for decomputing the intermediate
results, the clocks transition back to 0 in reverse order, i.e. first the
N − 1th clock is set back to 0, then the other ones. is way, all in-
termediate results are decomputed and restored back to 0. Overall,
this requires 2N + 1 time steps for a single computation (assum-
ing one additional time step is required to process the outputs of
the circuit). During these time steps, the inputs have to remain
constant—yielding a rather low throughput.
Example 5 (continued). Since the resulting circuit has four stages
(the OIG has a depth of 4), we need four different clocks (eight if we
take the dual-rail encoding into account). e waveforms of these
clocks are shown in Fig. 4b. Overall, this causes that a single compu-
tation of this circuit requires 9 timesteps.
fanout = 1
fanout = 1
Rule 1
fanout , 1
fanout = 1
Rule 2
Figure 5: Rules for optimization
x2
x1
x0
д2
д1
д3
д4
д5
д6
y0
y1
t
ϕ0
ϕ1
0 1 2 3 4 5
Figure 6: Optimized retractile circuit
4.2 Advanced Solution
e straightforward mapping described above can significantly be
optimized to reduce the number of required transmission gates and
power clocks. e optimized mapping scheme is motivated by an
analysis of the realization of an OR gate, which is composed of
two parallel buffers (i.e. a transmission gate), whose outputs are
connected (cf. Fig. 2a). Consequently, an OR gate with multiple
inputs can be generated by adding further buffers in parallel. is
way, each OIG node, whose children both have a fanout of 1 (and,
thus, represents a 4-inputs OR gate) can be realized in a single stage
of the circuit composed of two 2-input OR gates whose outputs are
connected. A similar optimization can be performed for OIG nodes,
where only one of the children has a fanout of 1. Here, one buffer
is required for the child which has a fanout larger than one (in
order to avoid sneak-paths). Additionally, the gate representing
the child with fanout 1 has to be lied to the next stage of the
circuit since both, the buffer as well as the gate, have to be operated
by the same power clock to allow for an adiabatic computation.
e optimization rules are shown in Fig. 5 and denoted Rule 1 and
Rule 2 in the following.
Note that one has to be careful when applying the rules if the
corresponding input of the gate is inverted. In this case, the in-
version has to be pushed towards the inputs. is is possible by
applying De Morgan’s law (a + b = a · b). Consequently, we have
to invert the inputs on this level and exchange the OR gate with
an AND gate.5
Example 6. Consider again the circuit shown in Fig. 4. e chil-
dren of gate д7 (i.e. д4 and д5) both have a fanout of 1. Consequently,
we can apply Rule 1 to remove д7. Furthermore, one child of gate
д3 has a fanout of 1 (i.e. the input x0). Consequently, we can apply
Rule 2 for gate д3. e resulting (optimized) circuit is shown in Fig. 6.
Since both inputs of д7 and one input of д3 are inverted, we have to
apply De Morgan’s law. Consequently, the gates д1, д4, and д5 are
transformed into an AND-gate. e resulting circuit only requires
11 transmission gates and has only two stages (and, thus, suddenly
requires only two different dual-rail encoded clocks).
5 FULLY-PIPELINED CIRCUITS
emain disadvantages of the retractile circuits considered in Sec-
tion 4 are that many different power clocks are required (one for
each stage) and that a computation can be conducted only every
2N + 1 time steps—resulting in a rather low throughput. ese
issues can be avoided by using fully-pipelined circuits. In conven-
tional design, this would require a register aer each stage of the
circuit. For the adiabatic circuits considered here, however, this is
5Note that this is also possible if there are several subsequent nodes for which the
rules can be applied.
ϕ0
xt−1
ϕ1
xt
(a) Transmission gates
t
0 1 2 3 4
ϕ1
ϕ0
(b) Clocks
Figure 7: Buffer element for fully-pipelined circuits
not necessary, because the gates inherently allow for latching their
output (cf. Section 2). In fact, we only have to compute the outputs
of a stage si while decomputing the signals of stage si−1 (i.e. reset-
ting them back to 0). is way, only two different power clocks
(four if we take the dual-rail encoding into account) are required
(independent from the circuit depth) and computations can be con-
ducted in a pipelined fashion (leading to a much higher through-
put).
To realize this, however, the functions computed in the individ-
ual stages have to be (conditionally) reversible. is can easily be
achieved by forwarding all the input signals of stage si−1 to the
stage si by using buffers. e following example illustrates the
idea of such buffers.
Example 7. Fig. 7a shows the structure of a buffer that setsxt = xt−1
while decomputing xt−1 (i.e. while reseing xt−1 back to 0). Initially,
both clocks ϕ0 and ϕ1 as well as xt are set to 0. If xt−1 = 1, the
transmission gate on the right connects ϕ1 with xt . In the first time
step, ϕ0 transitions to 1 (c.f. Fig. 7b). Aerwards, ϕ1 transitions to 1,
seing xt = xt−1. If xt = xt−1 = 1, the transmission gate on the
le hand side in Fig. 7a connects ϕ0 with xt−1 . is does not violate
the switching rules discussed in Section 2 since ϕ0 is also 1. In the
next time step, ϕ0 transitions back to 0—decomputing xt−1 and, thus,
disconnecting ϕi and xt . Consequently, the output xt remains at its
voltage level when eventually transitioning ϕ1 back to 0—the output
is latched.
To allow for inverted inputs of gates, a quad-rail encoding is
required for the signals to properly decompute the inputs [2]. Here,
each signal X is represented by two dual-rail signals (one forX = 1
and one forX = 0). Initially, both dual-rail signals are set to 0. is
again allows to realize inverters without any transmission gates—
just swapping the two dual-rail signals X = 1 and X = 0. In the
followingwe again abstract this fact when illustrating the required
transmission gates.
5.1 Straightforward Solution
As for retractile circuits, we again map the OIG nodes to an adia-
batic realizations of an OR gate. As mentioned above, this requires
to realize each OR gate as shown in Fig. 8a.6 is way, the sig-
nals from stage st−1 (e.g. At−1 and Bt−1) serve as input to compute
(A+B)t = At−1+Bt−1. Since (A+B)t is driven by clockϕ1, its value
is inherently latched. In fact, the input signals At−1 and Bt−1 are
reset to 0 by the according buffers (disconnecting ϕ1 and (A+B)t ),
before the clock ϕ1 is transitioned back to 0.
Now, in contrast to retractile circuits, new hardware is required
to decompute the result (aer e.g. copying it elsewhere) since the
stages of the pipeline already contain the values of the next com-
putation. e (conditionally) reversible function calculated by the
pipeline is F = fN−1 ◦ fN−2 ◦ · · · ◦ f0, where fi is the conditionally
reversible function computed by stage si .
7 Since the function fi
computed by each stage is conditionally reversible, the inverse of
6Signals with fanout do not have to be buffered multiple times.
7Note that ◦ denotes functional composition, i.e. д(x ) ◦ f (x ) = д(f (x )).
Bt−1
ϕ0
Bt
ϕ1
(A + B)t
ϕ1
At−1
ϕ0
At
ϕ1
(a) Computing OR
Bt
ϕ1
Bt−1
ϕ0
(A + B)t−1
ϕ0
At
ϕ1
At−1
ϕ0
(b) Decomputing OR
Figure 8: OR gate for fully-pipelined circuits
F (i.e. F−1) exists and is determined by F−1 = f −10 ◦ f
−1
1 ◦· · ·◦ f
−1
N−1.
e inverse f −1i of the function fi computed by stage si can be
easily realized by duplicating the hardware for stage si and con-
necting the power clocks ϕ0 and ϕ1 in opposite fashion (as shown
for an OR gate in Fig. 8b). Consequently, decomputing the results
requires to double the depth of the pipeline and, thus, doubles the
number of required transmission gates.
Example 8. Consider again the circuit shown in Fig. 4. e first
stage contains a single OR gate. Additionally, three buffers are re-
quired to forward the inputs x2, x1, and x0 to stage s1 (while decom-
puting them in stage s0). Consequently, (1 + 3) · 2 = 8 transmission
gates are required. e second stage has four input signals and re-
quires two OR gates. erefore, (4 + 2) · 2 = 12 transmission gates
are required to realize stage s1. e third stage has then 6 inputs and
requires 16 transmission gates. Finally, the last stage has 8 inputs
and requires 20 transmission gates. Overall, this sums up to 56 trans-
mission gates. e reverse cascade of the stages again requires 56
transmission gates. Consequently, a total of 112 transmission gates
are required (448 if we take the quad-rail encoding into account) to
realize the function in a fully-pipelined fashion—a huge overhead
compared to the retractile design methodology. However, the circuit
has a higher throughput and only requires two different clocks to be
operated (four if we take into account that their complement is also
needed due to a dual-rail encoding).
5.2 Advanced Solution
e mapping scheme discussed above yields circuits with a huge
overhead sincemany signals are pushed through thewhole pipeline—
even though they are not required as outputs, nor to obtain re-
versibility of a stage. Hence, we propose to decompute such unnec-
essary signals as soon as possible. As shown in Fig. 8b, the inputs
of a gate have to be present until its output is decomputed. is
means, the signals resulting from the gates in the next-to-last stage
can be decomputedwhile computing the outputs of the function to
be realized. Aerwards, the signals generated in the stage before
can be decomputed—eventually resulting in the mapping scheme
discussed in the previous subsection—hence, no signal can be de-
computed before the final outputs of the function to be realized are
determined.
However, we can easily circumvent this problem by choosing
some signals that shall not be decomputed.8 To this end, we mark
the corresponding OIG nodes that generate these signals. is al-
lows to decompute several other signals earlier—while continuing
to compute the outputs of the function. Consequently, fewer sig-
nals are pushed through the pipeline—reducing the number of re-
quired transmission gates.
Recall, that each node v of the OIG is translated to an OR gate
on a certain stage of the circuit. To determine when the signal
resulting from v can be decomputed we traverse all parents (de-
noted pj in the following). For each parent node pj we determine
8Note that, in the end, all signals are decomputed since each stage is duplicated as
discussed in Section 5.1.
the stage in which the signal generated byv can be decomputed at
the earliest. en, we take the stage with the largest index, since
the constraints for all parents have to be satisfied. If pj is a node
that is marked, we can immediately decompute the signal gener-
ated byv in the same stage (since the signal computed by pj is not
decomputed aerwards). If pj is not marked, we can decompute
the signal generated by v at the earliest one stage aer the signal
generated by pj can be decomputed (because the signal generated
by v is required to decompute the signal generated by pj ).
Example 9. Consider again the OIG shown in Fig. 3b (as well as
the corresponding circuit shown in Fig. 4). Assume that we marked
the nodes labeled ∨2 and ∨3 (the nodes labeled ∨6 and ∨7 are inher-
ently marked since they are directly connected to an output). Con-
sequently, we want to decompute the signals generated by the OIG
nodes labeled ∨1, ∨4, and ∨5 as soon as possible. In the second stage
(i.e. s1) of the circuit, we compute the result of the nodes labeled ∨2
and ∨3. Since the signal generated by node ∨1 is not required any-
more (its single parent labeled ∨3 is marked), it can be decomputed
in the second stage as well. Consequently, we can save the buffers for
this signal in the third and fourth stage of the circuit. Furthermore,
the signals generated by nodes labeled∨4 and ∨5 can be decomputed
while computing the outputs of the function (in stage s3). Since this is
the last stage of the circuit, no buffers can be saved. However, fewer
output signals result. Considering the fact that each pipeline stage
has to be duplicated, a reduction of four buffers (i.e. 8 transmission
gates) can be obtained.
is leads to the question how to determine a suitable marking
scheme for the nodes, i.e. a marking scheme that results in a circuit
with a smaller number of transmission gates. A very simple but
also effective marking scheme is to mark all nodes of the OIG with
a depth that is a multiple of a constant k ∈ N. For k = 2, this
means tomark all nodes with an even depth (as done in Example 9).
e experimental evaluations summarized in Section 6 show that
significant improvements can be obtained by using this marking
scheme.
6 EVALUATION
In this section, we summarize and discuss the results obtained by
our evaluations of the proposed design methods for adiabatic cir-
cuits. To this end, we implemented the approaches discussed in
Section 4 and Section 5 in C++ and used the tool ABC [3] to gen-
erate the initially required AIGs/OIGs (to reduce the number of
AIG nodes, we used the synthesis command dc2). Aerwards, we
evaluated the resulting methods using benchmarks taken from the
ISCAS [4] and the IWLS benchmark suite [12].
Table 1 summarizes the obtained results. e first columns show
the name of the benchmark as well as the number of primary in-
puts PI and primary outputs PO. en, we list the results obtained
for retractile and fully-pipelined adiabatic circuits. For each de-
sign style, we list the number of required transmission gates (de-
noted |tд |) and the number of required power clocks (denoted |ϕ |)
of the straightforward solution as well as the advanced solution
(columns denoted Straight-forward andAdvanced, respectively). Hav-
ing a dual-rail (for retractile circuits) or quad-rail encoding (for
fully-pipelined circuits) is taken into account in the numbers listed
for the required transmission gates, as well as the fact that each
power clock has to be supplied in two polarities (i.e. a power clock
is dual-rail encoded for both types of circuits). For sake of com-
pleteness, we also list the parameter k used in the solution dis-
cussed in Section 5.2. e runtime is not listed in Table 1 since all
methods are capable to produce these results in negligible runtime
(i.e. a fraction of a second).
Table 1: Evaluation
Retractile (Section 4) Fully-pipelined (Section 5)
Straight-forward Advanced Straight-forward Advanced
(Section 4.1) (Section 4.2) (Section 5.1) (Section 5.2)
Name P I PO |ϕ | |tд | |ϕ | |tд | |ϕ | |tд | k |ϕ | |tд |
apex5 117 88 26 1 826 24 1 181 4 150 544 2 4 112 208
ex4p 128 28 28 2 020 20 1 260 4 205 968 3 4 136 080
o64 130 1 16 258 2 130 4 30 928 2 4 23 936
i3 132 6 12 252 2 132 4 23 024 2 4 18 528
i5 133 66 36 264 12 160 4 60 096 4 4 50 384
i8 133 81 30 1 804 24 1 079 4 147 104 3 4 117 504
apex6 135 99 26 1 210 20 793 4 123 104 2 4 87 216
x3 135 99 26 1 230 18 773 4 131 120 4 4 88 112
rot 135 107 50 1 110 34 773 4 221 936 2 4 161 504
i6 138 67 12 766 8 461 4 36 320 4 4 29 984
frg2 143 139 24 1 508 20 958 4 116 496 3 4 92 288
pair 173 137 44 2 608 40 1 693 4 373 552 3 4 245 584
c5315 178 123 56 2 754 38 1 722 4 559 472 3 4 293 152
i4 192 6 28 372 2 192 4 75 200 3 4 59 776
i7 199 67 12 1 012 8 586 4 51 392 4 4 42 384
i2 201 1 26 416 6 210 4 79 056 3 4 58 864
c7552 207 108 76 2 940 62 2 085 4 868 272 4 4 474 912
c2670 233 140 44 1 076 20 663 4 232 496 3 4 152 032
des 256 245 36 6 784 26 4 495 4 752 016 5 4 499 184
i10 257 224 66 3 440 48 2 292 4 776 400 3 4 503 184
|ϕ |: #required clocks |tд |: #transmission gates k : parameter discussed in Section 5.2
First, the results nicely show the impact of the respectively cho-
sen design style. Retractile circuits are clearly the beer choice
when it comes to reducing the number of gates, while pipelined
circuits are efficient with respect to the number of power clocks
and, following that, also the throughput. At a first glance, it might
look that the costs of having fewer power clocks in pipelined cir-
cuits is not acceptable (in fact, magnitudesmore gates are required).
However, if area is not an issue, this might still acceptable since, as
discussed in Section 2, gates in adiabatic circuits do not affect the
energy consumption as much as they do in conventional circuits.
Hence, each design style has its own advantages and disadvantages
and, eventually, the user is presented with complementary solu-
tions out of which the best suitable can be chosen.
Besides that, the results clearly show the improvement of the
advanced schemes. On average an improvement of approx. 42%
in the number of required power clocks, as well as an average im-
provement of approx. 37% with respect to the number of required
transmission gates is obtained for retractile circuits. For the fully-
pipelined circuits, we observe a reduction in the number of trans-
mission gates of approx. 30% on average. Overall, these results
clearly confirm the benefit and applicability of the proposed de-
sign automation techniques for this kind of circuits. While pre-
viously considered circuits were either handcraed (following ap-
proaches e.g. proposed in [2, 17]) or relied on fully reversible re-
alizations which led to an unnecessarily large overhead (as con-
ducted in [15, 16] and discussed in [6]), the proposed design flow
allows for generating the desired adiabatic circuits in an automatic
fashion while, at the same time, satisfying the switching rules by
conditional reversibility only. e improvements obtained by the
advanced schemes additionally show the further potential that can
be exploited following this direction.
7 CONCLUSIONS
In this work, we proposed an automatic and dedicated design flow
for adiabatic circuits which explicitly takes recent findings in this
domain (namely that conditional reversibility is sufficient for adi-
abatic circuits) into account. e proposed flow first realizes the
desired functionality in terms of an AIG/OIG and, aerwards, ded-
icatedly maps the resulting structure to an adiabatic description.
For the laer step, two complementary schemes (namely retractile
or fully-pipelined) are considered which allow the designer to ei-
ther focus on reducing the number of gates or keeping the number
of power clocks small. Furthermore, optimizations are proposed
which allow for a reduction in the number of gates by approx. 37%
and 30%, respectively, for both design styles on average. By this,
expertise from both, adiabatic circuits and design automation, is
combined yielding an automatic and dedicated design scheme for
this promising technology. is eventually provides the basis for
further studies including, besides others, more sophisticated opti-
mizations, the design and use of larger building blocks, as well as
the application of the proposed design flow in the physical imple-
mentation of adiabatic circuits.
ACKNOWLEDGMENTS
is work has partially been supported by the European Union
through the COST Action IC1405. M. Frank was supported by the
Laboratory Directed Research and Development program at San-
dia National Laboratories and by the Advanced Simulation and
Computing program under the U.S. Department of Energyfis Na-
tional Nuclear Security Administration (NNSA). Sandia National
Laboratories is a multimission laboratory managed and operated
byNational Technology and Engineering Solutions of Sandia, LLC.,
a wholly owned subsidiary of Honeywell International, Inc., for
NNSA under contract DE-NA0003525. Approved for public release,
SAND2018-9936 O.
REFERENCES
[1] Semiconductor Industry Association 2.0. 2015. In-
ternational Technology Road-map for Semiconductors.
hps://www.semiconductors.org/main/2015 international technology roadmap for semiconductors itrs.
Accessed: 2017-09-15.
[2] Venkiteswaran Anantharam, Maojiao He, Krishna Natarajan, Huikai Xie, and
Michael P Frank. 2004. Driving Fully-Adiabatic Logic Circuits Using Custom
High-Q MEMS Resonators.. In ESA/VLSI. 5–11.
[3] Robert K. Brayton and Alan Mishchenko. 2010. ABC: An Academic
Industrial-Strength Verification Tool. In Computer Aided Verification. 24–40.
hps://doi.org/10.1007/978-3-642-14295-6 5
[4] F. Brglez, D. Bryan, and K. Kozminski. 1989. Combinational Profiles of Sequen-
tial Benchmark Circuits. In Int’l Symp. Circ. and Systems. 1929–1934.
[5] Michael P. Frank. 2003. Common Mistakes in Adiabatic Logic Design and How
to Avoid em. In Int’l Conf. on Embedded Systems and Applications. 216–222.
[6] Michael P. Frank. 2017. Foundations of Generalized Re-
versible Computing. In Int’l Conf. on Reversible Computation.
hps://doi.org/10.1007/978-3-319-59936-6 2
[7] Michael P. Frank. 2017. rowing computing into reverse. IEEE Spectrum Sep-
tember 2017 (2017).
[8] J Storrs Hall. 1992. An electroid switching model for reversible computer archi-
tectures. In Proceedings of Physics of Computation Workshop, Dallas Texas.
[9] Jeffrey G Koller and William C Athas. 1992. Adiabatic switching, low energy
computing, and the physics of storing and erasing information. In Physics of
Computation Workshop. 267–270.
[10] A. Kuehlmann, V. Paruthi, F. Krohm, and M.K. Ganai. 2002. Robust Boolean
Reasoning for Equivalence Checking and Functional Property Verification. IEEE
Trans. on CAD of Integrated Circuits and Systems 21, 12 (2002), 1377–1394.
[11] R. Landauer. 1961. IrreversibilityandHeat Generation in the Computing Process.
IBM Journal of Research and Development 5, 3 (July 1961), 183–191.
[12] K. McElvain. 1993. IWLS’93 Benchmark Set: Version 4.0. In Int’l Workshop on
Logic Synth.
[13] Ralph C Merkle. 1992. Towards practical reversible logic. In Physics and Com-
putation. 227–228.
[14] Alan Mishchenko, Satrajit Chaerjee, and Robert K. Brayton. 2006. DAG-aware
AIG rewriting a fresh look at combinational logic synthesis. In Design Automa-
tion Conf. hps://doi.org/10.1145/1146909.1147048
[15] Mahew Morrison and Nagarajan Ranganathan. 2014. Synthesis of
Dual-Rail Adiabatic Logic for Low Power Security Applications. IEEE
Trans. on CAD of Integrated Circuits and Systems 33, 7 (2014), 975–988.
hps://doi.org/10.1109/TCAD.2014.2313454
[16] Andreas Rauchenecker, Timm Ostermann, and Robert Wille. 2017. Exploiting
reversible logic design for implementing adiabatic circuits. In Mixed Design of
Integrated Circuits and Systems. 264–270.
[17] Saed G Younis andomas F Knight Jr. 1993. Practical implementation of charge
recovering asymptotically zero power CMOS. In Proceedings of the 1993 sympo-
sium on Research on integrated systems. MIT Press, 234–250.
[18] Alwin Zulehner and RobertWille. 2018. Exploiting Coding Techniques for Logic
Synthesis of Reversible Circuits. In Asia and South Pacific Design Automation
Conf. 670–675. hps://doi.org/10.1109/ASPDAC.2018.8297399
[19] Alwin Zulehner and RobertWille. 2018. One-pass Design of Reversible Circuits:
Combining Embedding and Synthesis for Reversible Logic. IEEE Trans. on CAD
of Integrated Circuits and Systems 37, 5 (2018), 996–1008.
