Reversible Computing with Fast, Fully Static, Fully Adiabatic CMOS by Frank, Michael P. et al.
PREPRINT MANUSCRIPT SUBMITTED TO ICRC 2020 SEP. 2, 2020 
Version 0.6 1 arXiv:2009.00448 [cs.AR] 
Reversible Computing with Fast, Fully Static, Fully 
Adiabatic CMOS 
Michael P. Frank  
Cognitive & Emerging 
Computing Department 
Sandia National Laboratories 
Albuquerque, NM, USA 
mpfrank@sandia.gov  
Robert W. Brocato 
RF Microsystems Department 
Sandia National Laboratories 
Albuquerque, NM, USA 
orcid:0000-0001-9751-1234 
Brian D. Tierney 
Radiation Hard CMOS 
Technology Dept. 
Sandia National Laboratories 
Albuquerque, NM, USA 
bdtiern@sandia.gov 
Nancy A. Missert 
Nanoscale Sciences Department 
Sandia National Laboratories 
Albuquerque, NM, USA 
orcid:0000-0003-2082-2282 
Alexander H. Hsia 
Sandia National Laboratories 
Albuquerque, NM, USA 
Requiescat in pace 
Abstract— To advance the energy efficiency of general digital 
computing far beyond the thermodynamic limits that apply to con-
ventional digital circuits will require utilizing the principles of re-
versible computing.  It has been known since the early 1990s that 
reversible computing based on adiabatic switching is possible in 
CMOS, although almost all of the “adiabatic” CMOS logic fami-
lies in the literature are actually not fully adiabatic, which limits 
their achievable energy savings.  The first CMOS logic style that 
achieved truly, fully adiabatic operation if leakage was negligible 
(CRL) is not fully static, which leads to a number of practical en-
gineering difficulties in the presence of certain nonidealities.  La-
ter, “static” adiabatic logic families were described, but they were 
not actually fully adiabatic, or fully static, and were much slower. 
In this paper, we describe a new logic family, Static 2-Level Adi-
abatic Logic (S2LAL), which is, to our knowledge, the first CMOS 
logic family that is both fully static, and truly, fully adiabatic (mo-
dulo leakage).  In addition, S2LAL is, we think, the fastest possible 
such family (among fully pipelined sequential circuits), having a 
latency per logic stage of one “tick” (transition time), and a mini-
mum clock period (initiation interval) of 8 ticks.  S2LAL requires 
8 phases of a trapezoidal power-clock waveform (plus constant po-
wer and ground references) to be supplied.  We argue that, if im-
plemented in a suitable fabrication process designed to aggressive-
ly minimize leakage, S2LAL should be capable of demonstrating a 
greater level of energy efficiency than any other semiconductor-
based digital logic family known today. 
Keywords—Reversible computing, adiabatic circuits, adiabatic 
CMOS, adiabatic logic, adiabatic switching, energy-efficient compu-
ting, unconventional computation, DPA resistance. 
I. INTRODUCTION 
Continuing to improve the computational energy efficiency 
of conventional CMOS technology is becoming increasingly 
difficult, due to numerous physical effects that emerge as device 
dimensions shrink and per-area densities increase [1].  This is, 
however, unsurprising, since the laws of physics guarantee that 
the energy efficiency of the conventional non-reversible comp-
uting paradigm is fundamentally limited [2][3].  But, an alternate 
paradigm for general digital computing called reversible comp-
uting [4][5] offers an alternative path for improving computer 
energy efficiency which bypasses the many barriers to further 
improvement that the conventional approach faces.   
Although the reversible approach imposes a number of over-
heads, we can foresee that, in the long run, as the underlying 
technology improves, ultimately it must win out, as the benefits 
will eventually come to outweigh the overheads [6][7].  Maxim-
izing the benefits will likely require new classes of devices that 
exploit quantum phenomena to minimize dissipation as a func-
tion of delay [8], but we can begin to gain experience today with 
the issues and principles of reversible design by prototyping re-
versible circuits in CMOS.  As manufacturing processes move 
into the 3rd dimension and cost per device continues to decrease, 
even implementations of reversible computing that are based on 
CMOS could come to outperform conventional approaches, in 
terms of aggregate parallel performance within such system con-
straints as size, weight, cost, and power. 
The fact that reversible computing can be implemented using 
a fully adiabatic style of sequential CMOS was first shown by 
Younis and Knight in 1993, in their Charge Recovery Logic 
(CRL) family [9].  However, CRL is a dynamic logic family, i.e., 
some circuit nodes are left floating in some circumstances.  This 
leads to various design difficulties which we will discuss.  To 
avoid those problems, there is a need for a fully static family of 
adiabatic CMOS.  Younis described a static version of his SCRL 
(split-level CRL) family in 1994 [10], but it was relatively slow, 
requiring 24 adiabatic transition times (“ticks”) per clock cycle.  
Also, the original version of SCRL was not quite fully adiabatic 
either [7], or even fully static—it had some floating nodes. 
We would thus like to have an adiabatic CMOS logic family 
that is both fully static and fully adiabatic, and further that opti-
mizes the following key performance characteristics: 
1) Minimal latency: The number of ticks of latency per lay-
er of logic should be as small as possible (1 tick). 
This work is supported by the Advanced Simulation and Computing 
(ASC) program at the U.S. Department of Energy’s National Nuclear Security 
Administration (NNSA).  Sandia National Laboratories is a multi-mission lab-
oratory managed and operated by National Technology and Engineering Solu-
tions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, 
Inc., for NNSA under contract DE-NA0003525.  This document describes ob-
jective technical results and analysis.  Any subjective views or opinions that 
might be expressed in this document do not necessarily represent the views of 
the U.S. Department of Energy or the United States Government.  Approved 
for public release, SAND2020-9140 O. 
 Version 0.6 2 arXiv:2009.00448 [cs.AR] 
2) Maximum throughput: The number of ticks per clock pe-
riod (initiation interval) should also be minimal. 
In this paper, we describe a candidate for this, a fully static 
and fully adiabatic CMOS logic family that we call S2LAL, for 
“static, 2-level adiabatic logic,” that requires only 8 ticks per 
clock period (i.e., 3× faster than static SCRL), which we think is 
minimal for a fully static, fully adiabatic CMOS logic. 
The remainder of this paper is organized as follows:  In Sec. 
II, we describe in more detail our motivation for developing a 
fully static adiabatic CMOS logic family, and briefly describe 
the limitations of the existing approaches, and the requirements 
that a correct solution needs to meet.  In Sec. III, we describe 
S2LAL, and explain its operation in some detail.  In Sec. IV, we 
summarize ideas for future work and conclude. 
II. REQUIREMENTS FOR FULLY STATIC ADIABATIC CMOS 
In this section, we review our motivation for designing an 
adiabatic CMOS logic family that is also fully static, explain the 
reasons why existing adiabatic CMOS logic families are not yet 
fully satisfactory, and we set forth formal technical requirements 
that any fully satisfactory solution will need to meet. 
A. What is fully static operation, and why is it needed? 
Briefly, when we call a particular design discipline fully sta-
tic, this means that every node in the circuit is at all times con-
nected to some external voltage reference, either directly via a 
low-impedance path, or at least indirectly, through turned-on 
switches of moderate impedance (e.g., on the order of 1–10 kΩ 
is typical for a turned-on logic FET).  Implicitly, there should 
also not exist competing paths of comparable impedance to sou-
rces at different reference levels, since this would imply a sub-
stantial “short circuit” current with associated power dissipation. 
This concept of “static” is the gist of what is meant when we 
talk about “static CMOS” or “static RAM” (SRAM), in contrast 
to dynamic design families for logic and memory, where circuit 
nodes are allowed to float at a dynamic level (i.e., without low-
impedance paths to any sources).  The voltage on a floating node 
can spontaneously vary over time, in various ways, including: 
1) Voltage drift, due to leakage (e.g., subthreshold leakage 
or gate leakage) to/from adjacent nodes in the circuit; 
2) Voltage “sag,” due to capacitive voltage-divider effects 
from parasitic capacitances with nearby nodes in the circuit; 
3) AC coupling via capacitive paths to nearby nodes, which 
become low-impedance at high frequencies. 
Whatever their cause, deviations in the dynamic voltage le-
vel of a floating node can lead to various difficulties in an adia-
batic circuit, including the following: 
1) Sudden energy dissipation (which we informally term 
“sparking”) when the node is reconnected to a reference source; 
if the node’s capacitance is 𝐶, and its voltage disparity from the 
reference level is 𝛥𝑉, then the energy dissipated is 𝐶(𝛥𝑉)2/2.  
Such sparking events are particular anathema for applications 
needing low and steady instantaneous power for privacy, e.g., 
in the context of an adversary analyzing EM emissions. 
2) Logical errors in the case of circuit nodes whose voltage 
deviation becomes large enough to affect the digital behavior 
of the circuit.  This can easily occur in any scenario where the 
clock frequency is slowed or paused for any reason, or even at 
relatively high frequencies, if a node remains floating for suffi-
ciently many cycles due to low activity in that part of the circuit; 
e.g., a DRAM cell will go invalid if not refreshed. 
In contrast, in a fully static, fully adiabatic CMOS logic, po-
wer dissipation will be low and relatively steady, and logical er-
rors will be essentially impossible, no matter how low the clock 
speed, even if the clocks are paused completely for system-level 
purposes (e.g., flow control, standby mode, or diagnostics). 
Thus, we can see that there are significant advantages to be 
gained, in terms of both efficiency and usability, from the devel-
opment of fully static versions of truly, fully adiabatic logic. 
Due to these many advantages, we will henceforth use the 
phrase “perfectly adiabatic” as a technical term referring to a 
CMOS circuit that is simultaneously fully static, and truly, fully, 
adiabatic.  In such circuits, minimum energy dissipation is only 
limited by leakage, and can be made as low as desired if the un-
derlying technology is refined to reduce leakage sufficiently. 
B. Limitations of some existing adiabatic CMOS families. 
To our knowledge, no previously published description of an 
adiabatic CMOS logic family for general sequential digital com-
puting quite achieves our goal of “perfectly adiabatic” operation.  
Here we briefly review how some of the previously described 
adiabatic or quasi-adiabatic logic styles depart from this ideal.   
Early efforts in the direction of adiabatic CMOS from 1978–
1992 by Fredkin & Toffoli [11], Seitz et al. [12], Koller & Athas 
[13], Hall [14], and Merkle [15] were incomplete, either not qua-
lifying as fully adiabatic, or not illustrating general sequential 
operation.  As mentioned earlier, Younis and Knight’s CRL [9], 
the first fully adiabatic sequential CMOS logic family, was not 
static.  Athas et al. attempted to describe an 8 tick/cycle static 
adiabatic logic family in 1994 [16], but that design was in fact 
neither fully adiabatic nor fully static (having some internal 
nodes that are left floating for part of the cycle, and non-adiaba-
tically overwritten).  Younis [10] described a 24-tick static vari-
ant of SCRL, but it was not quite fully adiabatic, due to some 
transistors that gradually turn themselves off as a threshold level 
is approached.  And, neither was it fully static, since it, too, in-
cludes internal nodes that become isolated in some cases.  How-
ever, both of these problems with static SCRL would be elimin-
ated if a modification described in [7] is applied. 
In any case, the novel perfectly adiabatic CMOS logic family 
described in this document (S2LAL) offers the following advan-
tages versus even a repaired version of static SCRL: 
1) 8× lower latency, 1 tick of latency per layer of logic in 
S2LAL, as opposed to 8 ticks/layer in static (“3 phase”) SCRL; 
2) 3× higher throughput, due to a 3× shorter clock period, 
specifically 8 ticks/cycle in S2LAL, as opposed to 24 ticks/ 
cycle in static SCRL; 
3) 3.75× fewer distinct clock waveforms; that is, only 8 dis-
tinct clocks, as opposed to 30 in (single-rail) static SCRL. 
 Version 0.6 3 arXiv:2009.00448 [cs.AR] 
S2LAL has one disadvantage versus static SCRL: Namely, 
it is a quad-rail logic family, and thus requires more transistors 
in total per logic stage than static SCRL, which can operate as a 
single-rail logic.  But this advantage of SCRL is at least 
somewhat negated by the fact of its longer clock period, which 
inflates both the minimum energy dissipation per device cycle, 
and the hardware cost to attain any required aggregate level of 
throughput.  Finally, the large number of clock rails required by 
static SCRL can be expected to contribute substantially to its 
layout area, and its long clock period makes the design of high-
quality energy-recovering resonant oscillators, vital for achie-
ving good system-level energy efficiency, more difficult.  How-
ever, a much more detailed design study would be needed to di-
rectly compare S2LAL against static SCRL for use in specific 
applications in specific fabrication processes.  For now, we just 
focus on giving a careful description of the new logic style. 
C. Requirements for Fully Static, Fully Adiabatic Operation 
The requirements for truly, fully adiabatic operation (when 
ignoring off-state leakage) have been described elsewhere [16] 
and we summarize them only briefly here: 
1) No diodes. Diodes must be avoided in the main paths for 
logic node charging and discharging, due to their built-in vol-
tage drop. 
2) No “spark” events.  This means, do not turn a switch 
(e.g., a FET) “on” when there is a non-negligible voltage dif-
ference between its source and drain terminals.  More generally, 
do not rapidly produce a large voltage change by any means. 
3) No “squelch” events.  This means, do not turn a switch 
“off” when there is a non-negligible current between its source 
and drain terminals (unless there is an alternative, comparably 
low-impedance path for that current). 
4) Quasi-trapezoidal power-clocks.  Supply voltage wave-
forms should rise and fall gradually, and should have flat tops 
and bottoms.  This rule is implied by the previous two. 
 
To guarantee fully static operation in the presence of these 
constraints, we must ensure that every internal circuit node (i.e., 
every circuit node that is not directly connected by a low-impe-
dance path to an externally supplied voltage reference) must at 
all times be, at least, less-directly connected to a voltage refer-
ence (that is, via a medium-impedance path through turned-on 
devices) in one of the following manners: 
1) Connected via a medium-impedance path through 
turned-on transistor(s) to a single constant-voltage reference; 
2) Connected via a medium-impedance path through 
turned-on transistor(s) to a single variable-voltage reference; 
3) Connected in a way that is actively transitioning (in ei-
ther direction) between conditions 1 & 2 above, with one path 
in the process of being connected while the other is in the pro-
cess of being disconnected, and with at least one path having 
no more than medium impedance at all times throughout the 
transition, and where, throughout the transition period, the le-
vel of the variable-voltage reference in question is being held 
constant at the same level as the constant-voltage rail; 
4) Connected in a way that is similarly actively transition-
ing between two different paths to a single supply reference 
(whether it is constant-voltage or variable-voltage). 
Obeying the above rule ensures that no appreciable voltage 
sag or drift can ever occur on any node in the circuit, since, at all 
times, every node enjoys at least one path of no more than med-
ium impedance to a reference node at its nominal voltage level.   
To be slightly more specific, although the following values 
are somewhat arbitrary, for purposes of this document we will 
consider “medium impedance” to refer to charging paths having 
~100–100,000 Ω impedance in the DC limit.  We will assume 
that, in any “ordinary” CMOS process, a minimum-width MOS-
FET will typically have an impedance value within this range 
whenever it is considered to be in the “on” state (e.g., when VGS 
> Vth in the case of an nFET), and will ideally have some much 
larger impedance value (but at minimum, >1 MΩ) whenever it 
is in the “off” state (e.g., VGS = 0). 
This concludes our discussion of the requirements that we 
consider necessary for achieving what we call “perfectly adia-
batic” operation.  With these requirements in mind, careful in-
spection of a given logic family can allow the reader to verify 
whether it qualifies.  We now proceed to describe S2LAL, which 
we conjecture is in fact the fastest possible qualifying family. 
III. DESCRIPTION OF THE S2LAL LOGIC FAMILY 
In this section, we give a detailed technical description of the 
new S2LAL logic family.  We start by presenting a concise for-
mal notation for clock and data signals, then describe the basic 
circuit elements of S2LAL in a hierarchical fashion. 
A. Important Notations 
A semi-formal mathematical notation which will be helpful 
for describing signals (both clock and data signals) will now be 
introduced.  As the name of S2LAL suggests, we will use just 
two nominal voltage levels, 0 (low) and 𝑉dd ≳ 2|𝑉t|  (high) 
(where |𝑉t| is the absolute FET threshold), in contrast to SCRL, 
which also uses a third level 𝑉dd/2.  As previously suggested, 
we divide time into discrete intervals we call ticks, each of the 
same duration 𝜏tr, which is also the maximum time allowed for 
any adiabatic transition between nominal levels, driven by the 
rise (or fall) of an externally supplied voltage reference between 
the levels over some actual period 𝜏tr ≤ 𝜏tr.  These transitions 
ideally have a linear ramp shape; if not exactly linear, dissipation 
per transition is multiplied by a shape factor 𝜉tr.  To first order 
(ignoring resistance and capacitance variations, wire inductan-
ces, and exponential tails at the start and end of the ramp), the 
active dissipation from adiabatically driving a load 𝐶L between 
nominal levels (in either direction) is then 
 𝐸a = 𝜉tr𝐶L𝑉dd
2
𝑅𝐶L
𝜏tr
, (1) 
where 𝑅  is the effective DC resistance of the charging path.  
This classic expression (cf. [16]) illustrates that, setting aside 
losses from leakage, the energy dissipation due to active transi-
tions can be made as low as desired, as the ramp time 𝜏tr is made 
longer. 
 Version 0.6 4 arXiv:2009.00448 [cs.AR] 
Within a given operation sequence for the circuit, we iden-
tify consecutive ticks using respective integers 𝑡 = 0, 1, 2, …, for 
example.  For periodic AC supplies, the clock period is taken to 
be an integral number of ticks, 𝜏p = 𝑛𝜏tr.  Thus, the clock fre-
quency is 𝑓 = 1/𝜏p = (𝑛𝜏tr)
−1. 
In S2LAL (as with other quad-rail styles, including CRL and 
2LAL [18]), different possible logic values (symbols) are repre-
sented by independent signal pairs.  Ordinarily, we use binary 
logic with two possible symbols 𝐿 = 0,1.  However, multi-val-
ued logics are easily supported as well (e.g., ternary logic would 
use six signals).  In general, to represent any of 𝑘 distinct logic 
symbols, we can use a bus of 2𝑘 signals. 
For any given logic symbol 𝐿 (e.g., logic 1), during any gi-
ven tick when the two signals corresponding to that symbol are 
being held at nominal levels, those signals can be in either of 
two possible conditions:  Namely, they are either, together, act-
ively representing 𝐿 (“active”), or not actively representing L 
(“inactive”).  During other ticks, the two signals simultaneously 
transition (in either direction) between active and inactive states. 
Further, the two signals making up a given signal pair cor-
responding to a logic symbol 𝐿 must always have complemen-
tary levels:  That is, one is high whenever the other is low, and 
vice-versa.  We say that one of the two signals is active-high 
(i.e., it is high when 𝐿 is being actively represented), and the 
other one is active-low (i.e., low when 𝐿 is being actively repre-
sented).  Normally, at most one of the 𝑛 various alternative sym-
bols that can be represented on a given signal bus may be active 
at any given time. 
We will use the following notation, and simplified variations 
thereof, to denote the signal pair for any given logic symbol 𝐿. 
 𝑆𝑡b,𝑡e
𝐿 = (?̂?𝑡b,𝑡e
𝐿 , ?̌?𝑡b,𝑡e
𝐿 ) () 
Here, 𝑆 can be any letter repre-
senting the name and/or seman-
tic purpose of this overall signal 
bus within the larger architec-
ture.  The “hat” (   ̂) and “cup” ( 
  ̌) accents mark the active-high 
and active-low signals in the 
pair, respectively.  The super-
script 𝐿, if present, indicates the 
logic symbol being represented; 
if absent, it is implicit and/or 
unimportant. The subscripts 𝑡b 
and 𝑡e, if present, are  the “be-
gin” and “end” ticks of the valid 
period for the symbol; that is, 
the signal may transition from 
inactive to active in tick 𝑡b, and 
(if it did) then it will transition 
back to inactive in tick 𝑡e.  Usu-
ally, the active period is given in 
context and so the second subscript 𝑡e may be omitted.   
The presence of the symbol 𝐿 in some period is represented 
by a temporary pulse on its signal pair, in which ?̂?𝐿 pulses high 
and ?̌?𝐿 pulses low.  Fig. 1 illustrates, for the ordinary case with 
𝑘 = 2 logic symbols and a minimum valid period of 1 tick. 
Our notation for periodic clock signals will be the same, but 
with 𝑆 being 𝜙, and stipulating that in the context of an 𝑛-tick 
clock period, the beginning and end tick numbers 𝑡b and 𝑡e shall 
always be considered to be taken modulo 𝑛.  In other words, any 
values of 𝑡b  and 𝑡e  are implicitly mapped down to the range 
0,1, … , 𝑛 − 1, and they denote that the clock signal will trans-
ition to active during all ticks 𝑡 ≡ 𝑡b (mod 𝑛), and to inactive in 
all ticks 𝑡 ≡ 𝑡e (mod 𝑛).   
Note a simple optimization applies given symmetrical duty 
cycles, i.e., if 𝑛  is even and 𝑡e − 𝑡b ≡ 𝑛/2  (mod 𝑛),  by 
leveraging the observation that, in this case, ?̂?𝑖 = ?̌?𝑖+𝑛/2; thus, 
𝑛 distinct clock networks suffice to distribute both active-high 
and active-low clock signals for all 𝑛 phases 0 ≤ 𝑖 < 𝑛.   
In S2LAL, the clock period is 𝑛 = 8 ticks, and 𝑡e − 𝑡b ≡ 4  
(mod 8), so we can use this optimization, and require only 8 
clock distribution networks to provide 8 pairs 𝜙𝑖 = (?̂?𝑖 , ?̌?𝑖) of 
complementary clock signals covering all 8 of the possible ti-
ming phases.  See Fig. 2 above for an illustration. 
We now describe the basic circuit elements of S2LAL. 
B. CMOS Transmission Gates 
S2LAL, like the other 2-level adiabatic logic styles before it 
(such as 2LAL and CRL), makes extensive use of CMOS trans-
mission gates.  Because we use these so widely, we introduce a 
simplified graphical notation for them; see Fig. 3 below.   
It’s important to note that each FET device in the transmis-
sion gate must be a 4-terminal device, having an implicit body 
or back-gate terminal (not shown) which shall not be connected 
to either channel terminal of the FET (what would normally be 
called the “source” and “drain” terminals; but, in bidrectional T-
gates, such labels should be avoided, since the source/drain roles 
Fig. 1. Example of signal waveforms on a quad-rail bus for representing the 
abstract logic signal 𝑆.  In this example, the bus stays valid for only one tick.  
This abstract signal could be denoted 𝑆0,2 since it goes valid on tick #0 and in-
valid on tick #2.  Note the active-high signals ?̂?𝐿 pulse high when active, and 
the active-low signals ?̌? pulse low.  There is a separate signal pair to convey 
each possible logic symbol, and no two symbols are simultaneously active. 
Fig. 2.  The active-high clock sig-
nals ?̂?𝑖 for the 8 phases of S2LAL.  
(The 8 corresponding active-low 
signals are relabelings of these.)  
 Version 0.6 5 arXiv:2009.00448 [cs.AR] 
will be alternating).  Instead, the body (or back-gate) terminal of 
each nFET should be connected to GND (fixed 0 V supply), and 
that of each pFET to a fixed logic high reference, 𝑉dd.   
Alternatively, one may also bias the body terminal levels by 
an additional offset (+𝑉B for nFETs and −𝑉B for pFETs), which 
will effectively increase the absolute device threshold, if desired, 
towards a larger magnitude (where the precise increase depends 
on device geometry); this will reduce off-state subthreshold cur-
rents.  Note, however, that if we do this, in order to retain decent 
on-state conductance of transmission gates over the entire signal 
range [0, 𝑉dd], we must still respect the constraint 𝑉dd ≳ 2|𝑉t|, 
where 𝑉t is now the body-effected threshold.  Also, it’s impor-
tant to be cognizant that reducing subthreshold leakage may not 
reduce total off-state leakage by very much, and thus may not be 
worthwhile, in cases where the total leakage is already domina-
ted by tunneling current through the gate oxide.  Detailed tech-
niques to reduce gate leakage are a topic for a later paper. 
C. Unlatched Static Adiabatic Buffers 
Armed with FETs and transmission gates, as we described 
above, we can now construct the most basic element of S2LAL, 
which we call the (unlatched) static adiabatic buffer (see Fig. 4).  
This circuit was first described by Athas et al. in 1994 [16], al-
though called an adiabatic amplifier in that paper. 
The unlatched static adiabatic buffer takes just 6 transistors, 
and its mechanism of operation is easy to follow.  Its dual-rail 
(complementary) output 𝑄 = (?̂?, ?̌?) is driven by a correspond-
ing pair 𝜙 = (̂?̂?, ?̌?) of clock rails, which hold at active for a pe-
riod 𝑡𝜙act (in our case, 3 ticks).  Its dual-rail input signal 𝐴 =
(?̂?, ?̌?) is constrained by specification to remain stable (whether 
in an active, or inactive state) throughout this driving pulse, in-
cluding its begin and end transitions, thus for at least 5 ticks.  So 
𝐴 can only transition during the other 3 ticks of the cycle, which 
is the period when 𝜙 is inactive.  Note that whenever 𝐴 is in the 
inactive state, 𝑄 is tied by single FETs to the rails holding it in 
the inactive state, whereas if 𝐴 is in the active state, 𝑄 is tied to 
the clock pulse through the transmission gate.  But, when 𝐴 
transitions (goes active or inactive), we have 𝜙 = (0, 𝑉dd) at 
that time, and therefore we’re only reconnecting 𝑄 between two 
references at identical levels, which obeys the Sec. II.C rules. 
We will show how to generalize this basic buffer structure to 
create more general adiabatic logic gates later, in Sec. III.F. 
But first, an obvious question to ask about the buffer struc-
ture is:  Given that the input has to remain valid for two ticks 
longer than the output, how can we chain such gates together 
while still maintaining a constant clock period?  This is solved 
(similarly to static SCRL) using latching logic and careful se-
quencing of handoffs, described in the next two subsections. 
D. Latching Static Adiabatic Buffers 
The first step in fleshing out S2LAL as a full sequential logic 
style is to turn the basic static adiabatic buffer from the previous 
subsection into a latching buffer.  To do this we simply run its 
output through another T-gate, controlled by an out-of-phase 
clock 𝜙𝑗 (Fig. 5).  As long as 𝜙𝑗 is already active when 𝜙𝑖 goes 
active, this will have the effect of latching the output 𝑄, so that 
it will remain at a valid level after 𝜙𝑖 goes inactive.  To prevent 
this from leading to a dynamic (rather than fully static) logic fa-
mily, while also maintaining fully adiabatic operation, requires 
further structure, which we will discuss in the next subsection. 
Fig. 3.  (Top) Simplified icon for CMOS transmission gates (T-gates).  The 
control input is implicitly a pair of complementary active-high and active-low 
signals.  The active-high signal controls the gate of the nFET and the active-
low signal controls the gate of the pFET.  Thus, when the input pair is “active,” 
the T-gate is turned “on” (conducting).  (Bottom) When the T-gate icon is de-
picted with an n-rail parallel bus on its channel terminals, this just implies that 
we have n parallel T-gates with identical controls. 
Fig. 4.  Unlatched static adiabatic buffer.  It takes a (dual-rail) input 𝐴 = (?̂?, ?̌?) 
and is driven by an externally-supplied (dual-rail) clock pair 𝜙 = (?̂?, ?̌?) which 
goes active on ticks #𝑖 (mod 8) and inactive on ticks #𝑖 + 4 (mod 8).  An im-
plicit constraint on 𝐴 is that it must be stable throughout the active pulse on 𝜙, 
which means (given that the full pulse on 𝜙, including transitions, takes 5 ticks) 
that 𝐴 must go valid for at least 5 ticks, so a complete pulse on 𝐴 takes up 7 out 
of the 8 ticks in the cycle (Fig. 7).  In the 8th tick, 𝐴 rests in the inactive state. 
Fig. 5.  Latching static adiabatic buffer, created by inserting a T-gate on each 
output line from the unlatched buffer.  This structure requires two clocks, in 
phases 𝑖 and 𝑗, where 𝑗 is different from both 𝑖 and 𝑖 + 4 (mod 8).  If 𝑗 is any 
of 𝑖 − 1, 𝑖 − 2, 𝑖 − 3, then the T-gate will be active (conducting) while 𝜙𝑖 is go-
ing active, but will be inactive (non-conducting) while 𝜙𝑖  is going inactive, 
thereby leaving 𝑄𝑖 latched in a valid state.  If 𝑗 is any of 𝑖 + 1, 𝑖 + 2, 𝑖 + 3, the 
identical structure can be used to decompute a previously valid output; we will 
see how to use this in Sec. III.E. 
 Version 0.6 6 arXiv:2009.00448 [cs.AR] 
E. Reversible Pipeline Sequencing 
To permit arbitrary sequential logic while retaining fully adi-
abatic (and fully static) operation requires a reversible pipeline.  
An example of this (for a simple shift register) is shown in Fig. 
6, with corresponding signal timing diagrams in Fig. 7.   
As is usual for fully adiabatic circuits, such a pipeline con-
sists of paired forward and reverse stages.  In the general (multi-
signal) case, if a forward stage F𝑖 computes some particular in-
vertible function 𝑆𝑖 = 𝐺𝑖(𝑆𝑖−1) of its input vector 𝑆𝑖−1, then the 
matching reverse stage R𝑖−1 must be designed to reconstruct (a 
copy of) the input of F𝑖 using 𝐺𝑖
−1, the inverse function to 𝐺𝑖. 
I.e., R𝑖−1 reconstructs a copy of 𝑆𝑖−1 = 𝐺𝑖
−1(𝑆𝑖), and then R𝑖−1 
is used to decompute 𝑆𝑖−1 through its copy.     
Note that each signal stays 
valid (in a data-carrying state) 
for 5 ticks out of the 8 and is in-
valid (carrying no data) for 1 
tick.  During the middle tick of 
the valid period, and during the 
invalid tick, control of the signal 
is handed off between the for-
ward and reverse gates that drive 
it. 
The reason why each signal 
𝑆𝑖  must stay valid for ≥ 5 ticks 
(thus why the complete clock 
period must be ≥ 8 ticks) is that 
before the handoff from F𝑖 to R𝑖 
can occur, two transitions must 
occur, in sequence: Namely, the 
next forward stage F𝑖+1  must 
compute its output 𝑆𝑖+1, then R𝑖 
must use that to reconstruct a 
copy of 𝑆𝑖 at its (tristated) output port before its output T-gate is 
enabled.  Then, we need another 2-step sequence after the hand-
off, while decomputing 𝑆𝑖−1 from 𝑆𝑖, before 𝑆𝑖 itself can finally 
be decomputed by R𝑖.  This is why we posit that S2LAL is the 
fastest possible perfectly adiabatic (i.e., fully static and fully 
adiabatic) pipeline technique for reversible CMOS. 
F. Logic Functions in S2LAL 
Implementing general logic in S2LAL is of course more in-
volved, but still doable.  The T-gates in Fig. 4 are replaced with 
more generalized activation networks, and the individual FETs 
in Fig. 4 are replaced with more general hold-inactive networks.  
Both networks must be carefully designed so as to never provide 
a path to both supplies simultaneously, except when the input is 
switching, since then the clock matches the constant supply.  We 
also must take care to ensure that no internal nodes are ever left 
floating; there must always be a path from each internal node to 
at least one supply.  These rules imply the minimal structure of 
each network, given its logical function.  Minimal circuits for 
AND/OR functions are shown in Figs. 8 and 9. 
The S2LAL unlatched AND gate takes 14 transistors.  For a 
latching version, one simply inserts a T-gate in each output line, 
as with the buffer.  Note that the internal nodes can never float, 
because at every moment there is a path from each internal node 
Fig. 7.  The valid periods for 
signals 𝑆𝑖 computed on phases 𝑖 of 
an S2LAL pipeline.  (Dotted lines 
show the level if the signal is not 
activated in the given cycle.)  
Fig. 6.  Basic structure of a reversible pipeline in S2LAL.  This one comprises 
a shift register, but pipelines of arbitrary logical complexity follow the same 
basic timing pattern.  Each forward stage F𝑖 for computing an output signal 𝑆𝑖 
from an input 𝑆𝑖−1 is paired with a corresponding reverse stage R𝑖−1 which de-
computes F𝑖’s input signal 𝑆𝑖−1  from the output 𝑆𝑖 .  Note each F𝑖  is enabled 
(connected to its output) in phase 𝑖 − 1 and activated (computing its output) in 
phase 𝑖.  Meanwhile, each reverse gate R𝑖 is activated (matching its output  𝑆𝑖) 
in phase 𝑖 + 2, and then enabled (connecting it to its output) in 𝑖 + 3, at the 
same time that F𝑖  is being disabled (its output terminal is going high-impe-
dance).  Thus, 𝑆𝑖 is, at every moment, connected to some source.  Later, in 𝑖 +
6, as soon as R𝑖−1 is no longer actively using signal 𝑆𝑖, it gets decomputed by 
R𝑖, and then in 𝑖 + 7, control of 𝑆𝑖 switches back to F𝑖 in preparation to initiate 
another cycle on 𝑖 + 8.  Careful inspection shows that there is no slack in this 
cycle; i.e., it cannot be shortened at all without violating our requirements. 
Fig. 8.  S2LAL static AND gate icon (top) and circuit (bottom).   
 Version 0.6 7 arXiv:2009.00448 [cs.AR] 
to either a constant or a variable supply reference (when input A 
is inactive, the internal node is held inactive as well). 
The unlatched OR gate takes 16 transistors.  Note that two 
more transistors are required because when both inputs are act-
ive, we require a full transmission gate to tie the internal node of 
the hold-inactive network to the clock to ensure it does not float. 
Inverting gates such as NOT, NAND and NOR are straight-
forward using quad-rail signaling, since pulsed complementary 
(active-high and -low) signal pairs for the logical complements 
of any given signal are also available.  That is, if the signal pair 
we’ve been calling “𝐴” is shorthand for 𝐴1 (meaning 𝐴 = 1), 
then there is also a signal pair 𝐴0 (meaning 𝐴 = 0) which pulses 
active on a given cycle if and only if 𝐴1 does not, and similarly 
for 𝐵.  So, we can implement NOT/NAND/NOR gates using the 
buffer and AND/OR gates, thusly (invoking DeMorgan): 
 NOT(𝐴1) = BUFFER(𝐴0) () 
 NAND(𝐴1, B1) = OR(𝐴0, 𝐵0) () 
 
1 See https://fossi-foundation.org/2020/06/30/skywater-pdk.  
 NOR(𝐴1, B1) = AND(𝐴0, 𝐵0) () 
Note if we wish to always maintain the quad-rail invariant that, 
for any signal pair 𝑆𝐿  representing a logic symbol 𝐿 = 0,1, a 
logically complementary signal pair 𝑆1−𝐿  is also available, 
which pulses active on all and only those cycles that 𝑆𝐿 does not 
(as illustrated in Fig. 1), then we have to ensure that our design 
includes additional gates as needed to compute those signals.  
But if subsequent logic does not require the logical complements 
of all signals, this extra complexity may be reduced. 
We should also note that the need to utilize invertible func-
tions at each stage of the reversible pipeline in general adds an 
additional layer of complexity, but this gets into a higher-level 
topic, regarding the design and analysis of reversible algorithms 
(realized here as sequential pipelines).  We consider that topic to 
be beyond the scope of this particular paper. 
This concludes our presentation of S2LAL. 
IV. FUTURE WORK AND CONCLUSION 
In this section, we briefly discuss some ideas for future work 
and conclude the paper. 
1)  Simulation studies.  S2LAL circuits of substantial com-
plexity should be designed and simulated using well-validated 
device models for a range of available processes.  We hypothe-
size that, despite the increased complexity of S2LAL compared 
to other “fully” adiabatic logic styles such as 2LAL, it may ul-
timately prove capable of reaching much lower levels of energy 
dissipation per function in practice, due to the elimination of 
any sparking events that (in dynamic styles) might be caused 
upon reconnection of floating nodes which may have been sub-
jected to, e.g., capacitively induced voltage sag. 
2) Fabrication and power dissipation measurement of test 
chips, to validate simulation results. 
3) Open-source hardware.  Due to the fully-static nature of 
S2LAL, it could be a relatively accessible technology for logic 
designers who are used to working in static CMOS, and thus 
are accustomed to not having to worry about whether floating 
nodes will sag or drift.  So, S2LAL seems an appropriate target 
for producing an open-source library of reference cells and ex-
ample designs for adiabatic CMOS, to encourage broader expe-
rimentation with adiabatic circuit concepts.  To be accessible to 
the broadest community, such a library could ideally be devel-
oped for an open PDK and using open-source EDA tools.  En-
vironments of this nature are presently taking shape; one exam-
ple being the open-source PDK release for SkyWater’s 130 nm 
process; free shuttle runs based on this PDK and various open-
source toolsets are currently being sponsored by Google and 
Efabless with outreach from FOSSI.1 
4) Cryogenic technologies. In recent years, it has become 
apparent that modern CMOS technologies perform quite well 
at deep cryogenic temperatures (e.g., ~4 K, and perhaps lower), 
spurring an increasing amount of research activity in cryogenic 
CMOS.  In particular, at these temperatures, subthreshold slope 
becomes much steeper, and subthreshold leakage currents ex-
ponentially lower.  This suggests the existence of a highly syn-
Fig. 9.  S2LAL static OR gate icon (top) and circuit (bottom). 
 Version 0.6 8 arXiv:2009.00448 [cs.AR] 
ergistic opportunity to explore impactful applications for perf-
ectly adiabatic logic styles such as S2LAL in the deep-cryoge-
nic regime, for a couple of reasons: 
a) Ultra-low dissipation. In-situ dissipation of a perfectly 
adiabatic logic style such as S2LAL is only limited by leakage. 
With subthreshold leakage becoming negligible at 4K, there is 
an opportunity to re-optimize the device structure to bring total 
leakage to such extreme low levels as to allow S2LAL to easily 
set a new record for energy efficiency of semiconductor-based 
logic; this can then potentially allow higher levels of aggregate 
parallel performance to be obtained in power-limited deep-cryo 
environments than has previously been possible. 
b) Power supply decoupling. Highly efficient operation 
of adiabatic circuits requires precise power supply waveforms, 
and efficient operation at room temperature requires that these 
AC signals be supplied with high-Q resonant energy recovery, 
which is quite challenging to achieve.  But, for cryogenic appli-
cations, we can simply bring in supply lines from room-temper-
ature sources, so the efficiency of the source is decoupled from 
the dissipation in the cryogenic environment. This can allow 
high-efficiency adiabatic circuits to be useful for in situ cryo-
genic applications, even before high-Q resonant supplies be-
come widely available. 
5) High-Q resonant supplies.  In a present project at Sandia, 
we are developing high-Q resonant power supplies suitable for 
driving fully adiabatic circuits [19]; these can allow adiabatic 
logic styles such as S2LAL to be viable even for room-temper-
ature applications.  Completing this development effort and ex-
ercising our resonator designs in combination with S2LAL is 
another important direction to pursue.  Also, in the longer term, 
design of appropriate superconducting resonators offers a way 
to eventually incorporate extremely high-Q resonant supplies 
more directly within low-T environments. 
In conclusion, we have shown in detail how to construct a 
novel and highly perfected form of adiabatic CMOS that exhi-
bits significant potential to enable practical applications and dra-
matic demonstrations which will illustrate the vast magnitude of 
future efficiency gains that can still be achieved for general di-
gital computing. 
ACKNOWLEDGMENT 
One of us (M. P. F.) would like to thank 
Mr. Richard Magnano, who helped inspire 
this innovation. On June 24th, when M. was 
creatively stalled in constructing S2LAL, 
R. showed M. the “Seed of Life” (right), an 
ancient sacred geometry figure, and indica-
ted that others found it inspiring.  Remoti-
vated, M. resumed work on the problem, and determined within 
a day that this particular clocking scheme (Figs. 2, 6, and 7) with 
8 out-of-phase cycles (echoing the 8 overlapping circles in the 
figure) in fact solves the problem and is as fast as possible. 
Thanks also are due to Erik DeBenedictis and Tom Conte for 
their encouragement in this work. 
 
 
 
REFERENCES 
[1] A. S. Mutschler, “Preparing for a Barrage of Physical Effects,” Semicon-
ductor Engineering, Aug. 13th, 2020.  https://semiengineering.com/ 
preparing-for-a-barrage-of-physical-effects/  
[2] R. Landauer, “Irreversibility and heat generation in the computing pro-
cess,” IBM J. Res. Dev., vol. 5, no. 3, pp. 183–191, Jul. 1961.  doi:10. 
1147/rd.53.0183 
[3] M. Frank, “Physical foundations of Landauer’s principle,” in Reversible 
Computation, RC 2018, J. Kari and I. Ulidowski, Eds., Lecture Notes in 
Computer Science, vol. 11106  Cham: Springer, 2018, pp. 3–33.  doi:10. 
1007/978-3-319-99498-7_1.  See also: arXiv:1901.10327 [cs.ET] 
[4] C. H. Bennett, “Logical reversibility of computation,” IBM J. Res. Dev., 
vol. 17, no. 6, pp. 525–532, Nov. 1973.  doi:10.1147/rd.176.0525  
[5] M. P. Frank, “Foundations of generalized reversible computing,” in Rev-
ersible Computation, RC 2017, I. Phillips and H. Rahaman, Eds., Lecture 
Notes in Computer Science, vol. 10301  Cham: Springer, 2017, pp. 19–
34.  doi:10.1007/978-3-319-59936-6_2.  See also arXiv:1806.10183 
[6] M. P. Frank and T. F. Knight, Jr., “Ultimate theoretical models of nano-
computers,” Nanotechnology, vol. 9, no. 3, pp. 162–176, Jan. 1998. 
[7] M. P. Frank, “Reversibility for efficient computing,” Ph.D. thesis, MIT, 
June 1999.  https://dspace.mit.edu/ handle/1721.1/9464 
[8] M. P. Frank, R. M. Lewis, K. Shukla, and T. M. Conte, “Analyzing the 
Fundamental Physical Limits of Dissipation-Delay Tradeoffs in Thermo-
dynamically Reversible Mechanisms for Classical Digital Computing and 
Quantum Computing,” LDRD Proposal, Sandia National Laboratories, 
2020. 
[9] S. G. Younis and T. F. Knight, Jr., “Practical implementation of charge 
recovering asymptotically zero power CMOS,” in Research on Integrated 
Systems: Proc. 1993 Symp., C. Ebeling and G. Borriello, Eds. Cambridge: 
MIT Press, Feb. 1993, pp. 234–250. 
[10] S. G. Younis, “Asymptotically Zero Energy Computing Using Split-Level 
Charge Recovery Logic,” Ph.D. thesis, Massachusetts Institute of Tech-
nology, June 1994.  https://dspace.mit.edu/handle/1721.1/11620  
[11] E. F. Fredkin and T. Toffoli, “Design principles for achieving high-per-
formance submicron digital technologies,” proposal to DARPA, Nov. 
1978; reprinted in Collision-Based Computing, A. Adamatzky, Ed.  Lon-
don: Springer, 2002, pp. 27–46.  doi:10.1007/978-1-4471-0129-1_2 
[12] C. L. Seitz, A. H. Frey, S. Mattison, S. D. Rabin, D. A. Speck, and J. L. 
van de Snepscheut, “Hot clock nMOS,” Proc. Chapel Hill Conference on 
VLSI, 1985.  See 5177:TR:85, California Institute of Technology, 1984. 
[13] J. G. Koller and W. C. Athas, “Adiabatic Switching, Low Energy Comp-
uting, and the Physics of Storing and Erasing Information,” Proc. Wkshp. 
on Physics and Computation, Dallas, TX, IEEE, Oct. 1992, pp. 267–270.  
doi: 10.1109/PHYCMP.1992.615554 
[14] J. S. Hall, “An Electroid Switching Model for Reversible Computer Ar-
chitectures,” Proc. Wkshp. on Physics and Computation, Dallas, TX, 
IEEE, Oct. 1992, pp. 237–247.  doi:10.1109/PHYCMP.1992.615549 
[15] R. C. Merkle, “Towards Practical Reversible Logic,” Proc. Wkshp. on 
Physics and Computation, Dallas, TX, IEEE, Oct. 1992, pp. 227–228.  
doi:10.1109/PHYCMP.1992.615546 
[16] W. C. Athas, L. “J.” Svensson, J. G. Koller, N. Tzartzanis, and E. Y.-C. 
Chou, “Low-Power Digital Systems Based on Adiabatic-Switching Prin-
ciples,” IEEE Trans. VLSI Sys., vol. 2, no. 4, pp. 398–407, Dec. 1994. 
[17] M. P. Frank, “Common Mistakes in Adiabatic Logic Design and How to 
Avoid Them,” in Proc. Int’l. Conf. on Embedded Systems and Applica-
tions, ESA '03, CSREA Press, June 2003, pp. 216–222. http://revcomp. 
info/legacy/revcomp/MLPD03-Mistakes-paper.pdf 
[18] V. Anantharam, M. He, K. Natarajan, H. Xie, and M. P. Frank, “Driving 
Fully-Adiabatic Logic Circuits Using Custom High-Q MEMS Resona-
tors,” in Proc. Int’l. Conf. on Embedded Systems and Applications, 2004, 
pp. 5–11.  revcomp.info/legacy/revcomp/AdiaMEMS/MLPD-04.pdf 
[19] M. P. Frank, R. W. Brocato, T. M. Conte, A. H. Hsia, A. Jain, N. A. Mis-
sert, K. Shukla, and B. D. Tierney, “Special Session: Exploring the Limits 
of Adiabatic Circuits,” invited paper submitted to the 38th IEEE Int’l. 
Conf. on Computer Design (ICCD), Oct. 2020. 
   
