Asynchronous Nano-Electronics: Preliminary Investigation by Martin, Alain J. & Prakash, Piyush
Asynchronous Nano-electronics: Preliminary Investigation
Alain J. Martin & Piyush Prakash
Department of Computer Science
California Institute of Technology
Pasadena, CA 91125, USA
Abstract
This paper is a preliminary investigation in implement-
ing asynchronous QDI logic in molecular nano-electronics,
taking into account the restricted geometry, the lack of con-
trol on transistor strengths, the high timing variations. We
show that the main building blocks of QDI logic can be suc-
cessfully implemented; we illustrate the approach with the
layout of an adder stage. The proposed techniques to im-
prove the reliability of QDI apply to nano-CMOS as well.
1. Introduction
Currently, molecular nano-electronics is considered a
plausible successor to CMOS. Although immense fabrica-
tion difficulties still lie ahead, at least experimental devices
are feasible today. Molecular nano-electronics is a mostly
non-lithographic bottom-up fabrication technology whose
main advantage over CMOS is to break the limit of lithog-
raphy in terms of feature size and device density. (Densities
in the range of to devices per are mentioned
in the literature [2, 3].)
All technologies at the nanoscale level, including nano-
CMOS, will face great fabrication challenges that will trans-
late into important parameter variations and decreased relia-
bility. Timing will be difficult to control and predict. Wires
will be short, by necessity and by choice: the technology
does not allow to grow long wires reliably and wire delay is
quadratic in the length. Consequently, it will be impossible
to build a useful global clocking network with those tech-
nologies. For those two main reasons—difficulty in con-
trolling and predicting timing and impossibility of building
a clock network—asynchronous logic seems an ideal and
perhaps unavoidable choice for digital circuits in this tech-
nology. QDI, which among all types of asynchronous meth-
ods, relies on the weakest timing assumption (isochronic
fork), seems particularly well suited.
This paper is a preliminary investigation in implement-
ing asynchronous (QDI) logic in molecular nanotechnology.
It is first and foremost an assessment, and further improve-
ment, of QDI’s robustness to extreme parameter variations
and restricted geometry. The paper is organized as fol-
lows. First, we briefly describe the main features of this new
technology. Among several possible alternatives we choose
a somewhat idealized one based on complementary FETs.
We define its main layout rules. Next, we show how to de-
sign basic combinational gates. State-holding elements like
C-elements and precharge function-blocks require some at-
tention because the traditional “staticizer” implementation
must be avoided. We give an implementation for those
cells that avoid staticizers. We show how the logic family
of the Caltech Asynchronous Microprocessor(CAM), and
the logic family of the MiniMIPS can be both implemented
without use of staticizers. We illustrate the design style with
the implementation of a ripple-carry adder. Finally, the im-
plementation of isochronic forks with its implied timing as-
sumption is analyzed.
2. Molecular Nanotechnology
Our target technology is a somewhat advanced version of
the devices produced in the Heath Lab at Caltech, the Lieber
Group at Harvard and the HP group, [5, 6, 1]. Chemists
are able to grow silicon nanowires (NW) with diameters
around 5nm and up to ten microns in length. These wires
are aligned in one direction, either by a flow process or
by nano-imprint. Because of the difficulty in arranging the
wires in a regular pattern, the feasible geometry is restricted
to a crossbar: a collection of parallel wires in one direction
superimposed with a collection of parallel wires in the or-
thogonal direction. However, the grid is not perfectly regu-
lar and several wires may be broken.
Interesting things may be made to happen at the intersec-
tion of two orthogonal wires. Special molecules (e.g. ro-
taxane) can be placed at the crosspoint between two wires
to connect the two wires. Such a junction is originally of
high enough resistance that the two wires are not electri-
cally connected. But, if a high voltage is applied between
14th IEEE International Symposium on Asynchronous Circuits and Systems
1522-8681/08 $25.00 © 2008 IEEE
DOI 10.1109/ASYNC.2008.22
12858
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
the two wires, the molecule will begin conducting and con-
tinue to do so when the voltage is lowered. In other words,
by applying voltage at selected junctions, the junctions can
be programmed to conduct. The junctions can also be made
to rectify, i.e., we can program both resistors and diodes.
Resistors and diodes are sufficient to build a complete
logic family and have been used to build PLAs and mem-
ories, but they have no gain (amplification) and therefore
signal degradation is unavoidable, leading to failure in any
computation containing a significant number of transitions
in series. In order to implement general computations, tran-
sistors are needed to provide amplification, which in turn
requires to be able to make semi-conductors out of silicon
NWs. NWs can be doped during the growth in order to con-
trol their conductivity. Heavily doped regions of a wire are
conducting; some regions can be kept lightly doped so as to
control their conductivity via an electrical field created by
applying a voltage on the crossing metal wire. If a dielectric
can be placed at the intersection of (and between) the two
wires, one has effectively created a FET at the junction. (So
far no technology offers programmable FETs, unlike resis-
tive junctions and diode junctions.)
Technologies with one type of transistors (usually
pFET), diodes, and resistive pullups or pulldowns have been
announced and demonstrated. The transistors have enough
gain to build restoring logic. Recently, all three groups
mentioned have announced that they will soon (or already
do) fabricate FETs of both n- and p-types, with appropriate
threshold voltages and acceptable gain, making it possible
to design complementary logic circuits.
Although the characteristics of the transistors are still
shrouded in mystery, we take as working hypothesis that
such devices will be built. We also assume the availabil-
ity of low-resistance metal wires for the grid direction that
provides the gates of transistors. (Such metal wires are
obtained either by coating silicon wires with metal during
crystal growth, or by imprinting pure metal wires using the
nano-imprint technique.)
Let us summarize the components of our (slightly pre-
cursory) target technology. As already mentioned, we be-
lieve that most properties of this technology will apply to
nano-CMOS. The basic building block is the tile. A tile
is a crossbar array of nanowires: the (top) horizontal wires
are metal conducting wires that are used as gates of tran-
sistors; the (bottom) vertical wires are semiconducting sili-
con nanowires (NW) used as channels of transistors. Wire
length is strictly limited, say around m, with a wire-
to-wire pitch of approximately nm. (Metal wires can be
packed more densely and can be longer as they are less re-
sistant. But we will ignore those differences.) We assume
that we can implement tiles of wires.
A tile may be a routing tile or a compute tile. A routing
tile contains only programmable junctions and is used to
connect two compute tiles. A compute tile consists of an n-
plane, a p-plane, and a routing plane. An n-plane contains
n-transistors only; a p-plane contains p-transistors only; a
routing plane contains programmable connections only.
An important advantage of this technology is that the dif-
ferent active crosspoints (transistors, resistors, diodes) fit
exactly under the area of the crosspoint and therefore do
not increase the area of the array, hence contributing to the
density advantage of the technology.
A few important restrictions and properties must be men-
tioned. (1) It is not possible to mix the different components
(n-transistor, p-transistor, resistor) inside the same plane.
Each plane is homogeneous. (2) It is not possible to con-
nect wires end-to-end in one direction. Connections are
made through orthogonal wires. (3) Connection resistance
is high (100K ) and highly variable. (4) The drive of tran-
sistors is not well characterized. We assume that the limit
to the number of transistors in series is similar to CMOS.
We have chosen 3 as a hard limit. (5) Up to 10% of the
wires and connections may be broken or stuck open. (6)
Finally, the collection of nano-tiles is placed on top of a
standard silicon layer. Power distribution and input/output
are implemented in the silicon layer. All vertical nanowires
in a p-plane are connected to a micro-wire distributing Vdd,
and all vertical nanowires in an n-plane are connected to a
micro-wire distributing GND.
3. Implementing QDI Logic
Among all asynchronous logic families, QDI is the most
robust to parameter variations because it relies on the weak-
est timing assumption: the isochronic fork. The main ques-
tion addressed in this paper is the following. In the presence
of extreme PVT (process, voltage, temperature) parameter
variations, how will a QDI circuit (implemented in molecu-
lar nano) fail, and what can be done to avoid the failure or
at least significantly improve the circuit’s robustness?
The implementation of combinational cells does not
present any particular problem beside the usual restriction
on the length of transistor chains, but it gives us the oppor-
tunity to fix a general layout scheme. Figure 1 shows two
alternative implementations for the inverter and the nand2.
(The nor2 gate is derived form the nand2 by exchanging
Vdd and GND, and p- and n-transistors.) A dot at the inter-
section of two wires indicates a connection, a diagonal bar
(/) indicates an n-transistor. A diagonal bar ( ) indicates a
p-transistor. In the first type of layout, the n- and p-planes
are vertically aligned w.r.t. the metal wires; in the second
type, they are horizontally aligned w.r.t. the metal wires. It
12959
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
b0
~A~A
A
+ 0+
A
pFET
nFET
Connection
Key:
Metal wire
Silicon
Channel
0
c
+ +
c
0
aa
b
Figure 1. Two possible layouts for the inverter
and nand2
is clear that the second choice of layout presents an impor-
tant area advantage: the area of the nand-gate is 6x3 in the
first case and 2x3 in the second case. As shown in Figure 2,
the area penalty of the first layout for an n-input combina-
tional gate is . But there is another, more important, ad-
vantage to the second layout choice. If an input is shared by
several transistor gates, then in the second layout choice, the
input signals to the different gates are carried by the same
metal wire, resulting in very similar characteristics for
the different paths. In the first layout choice, the paths from
one input to several transistor gates are very dissimilar, with
the paths to one type of transistors going through two resis-
tive contacts and the paths to the other type of transistors
being a single metal wire. As we shall see, this difference
is important for the implementation of isochronic forks.
The general scheme for a compute tile is shown in Fig-
ure 3. The two transistor planes are side by side with the
input signals running horizontally on metal wires. Each ver-
tical semiconductor wire of the transistor planes can be used
(inside a transistor plane) as a pullup or pulldown transistor
chain if transistors are placed at some of the cross-points.
Because of the restricted geometry, parallel chains cannot
share transistors, and therefore all boolean expressions are
implemented in the disjunctive normal form: The expres-
sion can only be implemented as .
The general shape of the routing plane is that of an inverted
T as shown on the figure. The part of the routing plane be-
tween the two transistor planes is used only for the feedback
connections needed in the implementation of state-holding
gates. (We will give examples shortly.) The bottom part of
the routing plane connects the pullup and pulldown chains
to the output(s) of the logic gate. (A similar scheme has
been proposed in [1].) A single compute tile contains sev-
eral cells.
inputs
p
n
n
2n  overhead2
no overhead
pn
Figure 2. Two possible layouts for an n-input
cell
+
n−plane p−plane
routing plane
0
Figure 3. General layout scheme for a com-
pute tile
13060
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
The standard CMOS implementation of state-holding
operators in this technology does present a serious problem:
the weak inverter of the staticizer. Consider the operator de-
fined by the two production rules:
Without loss of generality, let’s assume that it is imple-
mented as
By definition of a state-holding element, there exist states in
the computation where holds. In those “floating”
states, the path to Vdd through the pullup implementing ,
and the path to GND through the pulldown are both cut.
The voltage value of the physical node implementing has
to be maintained in other ways. The general approach is to
find another path to Vdd and another path to GND to main-
tain the current value of . The simplest and crudest so-
lution is to add a “staticizer” (or “keeper”). The staticizer
implementation consists of always maintaining the current
value of by adding the pullup and the pull-
down , giving the gate:
The added circuitry is a feedback inverter with input and
output . The difficulty with this solution is that when
the value of has to be changed, for instance from true
to false, there is a conflict (a “fight”) between
and . Symmetrically, when the value of has
to be changed from false to true, there is a “fight” between
and . There is no logical resolution to
those conflicts; they can only be resolved by physical means
by making sure that the current through the pullup imple-
menting and through the pulldown implementing is
stronger than the current through the transistors of the feed-
back inverter. This is usually achieved by implementing the
two transistors of the feedback inverter as highly resistive
(“weak”), making the currents through the pullup and the
pulldown of the feedback much smaller than the currents
through the pullup and pulldown . But the weak cur-
rents cannot be too weak since they have to maintain the
voltage on node in the presence of possibly important
leakage.
Hence, the correct behavior of the staticizer relies on a
two-sided inequality requirement on the feedback inverter’s
current. In a technology where the physical parameters of
the design are very hard to control and the transistors can-
not be sized, it would be very risky to rely on the relative
x
a
b
b
a
x_ x
a
a
b
b
x x
x
Figure 4. A two-input C-element implemented
as a majority gate.
strengths of conducting paths. We conclude that we will not
be able to use staticizers in nano-technology.
We have to use another solution to maintain the value of
the output nodes in the floating states. The general method
consists in identifying the floating states in which the value
of the output has to be maintained and using the feedback
inverter only in those states. Without additional informa-
tion, the floating states are characterized by , lead-
ing to the general solution:
The Muller C-element is an essential building block of
asynchronous logic. The 2-input C-element with inputs
and and output is defined as:
The above transformation applied to this pair of PRs leads
to the well-known majority-gate implementation (see Fig-
ure 4):
For the three-input C-element, the transformation gives:
The layout of the C-elements in Figure 5 shows the use
of the midsection of the routing plane for feedback connec-
tions (using the output as gate of some transistors).
4. Read/Write Register
The read/write register is used in a standard pipeline
stage of the Caltech Asynchronous Microprocessor (CAM).
13161
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
ab
x_
x
x_
x
a
b
c
x
x_
x
x_
0 + 0 +
x_
x
Figure 5. Schematics for a 2-input and a 3-input C-elements. The central part is used for feedback
paths.
x_
x
ri
xf_
xt_
wt
wf
wack_ wack SR
+
−
reg
x_
x
Figure 6. Dual-rail single-bit register with
read and write interfaces
It is also used in a slightly different form to implement
general-purpose registers in the MiniMIPS and Lutonium.
As shown in Figure 6, in its simplest form, the register
consists of three parts: the set/reset latch maintaining the
current value of the register bit , the two nand-gates
that constitute the read part, and the write-acknowledge
(wack) cell that generates the acknowledge signal for the
write handshake. The implementation of the read part
presents no difficulty. The set/reset latch is implemented as
cross-coupled nor-gates which do not need relative sizing
unlike the implementation with cross-coupled inverters (see
[7]). Only the wack needs some attention. Its specification
is
The floating states are and . In
both states, the write interface is in the process of changing
the values of and and therefore the output should
be kept high in the floating states. The transformation then
gives
Now the two guards are complementary. The write-
acknowledge is a combinational gate. Altogether the above
design leads to a compact nano layout for the register. No
feedback inverter is needed, the only feedback being that
of the cross-coupled nor-gates, and there are at most two
transistors in series on all paths. See Figure 7.
We know that we can implement all computations that
don’t require arbitration with just standard combinational
gates and the 2-input C-element. But performance will suf-
fer, and it takes just a few additional cells to allow efficient
circuit implementations. For instance, the C-element, the
set-reset latch and the precharge function are enough to im-
plement the circuits of the Caltech MiniMIPS as well as
the circuits of the CAM even though the two design styles
are different. Again, the set-reset latch can be implemented
with cross-coupled nor-gates. The precharge function is a
state-holding cell with two sets of inputs: the input is a
control signal used to precharge the output node high in
the neutral state ( ). The other set of inputs is used to
compute the boolean function when is high. Function
is necessarily small enough to fit in a single pulldown,
and therefore the number of inputs of is also limited—
rarely more than four. The PRS for the precharge function
cell is:
The floating state is . Applying the transformation
and after simplification, we get:
13262
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
+ 0
wack_
xf_
xt_
x_
x
ri
x_
x
wf
wt
S/R+ wack+ xt_ xf_ S/R− wack_− xt− xf−
Figure 7. Schematic for the dual-rail read/write register
XF ~F
z
z_
+0
en
X
Figure 8. Function block with single output.
Both boolean expressions and are used
to make the gate combinational.
A symbolic layout is shown in Figure 8 . For example, if
is the dual-rail equality of and , i.e. ,
the complement of has to be implemented as
, since unfortunately
we can only implement parallel or-terms. This leads to 5
parallel pullups. The added complexity may be significant.
But it should be observed that the added pullups and pull-
downs are used only to maintain the value of the output in
the floating state(s). Therefore, their possible complexity,
which translates into a weak drive ( current), might be
acceptable since those extra terms are not used to switch
the output but only to maintain the current value of the out-
put. However, the following simplification significantly
improves the circuit. It applies whenever it is required to
compute the dual-rail version of the function, i.e.:
If the only floating states are for the block,
and for the block, then we can staticize
the PRS as:
en
X
zt_
zf_
+0
Ft Ff
Figure 9. Schematic for a dual-rail function
block
with the pulldowns unchanged. This simplification requires
that be an invariant of the system. In other
words, the condition for applying the transformation is that
the function inputs are not reset when is set. A symbolic
nano-layout for this solution is shown in Figure 9. Certain
QDI templates already satisfy this condition. It is the case
for the control/data decomposition scheme of the pipeline
stage used in the CAM. (It relies on an easily satisfiable
isochronicity assumption.) Other QDI templates can eas-
ily be transformed to satisfy the invariant. It is the case for
the PCHB (“precharge half-buffer”) of the MiniMIPS de-
sign style. Since the validity (usually called ) of the
data input is always computed, it suffices to replace the en-
able signal with defined as (i.e. is the
output of the C-element with and as inputs).
5. Implementing an Adder
Next, we describe the implementation of an n-bit adder
stage in nano. The adder is designed in the same style as
13363
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
x
PCF
f f(x)
ri
roli
lo
ctrl
xreg
CT
Figure 10. Control/data decomposition of a
pipeline stage
x2
x1
li
lo_ ri
ro
C
C
wack
wack
reg2 F2
F1reg1 f(x1)
f(x2)
x1
x2
Figure 11. A control/data pipeline stage with
half-buffer control
the one used in the CAM. The basic QDI pipeline stage
can be implemented as in Figure 10 us-
ing the control/data decomposition approach of the CAM.
(See, for instance, [7].) As shown in Figure 11, the con-
trol part can be implemented as a simple half-buffer (which
can be as simple as a C-element in this case); the datapath
consists of a dual-rail register per bit of input with the two
nand-gates of the read part replaced by the precharge func-
tions computing the carry and sum for every pair of input
bits.
In this design, the condition for removing staticizers is
satisfied provided a mild isochronicity assumption is ful-
filled, and therefore the function blocks for sum and carry
can use the derived transformation. The isochronicity as-
sumption is that the delay for the valid inputs to propagate
from the outputs of the registers to the input of the function
blocks should be less than the delay of the adversary path
consisting of the wacks (in parallel), the completion tree,
and the C-element in the control.
Furthermore, in order to avoid propagating the control
signal (which is also in this case) to all bits of the
datapath, we can replace with in all cells but the
least significant one, where and are the pair of bits
implementing the carry-in. The nano-layouts for the carry
and sum functions are shown in Figure 12 and Figure 13.
The production rules describing the pullups and pulldowns
of the sum and carry cells are as follows. For the sum:
For the carry-out:
6. Isochronic Fork
Once all QDI building blocks have been defined, the
next step is to compose them into a working system. We
leave aside the issue of assembling 1 into a layout and con-
centrate on the critical issue of isochronic forks: Given
the wide timing variations to be expected in this technol-
ogy and in nano-CMOS, can we satisfy the timing require-
ments of isochronic forks? Although the Async reader
should be familiar with the concept of isochronic fork, there
still exists enough misunderstanding about the nature of
the isochronicity requirement that a discussion is in order.
In the past, a simple-to-explain timing condition has often
13464
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
0 +
ct
cf
bt
bf
at
af
dt_
df_
dt_
df_
Figure 12. Carry-out computation
0
ct
cf
bt
bf
at
af
st_
sf_
+
st_
sf_
Figure 13. Sum computation
been used to implement the isochronic fork; but, in the pres-
ence of important timing variations, this sufficient timing
condition is difficult to implement as it relies on a two-
sided timing inequality. The designer should switch to a
more complex but easier to satisfy necessary and sufficient
condition. This condition relies on the notion of adversary
path. It is a one-sided inequality stating that a single tran-
sition should be shorter in time than a sequence of multiple
transitions.
Let us first review how the need for isochronic forks
arises. A computation implements a partial order of tran-
sitions. In the absence of timing assumptions, this partial
order is based on a causality relation. For example, tran-
sition causes transition in state if and only if
makes guard of true in . Transition is said to
acknowledge transition . We do not have to be more spe-
cific about the precise ordering in time of transitions and
. The acknowledgment relation is enough to introduce
the desired partial order among transitions, and to conclude
that precedes . In an implementation of the circuit,
gate with output is directly connected to gate with
output , i.e., is an input of .
Hence, a necessary condition for an asynchronous cir-
cuit to be delay-insensitive is that all transitions are ac-
knowledged.
Unfortunately, the class of computations in which all
transitions are acknowledged is very limited, as we proved
in [8]. Consider the following example in which the specifi-
cation of the computation requires an ordering of transitions
on variables , , and defined by the sequence:
Implementing this sequence of transitions requires intro-
ducing at least one control variable to distinguish the
states in which causes from the states in which
causes , leading to PRs of the form:
As shown in Figure 14, as the output of gate is forked
to , an input of gate with output , and to , an
input of gate with output . A transition when
holds is followed by a transition , but not by a transi-
tion , i.e. transition is acknowledged but transition
is not, and vice versa when holds. Hence, in ei-
ther case, a transition on one output of the fork is not ac-
knowledged. In order to guarantee that the unacknowledged
transition completes without violating the specified order, a
timing assumption called the isochronicity assumption has
to be introduced, and the forks that require that assumption
are called isochronic forks[8]. The branch of the fork with
the non-acknowledged transition is called an isochronic
branch. (Not all forks in a QDI circuit are isochronic.)
First, it is important to realize that the weakest timing
assumption associated with an isochronic fork is not a re-
lation between the delays on the different branches of the
fork—which would be a very tight assumption indeed. For
example, in the case of a fork with two branches, if
and are the delays of transition on one branch and
transition on the other branch, the timing assumption is
NOT that . As already mentioned, such
a timing requirement has been used because of its simplic-
ity. It is sufficient but not necessary, and is very difficult to
fulfill in the presence of large parameter variations, in par-
ticular threshold voltages.
In order to characterize a weaker timing condition for an
isochronic fork to behave properly, let us examine how a
circuit will fail when a transition on an isochronic branch
fails to complete. In our previous example, assume that
13565
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
x2
Gy y
Gz zc
c
x1
x
Figure 14. The fork (x,x1,x2) is isochronic: a
transition on x1 causes a transition on y only
when c is true, and a transition on x2 causes
a transition on z only when c is false. Hence,
certain transitions on x1 and on x2 are not ac-
knowledged, and therefore a timing assump-
tion must be used to guarantee the proper
completion of those unacknowledged transi-
tions.
causes , but does not complete. Since does not
change the value of , initially the failure of to com-
plete does not cause a failure of gate . But causes
a sequence of transitions that will eventually change an-
other input of , say transition since we do not know
the sense of this transition. At this point, transition not
having completed will cause a misfiring of , since in the
partial-order specification of the circuit, should have
completed.
Hence, the isochronicity assumption is that transition
completes before the sequence of transitions starting
at and ending at . This sequence of transitions defines
a path in the circuit called the adversary path of isochronic
branch . The isochronicity assumption states that the de-
lay for unacknowledged transition to complete
be always smaller than the sum of the delays of the transi-
tions on the adversary path (including acknowledged tran-
sition ).
This is a one-sided inequality that can always be satisfied
by making the adversary path longer.
Estimating transition delays can be tricky. Transition de-
lay has essentially two components: the diffusion delay of
a voltage level propagating along a wire, and the switching
delay which is the time for a voltage change to cut or tie a
transistor chain connecting the output node of an operator
to Vdd or ground. (In our implementation of QDI, an in-
put of a logical operator is always the gate of one or several
transistors.) The switching delay is going to be different de-
pending whether the transition is a “cut” or “tie”: To cut an
n-type transistor chain requires a voltage change from Vdd
C
x1_
ro (0−>?)
x2_(1−>0)
lo_ ri_
(0−>1)li li2(0−>1)
li1(0−>1)
ri_ (1)
Figure 15. The fork (li,li1,li2) is isochronic,
with li2 as isochronic branch for the transi-
tion 0 to 1 on li
down to below the threshold voltage Vtn, almost a full volt-
age swing. To tie an n-type transistor chain requires a volt-
age change from zero to just Vtn. And similarly for p-type
transistors. Obviously, in the presence of long slew rates
(the “slope” of the voltage change) a cut may take much
longer than a tie. The isochronicity assumption can be of
two types: “tie-before-tie” or “cut-before-tie,” the first one
being easier to satisfy than the second one.
For instance, the isochronic fork in the control/data
pipeline of Figure 11 is of the type “tie-before-tie.” It is
required that the valid data and tie the transistors in
and before the adversary paths tie the enable signal
through . On the other hand, the isochronic fork in the
PCHB (precharge half-buffer) which is the main template
in the MiniMIPS (see [7, 9]), is of the type “cut-before-tie.”
The data inputs to the function block have to return to neu-
tral (low voltage) hence cutting the pulldown chain of the
function block, before the neutrality is detected and the en-
able signal is raised (a tie).
In the proposed layout for molecular nano, the restricted
wire layout and highly resistive connections make the im-
plementation of the isochronicity assumption somewhat
easier. Let us look at a worst-case example, where the ad-
versary path contains just one gate. The example is the Cal-
tech Q-element or the active-active buffer (see for instance
[7]) shown in Figure 15. We analyze the isochronic fork
. Initially, is true and is false. In order
to satisfy the handshake specification, transition should
not change the value of . But since the inverted output
of the C-element is the input of the nor-gate the ad-
versary path ( , C-element, ) causes transition ,
which could cause transition , unless the isochronic-
branch transition terminates before . This is one
of the tightest isochronicity requirements since the adver-
sary path contains only one gate, the C-element. (Since the
C-element is used with an inverted output in this cell, in
CMOS the output could be taken before the inverter if the
13666
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
li
ri_
x
x’
x2_
ro
x
x’
x2_
A B
0 +
C
Figure 16. Partial layout showing the adver-
sary path in dotted line
C-element is implemented with a staticizer. However, in an
implementation of the C-element as a majority gate, the in-
verted output has to be taken after two inverters, making the
adversary path safer.)
The nano-layout sketch of Figure 16 shows the imple-
mentation of the C-element with the two inverters, and the
nor-gate. (The nand-gate is omitted.) The isochronicity re-
quirement translates as follows. Initially the voltage on
is high and there is a path from GND to along the pull-
down . When the voltage on rises, the pulldown from
GND to must be tied by the n-transistor with gate and
the pullup must be cut by the p-transistor with gate
before the adversary path (marked by a dotted line on the
figure) cuts pulldown by the n-transistor with gate
and ties pullup by the p-transistor with gate .
Keeping the isochronic-branch delay short requires that
the layout discipline does not add extra delay through long
and/or resistive wires or by connection resistances. The lay-
out rule that places the n-plane and p-plane side by side al-
lows the same metal wire to be used for all branches of the
same isochronic fork provided it all fits within the length
constraint of a metal nano-wire. This layout rule elimi-
nates all extra delays on the isochronic branch besides the
transition delay. Furthermore, the transformation that elim-
inates the staticizer helps make all transition delays of simi-
lar length by pushing the transition voltage towards .
With this precaution, the timing requirement for this ex-
ample is that a single transition delay (a tie) on a single
metal wire of the isochronic branch must be shorter than 3
transition delays plus 9 delays of contacts and wires for
the adversary path. This timing requirement should not be
difficult to fulfill.
7 Oscillating Rings
An important point concerning the implementability of
QDI is to make sure that the rings of restoring gates com-
posing a QDI system keep oscillating, since a QDI system
is nothing but an interlocking of oscillating rings.
Since hardware computations are non-terminating, each
transition is followed, after a number of other transitions,
by transition , and vice versa. Since the guards and
of those transitions are mutually exclusive, the chain
of transitions between and must contain a transition
that invalidates . Hence, transition invalidates itself
through a sequence of intermediate transitions. Can we still
say that is stable? Is it possible that the effect of
propagates through the cycle of gates fast enough to inval-
idate itself? At the electrical level, could the voltage
stabilize at an intermediate value close to ?
Arguing that such a ring of operators is not self-
invalidating is equivalent to arguing that the ring oscillates.
This is an electrical property of the circuit that relates the
slew rates of transitions, the gain of the operators, and the
number of operators on the ring. We will have to require
that any cycle of operators be implemented with a number
of stages at least equal to a chosen minimum to guaran-
tee that the cycle is not self-invalidating. In CMOS, three
restoring operators with good gain are usually sufficient,
although we usually require five to be safe. In molecu-
lar nano-technology, the minimal number of operators on
each ring will have to be determined based on the above-
mentioned electrical properties of the technology.
8 Conclusion
The main purpose of this paper was to establish the ex-
istence of an efficient QDI logic family given the strict
design rules and large parameter variations of our target
nano-technology. A contribution of the research has been
to show that there exist efficient static implementations of
state-holding elements without weak feedback and with no
more than two transistors in series in the pullups. Such an
approach will be of interest for “extreme nano-CMOS” as
well. Also, the isochronicity assumption has been further
refined.
Several issues have not been addressed: (1) As up to
10% of the devices will be unusable (broken wire, stuck
open contact, etc) defect tolerance is an integral part of the
design. We have not presented any solution for defect toler-
ance, although preliminary investigation indicates that sat-
isfactory strategies exist. (2) For the usual density reasons,
memory requires specialized design. (Small memories have
already been demonstrated.) (3) Input and output, as well
as power supply, will be done at a microlevel, e.g. CMOS.
13767
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
How to connect nanowires to the microlevel CMOS connec-
tions without increasing the nano-pitch is a separate issue.
Several solutions exist. (See [4, 1].)
Acknowledgments
Acknowledgment is due to my colleagues Jim Heath and
Andre´ DeHon for sharing some of their knowledge about
molecular nanotechnology, and to my students Sean Keller
and Chris Moore for their excellent comments and criti-
cisms. The research described in this paper was supported
by a grant from the National Science Foundation.
References
[1] Greg Snider, Philip Kuekes and R Stanley Williams.
CMOS-like logic in defective, nanoscale crossbars.
Nanotechnology, 15 881-891, 2004.
[2] M.R Stan et al. Molecular Electronics: From Devices
and Interconnect to Circuits and Architecture. Pro-
ceedings of the IEEE, 91, 11, 1940-1957, 2003.
[3] Andre DeHon. Nanowire-Based Programmable Ar-
chitectures. J. of Emerging Technologies in Computer
Systems, 2005.
[4] Andre DeHon and Helia Naeimi. Seven strategies for
tolerating highly defective fabrication. IEEE Design
and Test of Computers, 22(4), 2005.
[5] C.M.Lieber et al. Design and hierarchical assembly of
nanowire-based moletronics. DARPA Moletronics PI
Meeting. 2002.
[6] N.A.Melosh et al. Ultrahigh-density nanowire lattices
and circuits. Science, 300, 112-115, 2003.
[7] A.J. Martin, M. Nystro¨m. Asynchronous Techniques
for System-on-Chip Design. Proceedings of the IEEE,
Special Issue on Systems-on-Chip, 94, 6, 1089-1120.
2006.
[8] Alain J. Martin. The Limitations to Delay-
Insensitivity in Asynchronous Circuits. Sixth MIT
Conf. on ARVLSI, ed. W.J. Dally, 263-278, MIT Press,
1990.
[9] Alain J. Martin et al. The Design of an Asynchronous
MIPS R3000 Microprocessor. Proc. 17th Conf. on
ARVLSI. Los Alamitos, Calif.: IEEE Computer Soci-
ety Press, pp. 164–181, 1997.
13868
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 15,2010 at 18:10:50 UTC from IEEE Xplore.  Restrictions apply. 
