NanoMagnet Logic: an Architectural Viewpoint by Vacca, Marco et al.
NanoMagnet Logic: an Architectural Viewpoint
M. Vacca, M. Graziano and M. Zamboni
Department of Electronics and Telecommunications, Politecnico di Torino, Italy
Email: marco.vacca@polito.it, mariagrazia.graziano@polito.it, maurizio.zamboni@polito.it
Abstract—Among the possible implementation of Field-
Coupled devices NanoMagnet Logic is attractive for its low power
consumption and the possibility to combine memory and logic
in the same device. However, the nature of these technologies is
so different from CMOS transistors that the implications on the
circuit architecture must be taken carefully into account.
In this work we analyze the most important issues related to
the design of complex circuits using this technology. We discuss
how they influence the architectural level. We propose detailed
solutions to solve these problems and to improve the overall
performance. As a result of this analysis the type of circuits and
applications that constitute the best target for this technology are
identified. The analysis is performed on NanoMagnet Logic but
the results can be applied to any QCA technology.
I. INTRODUCTION
Field-Coupled devices like Quantum dot Cellular Automata
(QCA) are based on a completely different approach with
respect to CMOS transistors [1]. In this technology, bistable
cells are used to represent the logic values ’0’ and ’1’. Logic
computation is obtained through coupling among neighbor
cells [2]. NanoMagnet Logic is an implementation of the QCA
principle, where single domain nanomagnets with two stable
states are used as basic cells [3]. Their main advantage is
the low power consumption and the possibility to combine
memory and logic in the same device [4]. To switch magnets
from one state to the other, a clock mechanism is necessary:
Magnets are forced in an intermediate unstable state through
an external stimulus, while when this stimulus is removed
magnets reorder themselves following the input magnet. Clock
can be implemented using a current generated magnetic field
[5], through a spin-torque coupling in multilayered structures
[6] or applying an electric field on multiferroic nanomagnets
[7]. Since the number of magnets that can be cascaded is low
[8], a multiphase clock must be used. Three clock signals with
a phase difference of 120 degrees (Figure 1.A) are applied to
small circuit areas called clock zones. Every zone is composed
by a limited number of magnets. In each time step (Figure 1.B)
when magnets of a clock zone are switching (SWITCH state),
magnets in their left are in the HOLD state and act like an
input, while magnets on the right are in the RESET state and
have no influence. In this way signals propagate through the
circuit without errors [9].
Due to this clock system every group of three consecutive
clock zones have a delay of one clock cycle and are therefore
equivalent to a CMOS register. This means that a NML wire,
composed by many clock zones, has a delay of many clock
cycles, leading therefore to an intrinsic pipelined nature. This
has a huge impact on the circuit architectures. In this work
HOLD SWITCHRESET
RESET HOLDSWITCH
HOLD SWITCH RESET
CLOCK
PHASE 3
PHASE 2
CLOCK
PHASE 1
CLOCK
1 2 3
(3)
(2)
(1)
B)A)
Fig. 1. A) 3-phase clock signals. B) Signal propagation in clock period.
we analyze the consequences of this behavior, the problems
that it generates and the way to solve them and to improve
circuit performance. On the basis of our conclusions we can
understand what kind of circuits and applications are best
suited for NanoMagnet Logic and Field-Coupled devices.
II. PIPELINING AND INTERCONNECTIONS
We refer to Figure 2 to support the discussion in this section.
It shows as example a 2 bits Multiply and Accumulate unit
(MAC) made by a multiplier and an adder, where the output
of the adder is connected to one of its inputs. The MAC is the
most important unit of Digital Signal Processors (DSP), and is
then worth studying as frequently used in many applications.
The pipelined nature of this technology has two main conse-
quences: The so-called “Layout=Timing” problem [10] and the
feedback problem [11]. The “Layout=Timing” problem means
that, if the input wires of a logic gate have a different length
in terms of clock zones, input signals of a logic gate can have
a different propagation delay and the logic operation is not
correct. This problem is reduced if a regular layout is used. In
Figure 2 the layout is obtained considering clock zones made
by parallel wires (zones are in the picture separated by the
vertical stripes). here we consider this case as it is currently
the only layout experimentally verified for NanoMagnet Logic
[4]. Using this organization the “Layout=timing” problem is
automatically solved, as all signals are affected by the same
delay in terms of number of clock zones they traverse. In case
another clock zone organization is used, this problem remains
and requires careful layout: in case of complicated circuits this
comes at the cost of a great area expense.
The feedback problem instead is a problem that exists also
in CMOS, for example in Superscalar Microprocessors. Due to
the pipelined nature of this technology a new data can be sent
every clock cycles. For example every adder in Figure 2 can
receive a new input every clock cycle. However the second
input of the adder is also its own output. The feedback of
output toward input requires many clock cycles to propagate
MULTIPLIER ADDER
LOOP
LOOP
LOOP
LOOP
R
E
G
clock cycles
Loop = N
F
A
B
*
+
MAC
F = (A*B)+F’
F’
Fig. 2. Circuit example: 2 bits Multiply and Accumulate (MAC) unit.
and this has consequences in terms of performance. When
a data is sent to the circuit the result is evaluated and it
propagates back to the input of the adder. The signal requires N
clock cycles to propagate through the loop, as N is the number
of clock phases it traverses. As a consequence, if a new data
is sent only after one clock cycle, as it would be natural to
do, the feedback signal is still propagating through the circuit.
The addition is therefore performed in this case between the
new data and the results of the addition calculated N-1 clock
cycles before and the obtained value is not correct. In order
to synchronize signals, inputs must be fed to the circuit every
N clock cycles, where N is delay of the longest loop of the
circuit, and kept frozen in the meantime. The consequences
are that circuit throughput is reduced of N times, severely
reducing performance.
To solve this problem one possible solution to exploit
interleaving to maximize performance. The MAC unit here
described is the key-unit in linear filtering operations for
signal analysis, so it is a good example to use to describe
the interleaving technique. Running N linear filtering opera-
tions permits to keep the pipeline full. Every linear filtering
operation is composed by many “multiply and accumulate”
operations. At the first clock cycle the first data of the first
linear filtering operation (OPA-1) is sent. At the second clock
cycle the first data of the second linear filtering operation
(OPB-1) is sent. In this case the operation is correct because
data are sent every clock cycle but there is no data dependency
between them. After N clock cycles the second data of the
first linear filtering operation (OPA-2) is sent, and so on. The
distance among two data that are part of the same linear
filtering operation is therefore N clock cycles, so that signals
are correctly synchronized. Running N operations in parallel
and interleaving them allows to keep the pipeline always full
and to maximize the throughput.
Another problem is the impact of interconnections. In the
circuit in Figure 2 a large part of the area is filled by wires,
wasting area and increasing power consumption. To reduce the
impact of this problem it is important to design circuit using
systolic like architectures [12], that have a regular layout and
avoid long interconnections wires.
Conclusions. The discussion here presented leads to two
important results. First, the delay of loops can be very high
so we need massive interleaving to maximize throughput, and,
second, systolic-like architectures are necessary to minimize
interconnections. From this two points we can say that Field-
Coupled devices are best suited for Massive Data Analysis
applications, like Digital Signal Processors, where it is pos-
sible to take advantage of a parallel systolic architectures
and massive parallelism can be used to maximize the circuit
throughput. General purpose applications, like microproces-
sors, are instead not well suited for this technology because
they can suffer from a severe penalty in the performance.
REFERENCES
[1] C.S. Lent, P.D. Tougaw, W. Porod, and G.H. Bernstein. Quantum cellular
automata. Nanotechnology, 4:49–57, 1993.
[2] A.I. Csurgay, W. Porod, and C.S. Lent. Signal processing with near-
neighborcoupled time-varying quantum-dot arrays. IEEE Transaction
On Circuits and Systems, 47(8):1212–1223, 2000.
[3] W. Porod. Magnetic Logic Devices Based on Field-Coupled Nanomag-
nets. Nano & Giga, 2007.
[4] M.T. Niemier, G.H. Bernstein, G. Csaba, A. Dingler, X.S. Hu, S. Kurtz,
S. Liu, J. Nahas, W. Porod, M. Siddiq, and E. Varga. Nanomagnet logic:
progress toward system-level integration. J. Phys.: Condens. Matter,
23:34, November 2011.
[5] M.T. Niemier, X.S. Hu, M. Alam, G. Bernstein, M. Putney W. Porod, and
J. DeAngelis. Clocking Structures and Power Analysis for nanomagnet-
Based Logic Devices. In Int. Symp. on Low Power Electronics and
Design, pages 26–31, Portland-Oregon, USA, 2007. IEEE.
[6] J. Das, S.M. Alam, and S. Bhanja. Low Power Magnetic Quantum
Cellular Automata Realization Using Magnetic Multi-Layer Structures.
J. on Em. Sel. Topics in Circ. and Sys., 1(3), September 267-276.
[7] M. S. Fashami, J. Atulasimha, and S. Bandyopadhyay. Magnetization
Dynamics, Throughput and Energy Dissipation in a Universal Multifer-
roic Nanomagnetic Logic Gate with Fan-in and Fan-out. Nanotechnol-
ogy, 23(10), February 2012.
[8] G. Csaba and W. Porod. Behavior of Nanomagnet Logic in the Pres-
ence of Thermal Noise. In International Workshop on Computational
Electronics, pages 1–4, Pisa, Italy, 2010. IEEE.
[9] M. Graziano, M. Vacca, A. Chiolerio, and M. Zamboni. A NCL-HDL
Snake-Clock Based Magnetic QCA Architecture. IEEE Transaction on
Nanotechnology, (10):DOI:10.1109/TNANO.2011.2118229.
[10] M. Graziano, M. Vacca, D. Blua, and M. Zamboni. Asynchrony in
Quantum-Dot Cellular Automata Nanocomputation: Elixir or Poison?
IEEE Design & Test of Computers, 2011.
[11] M. Vacca, M. Graziano, and M. Zamboni. Asynchronous Solutions for
Nano-Magnetic Logic Circuits. ACM Journal on Emerging Technologies
in Computing Systems, 7(4), December 2011.
[12] M. Crocker, M. Niemier, and X.S. Hu. A Reconfigurable PLA Archi-
tecture for Nanomagnet Logic. ACM Journal on Emerging Technologies
in Computing Systems, 8(1), February 2012.
