A pNML Compact Model Enabling the Exploration of 3D Architectures by Turvani, Giovanna et al.
Politecnico di Torino
Porto Institutional Repository
[Article] A pNML Compact Model Enabling the Exploration of 3D
Architectures
Original Citation:
Turvani, Giovanna; Riente, Fabrizio; Plozner, Elisa; Vacca, Marco; Graziano, Mariagrazia; Stephan,
Breitkreutz-V. Gamm (2017). A pNML Compact Model Enabling the Exploration of 3D Architectures.
In: IEEE TRANSACTIONS ON NANOTECHNOLOGY, p. 1. - ISSN 1536-125X
Availability:
This version is available at : http://porto.polito.it/2666391/ since: March 2017
Publisher:
IEEE
Published version:
DOI:10.1109/TNANO.2017.2657822
Terms of use:
This article is made available under terms and conditions applicable to Open Access Policy Article
("Public - All rights reserved") , as described at http://porto.polito.it/terms_and_conditions.
html
Porto, the institutional repository of the Politecnico di Torino, is provided by the University Library
and the IT-Services. The aim is to enable open access to all the world. Please share with us how
this access benefits you. Your story matters.
(Article begins on next page)
1A pNML Compact Model Enabling the Exploration
of 3D Architectures
G. Turvani, F. Riente, E. Plozner, M. Vacca, M. Graziano and S. Breitkreutz-v. Gamm
Abstract—In Nano Magnetic Logic (NML), single-domain
nanomagnets enable logic operations. Binary information can
be encoded thanks to its bistable magnetization. Many imple-
mentations are currently discussed in literature, among them
one promising candidate is perpendicular-Nano Magnetic Logic
(pNML). It features several advantages like the controllability of
the switching mechanism, the simplicity of design and the natural
predisposition of being integrated in 3D architectures.
Here we show how this technology can be adopted in the
design of 3D logic architectures. Physical equations and quantities
have been gathered from experimental demonstrations of pNML
devices; formulas have then been fitted and implemented in
VHDL (VHSIC Hardware Description Language). In this paper
we present an analysis of pNML circuits: initially a Multiplexer
has been manufactured and characterized, then our compact
model has been tested through simulations. Moreover, the MUX
has adopted to design a generic n-bit accumulator.
Our results demonstrate that the compact model makes it
possible to perform fast simulations, while maintaining a fine level
of accuracy. Thanks to its flexibility, novel materials, geometric
variations and other technological improvements can be easily
integrated in order to be tested at circuit level. We anticipate
our essay to be a starting point for the exploration of large 3D
digital circuits.
Index Terms—perpendicular Nano Magnets Logic, pNML, 3D
Architecturs, Innovative Technology
I. INTRODUCTION
According to the International Technology Roadmap of
Semiconductors [1], CMOS technology is rapidly reaching its
technological end economical limits. Among emerging tech-
nologies currently under investigation, Nano Magnetic Logic
(NML) [2] based devices seem to be very promising [3]. NML
belongs to the so-called beyond-CMOS technologies, where
information transportation is accomplished through magneto-
dynamic interactions among devices. Different implementa-
tions have been studied in recent years, two of the most
interesting being in-plane Nano Magnetic Logic (iNML) [4]
[5] and perpendicular Nano Magnetic Logic (pNML) [6].
The main characteristics of this technology are the non-
volatility of stored information, the possibility to store binary
information, the absence of interconnections and the possibil-
ity of being integrated with standard CMOS technology. iNML
rectangular-shaped nanomagnets (depicted in Fig. 1.A), with
typical dimensions of (50x100)nm, are placed side-by-side
Copyright (c) 2016 —-. Personal use of this material is permitted. However,
permission to use this material for any other other purposes must be obtained
from the — by sending a request to pubs-permissions@——-.
G. Turvani, F. Riente, and S. Breitkreutz-v. Gamm are with Institute for
Technical Electronics, Technical University of Munich. E. Plozner, M. Vacca
and M. Graziano are with the Electronics and Telecommunications Depart-
ment of Politecnico di Torino, Italy.
FIB Irradiation
ANC
Domain Wall
“1” “0”
Notch
3D Via
Coupling ﬁeld
“1” “0”
A) B) C)
D) E)
Figure 1. A) iNML cells. Logic status is encoded in the planar magnetization
B) pNML cells. Logic status is encoded in the perpendicular magnetization
C) 3D Via for vertical interconnections D) ANC creation through spotted FIB
irradiation. This definies a univocal propagation directionality. E) Memory
element.
creating wires and logic devices [7] [8] . Indeed, the coupling-
field acting among neighboring cells makes it possible to
transfer a magnetic charge according to the ferromagnetic and
antiferromagnetic interaction. The maximum limit of iNML
elements which can be cascaded is limited to 4/5 because of
physical non-idealities like thermal noise etc For this reason,
logic circuits must be divided into clock zones, in which the
maximum number of cells is limited. Furthermore, information
is propagated among different clock zones according to a
multiphase clocking system [9].
pNML (Fig. 1.B) overcomes this limitation thanks to its
intrinsic physical properties. Here, only one clock signal is
applied to the whole circuit [10] [11]. This has a remarkable
impact in terms of circuit compactness and also simplifies the
design process. There are several improvements introduced
by this implementation: the switching mechanism is tunable
through the manufacturing process, the propagation direction
of signals is controllable and it can be adopted in monolithic
3D structures [12] (Fig. 1.C). In pNML technology, single
domain nanomagnets with perpendicular magnetic anisotropy
are used. Two stable magnetization states are possible, which
encode the binary values 1 and 0.
However, to guarantee the signal flow directionality, one
side of the magnet should be more sensitive to magnetic
field changes. This highly-sensitive region is called artificial
nucleation center (ANC) and it is obtained by a partial Focus-
Ion-Beam (FIB) irradiation as shown in Fig. 1.D, [6]. Hence,
signal propagation is achieved in two steps: i) the domain
wall nucleation in the ANC and ii) its subsequent motion.
As shown in Fig. 1.D, the magnet is partially irradiated on
the left side entailing a reduction of the switching field.
Only a magnet placed nearby the ANC can influence the
2magnetization defining a univocal propagation direction [13].
Logic computation is performed by means of basic pNML
gates inverters and majority voters as experimentally demon-
strated in [10]. Signal synchronization can be achieved by
controlling domain wall (DW) propagation. Geometrical de-
formations (Fig. 1.E) in the magnetic nanowire (notches) make
it possible to block the propagating information. However, to
restore the information flow, short in-plane field pulses can be
used [14]. Usually, they are generated by a buried wire placed
just below the magnetic notch. In addition, this technology en-
ables the fabrication of monolithically 3D-integrated devices,
as demonstrated in [12]. The first structures have been already
experimentally demonstrated [6], then, several micromagnetic
simulations have been performed in order to study the physical
and logic behavior of such devices. Nevertheless, micromag-
netic simulators require very high computational costs and
consequently, a lot of time is required to simulate larger
circuits. Different models have been presented to perform
lighter simulations. One of the most interesting has been
presented in [15] and it is implemented in Verilog-A. This
model enables the reduction of the simulation time but has a
lack of flexibility when considering the description of complex
architectures. Here, we present our physical compact model
entirely developed in VHDL (VHSIC Hardware Description
Language) and able to perform very fast simulations of logic
architectures, while preserving a fine level of accuracy. From
a methodological point of view, several experiments have been
carried out in order to characterize pNML devices. Initially,
a complete study of characterization has been pursued, then,
physical data has been extracted and fitted into equations. A
detailed description of this model will be given in Sec. II.
The novelty of this model lies in the ability to enable fast
simulations of complex logic architectures.
With this paper we validate our compact model by present-
ing how the physical equations have been fitted from exper-
iments. As benchmarks, we choose to report the simulation
results obtained with the architectures of a Multiplexer and an
Accumulator. Both circuits have been described completely
in VHDL with a generic bit parallelism. These circuits must
be understood as samples aiming to verify the validity of the
presented model. Notwithstanding, exactly the same approach
can be applied to circuits of any complexity.
Experiments on 3D integration of pNML devices are cur-
rently carried on; with our approach it is possible to test
multilayered circuits with very low computational costs. Fur-
thermore, since the model is organized in libraries containing
technological quantities and physical equations, it is possible
to test circuits by changing any parameter (like geometry,
materials).
II. THE MODEL
To characterize complex architectures, a compact model has
been implemented using the hardware description language
VHDL. It gives a full characterization of the analyzed circuit
mixing the description of the switching behavior and all the
physical and technological features which characterize pNML
technology. From an implementation point of view, with this
Library of pNML 
Elementary Components
Devices and Circuit 
Design
Nucleation
Center
Domain Wall
pNML deviceNotch
3D Via
SimulationsResult Elaboration
Figure 2. Proposed methodology: the model of basic pNML elements is
implemented into a VHDL library. The combination of those components is
used to design pNML devices, and then architectures. Circuits can be tested
through standard VHDL simulators.
new model, pNML devices are divided into a few elementary
blocks characterized by different functionalities. In this way,
circuits of any complexity can be composed using a library of
elementary components.
In order to implement the compact model, several physical
parameters must be taken into account: geometry, material
chosen for the realization of the magnetic layers and delay
parameters such as the nucleation time and the propagation
time. Indeed, the focus is on the nucleation time and thus
also on the probability of being nucleated the current domain
wall, and on the propagation time, both of which are used to
define the duration of the clocking pulse. In order to implement
the model, two packages have been created. The first one in-
cludes the main physical quantities which characterize pNML
devices. Beyond that, different formulas have been conceived
in order to recreate the switching behavior of the ANC and
the propagation along the DW. The second package defines
the specific functionality of each elementary block needed to
realize circuits.
The flexibility of this model resides in its intrinsic ability
to be adapted by simply modifying the library parameters.
This represents a key point in the study of complex logic
architectures. Now it is possible to obtain fast simulations of
pNML circuits exploring how small modifications of physical
quantities (like geometry) can have a remarkable impact on
the performance of pNML circuits. Furthermore, the investi-
gation of geometric deformations such as notches, is always
becoming always more attractive since they can be used to
design innovative memories. Herein the description of each
elementary block follows.
A. Nucleation Center
The creation of a Domain Wall (DW) starts with the
nucleation of the magnetic structure in correspondence with
the ANC. This irradiated spot enables the control of the
switching properties of pNML devices [16]. The propagation
is supported by applying an external clock field called Hclock.
The nucleation and the propagation of the new domain define
the basis for the signal transmission in this technology. The
propagation can be planar, so through neighboring magnets in
3the same plane, or perpendicular, so through overlapping mag-
nets belonging to different layers interconnected by magnetic
vias [12]. Hence, signal transmission can take place among
different layers making it possible to realize 3D structures.
In order to guarantee a correct propagation, the switching
process should be completed during the clock field pulse
time tclock. As a consequence, all the operations required
to switch the magnet, like the nucleation of the domain
wall and the propagation of the new magnetization, must be
completed before the end of the clock period [16]. This defines
a constraint on the pulse time tclock:
tclock > tnuc + tprop. (1)
Here, tnuc represents the time required to nucleate the struc-
ture, while tprop is the time required to propagate the magne-
tization through the entire structure.
To nucleate a nanomagnetic structure means to reverse the
current state of magnetization in the ANC. The field required
to do this can be extracted considering the anisotropy field,
modelled by the Stoner-Wohlfarth model [16] [15]:
Hani =
2Keff,ANC
µ0MS
, (2)
where, MS is the saturation magnetization of the magnet
and Keff,ANC is the effective anisotropy in the ANC. In
fact, each nanomagnet is characterized by a unique anisotropy
term (called effective anisotropy Keff ) which depends on the
crystal structure, the geometry and the material. Hnuc, the
nucleation field required to switch the DW, is equal to the
Hani (eq. 2). The main role of the nucleation field is to reduce
the energy barrier Ebarrier of the structure.
The nucleation field required to nucleate a device is influ-
enced by the superposition the coupling fields of the neigh-
boring cells.
Heff = Hclock − Ceff . (3)
Where Ceff is:
Ceff =
N∑
i=1
CiMi. (4)
Here, Ci represents the coupling fields from the inputs with
magnetization Mi ∈ {−1; 1} [16]. Mi influences the energy
barrier which increases or decreases according to the parallel
or antiparallel state of input magnetization with respect to
the current state. Whether nucleation occurs or not can be
estimated by exploiting the probability of nucleation Pnuc,
expressed in terms of applied field Hz and of the duration time
of the field t by using the Arrhenius model [16]. To nucleate
the ANC of the nanomagnet, the clock field pulse of amplitude
Hpulse and its effective pulse time teff are considered:
Pnuc(teff , Hpulse) = 1− exp
(
−
teff
τ(Hpulse)
)
(5)
τ(Hpulse) = f
−1
0
· exp


E0
(
1−
Hpulse
H0
)2
KBT

 (6)
Two constraints on the nucleation probability can be ex-
pressed:
Pnuc,support = Pnuc(tnuc, Hclock + C)→ 1, (7)
Pnuc,prevent = Pnuc(tclock, Hclock − C)→ 0. (8)
Eq. (7) supports the nucleation, while eq. (8) prevents it. The
error rate of a device during one clock cycle is given by:
Edevice = 1− Pnuc,support · [1− Pnuc,prevent] (9)
The nucleation probability, and therefore the reliability of the
whole pNML circuit, strongly depends on the nucleation time
tnuc, the clocking pulse time tclock , the field amplitudeHclock
and the coupling fields C [16]. The time required to nucleate
a DW can be expressed in terms of the desired probability of
nucleating it:
tnuc = −τ(Heff ) · ln(1 − Pnuc). (10)
B. Domain Wall
Once the nanowire is nucleated, the magnetic charge is
propagated through the entire structure. The propagation is
characterized by a speed of motion, the DW velocity vDW ,
which depends on the applied field [16]. Three main regimes
can be identified in the propagation of the new state, in thin
multilayer structures. These regimes are modelled according
to the external field Hz applied to the magnetic structure [16].
In the first two regimes, where the Hz less or comparable
to the intrinsic pinning field, vDW strictly depends on the
temperature with an exponential relation. In the flow regime
(Hz ≫ Hint) instead, the velocity depends linearly on the
applied field and can be modelled according to the following
equation:
vDW (Hz ≫ Hint) = v0 + µw(Hz −Hint) (11)
where v0 is a numerical prefactor and µw is the domain wall
mobility. From a theoretical point of view, the velocity has a
linear dependency on the applied field and consequently the
adopted working regime is the flow regime. The propagation
time tprop can be defined as follows:
tprop =
lmag
vDW (Hclock)
(12)
where lmag is the length of the magnet.
C. Notch
The propagation of DWs can be controlled by a geometric
modification [14]. With notches it is possible to pin the
magnetization propagation. Here, the pinning and the depin-
ning operations play a fundamental role. In other words, a
notch defines an energy barrier able to block a magnetic
transmission. The depinning field Hdep, which is required to
depin a DW, is [14]:
Hdep = Hint +
σwsinα
2Ms(h+
1
2
δwsinα)
, (13)
where α is the notch apex angle and h the notch width.
4The depinning time (tdep) is the time required to depin a
DW from and it is described by the following equation:
tdep = τ0 · e
MsVa(Hdep −H)
KBT . (14)
Here, Va is the activation volume and τ0 is the inverse of the
attempted frequency f0.
In order to depin the magnetic domain from the notch, an
external in-plane field is applied. In fact, the out of plane
clock field applied externally, which is required to switch
correctly the domain wall and propagate the new domain, has
an amplitude which is not high enough to depin the notch. For
this reason, an additional in-plane field is used to depin it since
this reduces the required depinning field. Since temperature
plays an important role, and can help to overcome the energy
barrier of the notch, it is important to define the probability
of depinning [14]:
Pdep = 1− e
t
τ(Heff ) , (15)
where Heff represents the applied effective magnetic field,
which is a combination of the in-plane and out-of-plane fields.
t instead, represents its duration, while τ(Heff ) is the time
constant which describes the switching of the magnetization
[14].
D. Vias
Vertical interconnections (vias) represent the key element
for the realization of signal crossings. This element enables
the design of 3D architectures with pNML. In these novel
structures, information is carried among different layers by
using such magnetic vias [12]. It acts as an ANC, but since
the nucleation occurs within vertically aligned magnets the
coupling is ferromagnetic. This defines a new constraint in the
clock pulse duration. Indeed, tclock should be long enough to
guarantee both the nucleation of all vias and the propagation
of the domain walls, in all layers.
E. VHDL Implementation Hints
All the equations here reported are implemented in VHDL
by using the math and numeric libraries. The idea behind this
compact model is to have a single entity for each basic compo-
nent. For example, the interconnection of a Nucleation Center
and a Domain Wall entities defines the structure of the device
represented in Fig. 1.D. Hence, similar devices can be arranged
and connected together composing logic and finally circuits.
The model is structured in two parts: I) a library containing
the implementation of physical equations and all parameters
(like constants, geometrical information etc ) and II) the entity
declaration of each component with the implementation of its
physical behavior. An example of the parameter contained in
the library file is given in the following paragraph in table
I and II. Here, all the geometrical and fields parameters are
listed and together with the corresponding value. Designers are
free to modify all these quantities according to their specific
needs, in this way it is possibly to verify of the performance
of pNML devices vary in order to refine the peculiarity of
this technology. Equations are implemented through several
procedures. The main implemented functions are:
• Evaluation of the propagation time tprop
• Evaluation of the nucleation time tnuc
• Evaluation of the DW velocity
• Computation of the nucleation probability
• Extraction of the critical path
• Evaluation of the minimum nucleation time
All the procedures are invoked by the components’ entities.
For example, the implementation of the Nucleation Center sees
a list of several calls to the needed procedures. In this case,
the first function invoked is Computation of the nucleation
probability. Here the effective field Heff , the switching time
τeff etc... are evaluated by the use of the equation reported
in the previous paragraphs. At last, the nucleation probability
is returned and then the magnetization status can be evaluated
according to the clock signals and the status of the neighboring
magnets.
III. CIRCUIT ANALYSIS
The compact model presented here makes it possible to
simulate logic architectures while preserving technological
information needed to study the behavior of pNML circuits.
Different experiments have been conducted on a Full Adder
(FA) [10] circuit in order to extract the physical quantities
and fit the equations which must be inserted within the
implemented libraries. The following circuits have been tested
by using the technological parameters listed in Tab. I and II.
Table I
GEOMETRICAL PARAMETERS SET ACCORDING TO [16], [10].
Geometrical parameter Value
Length of the basic block DW 3.5 · 10−7 m
Width of the domain wall 2.0 · 10−7 m
Thickness of the Co 3.2 · 10−9 m
Thickness of the stack 6.2 · 10−9 m
ANC volume 1.68 · 10−23 m3
Apex angle 51.5◦
Notch height 54.0 · 10−9 m
Activation volume 1.26 · 10−23 m3
Table II
FIELD PARAMETERS SET FOR THE SIMULATION, ACCORDING TO [10],
[16].
Field parameter Value
Clock field amplitude 560 Oe
Intrinsic pinning field 190 Oe
Coupling field strength for the inverter 153 Oe
Coupling field strength for the majority voter 48 Oe
Coupling field strength for the magnetic via 75 Oe
Effective anisotropy 2.0 · 105 J/m3
Saturation magnetization of Co 1.4 · 106 A/m
Depinning field 736.6 Oe
In this paper we present how the behavior of a multiplexer
(MUX) realized experimentally has been mirrored by using
our model. This circuit has then been adopted in order to
5A
B
Sel
Fixed “0”
Fixed “0”
Z
A B Sel Z
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0
0
0
1
1
0
1
1
In1 In2 In3 Out
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
N
A
N
D
b
e
h
a
v
io
r
N
O
R
b
e
h
a
v
io
r
In1
In2
In3
Out
Minority Gate
C) D)
E) F)
2µm
‘1’ ‘0’B
Sel
A
Z
A) B)
Figure 3. A) SEM image of the fabricated 2-to-1 1-bit multiplexer; B) Wide-
field-MOKE image of the fabricated multiplexer when the input signals are
equal to A=1, B=0 and Sel=1. C) Implemented 2-to-1 MUX with D) truth
table. E) NAND and NOR behaviors can be programmed according to the
input pattern of the F) Minority Gate.
realize an accumulator. Performance studies are here proposed,
also considering a generic implementation which allows the
investigation of the n-bit accumulator. Hence, this architecture
combines the use of both the MUX and the FA, also exploiting
the novel 3D topology.
A. Multiplexer
In general, the functionality of the considered 2-to-1 MUX
can be described as:
Z = (A · S) + (B · S) : (16)
Eq. (16) is then modified as follows in order to be synthesized
by using only NAND gates:
Z = (A · S) + (B · S)
Z = (A · S) · (B · S)
(17)
The NAND operation in realized by using the majority voter
gate, with one input fixed to 0. The multiplexer needs three
majority voter gates and one inverter. The realized structure is
depicted in Fig. 3.E and F.
The fabrication process started from a cleaned
silicon wafer. The multilayer stack, which enables the
perpendicular magnetic anisotropy, has been obtained
by RF magnetron sputtering. An ultra-thin film of
Ta1.7nmPt4nm[Co0.75nmPt1.4nm]x4Pt2.75nm has been deposited.
The adhesion layer (Ta) and Pt have been sputtered at 2µbar,
whereas Co was sputtered at 4µbar. After spin coating a
thin layer of PMMA resist, the magnetic stack has been
patterned by using Focus-Ion-Beam (FIB) lithography with a
dose of 5 · 1012 ions/cm2. A computer-defined mask of the
multiplexer has been used to pattern the multilayer stack.
During the lithography, the ANCs are defined by increasing
the Ga+ dose to 5 · 1013 ions/cm2 over a specific spot of
30x30 nm2. Afterwards, the exposed resist is developed for
15s. By electron beam physical vapor deposition, a thin layer
of Ti is deposited on top to protect the magnetic stack. Then,
the remaining resist is lifted off and the magnetic devices are
structured by Ion Beam Etching.
Fig. 3.A shows the SEM image of the fabricated device.
Here, magnetic nanowires are in the range of 400nm. Wide
nanomagnets have been fabricated to get sharper images
during the measurement phase at the Wide-field-MOKE. We
have verified the logic behavior of the 2-to-1 multiplexer by
using a Wide-field-MOKE microscope. The out-of-plane field
was generated by an external electro-magnet placed just below
the circuit. In Fig. 3.B we report the correct ordering of the
magnetic circuit for the input combination: A=1, B=0 and
Sel=1. The Wide-field-MOKE image shows that the proper
input B is selected and transferred to the output (Z). The
measured coupling field of the inverter gate is 10mT.
The functionality of the circuit depicted in Fig. 3.C and D is
analyzed by using the VHDL model both in terms of switching
behavior and of timing performance.
In the first instance, the simulation extracts the maximum
clock frequency that can be achieved for the considered circuit.
This value is strictly related to the critical propagation time,
so the propagation delay introduced by the longest domain
wall, and the minimum nucleation time required to nucleate
the ANC. The extracted minimum nucleation time is 558 ns,
while the longest propagation time is 73.7 ns. According to
these values, the clock period is evaluated by setting in the
VHDL package the values of tnuc = 600 ns and tprop = 80
ns. The resulting tclock is 850 ns considering value rounding
and the non-idealities introduced by Hclock; indeed, the rise
time must be taken into account, and it is calculated as
20% of the pulse duration in the worst possible case. This
timing analysis is related to the geometric characteristic of the
examined sample. Indeed, in this example larger device sizes
have been adopted in order to improve the testability during
measurements. Nevertheless, smaller dimensions can lead to
significantly higher performance.
Considering these parameters, the circuit latency can be ex-
tracted considering an exhaustive analysis of the input patterns.
Results are reported in Tab. III. Generally, the output is valid
during the second clock cycle. Only the input combination
1–0–0 has the longest delay, since it passes through all the
chains (due to the fact that the majority voter that computes
A · S) needs the reverse of the selector to discriminate the
output. Moreover, this value is also needed by the last majority
voter to compute the final result.
6Table III
LATENCY FOR THE OUTPUT OF THE 2 TO 1 MULTIPLEXER.
A B Selector Z Z latency
0 0 0 0 3.172 µs
0 0 1 0 3.172 µs
0 1 0 0 3.172 µs
0 1 1 1 2.322 µs
1 0 0 1 4.022 µs
1 0 1 0 3.172 µs
1 1 0 1 2.322 µs
1 1 1 1 2.322 µs
B. 2-bit accumulator
The accumulator architecture considered here is depicted in
Fig. 4. The arithmetic unit is an adder realized by using the FA
presented in [10]. The MUX is used to select one of the inputs
of the adder unit. The choice is made between the previous
computed sum, and the current applied input. The storing of
the information is accomplished by using the notch structures.
The structure is organized in a 3D topology. It is divided in
three planes, exploiting the LIM (Logic In Memory) approach
[17]: the memory plane, the logic plane and the routing
plane. The pNML technology offers the possibility to be
easily adopted in 3D organizations. since the different layers
can communicate by ferromagnetic coupling. In this way no
complex interconnections are needed, and moreover, area can
be saved by overlapping different layers.
The logic plane contains the computational units: the 2-to1
MUX and the FA. FAs can be arranged in order to compose
a Ripple Carry Adder (RCA). In this organization, each carry
is passed to the next FA and only when the last terminates its
computation the final result is valid.
The routing plane connects inputs with the logical plane. It
is used also to connect the MUX with the FA and the carries.
While designing the accumulator, paths must be balanced in
order to guarantee a correct signal synchronization. Hence,
two inverters in cascade must be added in order compensate
the delay introduced by the MUX. This guarantees inputs to
reach the full adder in the same clock cycle. Moreover, each
FA belonging to the RCA introduces a delay. For this reason,
more significant bits must be properly delayed.
Within the memory plane, notches have a twofold func-
tionality: they can be used as memory element able to store
the current state of magnetization, and they can be used as
programmable inputs for majority/minority gates, in order to
realize logic programmable architectures.
The logic plane communicates both with the routing plane
and the memory plane. In this topology, external inputs are
connected to logic through the routing plane. Indeed, partial
results of the accumulator are stored thanks to the notches
belonging to the memory plane.
Similarly to what is presented for the MUX, the accumulator
has also been tested with our physical-compact model. Here t,
tnuc,min is equal to 558 ns and the longest propagation time
(tprop ) is 622 ns. This critical value comes from the feedback
route. By adding the rise time, which is equal to the 20% of
the tclock, the final duration of the applied Hclock is 1.625 µs.
Table IV
RESULTS OF THE N-BIT ACCUMULATOR: tnuc,min , tprop , CRITICAL PATH
(C.P.), tclk AND LATENCY (IN TERMS OF CLOCK CYCLES).
tnuc,min[ns] tprop[ns] C.P.[µs] tclk[µs] Lat. [cc]
2 bit 558 622 1.22 1.625 6
4 bit 558 622 1.22 1.625 8
8 bit 558 622 1.22 1.625 12
16 bit 558 842 1.442 1.875 16
32 bit 558 1.314 1.914 2.5 32
For the 2-bit accumulator, the extracted critical path is:
tcritical = 1.22 µs.
The various input combinations are tested. After the sixth
clock cycle for any combination the outputs are ordered
correctly. The critical output bit is the one that comes from
the sum of the last full adder, so the second bit of the final
result.
The selector is enabled at 26.0 us in order to guarantee that
the first result is valid for any combination.
The in plane field that depin the notch structure in the
memory layer is activated with two pulses of a duration of
50 ns. The first one is activated during the positive phase of
the clock field, while the other in the negative phase. The
couple of pulses is activated every 21.0 us. After six clock
cycles the new computed result is valid.
C. Generic Accumulator
The design of the 2-bit accumulator wa intended to be
flexible from the first steps of development. As a consequence,
the layout has been developed considering the repeatability
of the structure. The generic VHDL implementation of the
architecture makes it possible to automatically scale the input
parallelism. For each configuration, values of tnuc,min, tprop
and tclock are evaluated (Tab. IV).
It can be noticed that, for the 4-bit and 8-bit structures, the
critical path is the same as for the 2-bit accumulator. For the
other structures, the critical path is related to the routing of
the inputs to the FA, which increases with the number of bits.
A summary of how the latency varies depending on number
of bits, is reported in Tab. IV. The selector and the pulses of
the in-plane field are enabled according to the number of bits
of the tested architecture. As for the 2-bit case, the selector is
set to one after the first activation of the two pulses of in-plane
field.
IV. FUTURE DEVELOPMENTS AND CONCLUSIONS
The results reported have been obtained according to the
physical quantities reported in in Tab. I and II. However,
further technological improvement are currently under inves-
tigation [18].
As already discussed, dataflow directionality depends on FIB
irradiation, which creates the artificial nucleation center. The
size of the ANC would limit the scaling of pNML technology.
However, we are trying to solve this limitation by tuning the
tip geometry and simplifying the fabrication process. The idea
is to remove the FIB irradiation within the fabrication process
and avoid the ANC scaling limitation.
70
1 M
U
X
A
d
d
e
r
M
e
m
o
ry
Selector
B
A
A
Sum
Cout
Full Adder
Multplexer
A) B) C)
B(1)
A(1)
Sel
B(0)
A(0)
Sel
Out(0)
Out(1)
Out(2)
Interconnection
Layer
Logic
Layer
Memory
Layer
Figure 4. A) The Accumulator architecture is composed an adder and a multiplexer. The former is implemented as a chain of FAs. B) Accumulator layout
C) FA layout already verified [10]
Usually our process starts from a cleaned and oxidized silicon
wafer. On top of that, we sputter the multilayer of Co/Pt
and we spin coat a thin layer of PMMA resist. During the
lithography we pattern the magnet geometry and we define the
ANC by FIB irradiation. Then, the exposed resist is developed
and the thin layer of Ti is evaporated to protect the magnetic
layer during the lift-off process. Finally, the magnets are
structured by Ion Beam Etching.
As reported by Kimling in [19], modifying the process in
the following way could lead to a 60% reduction of the
nucleation field. In this process, on top of the cleaned silicon
wafer a thin layer of PMMA is spin coated before doing
the lithography. Afterwards, the irradiated resist is developed
and the multilayer stack (Co/Pt) is sputtered. As a final step,
the non-irradiated resist is lifted-off concluding the fabrication
process.
This process exploits the undercut profile obtained after the
development to have a non-homogenous thickness of the Co/Pt
stack. In particular, the magnetic stack will be thinner on the
side of the magnet and on the tip. When the multilayer stack is
thinner, the magnetic anisotropy is locally reduced. Therefore,
by shaping the tip (input) of the nanowire it is possible to
define the ANC without the need of the FIB irradiation and
the Ion Beam Etching steps.
With this paper, we have presented a physical compact
model enabling fast simulations of complex logic architec-
tures. The model has been proved through experiments; per-
formance of a Multiplexer and a generic n-bit Accumulator
are analyzed in order to be an example of how our model
can be used for further investigations. Its flexibility offers
the possibility to be easily adapted by simply modifying
the technological parameters. As an example, to scaling the
size of DomainWalls results in an improvement of timing
performance. Moreover, it is possible to study how different
fabrication processes might lead to more reliable circuits. In
other words, with the presented compact-model it is possible
to verify how physical can influence performance.
As a future step, we are now integrating the model presented
here into our ToPoliNano suite. This will enable the automatic
generation of the VHDL code making it possible to simply
describe circuits through a graphical representation of a set of
basic elements.
Investigation on logic architectures can be carried on also
exploiting the novel concept of Logic In Memory which sees
the integration of memory elements (notches) and logic onto
the same device. Indeed, the introduction of notches and vias
enables the design of 3D circuits.
REFERENCES
[1] “International Technology Roadmap of Semiconductors 2.0. Beyond
CMOS.” 2015, http://public.itrs.net.
[2] R. L. Stamps, S. Breitkreutz, J. kerman, A. V. Chumak, Y. Otani,
G. E. W. Bauer, J.-U. Thiele, M. Bowen, S. A. Majetich, M. Klui, I. L.
Prejbeanu, B. Dieny, N. M. Dempsey, and B. Hillebrands, “The 2014
magnetism roadmap,” Journal of Physics D: Applied Physics, vol. 47,
no. 33, p. 333001, 2014.
[3] C. Augustine, X. Fong, B. Behin-Aein, and K. Roy, “Ultra-Low Power
Nano-Magnet Based Computing: A System-Level Perspective,” IEEE
Transaction on Nanotechnology, vol. 10, no. 4, pp. 778–788, 2011.
[4] M. Niemier and al., “Nanomagnet logic: progress toward system-level
integration,” J. Phys.: Condens. Matter, vol. 23, p. 34, Nov. 2011.
[5] G. Turvani, F. Riente, F. Cairo, M. Vacca, U. Garlando, M. Zamboni,
and M. Graziano, “Efficient and reliable fault analysis methodology
for nanomagnetic circuits,” International Journal of Circuit Theory and
Applications.
[6] M. Becherer, G. Csaba, W. Porod, R. Emling, P. Lugli, and D. Schmitt-
Landsiedel, “Magnetic ordering of focused-ion-beam structured cobalt-
platinum dots for field-coupled computing,” IEEE Transactions on
Nanotechnology, vol. 7, no. 3, pp. 316–320, May 2008.
[7] M. Niemier, G. Csaba, A. Dingler, X. S. Hu, W. Porod, X. Ju,
M. Becherer, D. Schmitt-Landsiedel, and P. Lugli, “Boolean and non-
boolean nearest neighbor architectures for out-of-plane nanomagnet
logic,” in 2012 13th International Workshop on Cellular Nanoscale
Networks and their Applications, Aug 2012, pp. 1–6.
[8] F. Cairo, G. Turvani, F. Riente, M. Vacca, S. B. v. Gamm, M. Becherer,
M. Graziano, and M. Zamboni, “Out-of-plane nml modeling and ar-
chitectural exploration,” in Nanotechnology (IEEE-NANO) , 2015 IEEE
15th International Conference on, July 2015, pp. 1037–1040.
[9] M. Alam, J.DeAngelis, M. Putney, X. Hu, W. Porod, M. Niemier, and
G. Bernstein, “Clock Scheme for Nanomagnet QCA,” in International
Conference on Nanotechnology. Hong Kong: IEEE, 2007, pp. 403–408.
[10] S. Breitkreutz, J. Kiermaier, I. Eichwald, C. Hildbrand, G. Csaba,
D. Schmitt-Landsiedel, and M. Becherer, “Experimental demonstration
of a 1-bit full adder in perpendicular nanomagnetic logic,” IEEE
Transactions on Magnetics, vol. 49, no. 7, pp. 4464–4467, July 2013.
8[11] M. Becherer, J. Kiermaier, S. Breitkreutz, I. Eichwald, G. Csaba,
and D. Schmitt-Landsiedel, “Nanomagnetic logic clocked in the mhz
regime,” in 2013 Proceedings of the European Solid-State Device
Research Conference (ESSDERC), Sept 2013, pp. 276–279.
[12] “A monolithic 3d integrated nanomagnetic co-processing unit,” Solid-
State Electronics, vol. 115, Part B, pp. 74 – 80, 2016, selected papers
from the EUROSOI-ULIS conference.
[13] Breitkreutz, Stephan, Eichwald, Irina, Kiermaier, Josef, Papp, Adam,
Csaba, Gyrgy, Niemier, Michael, Porod, Wolfgang, Schmitt-Landsiedel,
Doris, and Becherer, Markus, “1-bit full adder in perpendicular
nanomagnetic logic using a novel 5-input majority gate,” EPJ
Web of Conferences, vol. 75, p. 05001, 2014. [Online]. Available:
http://dx.doi.org/10.1051/epjconf/20147505001
[14] J. J. W. Goertz, G. Ziemys, I. Eichwald, M. Becherer, H. J. M. Swagten,
and S. Breitkreutz-v. Gamm, “Domain wall depinning from notches
using combined in- and out-of-plane magnetic fields,” AIP Advances,
vol. 6, no. 5, 2016.
[15] “Modeling and simulation of nanomagnetic logic with cadence virtuoso
using verilog-a,” Solid-State Electronics, pp. –, 2016.
[16] S. Breitkreutz, I. Eichwald, G. Ziemys, D. Schmitt-Landsiedel, and
M. Becherer, “Influence of the domain wall nucleation time on the relia-
bility of perpendicular nanomagnetic logic,” in 14th IEEE International
Conference on Nanotechnology, Aug 2014, pp. 104–107.
[17] D. Pala, G. Causapruno, M. Vacca, F. Riente, G. Turvani, M. Graziano,
and M. Zamboni, “Logic-in-memory architecture made real,” in 2015
IEEE International Symposium on Circuits and Systems (ISCAS), May
2015, pp. 1542–1545.
[18] M. Becherer, J. Kiermaier, S. Breitkreutz, I. Eichwald, G. Csaba,
and D. Schmitt-Landsiedel, “Nanomagnetic logic clocked in the mhz
regime,” in 2013 Proceedings of the European Solid-State Device
Research Conference (ESSDERC), Sept 2013, pp. 276–279.
[19] J. Kimling, T. Gerhardt, A. Kobs, A. Vogel, S. Wintz, M.-Y. Im,
P. Fischer, H. P. Oepen, U. Merkt, and G. Meier, “Tuning of the
nucleation field in nanowires with perpendicular magnetic anisotropy,”
Journal of Applied Physics, vol. 113, no. 16, p. 163902, 2013. [Online].
Available: http://aip.scitation.org/doi/abs/10.1063/1.4802687
Giovanna Turvani received the M.Sc. degree with
honors (Magna Cum Laude) in Electronic Engineer-
ing in 2012 and the Ph.D. degree from the Po-
litecnico di Torino. She was Postdoctoral Research
Associate at the Technical University of Munich
since December 2016. She is currently Postdoc-
toral Research Associate at Politecnico di Torino.
Her interests include CAD Tools development for
non-CMOS nanocomputing, architectural design for
nanomagnetic computing and device modeling.
Fabrizio Riente received his M.Sc. Degree with
honors (Magna Cum Laude) in Electronic Engineer-
ing in 2012 and the Ph.D. degree in 2016 from the
Politecnico di Torino. He is currently Postdoctoral
Research Associate at the Technical University of
Munich. His primary research interests are device
modeling, circuit design for nano-computing, with
particular interest on magnetic QCA. His interests
cover also the development of EDA tool for beyond-
CMOS technologies, with the main focus on the
physical design.
Elisa Plozner received the M. Sc. degree in Elec-
tronic Engineering in 2016. Her interests include
low-power circuits, architectural design and emerg-
ing technologies.
Marco Vacca received the Dr. Eng. degree in Elec-
tronics engineering from the Politecnico di Torino,
Turin, Italy, in 2008. In 2013, he got the Ph.D. de-
gree in Electronics and Communications engineering
and he is now a Research Assistant at Politecnico di
Torino. His research interests include Nanomagnet
Logic and others beyond-CMOS technologies. He
is also an expert of innovative and unconventional
computer architectures.
Mariagrazia Graziano received the Dr.Eng. degree
and the Ph.D in Electronics Engineering from the
Politecnico di Torino, Italy, in 1997 and 2001, re-
spectively. Since 2002 she is Assistant Professor at
the Politecnico di Torino. Since 2008 she is adjunct
Faculty at the University of Illinois at Chicago and
since 2014 she is a Marie-Curie fellow at the London
Centre for Nanoelectronics. She works on ”beyond
CMOS” devices, circuits and architectures.
Stephan Breitkreutz-v. Gamm received the
Diploma in electrical engineering in 2009 and the
Doctor’s degree in 2015 at the Technical Univer-
sity of Munich (TUM). He has been awarded with
the Dr. Wilhelmy-Stiftungspreis for his excellent
dissertation and worked as head of the research
group for Nanomagnetic Logic (NML) at TUM until
2017. He has authored and co-authored more than
60 publications and worked as reviewer for several
leading international journals. In 2017, he joined
the Infineon Technologies AG, Germany, where he
focuses on the dielectric reliability of advanced CMOS devices.
