Shannon-inspired Statistical Computing to Enable Spintronics by Patil, Ameya D. et al.
Shannon-inspired Statistical Computing to Enable Spintronics 
Ameya D. Patil1, Sasikanth Manipatruni2, Dmitri E. Nikonov2, Ian A. Young2, Naresh R. 
Shanbhag1.  
1 Department of Electrical and Computer Engineering, University of Illinois at Urbana-
Champaign, Urbana, IL 61801 USA.  
2Components Research, Intel, Hillsboro, OR, 97124, United States. 
Summary: 
Modern computing systems based on the von Neumann architecture are built from silicon complementary 
metal oxide semiconductor (CMOS) transistors that need to operate under practically error free conditions 
with 1 error in 1015 switching events. The physical dimensions of CMOS transistors have scaled down over 
the past five decades leading to exponential increases in functional density and energy consumption. Today, 
the energy and delay reductions from scaling have stagnated, motivating the search for a CMOS 
replacement. Of these, spintronics offers a path for enhancing the functional density and scaling the energy 
down to fundamental thermodynamic limits of 100𝑘𝑇 to 1000𝑘𝑇. However, spintronic devices exhibit 
high error rates of 1 in 10 or more when operating at these limits, rendering them incompatible with 
deterministic nature of the von Neumann architecture. We show that a Shannon-inspired statistical 
computing framework can be leveraged to design a computer made from such stochastic spintronic logic 
gates to provide a computational accuracy close to that of a deterministic computer. This extraordinary 
result allowing a 1013 fold relaxation in acceptable error rates is obtained by engineering the error 
distribution coupled with statistical error compensation.   
 
 
Computing’s origins in Turing’s abstract universal machine1 and its realization through von Neumann’s 
stored program architecture2 have required electronic devices (transistors) that switch nearly error free with 
less than 1 error in 1015 switching events. The geometric dimensions (feature sizes) of these devices have 
been scaling relentless over the past five decades leading to improved density of transistors (109/cm2), and 
a reduction in energy (10000kT/event) and delay (<10 ps), while preserving their nearly error free switching 
behavior. This reduction in energy is due to a commensurate scaling down of the supply voltage, which has 
accompanied the scaling of transistor feature size. This form of scaling, referred to popularly as Moore’s 
Law3, has required an aggressive adoption of a number of semiconductor process technology innovations 
such as superior electrostatic control4, new gate oxide materials5 and the manipulation of the carrier 
transport via the use of strained semiconductors6. However, in spite of successful scaling of the transistor 
geometric dimensions, the supply voltage and frequency of operation have stagnated due to the fundamental 
limits to current modulation (60 mV/decade of change in current magnitude) imposed by Boltzmann 
distribution of electrons at room temperature. Hence it is of great interest to explore new computational 
devices and new models of computation that leverage the unique properties of such devices to enable 
continued computational scaling. In other words, the exploration of new computational devices needs to be 
conducted in concert with an exploration of new models of computation. 
In particular, spin-based computational devices built with materials with magnetic order and spin polarized 
transport have emerged as a viable beyond CMOS option, due to their potential to operate at energy levels 
close to 1000𝑘𝑇 − 100𝑘𝑇7. These devices are a subset of the beyond CMOS devices which include devices 
based on electron spin8,9, electron tunneling10, ferro-electric11, magneto-electric12 and multi-ferroic13 
phenomena. Furthermore, these devices also possess favorable attributes such as: 1) non-volatility (ability 
to retain the information in the absence of power); 2) higher logical efficiency (i.e., fewer devices required 
to realize a logic function); and 3) high integration density due to these devices being all-metallic and 
therefore compatible with the state-of-art back-end electronics manufacturing processes.  However, spin-
based devices have high error rates of 1 in 10 or more precisely due to the need to operate at an energy 
significantly below the present state-of-the-art electronic switches14,15. Hence, in order to design reliable 
spin-based computing systems there is a need to investigate computational frameworks that can compensate 
for or tolerate errors. In a sharp contrast to computing systems, communication systems today transmit data 
reliably i.e., with error rates 𝑝𝑒 < 10
−12 even when the error rate 𝜖 of the physical channel is as high as 
10−2 . This enormous reduction in error rates was enabled by the foundational work of Shannon16. 
However, Shannon theory has not had much impact in the field of computing even though the potential for 
such an impact was predicted by von Neumann17 – an aspiration that remains to be fulfilled to this day. This 
article we describe a Shannon-inspired approach to design reliable computing systems using high error-rate 
spin devices. 
The problem of designing reliable computing systems using erroneous components was first addressed by 
von Neumann17 who defined a reliable logic network as one whose output exhibits a probability of error 
𝑝𝑒 < 0.5 when designed using 𝜖-noisy logic gates, i.e., gates whose outputs are in error with probability 𝜖. 
Von Neumann showed that a reliable logic network can be designed for any logic function provided 𝜖 ≤
 0.0073 and that it is impossible to do so if 𝜖 ≥
1
6
. Later upper bounds on 𝜖 were obtained in a series of 
papers18,19 culminating with that of Evans and Schulman20. These works proved a precise formula for the 
upper bound 𝜖o such that any logic function can be reliably computed using 𝑘-input 𝜖-noisy gates provided 
𝜖 < 𝜖o. In these works, the design method for constructing reliable logic networks was to replicate 
individual logic gates and then majority vote the output – the 𝑁-modular redundancy (NMR) method. NMR 
is both area and energy expensive. Other fault-tolerant techniques such as check-pointing21 also incur a 
significant energy cost. Thus, the problem of designing a reliable and energy efficient computing system 
using dynamic stochastic switching elements remains an open problem to this day. 
In this article, we show that the Shannon-inspired statistical computing (SISC) framework (shown in Fig. 
1) enables the design of reliable all-spin logic (ASL) networks operating in stochastic regime (error rate 
𝜖 ≈10%) and achieves energy efficiency improvement over an equivalent CMOS implementation. The 
ASL gate is operated in a stochastic regime – the Shannon ASL (S-ASL) gate – and used as a primitive for 
constructing Shannon-inspired computing systems. In the language of Shannon theory, in Figure 1, the 
main computational block is a channel processing information bits while the low-complexity estimator is a 
side channel transmitting parity bits. The fusion block acts as a decoder that generates its best estimate of 
the correct output by observing the outputs of the main computational block and the estimator.  
As shown in Fig. 1, a high-complexity main block is constructed using high 𝜖, low energy, S-ASL gates 
which result in computational errors 𝜂. To compensate for 𝜂, a low-complexity statistical error compensator 
comprising an estimator with estimation errors 𝑒 and a fusion block, is constructed from low 𝜖 S-ASL gates. 
A central concept in Shannon-inspired statistical computing is to transform the network topology of the 
main block and the estimator algorithm so as to generate a disparity between the probability mass functions 
(PMFs) 𝑃𝜂(𝜂) and 𝑃𝑒(𝑒). This disparity is the key to enhancing the robustness of the architecture with 
minimal overhead error compensation. Our prior work22 has shown that compensating for errors with low 
overhead is possible when the error PMF 𝑃𝜂(𝜂) is sparse and 𝑃𝑒(𝑒) is dense.  Obtaining a dense 𝑃𝑒(𝑒) is 
not a problem as such PMFs arise in most estimation algorithms. In this article, we show that sparse error 
PMFs 𝑃𝜂(𝜂) can be engineered through logic transformations applied to S-ASL networks. Furthermore, we 
show that NMR techniques to compensate for errors incurs a large overhead (by a factor of 𝑁) which 
nullifies the energy efficiency benefits of spintronics.  
A Stochastic Spin-based Switch and a Spin-based Logic Gate 
An inverting/non-inverting S-ASL gate performing a logic inversion with directionality of signal 
(information) flow is shown in Fig. 1.  It comprises two nanomagnets whose magnetic moment directions 
represent information bits. Each nanomagnet is shared by two spin conduction channels. The logic gate 
operates via an asymmetric injection of spin current into the spin conduction channel from the nanomagnets. 
This spin current, in turn, exerts a torque on output magnet forcing it to switch (see methods for the details). 
The output magnet, however, may not switch with finite probability 𝜖 due to the presence of Langevin 
thermal noise, making a logical error at the gate output (see methods for the details). A complete 
combinatorial spin logic family including spin majority gates23, spin interconnects24,25, spin state elements 
and random access spintronic memory can be developed and employed to construct a general-purpose 
Turing-complete computer26.  
We observe that there is a trade-off between gate-level error rate 𝜖(𝐸, 𝑇𝑔), the switching energy 𝐸, and the 
delay 𝑇𝑔 in S-ASL gates as shown in Fig. 2A. While communication systems comprehend this trade-off 
between channel error rate and transmit energy, the trade-off between 𝜖(𝐸, 𝑇𝑔), 𝐸, and 𝑇𝑔 is novel for 
computation 27.  In order to capture this tradeoff, the stochastic behavior of the S-ASL gate, the switching 
of magnetic moment of the output magnet in particular, is modeled as a Wiener process with a stationary 
noise term calibrated by fluctuation dissipation theorem28.  Corresponding Fokker-Planck (FP) equation29 
of the switching magnetic layer (assuming uniaxial anisotropy commonly used in scaled nanomagnetic 
devices) is solved numerically to capture the time evolution of the probability distribution (𝜌(𝜃)) of the 
angle 𝜃 made by the magnetization vector with the anisotropy direction.  We validated the stochastic 
properties using extensive time domain Monte Carlo simulations with Stratanovich implementation and FP 
equations. Fig. 2A shows the energy-delay contours of S-ASL inverter at different error rates 𝜖.  While 
competitive with CMOS at 𝜖 = 0.5, i.e., at average switching delay, S-ASL is not energy competitive at 
very low error rates of  𝜖 ≈ 10−15 which is needed by traditional von Neumann architectures. 
We develop a modified ϵ-noisy gate (Fig. 2B) to describe S-ASL gate which comprehends the underlying 
physical stochastic behavior, while being sufficiently abstract to permit the design and analysis of complex 
ASL networks.  The modified ϵ-noisy gate comprehends: a) the multiplicative noise in the dynamics of the 
nanomagnets undergoing field and spin transfer switching, and b) the input dependence of S-ASL errors 
(the S-ASL gate makes an error only when the output needs to switch and does not, implying a dependence 
of the error event on the input data). Thermal noise appears as an antipodal (±𝑉) variable and has a 
multiplicative effect on the output of a S-ASL gate. It is assumed that Boolean inputs 𝐴𝑡 and 𝐵𝑡 are provided 
at time 𝑡 and the S-ASL gate generates its output 𝐶𝑡+𝑇𝑔at time 𝑡 + 𝑇𝑔 where 𝑇𝑔 is the delay assigned to the 
S-ASL gate. The model is composed of an ideal noise-free Boolean gate whose output 𝐶𝑜 = 𝐴𝑡𝐵𝑡 is 
EXORed with a Bernoulli noise random variable 𝜃 ∈ {0,1} with parameter 𝜖, i.e., Pr{𝜃 = 1} = 𝜖 = 1 −
Pr {𝜃 = 0}. The output selector computes the final output 𝐶𝑡+𝑇𝑔 by choosing either the output of the EXOR 
gate 𝐶𝑒 = 𝐶𝑜 ⊕ 𝜃 or the error-free output 𝐶𝑜. This EXOR gate output is chosen only if 𝐶𝑜 ≠ 𝐶𝑡, where 𝐶𝑡 
is the S-ASL gate output at time 𝑡. The 𝜖-noisy gate models noise as being additive in the Galois Field of 2 
(𝐺𝐹(2)) which is equivalent an antipodal multiplicative noise variable ∈ {±1} .  
Shannon-inspired Statistical Computing 
The Shannon-inspired statistical computing architecture shown in Fig. 3A includes: 1) a main block that 
does the bulk of the computation using energy efficient but high error rate circuits, 2) a low-complexity 
estimator that computes an estimate 𝑦𝑒 of the main block output 𝑦𝑎, and 3) a low-complexity fusion block 
that combines where 𝑦𝑎 and 𝑦𝑒 to generate an error compensated output ?̂? that is ‘close’, in a statistical 
sense, to the correct (but unknown) output 𝑦𝑜. Both the estimator and the fusion blocks are designed with 
low error rate, and hence energy inefficient, circuits. For example, in this article, we design the main block 
using S-ASL gates with 𝜖 = 0.01 and the estimator and fusion blocks are designed using S-ASL gates with 
𝜖 = 10−6. The main block and estimator outputs are described by an additive noise model given by 
𝑦𝑎 = 𝑦𝑜 + 𝜂 
𝑦𝑒 = 𝑦𝑜 + 𝑒 
where 𝜂 is the architectural level hardware error caused by the 𝜖-noisy S-ASL gates in the main block, and 
𝑒 is the estimation error. Shannon-inspired statistical computing employs the statistics of the input data 𝑥, 
the functionality of the main block, and the statistics of 𝜂 and 𝑒 to obtain the error compensated output ?̂? 
efficiently. For example, the knowledge of the precision and functionality of the main block allows one to 
employ as the estimator a low-precision version of the main block (𝑚 < 𝑘 in Fig. 3A).   
As the estimation error 𝑒 is algorithmic in nature, it is independent of the hardware error 𝜂 whose source is 
Langevin thermal noise. This independence between the two error sources allows for the realization of a 
low complexity fusion block which computes a statistical estimate ?̂?o of 𝑦𝑜 in terms of 𝑦𝑎 and 𝑦𝑒. Though 
several methods to realize estimators and the fusion block have been proposed30,  in this work, we consider 
the method of algorithmic error cancellation (AEC)22, where the fusion block operates on the difference 
𝑦𝑎 − 𝑦𝑒 = 𝜂 − 𝑒. As this is a difference between two independent random variables 𝜂 and 𝑒, the probability 
mass function of this difference is a convolution of 𝑃𝜂(𝜂) and 𝑃𝑒(𝑒), the PMFs of 𝜂 and 𝑒, respectively. 
The PMF of 𝑒 is typically dense and centered around zero as shown in Fig. 3B via the central limit theorem. 
If the PMF of  𝜂 is sparse as shown in Fig. 3B then it can be shown22 that the fusion block can compute an 
approximation to the maximum ?́? posteriori (MAP) rule that chooses ?̂? to maximize the probability  𝑃(?̂? =
𝑦𝑜|𝑦𝑎, 𝑦𝑒). 
?̂? = 𝑦𝑎 − 2
𝑙−𝑠 ⌊
𝑦𝑎 − 𝑦𝑒
2𝑙−𝑠
+
1
2
⌋ 
where 𝑙 is the precision of the main block output, and 𝑠 = ⌊log2 𝑝𝑘⌋ with 𝑝𝑘 denoting the number of distinct 
peaks in the sparse 𝜂 PMF. This results in a low complexity fusion block architecture as shown in Fig. 3C. 
The second term in the equation above is a MAP estimate ?̂? of the hardware error 𝜂. Such a fusion block 
needs three adders and this complexity is independent of the complexity of the main block and the estimator. 
Thus, the fusion block overhead will be a small fraction of the main block as the latter’s complexity 
increases. 
The key to low complexity statistical error compensation is the existence of a low complexity estimator and 
our ability in ensuring a strong disparity between the PMFs of 𝜂 and 𝑒. One way, as described above, of 
ensuring this strong disparity, is to engineer a sparse 𝑃𝜂(𝜂) (𝜂 takes few widely separated large magnitude 
values with high probability) so that it is very different from the 𝑃𝑒(𝑒), which is typically dense (region of 
support is centered in a small neighborhood around 0) 
Engineering Sparse Error PMFs in Computation 
We show that the requirement of sparse PMFs can be imposed on the outputs of a complex S-ASL network 
in order to enable SISC techniques. The distribution 𝑃𝜂(𝜂) is sparse when 𝜂 is restricted to take few large 
magnitude values with high probability. We show how sparsity (large magnitude 𝜂) is enforced via two 
logic transformations: 1) inter path delay balancing (I-PDB), and 2) intra path delay redistribution (I-PDR). 
Figure 4A shows an S-ASL network of a 8-bit ripple carry adder (RCA) which adds two 8-bit inputs 𝑥1 
and 𝑥2 to obtain a 8-bit output 𝑦𝑎 = 𝑥1 + 𝑥2 + 𝜂, and an output carry. When all S-ASL gates have the same 
delay 𝑇𝑔,𝑢 and energy 𝐸𝑔,𝑢, and hence, same error rate 𝜖𝑔,𝑢 = 𝜖(𝐸𝑔,𝑢, 𝑇𝑔,𝑢), the resulting error PMF 𝑃𝜂(𝜂) 
is dense at 𝜖𝑔,𝑢 = 0.1. In this case, paths with the largest number of gates 𝑁𝑐𝑝, i.e., the critical paths, have 
the highest path delay 𝑇cp,𝑢 = 𝑁𝑐𝑝𝑇𝑔,𝑢, which determines the overall throughput. In I-PDB, delays of gates 
lying on the shorter paths are increased, at a constant energy 𝐸𝑔,𝑢, such that every gate lies on at least one 
critical path, i.e., the path with path delay 𝑇cp,𝑢. Thus, I-PDB reduces error rate of many gates on shorter 
paths, while it leaves the paths having 𝑁𝑐𝑝 gates untouched with all their gates operating at high error rate 
as shown in Fig. 4B. This enhances the error PMF sparsity since the number of gates in a path computing 
an output bit increases with the significance of the bit in least significant bit (LSB)-first arithmetic 
architectures such as the RCA. Sparsity can be increased further by using I-PDR where the S-ASL gates 
delays along each path are redistributed while keeping total path delay, hence the throughput constant, as 
shown in Fig. 4C. This redistribution is done such that the error rate of the top few MSBs is greater than 
those of the other bits. In particular, the S-ASL gates at the start and at the end of the critical path are 
assigned lower delay (higher 𝜖) as compared to those that are in the middle. Doing so results in a highly 
sparse PMF 𝑃𝜂(𝜂) as indicated in Fig. 4C. Note that all the delay reassignments in both I-PDB and I-PDR 
techniques are done at a constant switching energy of 𝐸𝑔,𝑢 (moving vertically in Fig. 2A) and at constant 
throughput  (identical critical path delay 𝑇cp,𝑢). We define the average device error rate of the architecture 
as 𝜖cp−avg = 𝜖(𝐸𝑔,𝑢, 𝑇cp−avg), where  𝑇cp−avg = (
𝑇cp,𝑢
𝑁𝑐𝑝
). Note: 𝑇cp−avg = 𝑇𝑔,𝑢, when all gates have equal 
delay. All three architectures and corresponding 𝜂 PMFs (which are obtained for equivalent 15-bit RCAs) 
in Fig. 4 are compared at 𝜖cp−avg = 0.1, same energy and throughput. The supplementary information 
elaborates general algorithms for I-PDB and I-PDR applicable to any logic network. Thus, the I-PDB and 
I-PDR are delay manipulation techniques at the gate-level to achieve effective error statistics shaping at the 
output of a given multi-bit output logic network. However, they can be applied at the level of clusters of 
logic gates such that all the logic gates in any give clusters can have identical supply current and delay. The 
size (average gate count) of such clusters can be chosen appropriately to trade-off the design complexity 
with the effectiveness of error statistics shaping, and hence system-level performance.   
A Support Vector Machine (SVM) classifier using S-ASL Gates 
We demonstrate the benefits of SISC for a S-ASL implementation of SVM classifier used for 
electroencephalogram (EEG) based seizure detection. The SVM implementation is presented with input 
data 𝒙 ∈ ℝ𝑁, which is 𝑁(= 120) dimensional feature vector extracted from the EEG signal. It assigns a 
label ?̂? ∈ {−1,1}, where ?̂? = 1 indicates the presence of seizure. The label ?̂? is calculated according to the 
decision rule ?̂? = sign(𝒘𝑇𝒙 + 𝑏) where 𝒘 ∈ ℝ𝑁 and 𝑏 ∈ ℝ are the 𝑁-dimensional weight vector and a 
scalar bias, respectively. Let 𝑧 ∈ {−1,1} denote the true label for the input vector 𝒙. The accuracy of the 
classifier implementation is captured in terms of true positive (TP) rate 𝑝𝑇𝑃 and false alarm (FA) rate 𝑝𝐹𝐴, 
where 𝑝𝑇𝑃 = Pr{?̂? = 1|𝑧 = 1} and 𝑝𝐹𝐴 = Pr{?̂? = 1|𝑧 = 0}, and the probabilities are estimated empirically 
(via leave-one-out cross-validation31) for the MIT-CHB EEG dataset32 by running extensive Monte Carlo 
simulations. 
Figure 5A shows the conventional serial architecture employing 𝑁 Baugh Wooley multipliers (BWM) and 
a carry save adder (CSA). All gates in this architecture operate at identical error rates. The Shannon-inspired 
architecture in Fig. 5B employs the serial architecture as the main block (MB), and applies I-PDB and I-
PDR to it in order to shape its output error distribution. Since I-PDB and I-PDR techniques make some 
gates operate at lower error rate, few reliable intermediate signals in BWMs can be employed as the 
estimates of their outputs indicated via green RPE-EST blocks in BWMs (see methods for the details). We 
add an error compensation (EC) block consisting of a CSA and a fusion block computing final output ?̂?. 
The overhead of EC block amounts to ~11% of the gate complexity of the MB. In the EC, we assume same 
low error rate  (𝜖 ≈ 10−4𝜖cp−avg) for all the gates in EC to keep its design simple. We assume that the 
fusion block computation can be pipelined since it operates only on the final outputs of the MB and the 
estimator. This allows the gates in fusion block to operate at much lower energy since its critical path is 
much shorter than that of the main block.   
We show that SISC outperforms the traditional serial architectures and 𝑁𝑚 modular redundancy (𝑁𝑚-MR) 
architectures, which replicate the conventional serial architecture 𝑁𝑚 times and take bitwise majority vote 
on their outputs. We compare the 𝑝𝑇𝑃 of the serial architecture (figure S9), the Shannon-inspired 
architecture (figure S10) and a 3-MR architecture at a constant 𝑝𝐹𝐴 = 1%. We observe in Fig. 6A that 
Shannon-inspired architecture can tolerate 1000 × higher device error rate (𝜖cp−avg) compared to the serial 
architecture while maintaining the 𝑝𝑇𝑃 close to that of the fixed point ideal error-free implementation. In 
particular, the TP rate for Shannon-inspired architecture is close to 93% even though the device error rate 
𝜖 is as high as 1%. The 3-MR architecture tolerates a device error rate (up to 0.01%) that is greater than 
that of the serial architecture but lower by 100 × when compared to the Shannon-inspired architecture. 
Conventional 20nm LV CMOS based von Neumann architecture is designed to operate very reliably (𝜖 =
10−15) by carefully budgeting the impact of process, voltage and temperature variations. This Shannon-
inspired approach demonstrates 1013 fold increase in tolerable device error rates, while maintaining the 
system-level performance.  
The ability of Shannon-inspired architecture to tolerate high device error rate translates into gains in energy-
efficiency when compared to both serial and 3-MR architectures. We compare total energy/decision for 
fixed decision delay of 9.7155ns. The Shannon-inspired architecture achieves 3 × lower energy compared 
to the serial architecture (Fig. 6B) while maintaining 𝑝𝑇𝑃 = 93%, thanks to its 1000 × higher device error 
rate tolerance. The 3-MR architecture, however, consumes 2.3 × more energy even after tolerating 
marginally higher device error rate. This is because the energy overhead of replication overweighs the 
energy reduction achieved via tolerating higher device error rate. Thus, the Shannon-inspired computing 
outperforms the conventional architectures in both energy and accuracy with substantial margins. This 
improvement in energy makes the Shannon-inspired architecture competitive to CMOS. All the reported 
energy numbers here do not include the leakage energy or the energy consumed in the interconnects. For 
S-ASL based implementations, we do not include the energy consumed in clocking network for all three 
architectures.  
We have demonstrated the benefits of designing reliable inference systems on stochastic spintronic devices 
by leveraging Shannon theory for error compensation. The ability to perform reliable computation on 
stochastic device fabrics can enable the use of a highly error prone but scalable physical devices. 
 
 
 
 
 
 
 
Methods: 
Description of spin torque logic device, its operation, and estimation of its nominal delay 
and switching energy: 
As an example spintronic logic device amenable to Shannon inspired computing, we describe a 
representative nanoscale logic device, referred to as the All Spin Logic (ASL) device, which operates via 
generation and interaction of spin currents with nanomagnets. The intrinsic ASL device consists of two 
nanomagnets (with magnetization pointing out of plane) sharing a spin conduction channel as shown in 
Extended Data (ED) Fig. 1A. The information bit is represented by the direction of the magnetization vector 
of the nanomagnet. Each nanomagnetic node connects to two spin conduction channels for a) receiving spin 
information b) regenerating the signal information. For positive supply voltages, the intrinsic ASL device 
operates as a Boolean inverter with directionality of signal (information) flow is shown in ED Fig. 1A. The 
input and output nanomagnets inject electrons into the spin conduction channel, where the electron spin 
direction is decided by the orientation of the nanomagnets (for example, nanomagnet with spin pointing to 
the +X direction injects electrons oriented in the -X direction). The dominant magnet, therefore, sets up a 
larger spin polarization with orientation opposite to its own magnetic moment. In ED Fig. 1A and 1B, the 
nanomanget M1 is made dominant by allocating larger overlap area with the spin channel compared to M2. 
A spin current flows in the channel from the higher spin potential8 (M1) to the nanomagnet M2 creating a 
spin torque to orient its magnetic moment in direction opposite to that of the M1. For negative supply 
voltage, a converse process happens forcing M2 to align parallel to M1 and the device acts as buffer. The 
ASL device is also amenable to integration into a microchip (ED Fig. 1B). Majority gates can be created 
by having more input magnets sharing the conduction channel.  
The example materials for forming the nanomagnets can be CoFeB, CoFe-based Heusler alloys, MnxGay 
class of materials, or L10 metals (FePt, FePd). Heusler alloys having a large range of magnetic anisotropy 
(Hk) and low saturation magnetization (Ms) are particularly suitable9. Example channel materials are Si24, 
Cu25 and Graphene33 which exhibit excellent room temperature  
We evaluate the energy-delay product of nominal the ASL device (comprising magnets with 52kT energy 
barrier corresponding to more than 7-year retention time for 1 bit) with nominal existing device parameter 
choices9 (ED Table 1). Materials with existing proof of concept integration in CMOS are chosen. At 
nominal operating voltage of 10 mV, the ASL device operates with a response time ~0.5 ns, which is 
dominated by the delay of switching. The estimated energy-delay product of the device is given by the total 
joule energy supplied by the supply voltage. An example time domain switching dynamics of ASL device 
is shown in ED Fig. 1C, while corresponding spin current magnitudes are shown in ED Fig. 1D, and 
instantaneous power consumption in ED Fig. 1E. These plots are obtained by performing simulations of 
SPICE-based models34 of ASL. Exemplary equivalent circuit34 is shown in ED Fig. 1F.  
Since the nanomagnets are non-volatile, the information bit is preserved even though the power supply of 
ASL gate is turned off after its operation. It has already been shown23 that it is energy-efficient to clock the 
ASL gates such that they are turned ON only when they need to process the information. Such clocking 
scheme eliminates the static power consumption to large extent.  Here we assume that the ASL gates are 
clocked. Hence, the delay of any ASL gate is determined by the time duration for which the gate is turned 
ON. We denote gate delay by 𝑇𝑔.  
The energy consumption in the ASL based Boolean inverter is given by   
𝐸inv = 𝐼supply
2 𝑅spin𝑇𝑔 (𝐴1) 
where 𝐼supply denotes the charge current supplied to the magnet and 𝑅spin is the series electrical resistance 
of the magnet and the channel. The energy consumption of 3-majority gate is given by   
𝐸MAJ3 = 3𝐼supply
2 𝑅spin𝑇𝑔  
since current 𝐼supply needs to be passed through three identical magnets to operate a 3-majority gate.  
Estimation of Logic Error Rates using Fokker-Planck equation 
The response time of ASL device is dominated by the switching time of output nanomanget, which is a 
random variable due to the randomness in the initial direction of the magnetic moment when the switching 
process starts. The phenomenological equation describing the dynamics of nanomagnet with a magnetic 
moment unit vector (?̂?), the modified Landau-Lifshitz-Gilbert (LLG) equation, is (see ED Table 1 for 
parameters) 
𝜕?̂?
𝜕𝑡
=  −𝛾𝜇𝑜[?̂? × ?̅?eff] +  𝛼 [?̂? ×
𝜕?̂?
𝜕𝑡
] + 
𝐼⊥
𝑒𝑁𝑠
(𝐴2) 
 
where 𝛾 is the electron gyromagnetic ratio,   𝜇0 is the free space permeability, ?̅?eff is the effective magnetic 
field due to material/geometric/surface anisotropy, 𝛼 is the Gilbert damping of the material, and  𝐼⊥ is the 
component of vector spin current perpendicular to the magnetization (?̂?) leaving the nanomagnet, 𝑁𝑠 is the 
total number of Bohr magnetons per magnet. It has been proposed that stochasticity in the initial direction 
can be equivalently modeled as additive random noise field15. The noise field acts isotopically on the 
magnet. The internal field is described as: 
?̅?eff = ?̅?eff,m + 𝐻𝑛,𝑖𝑥 + 𝐻𝑛,𝑗?̂? + 𝐻𝑛,𝑘?̂? 
Where ?̅?eff,m denotes the mean effective magnetic field due to material/geometric/surface anisotropy, 
while, the  first and second order moments of random noise field components 𝐻𝑛,𝑖, 𝐻𝑛,𝑗 and 𝐻𝑛,𝑘 as a 
function of time are given as  
〈𝐻𝑛,𝑙(𝑡)〉 = 0 
〈𝐻𝑛,𝑙(𝑡)𝐻𝑛,𝑚(𝑡
′)〉 =
2𝛼𝑘𝐵𝑇
𝜇𝑜
2𝛾𝑀𝑠𝑉
𝛿(𝑡 − 𝑡′)𝛿𝑙𝑚 
for 𝑚, 𝑙 ∈ {𝑖, 𝑗, 𝑘}, where 𝑀𝑠, 𝑉 denote saturation magnetization and volume of the nanomagnet, 
respectively, and 𝑘𝐵 and 𝑇 denote Boltzmann constant and temperature, respectively. The initial conditions 
of the magnets should also be randomized to be consistent with the distribution of initial angles of magnet 
moments in a large collection of magnets. We used a mid-point integration method35 to apply the 
Stratonovich calculus while integrating the LLG equation to compute orientation of m̂ as a function of time. 
The ASL device is said to have made a switching error if the orientation of m̂ does not change appropriately 
within the time duration 𝑇𝑔 even though the charge current 𝐼supply is passed through the magnet to achieve 
the switching. Such switching errors at the device-level cause logic errors in the computation. One can 
empirically compute the probability of switching error (𝜖) via Monte-carlo simulations of the magnetization 
by numerical integration of Landau-Lifshitz-Gilbert (LLG) equation and randomly sampling the noise field. 
Alternatively, here we use a dynamic equation governing the time domain evolution of the probability of 
the direction of the magnetic moment, referred to as the Fokker-Planck equation. It solves for the probability 
distribution of the direction of the magnetic moment15,  
𝜕𝜌(𝜃, 𝜏)
𝜕𝑡
= −𝛻. 𝐽(𝜃, 𝜏) =  −
1
sin 𝜃
𝜕
𝜕𝜃
[sin𝜃  𝐽𝜃(𝜃, 𝜏)] (𝐴3) 
where 𝜌(𝜃, 𝜏) is probability density of m̂ (with angle and time as variables, see ED Fig. 2A & 2B, for 
example) and 𝐽𝜃(𝜃, 𝜏) is the flow of probability given by drift and diffusion components  
𝐽𝜃(𝜃, 𝜏) = 𝜌(𝜃, 𝜏)
𝜕𝜃
𝜕𝜏
− 𝐷
𝜕𝜌(𝜃,𝜏)
𝜕𝜏
 (𝐴4)  
The flow term 
𝜕𝜃
𝜕𝜏
= (𝑖 − ℎ − cos 𝜃) sin 𝜃 where 𝑖 =
𝐼supply
𝐼crit
, ℎ =
𝐻
𝐻𝑘𝑐
  are the current and field driving terms 
for the probability, 𝐷 =
𝑘𝑇
2𝐸𝑏
 is the diffusion constant in terms of the thermal barrier 𝐸𝑏 of the magnet, and 
𝐼crit
  denotes the minimum current required to switch of the nanomagnet, referred to as the critical current 
of the nanomagnet. Also,  
𝜏 =  
𝛼𝛾𝜇0𝐻eff
1 + 𝛼2
𝑡. 
 where, 𝑡 denotes time. We have compared the Fokker-Planck models with Monte-carlo simulations of the 
magnetization by numerical integration of Landau-Lifshitz-Gilbert (LLG) equation (see validation in Fig. 
2C). The logic error rate (𝜖) at any time 𝜏 can be obtained by integrating 𝜌(𝜃, 𝜏) with respect to 𝜃 varying 
from 0 to 
𝜋
2
, assuming the initial orientation of magnetic moment ?̂? is 𝜃 = 0.  
One can derive approximate analytical expression for error rate 𝜖 as14  
𝜖(𝑖, 𝑇𝑔) = 1 − exp [
−𝜋2(𝑖 − 1) 
𝐸𝑏
4𝑘𝑇  
𝑖𝑒
2𝛼𝛾𝐻𝑘𝜇𝑜𝑇𝑔(𝑖−1)
(1+𝛼2)
 
− 1
] (𝐴5) 
by assuming that the delay of ASL gate is dominated by the time required for the output nanomanget to 
switch.  
If 𝑖 ≫ 1, the 𝜖(𝑖, 𝑇𝑔) expression can be approximated as, 
𝜖(𝑖, 𝑇𝑔) ≈ 1 − exp [
−𝜋2
𝐸𝑏
4𝑘𝑇  
𝑒
2𝛼𝛾𝐻𝑘𝜇𝑜𝑇𝑔𝑖
(1+𝛼2)
 
] 
Denoting the energy delay product of the gate as Κ𝑔, we have, 
𝐸𝑔𝑇𝑔 = Κ𝑔 
and using equation (A1), we get,  
𝜖(Κ𝑔𝑖) ≈  1 − exp
[
 
 
 
 
 
−𝜋2
𝐸𝑏
4𝑘𝑇  
𝑒
2𝛼𝛾𝐻𝑘𝜇𝑜√
Κ𝑔
𝑅spin𝐼crit
2⁄
(1+𝛼2)
 ]
 
 
 
 
 
 (𝐴6) 
Thus, the device error rate 𝜖 remains approximately constant for constant energy delay product Κ𝑔 for a 
given gate 𝑔. This is the reason why iso-𝜖 contours in Fig. 2A appear approximately as straight lines. For a 
given device, its error rate can be reduced by increasing its energy-delay product either via increasing its 
current or increasing its delay or some combination of the two. This insight is exploited by logic 
transformation techniques I-PDB and I-PDR to achieve error statistics shaping as described in the main 
text.  
Above analysis is for an ASL-based Boolean inverter. For 3-majority gate, the error rate corresponds to 
either supply current of 3𝐼supply or 𝐼supply depending on whether all three input nanomagnets have parallel 
magnetic moments or not. However, we here conservatively assume a constant error rate corresponding to 
𝐼supply since such input dependence of error rate is too complex to tract in Monte Carlo simulations.  
Design of Support Vector Machine (SVM) classifier  
Here we elaborate some of the techniques applied in the Shannon-inspired architecture of the SVM 
implementation. 
Constrained I-PDB: As described in main text, in I-PDB, the gates on the non-critical paths are made to 
operate at correspondingly larger delay, but at constant energy, and hence reducing their error rate. 
However, if the main block is sufficiently complex, it is possible that few primary paths, being very short 
compared to the critical paths, allow sufficiently large increase in the gate delay. In inference applications, 
reducing the error rate below 10−6-10−7 may not lead to any further robustness benefits (since, the feature 
noise starts dominating the final system-level performance). In such cases, one can constrain the I-PDB 
technique to increase the gate delay, while decreasing its switching energy and maintaining its error rate 
sufficiently low. The impact of constrained I-PDB is illustrated in ED Fig. 3A. Suppose given gate 𝑔1 is 
initially at (𝐸𝑔1, 𝑇𝑔1, 𝜖𝑔1) and application of I-PDB results in the delay of 𝑇𝑔1
∗  and reduction in its error rate 
to 𝜖𝑔1
∗ . Now, if corresponding inference kernel implementation preserves its system-level performance even 
if all its gates have error rates of 𝜖𝑔1
′ , the I-PDB can be constrained  to increase its delay to 𝑇𝑔1
∗  while 
maintaining its error rate to 𝜖𝑔1
′  and reducing its energy to 𝐸′𝑔1 (< 𝐸𝑔1). Thus the constrained I-PDB can 
achieve more energy-efficient design compared to the I-PDB, while maintaining the system-level 
performance.  
Reduced-precision embedded estimator (RPE-EST):  Consider a multiplication of two 8-bit binary 
operands, 𝑎8𝑎7 …𝑎1 and 𝑏8𝑏7 …𝑏1, where 𝑎𝑖 , 𝑏𝑖 ∈ {0,1}∀𝑖 ∈ {1,… ,8} as follows:  
𝑚𝑜 = [𝑐16 …𝑐2 𝑎1𝑏1] = (𝑎8𝑎7 …𝑎1) × (𝑏8𝑏7 …𝑏1) 
where 𝑚𝑜 is 16-bit binary output with its individual bits denoted as 𝑐16, … , 𝑐2 ∈ {0,1}. The corresponding 
architecture of Baugh-Wooley multiplier is shown in ED Fig. 3B, where 𝑝𝑖𝑗 = 𝑎𝑖𝑏𝑗, ?̅?𝑖8 = 𝑎𝑖?̅?8 and ?̅?8𝑖 =
 ?̅?8𝑏𝑖  ∀𝑖, 𝑗 ∈ {1,… ,7}. If the input binary numbers are quantized to 5-bits, we get,  
𝑚𝑜5 = [𝑑10 …𝑑2 𝑎5𝑏5] =  (𝑎8𝑎7𝑎6𝑎5𝑎4) × (𝑏8𝑏7𝑏6𝑏5𝑏4) 
where 𝑚𝑜5 is 10-bit binary output with its individual bits denoted as 𝑑10, … , 𝑑2 ∈ {0,1}. 
Now, we use 𝑚𝑜5 as an estimate of 𝑚𝑜. However, instead of adding a separate 5-bit multiplier, we observe 
that 𝑚𝑜5 can be obtained by taking intermediate signals from 8 × 8 multiplier architecture as shown in ED 
Fig. 3B. We need to add four more full adders (shown by dotted green lines in ED Fig. 3B) in order to 
compute appropriate carry ripple. Thus, the estimate of actual output 𝑚𝑜 is obtained at minimal  overhead. 
In addition, since most of the green-colored full adders are on the non-critical paths, the error in 𝑚𝑜5 is 
dominated by quantization error having a dense distribution over a shorter range, as desired for effective 
error compensation. The output 𝑚𝑜5 obtained this way is exact if both (𝑎8𝑎7 …𝑎1) and (𝑏8𝑏7 …𝑏1) are 
non-negative. 
In this SVM implementation, the feature vector 𝒙 is unsigned while the weight vector 𝒘 is signed. Hence, 
an appropriate constant (independent of component magnitudes of vector 𝒘) is added in the final estimator 
output via the additional CSA, which adds individual estimates of BWM outputs in the EC block. 
Current redistribution in CSA and dimension reordering in the MB: Since the effectiveness of the I-
PDB and I-PDR techniques increases with increasing path delay diversity in the logic network, we choose 
serial CSA in this design. We reorder the dimensions of weight vector 𝒘 such that the products 𝑤𝑖𝑥𝑖 are 
added in the ascending order of the value of the dimension 𝑤𝑖, i.e. 𝑤𝑗𝑥𝑗 is added earlier than 𝑤𝑖𝑥𝑖 if 𝑤𝑗 <
𝑤𝑖. This enables I-PDB to compute the products of larger dimensions of 𝒘 with more reliability (since 
corresponding primary paths are shorter and hence consist of gates operating at lower error rate). Note that 
this dimension reordering needs to be performed only once. In serial CSA, since all paths have very similar 
path delays (path delay difference only arises due to the RCA at the last stage), we apply supply current 
redistribution to the CSA in the MB. In particular, we lower the supply current for gates computing the 
MSBs while increasing it for the gates computing LSBs in the CSA. It is to be noted that, unlike delay 
redistribution, there is no constraint on increasing the supply current in the current redistribution. This 
effectively enables increase in the switching energy of gates in the paths computing LSBs in the CSA until 
sufficiently good error distribution sparsity is achieved. 
Simulation methodology for S-ASL based architectures and systems:  
We carry out simulations of Fokker-Planck equations to estimate 𝜖(𝐸, 𝑇𝑔) for different values of 𝐸 and 𝑇𝑔 
for an ASL-based Boolean inverter, and generate iso-𝜖 contours in Fig. 2A using MATLAB. Once this data 
is obtained, all subsequent architecture and system level simulations are carried out in MATLAB. We then 
separately estimate the normalized delay factors after applying I-PDB and I-PDR (as described in 
supplementary information) for different blocks, such as Ripple Carry Adder (RCA), Baugh-Wooley 
multiplier (BWM), Carry Save Adder (CSA). For example, the normalized delay factors for RCA 
(architecture in ED Fig. 4) are shown in ED Table 2, where the delay factor of 1 corresponds to the error 
rate of 10%. We then design S-ASL based Shannon-inspired architecture implementing SVM using the 
BWM and CSA as shown in Fig. 5 and apply I-PDB and I-PDR techniques as described earlier in this 
section, and in supplementary information. We thus obtain (𝜖, 𝑇𝑔, 𝐸) triplet for every gate in the 
architecture. Then, we carry out Monte Carlo simulations by passing extracted input feature vectors through 
the architecture (with each gate making error randomly with corresponding 𝜖 probability) to estimate the 
final system-level performance in terms of true positive rate 𝑝𝑇𝑃 and false alarm rate 𝑝𝐹𝐴. The decision 
delay is the sum of gate delays on the critical path, while total energy/decision is the sum of switching 
energies of all gates. Similarly, we estimate the system-level performance of the serial architecture, where 
all gates have identical delay and error rate. The 𝑝𝑇𝑃 and 𝑝𝐹𝐴 for 20nm LV CMOS are same as that of the 
error-free computation. We follow the benchmarking methodology36 to estimate the energy and delay of 
the CMOS implementation and consider energy-delay curve of 20nm LV CMOS FO4 inverter9 as shown 
in Fig. 2A. We assume the activity factor of 50% for CMOS implementation. 
References:  
1. Turing, A. M. On computable numbers, with an application to the Entscheidungsproblem. J. of Math 
58, 5 (1936). 
2. Von Neumann, J. & Godfrey, M. D. First Draft of a Report on the EDVAC. IEEE Annals of the 
History of Computing 15, 27–75 (1993). 
3. Moore, G. E. Cramming more components onto integrated circuits. Readings in computer 
architecture 56, (2000). 
4. Kuhn, K. J. & others Considerations for ultimate CMOS scaling. IEEE Trans. Electron Devices 59, 
1813–1828 (2012). 
5. Mannhart, J. & Schlom, D. Oxide interfaces—an opportunity for electronics. Science 327, 1607–
1611 (2010). 
6. Ieong, M., Doris, B., Kedzierski, J., Rim, K. & Yang, M. Silicon device scaling to the sub-10-nm 
regime. Science 306, 2057–2060 (2004). 
7. Wolf, S. et al. Spintronics: a spin-based electronics vision for the future. Science 294, 1488–1495 
(2001). 
8. Behin-Aein, B., Datta, D., Salahuddin, S. & Datta, S. Proposal for an all-spin logic device with built-
in memory. Nature nanotechnology 5, 266–270 (2010). 
9. Manipatruni, S., Nikonov, D. E. & Young, I. A. Material Targets for Scaling All-Spin Logic. 
Physical Review Applied 5, 014002 (2016). 
10. Ionescu, A. M. & Riel, H. Tunnel field-effect transistors as energy-efficient electronic switches. 
Nature 479, 329–337 (2011). 
11. Salahuddin, S. & Datta, S. Use of negative capacitance to provide voltage amplification for low 
power nanoscale devices. Nano letters 8, 405–410 (2008). 
12. Manipatruni, S., Nikonov, D. E. & Young, I. A. Spin-Orbit Logic with Magnetoelectric Nodes: A 
Scalable Charge Mediated Nonvolatile Spintronic Logic. arXiv preprint arXiv:1512.05428 (2015). 
13. Wang, J. et al. Epitaxial BiFeO3 multiferroic thin film heterostructures. science 299, 1719–1722 
(2003). 
14. Butler, W. et al. Switching distributions for perpendicular spin-torque devices within the macrospin 
approximation. Magnetics, IEEE Transactions on 48, 4684–4700 (2012). 
15. Brown Jr, W. F. Thermal fluctuations of a single-domain particle. Journal of Applied Physics 34, 
1319–1320 (1963). 
16. Shannon, C. E. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing 
and Communications Review 5, 3–55 (2001). 
17. Von Neumann, J. Probabilistic logics and the synthesis of reliable organisms from unreliable 
components. Automata studies 34, 43–98 (1956). 
18. Hajek, B. & Weller, T. On the maximum tolerable noise for reliable computation by formulas. IEEE 
Transactions on Information theory 37, 388–391 (1991). 
19. Pippenger, N. Reliable computation by formulas in the presence of noise. Information Theory, IEEE 
Transactions on 34, (1988). 
20. Evans, W. S. & Schulman, L. J. On the maximum tolerable noise of k-input gates for reliable 
computation by formulas. IEEE Transactions on Information Theory 49, 3094–3098 (2003). 
21. Tamir, Y., Tremblay, M. & Rennels, D. A. The Implementation and Application of Micro Rollbacks 
in Fault Tolerant VLSI Systems. (UCLA, Computer Science Department: 1988). 
22. Gonugondla, S. K., Shim, B. & Shanbhag, N. R. Perfect error compensation via algorithmic error 
cancellation. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing 
(ICASSP) 966–970 (2016). 
23. Calayir, V., Nikonov, D. E., Manipatruni, S. & Young, I. A. Static and clocked spintronic circuit 
design and simulation with performance analysis relative to CMOS. IEEE Transactions on Circuits 
and Systems I: Regular Papers 61, 393–406 (2014). 
24. Chang, S.-C., Manipatruni, S., Nikonov, D. E., Young, I. A. & Naeemi, A. Design and analysis of si 
interconnects for all-spin logic. IEEE Transactions on Magnetics 50, 1–13 (2014). 
25. Chang, S.-C. et al. Design and analysis of copper and aluminum interconnects for all-spin logic. 
IEEE Transactions on Electron Devices 61, 2905–2911 (2014). 
26. Manipatruni, S., Nikonov, D. E. & Young, I. A. All-spin nanomagnetic state elements. Applied 
Physics Letters 103, 063503 (2013). 
27. Sarpeshkar, R. Analog versus digital: extrapolating from electronics to neurobiology. Neural 
computation 10, 1601–1638 (1998). 
28. Kubo, R. The fluctuation-dissipation theorem. Reports on progress in physics 29, 255 (1966). 
29. Apalkov, D. & Visscher, P. Spin-torque switching: Fokker-Planck rate calculation. Physical Review 
B 72, 180405 (2005). 
30. Shim, B. Error-Tolerant Digital Signal Processing. (2005). 
31. Shoeb, A. H. Application of machine learning to epileptic seizure onset detection and treatment. 
(2009). 
32. Goldberger, A. L. et al. Physiobank, physiotoolkit, and physionet components of a new research 
resource for complex physiologic signals. Circulation 101, e215–e220 (2000). 
33. Tombros, N., Jozsa, C., Popinciuc, M., Jonkman, H. T. & Van Wees, B. J. Electronic spin transport 
and spin precession in single graphene layers at room temperature. Nature 448, 571–574 (2007). 
34. Bonhomme, P. et al. Circuit simulation of magnetization dynamics and spin transport. IEEE 
Transactions on Electron Devices 61, 1553–1560 (2014). 
35. D Aquino, M., Serpico, C. & Miano, G. Geometrical integration of Landau–Lifshitz–Gilbert 
equation based on the mid-point rule. Journal of Computational Physics 209, 730–753 (2005). 
36. Nikonov, D. & Young, I. Benchmarking of Beyond-CMOS Exploratory Devices for Logic 
Integrated Circuits. Exploratory Solid-State Computational Devices and Circuits, IEEE Journal on 
1, 3–11 (2015). 
  
Acknowledgments: This work was supported in part by Systems on Nanoscale Information fabriCs 
(SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.  
 
 
 
 
 
 
 
Figure 1: Shannon all-spin logic (S-ASL) gates trade-off the switching energy 𝐸 with delay 𝑇𝑔 and gate-
level error rate 𝜖.  A Shannon-inspired computing system is able to operate with very low system-
level error rates even when the bulk of the computation (main block) is designed using high 𝜖  (low 
𝐸)  S-ASL gates. We shape the architectural level error distribution at the output of the main block 
using logic transformations and the 𝐸 vs.  𝑇𝑔 vs. 𝜖 trade-off in S-ASL gates so as to enable a low-
complexity statistical error compensator (an estimator and a fusion block) to be designed. The error 
compensator is able reduce the system-level error rate by more than 2-orders-of-magnitude by 
correcting the computational errors at the main block output. In doing so, substantial gains in both 
energy consumption and system accuracy are obtained over conventional ASL implementations. 
 
 
Figure 2: The Shannon ASL (S-ASL) gate: (A) a plot showing the trade-off between the gate error rate 𝜖, 
the switching energy 𝐸, and the delay 𝑇𝑔.  The energy gap between an (error-free) ASL gate and a S-ASL 
gate with 𝜖 = 0.4 is approximately 40 ×. Error free (𝜖 < 10−14) ASL gates needed by the von Neuman 
architecture can be obtained from S-ASL gates operating at 𝜖 = 0.4 by either increasing the switching 
energy by 40 × (keeping delay fixed) or by increasing the delay by 100 × (keeping the switching energy 
fixed) or some combination of the two. The proposed Shannon-inspired statistical computing framework 
enables operation using S-ASL gates with 𝜖 = 0.4. (B) An 𝜖-noisy gate model of an S-ASL AND gate 
showing it as being composed of an error-free AND gate followed by a virtual gate that emulates the 
stochastic behavior of the spin device. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3: Shannon-inspired statistical computing: (A) an architecture showing statistical error 
compensation, (B) an illustration of the disparity between the PMFs of 𝜂 and 𝑒 necessary for low complexity 
error compensation, and (C) the architecture of the fusion block. 
 
 
 
 
 
 
 
 
 
Figure 4: Engineering the distribution of error in a ripple carry adder (RCA) constructed from S-ASL gates: 
(A) with uniform delay assignment, (B) with inter-path delay balancing (I-PDB) technique that increases 
the delay of gates in the non-critical path (at constant energy), (C) with intra-path delay redistribution (I-
PDR) technique to reassign gate delays along the critical path (at constant energy). In each subfigure, the 
color of the box corresponds to a particular error rate regime in Fig. 2A, in which the gate is operating. All 
three architecture consume identical switching energy and operate at identical throughput corresponding to 
average device error rate 𝜖cp−avg of 10%. The 𝜂 distributions are obtained for equivalent 15-bit RCA 
architecture, and corresponding actual delay factors are given in ED Table 2. 
 
 
 
Figure 5: The spin-based support vector machine (SVM) classifier using S-ASL gates: (A) the conventional 
serial architecture with uniform delay assignment, and (B) a Shannon-inspired architecture. 
 
 
 
 
 
 
 
 
 
 
 
Figure 6: True positive (TP) rate of the spin-based support vector machine (SVM) classifier wrt.: (A) 
𝜖cp−avg, which is the average device error rate, and (B) total energy consumption per decision at constant 
false alarm rate of 1%.  
 
 
 
 
 
 
 
Extended Data: Figures and Tables 
Extended Data Figure 1: All Spin Logic (ASL) inverter: (A) A schematic consisting of two nanomagnets 
interacting via spin channels, (B) cross section schematic, (C) switching of input and output magnetization 
at supply voltage of 10mV, (D) corresponding spin currents during the switching operation, (E) 
instantaneous power consumption during the switching operation, and (F) equivalent RC circuit. 
 
 
 
 
 
 
 
 
 
 
 
Extended Data Figure 2: Time domain evolution of probability distribution 𝑝(𝜃, 𝑡) of magnetic moment 
of output magnet, and corresponding error rates (𝜖) observed using Fokker-Planck (FP) equation : (A) 
Probability function evolution for Eb = 40kT, for time t ≤ 10ns, (B) Probability function evolution for 
Eb = 20kT, for time t ≤ 0.1 ns, and (C) Comparison of the 𝜖 for various values of 𝐸𝑏 obtained using direct 
integration of the stochastic LLG equation and FP equation. 
 
 
 
Extended Data Table 1: Device parameters considered in the simulations. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Extended Data Figure 3: Details about the design of support vector machine (SVM) implementation: (A) 
Illustration of the difference between the I-PDB and the constrained I-PDB, and (B) Architecture of 8 × 8 
Baugh-Wooley multiplier, where each 𝑝𝑖𝑗 denotes a partial product 𝑎𝑖𝑏𝑗, and 𝑎𝑖 and 𝑏𝑖 denote the bits of 
8-bit binary operands. The green shaded area is a 5 × 5 Baugh-Wooley multiplier operating on first 5 input 
MSBs (after including 4 additional full adder blocks shown in green dotted line).  The output 
[𝑑10 𝑑9 𝑑8 𝑑7 𝑑6 𝑑5 𝑑4 𝑑3 𝑑2 𝑎5𝑏5] is the output of RPE-EST block as denoted in the main text Fig. 5.  
 
 
 
 
 
 
 
 
 
Extended Data Figure 4: Architecture of ripple carry adder (RCA). 
 
 
 
 
 
 
 
 
 
 
 
 
 
Extended Data Table 2: The normalized delay factors for all gates in RCA (indexed as shown in ED Fig. 
4) after (A) I-PDB, and (B) I-PDB followed by I-PDR. Here, normalized delay factor of 1 corresponds to 
error rate of 10%, and all majority gates and inverters consume same energy 𝐸maj and 𝐸inv, respectively, 
with 𝐸maj = 3𝐸inv. These relative delay numbers were used to obtain distributions in main text Fig. 4. 
(A) 
 
 
 
 
 
 
 
 
(A) (B) 
(B) 
 
Extended Data Figure 5: Illustration of Intra-path delay redistribution (I-PDR) technique: (a) an exemplary path 
of logic gates, (b) a bar plot of corresponding final redistributed gate delays, where all gates had equal delay 𝑇 before 
the redistribution. 
 
 
 
(A) (B) 
Supplementary Information: 
Error Statistics Shaping via I-PDB and I-PDR techniques 
In this section, we describe general algorithms for I-PDB and I-PDR, and explain how those can be applied 
to achieve error statistics shaping for any given logic network.  
Preliminary Definitions and Notation:  
Consider a combinational logic network (𝑆) having a 𝑙-bit output, consisting of ASL gates. Let total number 
of gates be 𝑁𝑔 and each gate be denoted as 𝑔𝑗, 𝑗 ∈ {1,…𝑁𝑔}. Let the delay of each gate be denoted as 𝑇𝑔𝑗 
and its switching energy consumption as 𝐸𝑔𝑗.For the given logic network 𝑆, the total switching energy 
consumption ℇ to compute one 𝑙-bit output word is given as:  
ℇ (Κ𝑔1 , … , Κ𝑔𝑁𝑔 , 𝑇𝑔1 , … , 𝑇𝑔𝑁𝑔) = ∑𝐸𝑔𝑗
𝑁𝑔
𝑗=1
= ∑
Κ𝑔𝑗
𝑇𝑔𝑗
𝑁𝑔
𝑗=1
 
Since energy can always be reduced at the expense of higher delay, we enforce an additional throughput 
constraint (𝑇𝑡ℎ) at the system-level. Hence all output bits need to be computed within 𝑇𝑡ℎ delay after the 
input bits are available.  
In combinatorial circuits, there is no feedback. When each gate is considered as a node and each connection 
as an edge, the combinatorial circuit forms a directed acyclic graph (DAG). Thus 𝑆 =  (𝑉, 𝐸), where 𝑉 is a 
set nodes (gates) and 𝐸 is set of edges (connections). While each logic gate has its own input and output, 
we refer to the input and outputs of the network as primary inputs and primary outputs and assume that they 
are latched at a single clock edge. All the logic gates having one of the primary input as their input are 
referred to as primary input nodes. Similarly, primary output nodes are the logic gates whose output is the 
primary output. 
We define a primary path in this DAG as a chain of cascaded nodes starting at one of the primary input 
nodes and terminating at one of the primary output nodes, while a path as any chain of cascaded nodes. 
Each path has a source node (a logic gate at the starting of the path) and a destination node (the last logic 
gate in the path). Each primary path is a subgraph 𝜌𝑛 = (𝐺𝑛, 𝐶𝑛) and hence has an associated set of nodes 
𝐺𝑛 ⊂ 𝑉 defined as follows: 
𝐺𝑛 = {𝑔𝑖| 𝑔𝑖 is in path 𝜌𝑛 ∀ 𝑖 } ∀ 𝑛 ∈ {1,… ,𝑁𝑝}. 
where 𝑁𝑝 is total number of primary paths. Similarly, 𝐶𝑛 is a set of connections between gates 𝑔𝑖 that 
constitute the primary path 𝜌𝑛. Let 𝑚 = argmax
𝑛
|𝐺𝑛|. Then 𝜌𝑚 is referred to as a critical path. Without 
loss of generality, let us assume that 𝜌1 is a critical path, i.e. 𝜌1 = 𝜌cp. We define path delay as sum of the 
delays of the nodes in the path. We denote critical path delay by 𝑇𝑐𝑝. If all nodes have equal delays, 𝑇𝑐𝑝 is 
the maximum path delay.  
We define all set operations on paths as corresponding operations on their individual node and edge sets. 
For example, 𝜌𝑛 ∩ 𝜌𝑚 ≜ (𝐺𝑛 ∩ 𝐺𝑚, 𝐶𝑛 ∩ 𝐶𝑚 ), 𝜌𝑛 ⊂ 𝜌𝑚 ≜ (𝐺𝑛 ⊂ 𝐺𝑚, 𝐶𝑛 ⊂ 𝐶𝑚 ) 
Each primary path 𝜌𝑛 can be partitioned into 𝐿 disjoint subsets as follows:  
𝜌𝑛 = {𝜌𝑛,1|𝜌𝑛,2| … |𝜌𝑛,𝐿} 
where each partition 𝜌𝑛,𝑙  ∀ 𝑙 ∈ {1,… , 𝐿} has following properties  
(1) Each partition 𝜌𝑛,𝑙 is a path that entirely lies in the primary path 𝜌𝑛.  
(2) Either 𝜌𝑛,𝑙 ⊂ 𝜌1 or 𝜌𝑛,𝑙 ∩ 𝜌1 = 𝜙 (empty set) 
 
Without loss of generality, let us assume that partitions are indexed such that output of path 𝜌𝑛,1 is primary 
output and output of each path 𝜌𝑛,𝑙 is input to path 𝜌𝑛,𝑙−1. We refer to this indexing as ‘ordered indexing’. 
Lemma 1:  The number of partitions of a primary path 𝜌𝑛 is minimized to 𝐿 if the following condition is 
satisfied: 
If 𝜌𝑛,𝑙 ⊂ 𝜌1 then 𝜌𝑛,𝑙+1 ∩ 𝜌1 = 𝜙 ∀ 𝑙 ∈ {1,… , 𝐿 − 1} 
Proof:  
By contradiction, suppose 𝜌𝑛,𝑙 ⊂ 𝜌1 and 𝜌𝑛,𝑙+1 ∩ 𝜌1 ≠ 𝜙 for some 𝑙.  
By the above property (2), we claim that 𝜌𝑛,𝑙+1 ⊂ 𝜌1.  
Then, minimum number of partitions will be reduced to 𝐿 − 1 since new partition 𝜌𝑛,𝑙
∗ = (𝜌𝑛,𝑙 ∪ 𝜌𝑛,𝑙+1)  
can be defined by merging 𝜌𝑛,𝑙 and 𝜌𝑛,𝑙+1. 
Now we define an input critical path (𝜌𝑆
imax(𝑔𝑖)) for each node 𝑔𝑖 in DAG 𝑆 as the path having the largest 
number of nodes, such that its source node is a primary input node and its destination node is the node 𝑔𝑖. 
Similarly, output critical path (𝜌𝑆
omax(𝑔𝑖)) for node 𝑔𝑖 can be defined as the path having the highest number 
of nodes, such that its source node is 𝑔𝑖 and destination node is a primary output node. If there are multiple 
input (output) critical paths for a certain gate 𝑔𝑖, they are denoted as 𝜌𝑗,𝑆
imax(𝑔𝑖) (𝜌𝑗,𝑆
omax(𝑔𝑖)) with index 𝑗 
used to distinguish between them. These terminologies will be used in subsequent subsections to describe 
I-PDB and I-PDR algorithms that can be applied to any given logic network as well as their properties. 
I-PDB algorithm and its properties:  
In I-PDB, the delays of gates on the non-critical path are increased such that every gate lies on at least one 
path having delay equal to the critical path delay, i.e. 𝑇𝑐𝑝. After applying I-PDB, we refer to the resulting 
delay assignment as a balanced delay assignment. Its mathematical definition is as follows: 
Definition 1:  Any given delay assignment (𝑇𝑔1
′ , … , 𝑇𝑔𝑁𝑔
′ ) is referred to as a balanced delay assignment if, 
for every gate 𝑔𝑖, its gate delay 𝑇𝑔𝑖
′  is given as: 
𝑇𝑔𝑖
′ = 𝑇𝑐𝑝
 − 𝑇𝑆
imax(𝑔𝑖) − 𝑇𝑆
omax(𝑔𝑖) + 2𝑇𝑔𝑖
′
𝑇𝑔𝑖
′ = 𝑇𝑆
imax(𝑔𝑖) + 𝑇𝑆
omax(𝑔𝑖) − 𝑇𝑐𝑝 (𝐴7)
 
where 𝑇𝑆 
imax(𝑔𝑖), and 𝑇𝑆
omax(𝑔𝑖)  are the delay of 𝜌𝑆
imax(𝑔𝑖) and 𝜌𝑆
omax(𝑔𝑖) respectively. 
If a delay assignment of a logic network is balanced, it is impossible to increase the delay of a gate without 
decreasing the delay of other gate. There are many possible balanced delay assignments for a given logic 
network.  
For the logic networks consisting of large number of gates, finding balanced delay state directly by 
inspection may not be possible. Hence, we derive a general I-PDB algorithm that can be applied to any 
logic network. This algorithm finds a balanced delay assignment starting with a delay assignment having 
all gate delays equal and updating individual gate delays. Now Lemma 2 investigates the properties of 
𝜌𝑆
imax(𝑔𝑖) and 𝜌𝑆
omax(𝑔𝑖) for any gate 𝑔𝑖 in order to determine one particular sequence for updating gate 
delays such that eventual gate delay updates do not violate the condition in (𝐴7) for gates whose delays 
have been assigned/updated earlier. Thus, one can find balanced delay assignment in one iteration of 
updating individual gate delays in I-PDB algorithm.  
Lemma 2: Let the delays of all gates be equal and gate 𝑔𝑖 be any gate in a path 𝜌𝑛 in the logic network 𝑆. 
Then,  
𝜌𝑆
imax(𝑔𝑖) ⊂  𝜚 (𝜌𝑛) and 𝜌𝑆
omax(𝑔𝑖) ⊂ 𝜚(𝜌𝑛)  
where a set of paths 𝜚(𝜌𝑛) is defined as 
𝜚(𝜌𝑛) = {primary path 𝜌𝑟 such that |𝐺𝑟| ≥ |𝐺𝑛| ∀ 𝑟 ∈ {1,… ,𝑁𝑝}}. 
Proof:   
First, we will prove 𝜌𝑆
imax(𝑔𝑖) ⊂  𝜚 (𝜌𝑛): 
𝜌𝑆
imax(𝑔𝑖) ⊂  𝜚 (𝜌𝑛) ≡ [∃𝑘 such that 𝜌𝑘 ∈ 𝜚(𝜌𝑛), 𝐺𝑆
imax(𝑔𝑖) ⊂  𝐺𝑘  and 𝐶𝑆
imax(𝑔𝑖) ⊂  𝐶𝑘] 
Suppose that 𝜌𝑆
imax(𝑔𝑖) ⊄  𝜚(𝜌𝑛). Let the primary path 𝜌𝑚
  be a concatenation of path 𝜌𝑆
imax(𝑔𝑖) and path 
{𝜌𝑆
omax(𝑔𝑖)\𝑔𝑖}. Thus, 𝜌𝑚
  ∉  𝜚 (𝜌𝑛).  
Now  |𝐺𝑚
 | is the maximum value among all the paths that contain gate 𝑔𝑖 by definitions of input critical 
path and output critical path. Hence, |𝐺𝑚
 | ≥ |𝐺𝑛|.  
Thus, we have,  𝜌𝑚
  ∉  𝜚 (𝜌𝑛) even though |𝐺𝑚
 | ≥ |𝐺𝑛|. This contradicts the definition of 𝜚(𝜌𝑛). Hence, 
we prove that 𝜌𝑆
imax(𝑔𝑖) ⊂  𝜚 (𝜌𝑛).  
 
If there are multiple input or output critical paths for the gate 𝑔𝑖, lemma 2 holds for each of them. 
Without loss of generality, let us index primary paths such that |𝐺1| ≥ |𝐺2| ≥ |𝐺3| ≥ ⋯ ≥ |𝐺𝑁𝑝|. If, for 
any 𝑚 and 𝑛, |𝐺𝑚| = |𝐺𝑛|, then 𝑛 < 𝑚 if |𝐺1 ∩ 𝐺𝑛| > |𝐺1 ∩ 𝐺𝑚| 
Let us assume that there are total 𝑁𝑐𝑝 critical paths in a given logic network 𝑆. Now, PDB algorithm is 
applied to all the gates in the non-critical path as follows:   
 
Algorithm 1:   
Input: logic network 𝑆 
Output: Balanced delay assignment 𝑇𝑔𝑖  ∀ 𝑖 
𝑇𝑔𝑖 ← 1 ∀ 𝑖  
𝜚𝑐𝑝 = ⋃ 𝜌𝑟
𝑁𝑐𝑝
𝑟=1   
𝑇𝑐𝑝 = |𝐺1|  
for 𝑛 = 𝑁𝑐𝑝 + 1: 1:𝑁𝑝  
      Partition 𝜌𝑛 = {𝜌𝑛,1|𝜌𝑛,2|… |𝜌𝑛,𝐿} 
      𝐺𝑐𝑢 = ⋃ 𝐺𝑟
𝑛−1
𝑟=1   
                 for 𝑘 = 𝐿:−1: 1  
                      if (𝜌𝑛,𝑘 ∩ 𝜚𝑐𝑝) = 𝜙 
     for 𝑞 = 1: 1: |𝐺𝑛,𝑘| 
                  if 𝑔𝑞 ∈ (𝐺𝑛,𝑘 ∩ 𝐺𝑐𝑢
𝑐 ),  
                                     compute 𝑇𝜚(𝜌𝑛)
imax(𝑔𝑞), 𝑇𝜚(𝜌𝑛)
omax(𝑔𝑞)  
           𝑇𝑔𝑞 = 𝑇𝜚(𝜌𝑛)
imax(𝑔𝑞) + 𝑇𝜚(𝜌𝑛)
omax(𝑔𝑞) − 𝑇𝑐𝑝  
          end 
  end 
           end 
    end 
 end 
Balanced delay state puts the condition (𝐴7) on the delay of every gate and delays of its input and output 
critical paths. When individual primary paths are considered in sequence, Lemma 2 establishes that input 
and output critical paths for any gate on a given path 𝜌𝑛 lie on one of the paths having more or equal number 
of gates than the path 𝜌𝑛 itself. This fact and condition (𝐴7) are used together to derive delay update rule 
in the above I-PDB algorithm.  
Remark 1: Suppose the delay of a given gate is increased by a factor of 𝜒 via I-PDB, the switching energy 
of that gate can be kept constant by reducing its supply current by a factor of √𝜒. Even then, the energy 
delay product Κ𝑔 increases by the factor of 𝜒 reducing the switching error rate 𝜖(Κ𝑔 ) of that gate as 
indicated in equation (𝐴6).  
I-PDR algorithm and its properties:  
Now we show how I-PDR can be applied to find any other balanced delay state starting with one balanced 
delay state.  
In I-PDR, the gate delays along any path are redistributed such that the total path delay remains same. For 
example, as illustrated in ED figure 5A, if all gates have equal delay 𝑇, corresponding path delay will be 
4𝑇. However, the gate delays can be redistributed such that gates 𝑔1 and 𝑔2 are operated at delay 𝑇 + 𝑡 
while gates 𝑔3 and 𝑔4 at 𝑇 − 𝑡 (ED figure 5B), thus maintaining the path delay equal to 4𝑇. The choice of 
the value of 𝑡, in this case, controls the extent of delay redistribution.  
A general rule of applying I-PDR systematically to any given logic network can be stated as follows:  
Algorithm 2:  
Input: balanced day assignment 𝑇𝑔𝑖  ∀ 𝑖 
Output: another balanced delay assignment  𝑇𝑔𝑖
∗  ∀ 𝑖 
- Select a path 𝜌𝑘 and a gate 𝑔𝑛 on that path  
- 𝑇𝑔𝑛
∗ ← 𝑇𝑔𝑛 − 𝑇1.  
- Find gates 𝑔𝑞1 , … , 𝑔𝑞𝑁  such that for each gate 𝑔𝑞𝑖  one of the following two conditions holds, 
[𝑔𝑞𝑖 ∈ 𝐺𝑖,𝑆
imax(𝑔𝑛) and 𝑔𝑛 ∈ 𝐺𝑆
omax(𝑔𝑞𝑖)]  
        or  
[𝑔𝑞𝑖 ∈ 𝐺𝑖,𝑆
omax(𝑔𝑛) and 𝑔𝑛 ∈ 𝐺𝑆
imax(𝑔𝑞𝑖)] 
- For each gate 𝑔𝑞𝑖, assign 𝑇𝑔𝑞𝑖
∗ ← 𝑇𝑔𝑞𝑖
+ 𝑇1.  
Thus, above procedure allows one to redistribute delays while maintaining total path delay constant. 
Remark 2: If all gates are operating at the same supply current, the application of I-PDR does not change 
the total energy consumption of the logic network. Analogously, if all the gates are operating at the same 
delay, similar redistribution can be made for the supply current i.e., some gates can be operated at lower 
supply current and others at higher supply current, while making sure that the total energy consumption 
remains same.  
Error statistics shaping:  
In this work, the I-PDR is selectively applied such that the error distribution at the output of the logic 
network becomes sparse. When I-PDB is applied, the gates on all non-critical paths are slowed down and 
hence are made to operate at lower error rate. Then, I-PDR can be applied to make outputs of certain paths 
more error prone than others by reducing delays of appropriate gates and distributing them among others. 
For example, in ED Fig. 5, the outputs of gates 𝑔1 and 𝑔2 are less error prone than the output of gate 𝑔4. 
Once we have such control over the error rates of outputs of different paths, following insight can be used 
as an empirical guidance towards achieving sparse error PMF.  
Let us denote the error prone output of the logic network by 𝑦𝑎 . We define correct output as the output of 
the logic network when ϵ=0 for all logic gates, i.e. when all the logic gates are reliable. The correct output 
is denoted by 𝑦𝑜. The error prone output 𝑦𝑎 can be represented, in 2’s complement form, as 
𝑦𝑎 = −𝑦𝑎,0 +  ∑2
−𝑖𝑦𝑎,𝑖
𝑙−1
𝑖=1
(𝐴8) 
where 𝑦𝑎,𝑖 ∈ {0,1} denotes 𝑖th bit of 𝑦𝑎 and 𝑦𝑎,0 denotes its sign bit. Denoting 𝑖th bit of the correct output 
by 𝑦𝑜,𝑖, one can modify equation (𝐴8) as follows:  
𝑦𝑎 = −(𝑦𝑜,0⨁𝛽0)  +  ∑2
−𝑖(𝑦𝑜,𝑖⨁𝛽𝑖)
𝑙
𝑖=1
(𝐴9) 
where 𝛽𝑖 is a Bernoulli random variable ∀𝑖 ∈ {0,1,… , 𝑙} that captures the impact of 𝜖-noisy gates on the 
output bits. It is to be noted that Pr{𝛽𝑖 = 1} = Pr{𝑖th output bit is in error}. Simplifying equation (A9), we 
get,   
𝑦𝑎 = (−𝑦𝑜,0 +  ∑2
−𝑖𝑦𝑜,𝑖
𝑙
𝑖=1
) + (−𝛽0(−1)
𝑦𝑜,0 +   ∑2−𝑖𝛽𝑖(−1)
𝑦𝑜,𝑖
𝑙
𝑖=1
) (𝐴10) 
The second term in the equation (𝐴10) is the additive error in the correct output due of the component gates 
being 𝜖-noisy.  
In order to shape the additive error distribution to be sparse, we make high magnitude errors to be more 
probable than low magnitude errors. This is achieved by meeting the following condition: 
Pr{𝛽𝑖 = 1} ∀ 𝑖 ∈ {0,1,2, 𝑝 − 1} ≫ Pr{𝛽𝑗 = 1} ∀ 𝑖 ∈ {𝑘 + 1,… , 𝑙} , 𝑝 < 𝑙 (𝐴11) 
where 𝑝 is the numbers of most significant bits (MSBs) for which the error rate is chosen to be high. 
The condition in equation (A11) indicates that for any logic network having multi-bit output, the sparse 
output error PMF can always be achieved by choosing appropriate error rates for its output bits. We apply 
I-PDB and I-PDR such that the MSBs of the output become more error prone that the LSBs in order to 
make output error PMF sparse.  
