Digital Biologically Plausible Implementation of Binarized Neural
  Networks with Differential Hafnium Oxide Resistive Memory Arrays by Hirtzlin, Tifenn et al.
Digital Biologically Plausible Implementation
of Binarized Neural Networks with Differential
Hafnium Oxide Resistive Memory Arrays
Tifenn Hirtzlin 1,†, Marc Bocquet 2,†, Bogdan Penkovsky 1,
Jacques-Olivier Klein 1, Etienne Nowak 3, Elisa Vianello 3, Jean-Michel Portal 2
and Damien Querlioz 1,∗
1 C2N, Univ Paris-Sud, Universite´ Paris-Saclay, CNRS, Palaiseau, France
2 Aix Marseille Univ, Universite´ de Toulon, CNRS, IM2NP, Marseille, France
3 CEA, LETI, Grenoble, France
† Tifenn Hirtzlin and Marc Bocquet contributed equally to this work.
Correspondence*:
Damien Querlioz
damien.querlioz@c2n.upsaclay.fr
ABSTRACT
Brains perform intelligent tasks with extremely low energy consumption, probably to a large extent
because they merge computation and memory entirely, and because they function using low
precision computation. The emergence of resistive memory technologies provides an opportunity
to integrate logic and memory tightly in hardware. In parallel, the recently proposed concept
of Binarized Neural Network, where multiplications are replaced by exclusive NOR logic gates,
shows a way to implement artificial intelligence using very low precision computation. In this
work, we therefore propose a strategy to implement low energy Binarized Neural Networks, which
employs these two ideas, while retaining energy benefits from digital electronics. We design,
fabricate and test a memory array, including periphery and sensing circuits, optimized for this
in-memory computing scheme. Our circuit employs hafnium oxide resistive memory integrated
in the back end of line of a 130 nanometer CMOS process, in a two transistors - two resistors
cell, which allows performing the exclusive NOR operations of the neural network directly within
the sense amplifiers. We show, based on extensive electrical measurements, that our design
allows reducing the amount of bit errors on the synaptic weights, without the use of formal error
correcting codes. We design a whole system using this memory array. We show on standard
machine learning tasks (MNIST, CIFAR-10, ImageNet and an ECG task) that the system has
an inherent resilience to bit errors. We evidence that its energy consumption is attractive with
regards to more standard approaches, and that it can use the memory devices in regimes where
they exhibit particularly low programming energy and high endurance. We conclude the work
by discussing its associations between biological plausible ideas and more traditional digital
electronics concepts.
Keywords: binarized neural networks, resistive memory, memristor, in-memory computing, biologically plausible digital electronics,
ASICs.
1
ar
X
iv
:1
90
8.
04
06
6v
1 
 [c
s.E
T]
  1
2 A
ug
 20
19
Hirtzlin et al. Digital Biologically Plausible Implementation...
1 INTRODUCTION
Through the progress of deep learning, artificial intelligence has made tremendous achievements in recent
years. Its energy consumption on graphics or central processing units (GPUs and CPUs) remains, however,
a considerable challenge, limiting its use at the edge and raising the question of the sustainability of large
scale artificial intelligence-based services. Brains, by contrast, manage intelligent tasks with highly reduced
energy usage.
One key difference between GPUs and CPUs on one side, and brains on the other side is their
relationship with memory. In GPUs and CPUs, memory and arithmetic units are separated, both physically
and conceptually. In artificial intelligence algorithms, which require high amounts of memory access,
considerably more energy is spent moving data between logic and memory than for doing actual arithmetic
(Pedram et al., 2017). In brains, by contrast, neurons, which implement most of the arithmetic, and synapses,
which are believed to store long term memory are entirely colocated. A major lead for reducing the energy
consumption of artificial intelligence is therefore to imitate this strategy and design non-von Neumann
systems where logic and memory are merged (Yu, 2018; Editorial, 2018; Querlioz et al., 2015; Indiveri
and Liu, 2015). This idea takes special meaning today with the emerging of novel nanotechnology-based
non-volatile memories, which are compact and fast, and can be embedded at the core of the Complementary
Metal Oxide Semiconductor (CMOS) process (Yu, 2018; Ambrogio et al., 2018; Prezioso et al., 2015; Serb
et al., 2016; Saı¨ghi et al., 2015; Covi et al., 2016; Wang et al., 2015).
Another key difference between processors and the brain is the basic nature of computations. GPUs and
CPUs typically perform all neural networks computations with 32 or 64 bits floating point arithmetic (and
sometimes 16 bits in recent designs (Markidis et al., 2018)), relying mostly on multiplications of neural
states and synaptic weights, and additions. In brains, most of the computation is done in a low precision
analog fashion within the neurons, resulting in asynchronous spikes as output, therefore binary output. A
second idea for reducing the energy consumption of artificial intelligence therefore is to design systems
that function with much lower precision computation.
In recent years, considerable research has proposed to implement neural networks using analog resistive
memory as synapses – the device conductance implementing the synaptic weights. The neural network
computation can then to a large extent done using analog electronics: weight / neuron multiplication is
performed using Ohm’s law, and addition is done using Kirchoff’s current law (Ambrogio et al., 2018;
Prezioso et al., 2015; Serb et al., 2016; Li et al., 2018; Wang et al., 2018; Shafiee et al., 2016). This type
of implementation is to a certain extent very biologically plausible, as it reproduces the two strategies
mentioned above. Their challenge is that they need to be associated with relatively heavy analog or mixed-
signal CMOS circuitry such as operational amplifier or Analog to Digital Converters, which add significant
area and energy overhead.
In parallel, a novel class of neural networks has recently been proposed – Binarized Neural Networks
(or the closely related XNOR-NETs) (Courbariaux et al., 2016; Rastegari et al., 2016). In these neural
networks, once trained, synapses as well as neurons assume only binary values meaning +1 or −1. These
neural networks have therefore limited memory requirements, and also rely on highly simplified arithmetic.
In particular, multiplications are replaced by one-bit exclusive NOR (XNOR) operations. Nevertheless,
they can achieve near state of the art performance on vision tasks (Courbariaux et al., 2016; Rastegari
et al., 2016; Lin et al., 2017). Binarized Neural Networks are therefore extremely attractive for realizing
inference hardware. The low precision of Binarized Neural Networks, and in particular the binary nature of
2
Hirtzlin et al. Digital Biologically Plausible Implementation...
neurons – which is reminiscent of biological neurons spikes – also gives them biological plausibility. They
can even be seen as a simplification of spiking neural networks.
Many efforts have been made to propose hardware implementation of Binarized Neural Networks. Using
nanodevices, it is natural to adopt the same strategy proposed for conventional neural networks and perform
arithmetic in an analog fashion using Kirchoff’s law (Yu, 2018; Yu et al., 2016). However, Binarized Neural
Networks are very digital in nature, and are multiplication-less. As they are also biological plausible,this
can be an opportunity to try and benefit both ideas from bioinspiration and from the achievements of
Moore’s law and digital electronics. Therefore, in this work we propose a fully digital implementation
of binarized neural networks mixing CMOS and nanodevices, keeping the biological concepts of tight
memory and logic integration, as well as low precision computing. As memory nanodevices, we use
hafnium oxide-based resistive random access memory (OxRAM), a compact non-volatile and fast memory
cell highly compatible with the CMOS process (Grossi et al., 2016).
One considerable challenge, however, to implement a digital system with memory nanodevices is
their inherent variability (Ielmini and Wong, 2018; Ly et al., 2018), which causes bit errors. Traditional
applications employ multiple error correcting codes (ECCs) to solve this issue. Unfortunately, this solution
is difficult to implement in a context where memory and logic are tightly integrated, as ECC decoding
circuits have high area and energy consumption (Gregori et al., 2003), and would need to replicated many
times on a die. The arithmetic operations of error syndrome computation are actually more complicated as
those of a Binarized Neural Network. The state of the art of RRAM for in memory computing does not
correct errors and is not compatible with technologies with errors (Chen et al., 2018, 2017).
In this paper, we introduce our solution. We design, fabricate, and test a differential OxRAM memory
array, including all peripheral and sensing circuitry. This array, based on two-transistors / two-resistors
(2T2R) bit cell inherently reduces bit errors without the use of ECC, and we show that it is particularly
adapted for in-memory computing. We then design and simulate a full binarized neural network based
on this memory array. We show that the XNOR operations can be integrated directly within the sense
operation of the memory array, and that the resulting system can be highly energy efficient. Based on neural
networks on multiple datatsets (MNIST, CIFAR-10, ImageNet and ECG data analysis), we evaluate the
amount of bit errors in the memory that can be tolerated by the system. Based on this information, we show
that the memory nanodevices can be used in an unconventional programming regime, where they feature
low programming energy (less than five picoJoules per bit) and outstanding endurance (billions of cycles).
Partial and preliminary results of this work have been presented at a conference (Bocquet et al., 2018).
This paper adds additional measurements of OxRAMs with shorter programming pulses, an analysis of
the impact of bit errors on more datasets (ImageNet and ECG data analysis), and a detailed comparison
and benckmarking of our approach with processors, non binarized ASIC neural networks and analog
RRAM-based neural networks.
2 MATERIALS AND METHODS
2.1 Differential Memory Array for In-Memory Computing
In this work, we fabricated a memory array for in-memory computing with its associated periphery
and sensing circuits. The memory cell relies on hafnium oxide (HfO2) oxide-based resistive Random
Access Memory (OxRAM). The stack of the device is composed of a HfO2 and a titanium layer of both
ten nanometer thickness, in-between two titanium nitride (TiN) electrodes. Our devices are embedded
3
Hirtzlin et al. Digital Biologically Plausible Implementation...
Figure 1. (a) Scanning Electron Microscopy image of the backend-of-line of the CMOS process integrating
an OxRAM device. (b) Photograph and (c) simplified schematic of the one kilobit in-memory computing-
targeted memory array characterized in this work.
within the back-end-of-line of a commercial 130 nanometer CMOS logic process (Fig. 1(a)), allowing tight
integration of logic and non volatile memory (Grossi et al., 2016). The devices are integrated on top of the
fourth (copper) metallic layer.
We chose hafnium oxide OxRAMs as they are known to provide non-volatile memories compatible with
modern CMOS process, and only involve foundry-friendly materials and process steps. These memories are
also considerably faster than Flash memory, as bit cells can be programmed in less than one microsecond,
and are programmed at much lower voltage. After a one-time forming process, such devices can switch
between low resistance and high resistance states (LRS and HRS) by applying positive or negative electrical
pulses respectively. Nevertheless, our work could be reproduced with other types of emerging memories
such as phase change memory or spin torque magnetoresitive memory (Chen, 2016).
Conventionally, OxRAMs are organized in a “One Transistor - One Resistor” structure (1T1R), where
each nanodevice is associated with one access transistor (Chen, 2016). The LRS is used to mean the zero
logic value, and the HRS is used to mean the one logic value. The read operation is then achieved by
comparing the electrical resistance of the nanodevice to a reference value intermediate between typical
values of resistances in HRS and LRS.
Unfortunately, due to device variability, OxRAMs are prone to bit errors: the HRS value can become
lower than the reference resistance, and the LRS value can be higher than the reference resistance. The
device variability includes both device-to-device mismatch, as well as the fact that within the same device,
the precise value of HRS and LRS resistance changes at each programming cycle (Grossi et al., 2018).
To limit the amount of bit errors, in this work, we fabricated a memory array with a “Two Transistors -
Two Resistors” structure (2T2R), where each bit of information is stored in a pair of 1T1R structures. A
photograph of the die is presented in Fig. 1(b) and its simplified schematic in Fig. 1(c). Information is
4
Hirtzlin et al. Digital Biologically Plausible Implementation...
then stored in a differential fashion: the pair LRS/HRS means logic value zero, while the pair HRS/LRS
means logic value one. In this situation, readout is performed by comparing the resistance values of the two
devices. We therefore expect bit errors to be less frequent, as bit error only occur if a device programmed
in LRS ends up more resistive than its complementary device programmed in HRS. This concept of
2T2R memory arrays has already been proposed, but its benefit in terms of bit error rate has never been
demonstrated until this work (Hsieh et al., 2017; Shih et al., 2017).
Figure 2. (a) Schematic of the precharge sense amplifier used in this work to read 2T2R memory cells. (b)
Schematic of the precharge sense amplifier augmented with a XNOR logic operation.
In our fabricated circuit, this readout operation is performed with precharge sense amplifiers (PCSA)
(Zhao et al., 2009, 2014) (Fig. 2(a)). These circuits are highly energy efficient due to their operation in
two phases, namely precharge and discharge, avoiding any direct path between supply voltage and ground.
First, the sense signal (SL) is set to ground, which precharges the two selected complementary nanodevices
at the same voltage as well as the comparing latch. In the second phase, the sense signal is set to the
supply voltage, and the voltages on the complementary devices are discharged to ground. The branch with
the lowest resistance discharges faster, and causes its associated inverter output to discharge to ground,
which latches the complementary inverter output to the supply voltage. The two output voltages therefore
represent the comparison of the two complementary resistance values.
We fabricated a differential memory with 2048 (2K) devices, therefore implementing a kilobit memory.
Each column of complementary nanodevices features a precharge sense amplifier, and row and columns are
accessed through integrated CMOS digital decoders. The pads of the dies are not protected for electrostatic
discharge, and the dies were tested with commercial 22-pads probe cards. In all the experiments, voltages
are set using a home made printed circuit board, and pulses voltages are generated using Keysight B1530A
pulse generators. In the design, the precharge sense amplifiers can optionally be deactivated and by-passed,
which allows to measure the nanodevices resistance directly through external precision source monitor
units (Keysight B1517a).
5
Hirtzlin et al. Digital Biologically Plausible Implementation...
2.2 Design of In-Memory Binarized Neural Network Based on the Differential Memory
Building Block
In this work, we aim at implementing Binarized Neural Networks in hardware. In these neural networks,
the synaptic weights, as well as the neuronal states, can take only two values, +1 and −1, while these
parameters assume real values in conventional neural networks. The equation for neuronal value A in an
usual neural network
A = f
(∑
i
WiXi
)
, (1)
where Xi are the neuron inputs, Wi the synaptic weights values and f the neuronal activation function is
then transformed to the much simpler equation:
A = sign
(
POPCOUNT
i
(XNOR (Wi, Xi))− T
)
. (2)
In this equation, T is the threshold of the neuron and is learned. POPCOUNT is the function that counts
the number of ones in a series of bits, and sign is the sign function.
The training process of binarized neural networks differs from conventional neural networks. During
training, the weights also assume real weights and the binarized weights are equal to the sign of the real
weights. Training employs the classical error backpropagation equations, with several adaptations. The
binarized weights are used in the equations of both the forward and the backward passes, but the real
weights are changed as a result of the learning rule (Courbariaux et al., 2016). Additionally, as the activation
function of binarized neural networks is the sign function and is not differentiable, the clip function is used
as an activation function in the backward pass. With these changes, binarized neural networks function
surprisingly well. They can achieve near state of the art performance on image recognition tasks such as
CIFAR-10 and ImageNet (Lin et al., 2017).
After learning, the real weights serve no more purpose and can be discarded. This makes binarized
neural networks exceptional candidates for hardware implementation of neural network inference. Not
only are their memory requirements minimal (one bit per neuron and synapse), but their arithmetic is
vastly simplified. Energy and area expensive multiplications in eq. (1) are replaced by one-bit exclusive
NOR (XNOR) operations in eq. 2. Additionally, the real sums in eq. (1) are replaced by POPCOUNT
operations, which are equivalent to integer sums with a low bit width.
It is possible to implement ASIC Binarized Neural Networks with solely CMOS (Ando et al., 2017).
However, the most attractive prospect is to rely on emerging non-volatile memories, and to associate logic
and memory as closely as possible. This approach can provide non volatile neural networks, and eliminate
the von Neumann bottleneck entirely, as weights are stationary and only neurons states are transmitted.
In these approaches, the nanodevices implement the synaptic weights, while the arithmetic is done in
CMOS. Most of the literature on using emerging memory as synapse relies on a clever way to perform
the multiplications and additions of eq. 1, relying on analog electronics. The multiplications are done
relying on Ohm’s law, and the addition on Kirchoff current law (Ambrogio et al., 2018; Yu et al., 2016).
This analog approach can be transposed to binarized neural networks, and this idea has received high
interest recently (Sun et al., 2018a,b; Tang et al., 2017; Yu, 2018). However, binarized neural networks are
inherently digital objects that rely, as previously remarked, on simple logic operation: XNORs and low bit
width sum. Therefore, we investigate their implementation with purely digital circuitry. This concept has
6
Hirtzlin et al. Digital Biologically Plausible Implementation...
also recently appeared in (Natsui et al., 2018; Giacomin et al., 2019) and in our preliminary version of this
work (Bocquet et al., 2018). Our work is the first one with measurements on a physical memory array, that
includes the effect of bit errors.
A first realization is that the XNOR operations can be realized directly within the sense amplifiers. For
this, we follow the pioneering works of (Zhao et al., 2014), which shows that precharge sense amplifier can
be enriched with any logic operation. In our case, we can add four additional transistors in the discharge
branches of a precharge sense amplifier (Fig. 2(b)). These transistors can prevent the discharge and allow
implementing the XNOR operation between input voltage X and the value stored in the complementary
OxRAM devices in a single operation.
Figure 3. Schematization of the full architecture to implement Binarized Neural Network, in the “parallel
to sequential” configuration. The system assembles kilobit memory block surrounded by logic circuits, and
moves minimal data between the blocks.
Based on the basic memory array with PCSAs enriched with XNOR, we designed the whole system
implementing a Binarized Neural Network. The overall architecture is presented in Fig. 3, and is inspired
by the purely CMOS architecture proposed in (Ando et al., 2017), adapted to the constrains of OxRAM.
The design is made of the repetition of basic cells incorporating a kilobit OxRAM memory block with
XNOR-enriched PCSAs and POPCOUNT logic. The system features a degree of reconfigurability to
adapt to different neural network topologies: it can be used either in parallel to sequential structure, or
in a sequential to parallel configuration. The sequential to parallel configuration deals with long input
data, and outputs reduced parallel output data. The parallel to sequential structure (presented in Fig. 3)
deals with reduced parallel input data, and outputs long sequence data. The POPCOUNT computation
differs depending on the configuration. In the first configuration, each cell performs a partial POPCOUNT,
which can be loop to the same cell to compute the whole POPCOUNT sequentially, or in the second
configuration, can be given to the column to perform the POPCOUNT through a “popcount tree”.
7
Hirtzlin et al. Digital Biologically Plausible Implementation...
In the first configuration, the activation function is obtained by subtracting in each cell the threshold value
T : the signed bit of the result gives the activation value. In the second configuration, the same operation is
made with the output of the popcount tree shared along the column.
The whole system was designed using synthesizable SystemVerilog. The memory block are described
in behavioral SystemVerilog. We synthesized the system using the 130 nanometer design kit used for
fabrication, as well as using the design kit of an advanced commercial 28 nm process for scaling projection.
All simulations reported in the results sections were performed using the Cadence Incisive simulators.
The estimates for system-level energy consumption were obtained using the Cadence Encounter tool. We
used Value Change Dumps (VCD) files extracted from simulations of practical tasks so that the obtained
energy values reflect the operation of the system realistically.
3 RESULTS
3.1 Differential Memory Allows Memory Operation at Reduced Bit Error Rate
Figure 4. (a) Distribution of the LRS and the HRS of the OxRAM devices in an array programmed
with a checkerboard pattern. RESET voltage of 2.5V , SET current of 55µA and programming time of
1µs. (b-c) Failure rate on 100 whole-array programming of a memory array, for the two complementary
checkerboards configuration. (d) Rate of programming failure indicated of the precharge sense amplifier
circuits as a function of the ratio between HRS and LRS resistance (measured by a sense measure unit), in
the same configuration as (a-c).
This section first presents the electrical characterization results of the differential OxRAM arrays. We
program the array with checkerboard-type data, alternating zero and one bits, using programming times
of one microsecond. For programming devices in HRS (RESET operation), the access transistor is fully
opened and a reset voltage of 2.5V is used. For programming devices in LRS (SET operation), the gate
voltage of the access transistor is chosen to ensure a compliance current of 55µA. Fig. 4(a) shows the
statistical distribution of the LRS and HRS of cells programmed, based on 100 programmings of the full
array. This graph is using a standard representation in the memory field, where the y axis is expressed as
number of standard deviations of the distribution (Ly et al., 2018). The Figure superimposes distributions of
left (BL) and right (BLb) columns of the array, and no difference is seen. The LRS and HRS distributions
are clearly separate but overlap at a value of three standard deviations, make bit errors possible. If a 1T1R
structure was used, a bit error rate of 0.012 (1.2%) would be seen with this distribution. But at the output
of the precharge sense amplifiers, a bit error rate of 0.002 (0.2%) is seen, given a first suggestion of the
8
Hirtzlin et al. Digital Biologically Plausible Implementation...
benefits of the 2T2R approach. Fig. 4(b) and 4(c) show the mean error (using the 2T2R configuration) on
the whole array, for the two types of checkerboards. We see that all devices can be programmed in HRS
and LRS. A few devices have increased bit error rate. This graph highlights the existence of both cycle to
cycle and device to device variability, and the absence of dead cells.
We now validate in detail the functionality of the precharge sense amplifiers. The precise resistance of
devices is first measured by deactivating the precharge sense amplifiers, and using the external source
monitor units. Then, the precharge sense amplifiers are reactivated and a sense operation is performed.
Fig. 4(d) plots the mean measurement of the sense amplifiers as a function of the ratio between the two
resistances that are being compared, superimposed with the ideal behavior of a sense amplifier. The sense
amplifiers show excellent functionality, but can make mistakes if the two resistances differ by less than a
factor five.
Figure 5. Bit error rate for different programming conditions, as measured by the precharge sense amplifier,
for 2T2R configuration on a kilobit memory array.
The programming rates are strongly dependent on the programming conditions. Fig. 5 shows the mean
number of incorrect bits on a whole array for various combination of programming times (from 1µs to
100µs, RESET voltage between 1.5 and 2.5 Volts, SET compliance current between 28 and 200µA). We see
that the bit error rate depends extensively on the three programming parameters, the SET compliance current
having the most significant impact. For the rest of the section, we remain on the shortest programming time
(1µs), as this parameter has limited impact on the bit error rate.
In Fig. 6, we look more precisely at the effects of cycle to cycle device variability and device aging. A
device and its complementary device were programmed 700 million cycles. Figs. 6(a) and 6(b) show the
distribution of LRS and HRS of the device under test and its complementary device, after different number
cycles ranging from the first one and the last one. It is apparent that when the devices are cycled, LRS
and HRS distributions become less separated and start to overlap at lower number of standard deviations.
This translates directly on the mean resistance of the devices in HRS and LRS (Figs. 6(c) and 6(d)), which
become closer when the device ages. More importantly, the aging process translates on the device bit
error rate (6(e)): the bit error rate of the device and its complementary device increase of several orders
9
Hirtzlin et al. Digital Biologically Plausible Implementation...
Figure 6. (a-b) Distribution of the resistance values, (c-d) mean resistance value and (e) mean bit error rate
over 10 million cycles measured by the precharge sense amplifier, in the 2T2R configuration, as function of
the number of cycles that a device has been programmed. RESET voltage of 2.5V , SET current of 200µA
and programming time of 1µs.
of magnitudes over the lifetime of the device. The same is seen on the bit error rate resulting from the
precharge sense amplifier (2T2R), but it remains at much lower level: while the 1T1R bit error rate go
above 10−3 after a few million cycles, the 2T2R remains below this value over the 700 million cycles.
This result highlights that the concept of cyclability depends on the acceptable bit error rate, and that the
cyclability at constant bit error rate can be considerably extended when using the 2T2R structure.
We now aim at quantifying and benchmarking more precisely the benefits of the 2T2R structure. We
performed extensive characterization of BERs on the memory array in various regimes. Fig. 7(a) presents
different experiments where the 2T2R bit error rate is plotted as a function of the bit error rate that would
be obtained using using a single device programmed in the same conditions. The graph associates two type
of experiments:
• The points marked as “Low Ic” are obtained using whole array measurement where devices are
programmed with low SET compliance current to ensure high error rate.
• The points marked as “High Ic” are obtained by single device measurements (i.e. selecting one device
in an array) programmed ten million times using high SET compliance current to reach low bit error
rates.
We see that the 2T2R bit error rate is always lower than the 1T1R one. The difference is larger for lower
bit error rate, and reaches four order of magnitudes for a 2T2R bit error rate of 10−8. The black line
presents calculation where the precharge sense amplifier is supposed to be ideal (i.e. to follow the idealized
dotted characteristics of Fig. 4(c)).
To interpret the results of the 2T2R approach with more perspective, we benchmark them with standard
error correcting codes. Figs. 7(b) and 7(c) show the benefits of two codes, using the same plotting format
as Figs. 7(a): a Single Error Correction (SEC) and a Single Error Correction Double Error Detection
(SECDED) code, presented with different degrees of redundancy. These simple codes, formally known as
Hamming and extended Hamming codes, are widely used in the memory field. Interestingly, we see that
10
Hirtzlin et al. Digital Biologically Plausible Implementation...
Figure 7. Experimental bit error rate of the 2T2R array, measured by the precharge sense amplifiers,
as a function of the bit error rate obtained individual (1T1R) RRAM devices in the same programming
conditions. The detailed methodology for obtaining this graph is presented in the body text. Bit error rate
obtained with (b) Single Error Correcting (SEC) and (c) Single Error Correcting Double Error Detection
(SECDED) ECC as a function of the error rate on the individual devices.
the benefit of these codes are very similar to the benefit of our 2T2R approach with ideal sense amplifier, at
equivalent memory redundancy (e.g. SECDED(8,4)), although our approach uses no decoding circuit and
the equivalent of error correction is performed directly within the sense amplifier. By contrast, ECCs can
also reduce bit errors, to a lesser extent, using less redundancy. But the required decoding circuits need
hundreds to thousands of logic gates (Gregori et al., 2003). In a context where logic and memory are tightly
integrated, these decoding circuits would need to be repeated many times. And as their logic is much more
complicated than the one of binarized neural networks, they would be the dominant source of computation
and energy consumption. ECC circuits are also incompatible with the idea of integrating XNOR operations
within the sense amplifiers, and cause important read latency.
3.2 Do All Errors Need to Be Corrected?
Based on the results of electrical measurements, and before designing the whole system, it is important
to know how low the OxRAM bit error rate needs to be for applications. To answer this question, we
performed simulations of binarized neural networks on four different tasks:
• MNIST handwritten digit classification (LeCun et al., 1998), the canonical task of machine learning.
We use a fully connected neural network with two 1024-neurons hidden layers.
• The CIFAR-10 image recognition task (Krizhevsky and Hinton, 2009), which consists in recognizing
32×32 color images spread between ten categories of vehicles and animals. We use a deep convolutional
network with six convolutional layers using kernels of 3 × 3 and a stride of one, followed by three
three fully connected layers.
• The ImageNet recognition task, which consists in recognizing 128× 128 color images out of 1000
classes. This task is considerably more difficult than MNIST and CIFAR-10. We use the historic
AlexNet deep convolutional neural network (Krizhevsky et al., 2012).
11
Hirtzlin et al. Digital Biologically Plausible Implementation...
Figure 8. Recognition rate on the validation dataset of the fully connected neural network for MNIST, the
convolutional neural network for CIFAR10, and AlexNet for ImageNet (Top-5 and Top-1) accuracies and
the ECG analysis task, as a function of the bit error rate over the weights during inference. Each experiment
was repeated five times, the mean recognition rate is presented.
• A medical task involving the analysis of electrocardiography (ECG) signals: automatic detection of
electrode misplacement. This binary classification challenge takes as input the ECG signals of twelve
electrodes. The experimental trial datas are sampled at 250Hz and have a duration of three seconds
each. To solve this task, we employed a convolutional neural network composed of five convolutional
layers and two fully-connected layers. The convolutional kernel (sliding window) sizes were decreasing
from 13 to 5 in each subsequent layer. Each convolutional layer produced 64 filters detecting different
features of the signal.
Fully binarized neural networks were trained on these tasks on Nvidia Tesla GPUs using Python and
the PyTorch deep learning framework. Once the neural networks are trained, we ran them on the datasets
validation sets, artificially introducing errors in the neural networks weights (meaning some +1 weights
are replaced by −1 weights, and reciprocally). This way, we emulate the impact of OxRAM bit errors.
Fig. 8 shows the resulting validation accuracy as a function of the introduced bit error rate for the four
considered tasks. In the case of ImageNet, both the Top-1 (proportion of validation images where the right
label is the top choice of the neural network) and the Top-5 (proportion of validation images where the
right label is within the top five choices of the neural network).
On the three vision task (MNIST, CIFAR-10, ImageNet), we see that extremely high levels of bit errors
can be tolerated: up to a bit error rate of 10−4, the network performs as well as with no errors. Minimal
performance reduction starts to be seen with bit error rates of 10−3 (the Top-5 accuracy on ImageNet is
degraded from 69.7% to 69.5%). At bit error rates of 0.01, the performance reduction becomes significant.
The reduction is more substantial for harder tasks: MNIST accuracy is only degraded from 98.3% to 98.1%,
CIFAR-10 accuracy is degraded from 87.5% to 86.9%, while ImageNet Top-5 accuracy is degraded from
69.7% to 67.9%.
12
Hirtzlin et al. Digital Biologically Plausible Implementation...
The ECG tasks also shows extremely high bit error tolerance, but bit errors have an effect more rapidly
than in the vision tasks. At a bit error rate of 10−3, the validation accuracy is reduced from 82.1% to 78.7%,
and at a bit error rate of 0.01 to 68.4%. This difference between vision and ECG tasks originates in the
fact that ECG signals carry a lot less redundant information than images. Nevertheless, we see that even
for ECG tasks high bit errors rates can be accepted with regards to the standards of conventional digital
electronics.
4 DISCUSSION
4.1 Projection at the System Level
4.1.1 Impact of In-Memory Computation
We now use all the paper results to discuss the potentials of our approach. Based on our ASIC design,
using the energy evaluation technique described at the end of the Methods section, we find that our system
would consume 25 nJ to recognize one handwritten digit, using a fully connected neural networks with
two hidden layers of 1024 neurons. This is considerably less than processor based options. (Lane et al.,
2016) analyses the energy consumption of inference on CPUs and GPUs. Operating a fully connected
neural network with two hidden layers of 1000 neurons requires 7 to 100 millijoules on a low power CPU
(from Nvidia Tegra K1 or Qualcomm Snapdragon 800 systems on chip), and 1.3 millijoules on a low power
GPU (Nvidia Tegra K1).
These results are not surprising due to the considerable overhead for accessing memory in modern
computers. For example, (Pedram et al., 2017) shows that accessing data in a static RAM cache consumes
around fifty times more energy than the integer addition of this data. If the data is stored in the external
dynamic RAM, the ratio is increase to more than 3000. Binarized Neural Networks require minimal
arithmetic: no multiplication, and only integer addition with a low number of bits. When operating a
Binarized Neural Networks on a CPU or GPU, the almost entirely of the energy is used to move data, and
the inherent topology of the neural network is not exploited to reduce data movement. Switching to in-
memory or near-memory computing approaches has therefore the potential to reduce energy consumption
drastically for such tasks. This is especially true as, in inference hardware, synaptic weights are static and
can be programmed to memory only one if the circuit does not need to change function.
4.1.2 Impact of Binarization
We now look specifically to the benefits of relying on Binarized Neural Networks rather than real-valued
digital ones. Binarized Neural Networks feature considerably simpler architecture than conventional neural
network, but also require an increased number of neurons and synapses to achieve equivalent accuracy. It is
therefore essential to confront the binarized and real-values approaches.
Most digital ASIC implementations of neural networks inference function with eight-bit fixed point
arithmetic, the most famous example being the tensor processing units developed by Google (Jouppi et al.,
2017). At this precision, no degradation is usually seen for inference with regards to 32 and 64 bits floating
point arithmetic.
To investigate the benefits of Binarized Neural Networks, Fig. 9 looks at the energy consumption for
inference over a single MNIST digits. We consider two architectures: a neural network with a single hidden
layer (Fig. 9(a)) and another one with two hidden layers (Fig. 9(b)), and we vary the number of hidden
neurons. Figs. 9(a) and Figs. 9(b) plot on the x axis the estimated energy consumption of a Binarized Neural
13
Hirtzlin et al. Digital Biologically Plausible Implementation...
Figure 9. Red circles: MNIST validation accuracy as a function of the inference energy of our Binarized
Neural Network hardware design. Blue cross: same, as function of the energy used for arithmetic operation
in a real valued neural networks employing eight bits fixed point arithmetic. The different points are
obtained by varying the number of hidden neurons in (a) a one hidden layer neural network and (b) a two
hidden layers neural network. Insets: number of synapses in each situations.
Networks using our architecture based on the flow presented in the Methods section. It also plots the energy
required for the arithmetic operations (sum and product) of a eight bit fixed point regular neural network,
neglecting overhead that is considered for the Binarized Neural Network. For both types of networks, the y
axis shows the resulting accuracy on the MNIST task. We see that at equivalent precision, the Binarized
Neural Network always consume much less energy than the arithmetic operations of the real-valued one.
It is remarkable that the energy benefit depends very significantly on the targeted accuracy, and should
therefore be investigated in a case by case basis. The highest energy benefits, a little less than a factor ten,
are seen at lower targeted precision.
Binarized Neural Networks have other benefits with regards to real valued digital networks: if the weights
are stored in RRAM, the programming energy is reduced due to the lower memory requirements of
Binarized Neural Networks. The area of the overall circuit is also expected to be reduced due to the absence
of multipliers, which is a high area circuit.
4.1.3 Comparison with Analog Approaches
As mentioned in the introduction, a widely studied approach for implementing neural networks with
RRAM is to rely on an analog electronics strategy, where Ohm’s law is exploited for implementing
multiplications, and Kirchoff’s current law for implementing additions (Ambrogio et al., 2018; Prezioso
et al., 2015; Serb et al., 2016; Li et al., 2018; Wang et al., 2018). It is not straightforward to compare
the digital approach presented in this paper with the analog approach, as the detailed performance of the
latter one will depend tremendously on the implementation details, device specifics and size of the neural
networks. Nevertheless, several points can be made.
First, the programming of the devices is much simpler in our approach than in the analog one: one
only needs to program a device and its complementary one in LRS and HRS, which can be achieved by
two programming pulses. It is not necessary to verify the programming operation, as the neural network
14
Hirtzlin et al. Digital Biologically Plausible Implementation...
has inherent bit error tolerance. Programming RRAM for analog operation is more challenging task, and
usually requires a sequence of multiple pulses (Prezioso et al., 2015), which leads to higher programming
energy and device aging.
For the neural network operation, the analog approach and ours function differently. Our approach
reads synaptic values using the sense amplifier, which is a highly energy efficient and fast circuit that can
operate at hundreds of picoseconds in advanced CMOS nodes (Zhao et al., 2014). This sense amplifier
inherently produces the multiplication operation, and then the addition needs to be performed with low bit
number digital integer addition circuit. The ensemble of a read operation and the corresponding addition
typically consumes fourteen femtojoules in our estimates in advanced node. In the analog approach, read
is performed by applying a voltage pulse, and inherently produces the multiplication though Ohm’s law,
but also the addition though Kirchoff law. It therefore is attractive, but in the other hand requires the use
of CMOS analog overhead circuitry such as operational amplifier, which can bring large energy and area
overhead. Which approach is the most energy efficient between ours and the analog one will probably
depend tremendously between memory size, application and targeted accuracy.
Another advantage of the digital approach is that it is much simpler to design, test and verify, as it relies
on all standard VLSI design tools. On the other hand, an advantage of the analog approach is that it may
for small memory size function without access transistors, resulting in higher memory densities (Prezioso
et al., 2015).
4.1.4 Impact in Terms of Programming Energy and Device Aging
Table 1. RRAM Properties with Different Programming Conditions
Programming condition Very strong Strong Weak
SET compliance current 600µA 55µA 20µA
RESET voltage 2.5V 2.5V 1.5V
Programming time 100ns 100ns 100ns
1T1R Bit error rate < 10−6 9.7× 10−5 3.3× 10−2
2T2R Bit error rate < 10−10 < 10−10 2× 10−3
Programming energy 120/150pJ 11/14pJ 4/5pJ
(SET/RESET)
Cyclability 100 > 10, 000 > 1010
A last comment is that the bit error tolerance of binarized neural networks can have considerable benefits
at the system level. Table 1 summarizes the measured properties of RRAM cells in different programming
conditions. We see that weaker programming conditions (low SET compliance currents and RESET voltage)
lead to high bit error rates, but to reduced programming energy and much higher cyclability. They can also
be used with smaller memory cells as the transistor in the 1T1R structure needs to drive lower currents.
With the benefits of the 2T2R structure combined with the inherent resilience of Binarized Neural Networks
seen in the previous section, it is envisionable to use very weak programming conditions. For example, we
see in, Table 1 that using a SET current of 20µA, a RESET voltage of 1.5V and a programming time of
100ns, we get a 2T2R bit error rate of 2× 10−3 that would only marginally reduce the performance of a
binarized neural network on vision tasks (Fig. 8). On the other hand, devices in such conditions need less
than five picojoules to be programmed, and show outstanding endurance of more than ten billion cycles.
15
Hirtzlin et al. Digital Biologically Plausible Implementation...
4.2 Conclusion
In this work, we have proposed an architecture for implementing binarized neural networks with RRAMs.
Our approach incorporates several biological-plausible ideas:
• Fully co-locating logic memory,
• Relying only on low precision computation (through the Binarized Neural Network concept),
• Avoiding multiplication all-together,
• The acceptance of some errors without formal error correction.
At the same time, it relies on conventional microelectronics ideas that are non-biological in nature:
• Relying on fixed point arithmetic to compute sums, whereas brains use analog computation,
• Use of sense amplifiers circuits, which are not brain-inspired ,
• And the use of a differential structure to reduce errors, a traditional electrical engineering strategy.
Based on these ideas, we designed, fabricated and tested extensively a memory structure with its periphery
circuitry, and designed and simulated a full digital system based on these values. We see that our structure
allows implementing neural networks without the use of Error Correcting Codes that are usually used
with emerging memories, features very attractive properties in terms of energy consumption, and can
allow using RRAM devices in “weak” programming regime where they have low programming energy
and outstanding endurance. These results highlight that although in-memory computing cannot rely on
Error Correcting Codes, if a differential memory architecture is chosen, this does not have to translate into
stringent requirements on device variability.
When working on bioinspiration, drawing the line between bio-plausability and embracing the differences
between brain nanodevices and electronic devices is always a challenging question. In this project, we
try to show that digital electronics can be enriched by biological-plausible ideas. When working with
nanodevices, it can be beneficial to incorporate device physics questions into the design, and to not seek
for the level of determinism that we have been accustomed to by CMOS.
This works opens multiple prospects. On the device front, it could be possible to develop more integrated
2T2R structure, to increase the density of the memories. The concept could also be adapted to other kind of
emerging memories, such as phase change memories or spin torque magnetoresistive memories. At the
system level, we are now on the position to fabricate larger systems, and to investigate its extension to
more varied forms of neural network architecture such as convolutional and recurrent ones. In the case
of convolutional one, a dilemma is between keeping the in-memory computing approach to its fullest, by
replicating physically convolutional kernels, or implementing some sequential computation to minimize
resources, as works have started to evaluate already. These considerations open the way for truly low energy
artificial intelligence for both servers and embedded systems.
CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial
relationships that could be construed as a potential conflict of interest.
16
Hirtzlin et al. Digital Biologically Plausible Implementation...
AUTHOR CONTRIBUTIONS
EV and EN were in charge of fabrication, and of the initial RRAM characterization. JMP performed the
CMOS design of the memory array. MB performed the electrical characterization. TH designed the BNN
systems. TH, BP and DQ performed the BNN simulations. DQ wrote the initial version of the paper. JOK,
EV, JMP and DQ planned and supervised the project. All authors participated to data analysis, and to the
writing of the paper.
FUNDING
This work is supported by the European Research Council Starting Grant NANOINFER (reference: 715872)
and the Agence Nationale de la Recherche grant NEURONIC (ANR-18-CE24-0009).
DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.
REFERENCES
Ambrogio, S., Narayanan, P., Tsai, H., Shelby, R. M., Boybat, I., Nolfo, C., et al. (2018). Equivalent-
accuracy accelerated neural-network training using analogue memory. Nature 558, 60
Ando, K., Ueyoshi, K., Orimo, K., Yonekawa, H., Sato, S., Nakahara, H., et al. (2017). Brein memory:
A 13-layer 4.2 k neuron/0.8 m synapse binary/ternary reconfigurable in-memory deep neural network
accelerator in 65 nm cmos. In Proc. VLSI Symp. on Circuits (IEEE), C24–C25
Bocquet, M., Hirztlin, T., Klein, J.-O., Nowak, E., Vianello, E., Portal, J.-M., et al. (2018). In-memory and
error-immune differential rram implementation of binarized deep neural networks. In IEDM Tech. Dig.
(IEEE), 20.6.1
Chen, A. (2016). A review of emerging non-volatile memory (nvm) technologies and applications.
Solid-State Electronics 125, 25–38
Chen, W. H., Li, K. X., Lin, W. Y., Hsu, K. H., Li, P. Y., Yang, C. H., et al. (2018). A 65nm 1mb nonvolatile
computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge
processors. In Proc. ISSCC. 494–496. doi:10.1109/ISSCC.2018.8310400
Chen, W. H., Lin, W. J., Lai, L. Y., Li, S., Hsu, C. H., Lin, H. T., et al. (2017). A 16mb dual-mode ReRAM
macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination
scheme. In IEDM Tech. Dig. 28.2.1–28.2.4. doi:10.1109/IEDM.2017.8268468
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. (2016). Binarized neural networks:
Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint
arXiv:1602.02830
Covi, E., Brivio, S., Serb, A., Prodromakis, T., Fanciulli, M., and Spiga, S. (2016). Analog memristive
synapse in spiking networks implementing unsupervised learning. Frontiers in neuroscience 10, 482
Editorial (2018). Big data needs a hardware revolution. Nature 554, 145. doi:10.1038/d41586-018-01683-1
Giacomin, E., Greenberg-Toledo, T., Kvatinsky, S., and Gaillardon, P.-E. (2019). A robust digital rram-
based convolutional block for low-power image processing and learning applications. IEEE Transactions
on Circuits and Systems I: Regular Papers 66, 643–654
Gregori, S., Cabrini, A., Khouri, O., and Torelli, G. (2003). On-chip error correcting techniques for
new-generation flash memories. Proc. IEEE 91, 602–616
17
Hirtzlin et al. Digital Biologically Plausible Implementation...
Grossi, A., Nowak, E., Zambelli, C., Pellissier, C., Bernasconi, S., Cibrario, G., et al. (2016). Fundamental
variability limits of filament-based rram. In IEDM Tech. Dig. (IEEE), 4–7
Grossi, A., Vianello, E., Zambelli, C., Royer, P., Noel, J.-P., Giraud, B., et al. (2018). Experimental
investigation of 4-kb rram arrays programming conditions suitable for tcam. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems 26, 2599–2607
Hsieh, W.-T., Chih, Y.-D., Chang, J., Lin, C.-J., and King, Y.-C. (2017). Differential Contact RRAM Pair
for Advanced CMOS Logic NVM applications. Proc. SSDM , 171
Ielmini, D. and Wong, H.-S. P. (2018). In-memory computing with resistive switching devices. Nature
Electronics 1, 333
Indiveri, G. and Liu, S.-C. (2015). Memory and information processing in neuromorphic systems. Proc.
IEEE 103, 1379–1397
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., et al. (2017). In-datacenter
performance analysis of a tensor processing unit. In Proc. ISCA (IEEE), 1–12
Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep.,
Citeseer
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional
neural networks. In Advances in neural information processing systems. 1097–1105
Lane, N. D., Bhattacharya, S., Georgiev, P., Forlivesi, C., Jiao, L., Qendro, L., et al. (2016). Deepx: A
software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the
15th International Conference on Information Processing in Sensor Networks (IEEE Press), 23
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document
recognition. Proc. IEEE 86, 2278–2324
Li, C., Belkin, D., Li, Y., Yan, P., Hu, M., Ge, N., et al. (2018). Efficient and self-adaptive in-situ learning
in multilayer memristor neural networks. Nature communications 9, 2385
Lin, X., Zhao, C., and Pan, W. (2017). Towards accurate binary convolutional neural network. In Advances
in Neural Information Processing Systems. 345–353
Ly, D. R. B., Grossi, A., Fenouillet-Beranger, C., Nowak, E., Querlioz, D., and Vianello, E. (2018). Role
of synaptic variability in resistive memory-based spiking neural networks with unsupervised learning. J.
Phys. D: Applied Physics
Markidis, S., Der Chien, S. W., Laure, E., Peng, I. B., and Vetter, J. S. (2018). Nvidia tensor core
programmability, performance & precision. In 2018 IEEE International Parallel and Distributed
Processing Symposium Workshops (IPDPSW) (IEEE), 522–531
Natsui, M., Chiba, T., and Hanyu, T. (2018). Design of mtj-based nonvolatile logic gates for quantized
neural networks. Microelectronics journal 82, 13–21
Pedram, A., Richardson, S., Horowitz, M., Galal, S., and Kvatinsky, S. (2017). Dark memory and
accelerator-rich system optimization in the dark silicon era. IEEE Design & Test 34, 39–50
Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G. C., Likharev, K. K., and Strukov, D. B. (2015).
Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature
521, 61
Querlioz, D., Bichler, O., Vincent, A. F., and Gamrat, C. (2015). Bioinspired programming of memory
devices for implementing an inference engine. Proc. IEEE 103, 1398–1416
Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A. (2016). Xnor-net: Imagenet classification using
binary convolutional neural networks. In Proc. ECCV (Springer), 525–542
Saı¨ghi, S., Mayr, C. G., Serrano-Gotarredona, T., Schmidt, H., Lecerf, G., Tomas, J., et al. (2015). Plasticity
in memristive devices for spiking neural networks. Frontiers in neuroscience 9, 51
18
Hirtzlin et al. Digital Biologically Plausible Implementation...
Serb, A., Bill, J., Khiat, A., Berdan, R., Legenstein, R., and Prodromakis, T. (2016). Unsupervised
learning in probabilistic neural networks with multi-state metal-oxide memristive synapses. Nature
communications 7, 12611
Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J. P., Hu, M., et al. (2016). Isaac:
A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH
Computer Architecture News 44, 14–26
Shih, Y.-H., Hsu, M.-Y., Lin, C. J., and King, Y.-C. (2017). Twin-bit Via RRAM in 16nm FinFET Logic
Technologies. Proc. SSDM , 137
Sun, X., Peng, X., Chen, P.-Y., Liu, R., Seo, J.-s., and Yu, S. (2018a). Fully parallel rram synaptic array for
implementing binary neural network with (+ 1,- 1) weights and (+ 1, 0) neurons. In Proc. ASP-DAC
(IEEE Press), 574–579
Sun, X., Yin, S., Peng, X., Liu, R., Seo, J.-s., and Yu, S. (2018b). Xnor-rram: A scalable and parallel
resistive synaptic architecture for binary neural networks. algorithms 2, 3
Tang, T., Xia, L., Li, B., Wang, Y., and Yang, H. (2017). Binary convolutional neural network on rram. In
Proc. ASP-DAC (IEEE), 782–787
Wang, Z., Ambrogio, S., Balatti, S., and Ielmini, D. (2015). A 2-transistor/1-resistor artificial synapse
capable of communication and stochastic learning in neuromorphic systems. Frontiers in neuroscience
8, 438
Wang, Z., Joshi, S., Savel’ev, S., Song, W., Midya, R., Li, Y., et al. (2018). Fully memristive neural
networks for pattern classification with unsupervised learning. Nature Electronics 1, 137
Yu, S. (2018). Neuro-inspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285
Yu, S., Li, Z., Chen, P.-Y., Wu, H., Gao, B., Wang, D., et al. (2016). Binary neural network with 16 mb
rram macro chip for classification and online training. In IEDM Tech. Dig. (IEEE), 16–2
Zhao, W., Chappert, C., Javerliac, V., and Noziere, J.-P. (2009). High speed, high stability and low power
sensing amplifier for mtj/cmos hybrid logic circuits. IEEE Transactions on Magnetics 45, 3784–3787
Zhao, W., Moreau, M., Deng, E., Zhang, Y., Portal, J.-M., Klein, J.-O., et al. (2014). Synchronous
non-volatile logic gate design based on resistive switching memories. IEEE Transactions on Circuits
and Systems I: Regular Papers 61, 443–454
19
