One-step regression and classification with crosspoint resistive memory
  arrays by Sun, Zhong et al.
1 
 
One-step regression and classification with crosspoint resistive memory arrays 
Authors: Zhong Sun, Giacomo Pedretti, Alessandro Bricalli, Daniele Ielmini* 
Affiliations: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano 
and IU.NET, Piazza L. da Vinci 32 – 20133 Milano, Italy. 
*Correspondence to: daniele.ielmini@polimi.it 
 
One Sentence Summary: Machine learning algorithms such as linear regression and logistic 
regression are trained in one step with crosspoint resistive memory arrays. 
  
2 
 
Abstract: Machine learning has been getting a large attention in the recent years, as a tool to 
process big data generated by ubiquitous sensors in our daily life. High speed, low energy 
computing machines are in demand to enable real-time artificial intelligence at the edge, i.e., 
without the support of a remote frame server in the cloud. Such requirements challenge the 
complementary metal-oxide-semiconductor (CMOS) technology, which is limited by the 
Moore’s law approaching its end and the communication bottleneck in conventional computing 
architecture. Novel computing concepts, architectures and devices are thus strongly needed to 
accelerate data-intensive applications. Here we show a crosspoint resistive memory circuit with 
feedback configuration can execute linear regression and logistic regression in just one step by 
computing the pseudoinverse matrix of the data within the memory. The most elementary 
learning operation, that is the regression of a sequence of data and the classification of a set of 
data, can thus be executed in one single computational step by the novel technology. One-step 
learning is further supported by simulations of the prediction of the cost of a house in Boston and 
the training of a 2-layer neural network for MNIST digit recognition. The results are all obtained 
in one computational step, thanks to the physical, parallel, and analog computing within the 
crosspoint array. 
  
3 
 
INTRODUCTION  
Resistive memories, also known as memristors (1), including resistive switching memory (RRAM) and 
phase change memory (PCM), are emerging as a novel technology for high density storage (2,3), 
neuromorphic hardware (4,5) and stochastic security primitives, such as random number generators (6,7). 
Thanks to their ability to store analog values and to their excellent programming speed, resistive 
memories have also been demonstrated for executing in-memory computing (8-17), which eliminates the 
data transfer between the memory and the processing unit to improve the time and energy efficiency of 
computation. With a crosspoint architecture, resistive memories can be naturally utilized to perform 
matrix-vector multiplication (MVM), by exploiting fundamental physical laws such as the Ohm’s law and 
the Kirchhoff’s law of electric circuits (8). Crosspoint MVM has been shown to accelerate various data-
intensive tasks, such as training and inference of deep neural networks (11-14), signal and image 
processing (15), and the iterative solution of a system of linear equations (16) or a differential equation 
(17). With a feedback circuit configuration, the crosspoint array has been shown to solve systems of linear 
equations and calculate matrix eigenvectors in one step (18). Such a low computational complexity is 
attributed to the massive parallelism within the crosspoint array, and to the analog storage and 
computation with physical MVM. Here, we show that a crosspoint resistive memory circuit with feedback 
configuration is able to accelerate fundamental learning functions, such as predicting the next point of a 
sequence by linear regression, or attributing a new input to either one of two classes of objects by logistic 
regression. These operations are completed in just one step in the circuit, in contrast to the iterative 
algorithms running on conventional digital computers, which approach the solution with a polynomial 
time complexity. 
RESULTS 
Linear regression in one step 
Linear regression is a fundamental machine learning (ML) model for regressive and predictive analysis in 
various disciplines, such as biology, social science, economics and management (19-21). Logistic 
4 
 
regression, instead, is a typical tool for classification tasks (22), e.g., acting as the last classification layer 
in a deep neural network (23,24). Due to their simplicity, interpretability and well-known properties, 
linear and logistic regression stand out as the most popular ML algorithms across many fields (25). A 
linear regression model is described by an overdetermined linear system given by:  
𝑿𝒘 = 𝒚, (1) 
where 𝑿 is an 𝑁 × 𝑀 matrix (𝑁 > 𝑀), 𝒚 is a known vector with a size of 𝑁 × 1, and 𝒘 is the 
unknown weight vector (𝑀 × 1) to be solved. As the problem is overdetermined, there is typically no 
exact solution 𝒘 to Eq. (1). The best solution of Eq. (1) can be obtained by the least squares error (LSE) 
approach, which minimizes the norm of error 𝜺 = 𝑿𝒘 − 𝒚, namely ‖𝜺‖ = ‖𝑿𝒘 − 𝒚‖2 where ‖∙‖2 is 
the Euclidean norm. The vector 𝒘 minimizing ‖𝜺‖ is obtained by the pseudoinverse (21,23,24) [or 
Moore-Penrose inverse (26)] 𝑿+, given by: 
𝒘 = 𝑿+𝒚 = (𝑿𝑻𝑿)
−𝟏
𝑿𝑻𝒚, (2) 
where 𝑿𝑻 is the transpose of matrix 𝑿.  
To obtain the solution in Eq. (2), we propose a crosspoint resistive memory circuit in Fig. 1A, where the 
matrix 𝑿 is mapped by the conductance matrix 𝑮𝑿 in a pair of crosspoint arrays of analog resistive 
memories, the vector 𝒚 corresponds to the opposite of the input current vector 𝒊 = [𝐼1; 𝐼2; … ; 𝐼𝑁], and 𝒘 
is represented by the output voltage vector 𝒗 = [𝑉1; 𝑉2; … ; 𝑉𝑀] (𝑀 = 2 in Fig. 1A).  
For a practical demonstration of this concept, we adopted arrays of RRAM devices composed of a HfO2 
dielectric layer sandwiched between a Ti top electrode and a C bottom electrode (27). This type of RRAM 
device can be programmed to any arbitrary analog conductance within a certain range, thus allowing to 
represent the matrix elements 𝑋𝑖𝑗 of the matrix 𝑿 with sufficient accuracy (18). Representative analog 
conductance levels were programmed by controlling the compliance current during the set transition, as 
shown in Fig. S1. The crosspoint arrays are connected within a nested feedback loop (28) by 𝑁 
operational amplifiers (OAs) from the left array to the right array, and 𝑀 OAs from the right array to the 
5 
 
left array. Briefly, the first set of OAs gives a negative transfer of the current, while the second one gives 
a positive transfer, resulting in an overall negative feedback, hence stable operation of the circuit with 
virtual ground inputs of all OAs. A detailed analysis of the circuit stability is reported in Supplementary 
text 1. 
According to Ohm’s law and Kirchhoff’s law in Fig. 1A, the input currents at the OAs from the left 
crosspoint array are 𝑮𝑿𝒗 + 𝒊, thus the output voltages applied to the right crosspoint array are 𝒗𝒓 =
−
(𝑮𝑿𝒗+𝒊)
𝐺𝑇𝐼
, where 𝐺𝑇𝐼 is the feedback conductance of transimpedance amplifiers (TIAs). The right 
crosspoint array operates another MVM between the voltage vector 𝒗𝒓 and the transpose conductance 
matrix 𝑮𝑿
𝑻, resulting in a current vector −𝑮𝑿
𝑻 (𝑮𝑿𝒗+𝒊)
𝐺𝑇𝐼
, which is forced to zero at the input nodes of the 
second set of OAs, namely:  
𝑮𝑿
𝑻(𝑮𝑿𝒗 + 𝒊) = 𝟎. (3) 
The steady-state voltages 𝒗 at the left array are thus given by: 
𝒗 = −(𝑮𝑿
𝑻𝑮𝑿)
−1
𝑮𝑿
𝑻𝒊,    (4) 
which is Eq. (2) with 𝑮𝑿, 𝒊 and 𝒗 representing 𝑿, −𝒚 and 𝒘, respectively. The crosspoint array 
circuit of Fig. 1A thus solves the linear regression problem in just one step. 
The circuit of Fig. 1A was implemented in hardware using RRAM devices arranged within a crosspoint 
architecture on a printed circuit board (see Materials and Methods with Fig. S2). As a basic model, we 
considered the simple linear regression of points (𝑥𝑖, 𝑦𝑖), where 𝑖 = 1,2, … , 𝑁, to be fitted by a linear 
model 𝑤0 + 𝑤1𝑥𝑖 = 𝑦𝑖, where 𝑤0 and 𝑤1 are the intercept with axis 𝑦 and the slope, respectively, of 
the best fitting line. To solve this problem in hardware, we encoded the matrix 𝑿:  
𝑿 = [
1
1
𝑥1
𝑥2
⋮
1
⋮
𝑥𝑁
], (5) 
6 
 
in the crosspoint arrays. A column of discrete resistors with G = 100 S was used to represent the first 
column of 𝑿 in Eq. (5), which is identically equal to 1. The second columns of both arrays were 
implemented with reconfigurable RRAM devices. A total number of 𝑁 = 6 data points was considered, 
with each 𝑥𝑖 implemented as a RRAM conductance with unit 100 S. The unit of conductance was 
chosen according to the range of linear conduction of the device (18), thus ensuring a good computation 
accuracy. Other aspects such as the current limit of the OAs and the power consumption should also be 
considered to select the best memory devices in the circuit. Although the conductance values in the two 
crosspoint arrays should be identical, some mismatch can be tolerated for practical implementations 
(Supplementary text 2). A program/verify technique was used to minimize the relative error (less than 
5%) between the values of 𝑿 in the two crosspoint arrays (Fig. S3). The data ordinates −𝒚 were instead 
applied as input currents. The input currents should be kept relatively small so that the resulting output 
signal is low enough to prevent disturbance of the device states in the stored matrix. 
Given the matrix 𝑿 stored in the crosspoint arrays and an input current vector 𝒚, the corresponding 
linear system was then solved by the circuit in one step. Fig. 1B shows the resulting dataset for an input 
current vector 𝒊 = [0.3; 0.4; 0.4; 0.5; 0.5; 0.6]𝐼0 with 𝐼0 = 100 A to align with the conductance 
transformation unit. Fig. 1B also shows the regression line, obtained by the circuit output voltages 
representing weights 𝑤0 and 𝑤1. The comparison with the analytical regression line shows a relative 
error of -4.86% and 0.82% for 𝑤0 and 𝑤1, respectively. The simulated transient behavior of the circuit 
is shown in Fig. S4, evidencing that the linear regression weights are computed within about 1 s. By 
changing the input vector, a different linear system was formed and solved by the circuit, as shown in 
Fig. 1C for 𝒊 = [0.3; 0.3; 0.5; 0.4; 0.5; 0.7]𝐼0. The result evidences that a more scattered dataset can also 
be correctly fitted by the circuit. 
The crosspoint circuit also naturally yields the prediction of the value 𝑦∗ in respondence of a new point 
at position 𝑥∗. This is obtained by adding an extra row in the left crosspoint, where an additional RRAM 
element is used to implement the new coordinate 𝑥∗ (Fig. S5). The results are shown in Fig. 1C, 
7 
 
indicating a prediction by the circuit which is only 1% smaller compared to the analytical prediction. 
Fig. S6 reports more linear regression and prediction results of various datasets. Linear regression with 2 
independent variables was also demonstrated by a crosspoint array of 3 columns, with results shown in 
Fig. S7. These results support the crosspoint circuit for the solution of linear regression models in various 
dimensions. The linear regression concept can also be extended to nonlinear regression models, e.g., 
polynomial regression (29), to better fit a dataset and thus make better predictions. By loading the 
polynomial terms in crosspoint arrays, the circuit can also realize polynomial regression in one step 
(Supplementary text 3 with Fig. S8). 
Logistic regression 
Logistic regression is a binary model that is extensively used for object classification and pattern 
recognition. Different from linear regression which is a fully linear model, logistic regression also 
includes a nonlinear sigmoid function to generate the binary output. A logistic regression model can be 
viewed as the single-layer feed-forward neural network in Fig. 2A. Here, the weighted summation vector 
𝒔 of input signals to the nonlinear neuron is given by:  
𝒔 = 𝑿𝒘. (6) 
where 𝑿 is the matrix containing the independent-variable values of all samples, and 𝒘 is the vector of 
the synaptic weights. The neuron outputs are thus simply given by the vector 𝒚 = 𝑓(𝒔), where 𝑓 is the 
nonlinear function of the neuron. To compute the weights of a logistic regression model with a sample 
matrix 𝑿 and a label vector 𝒚, the logit transformation can be first executed (30). By applying the 
inverse of sigmoid function, the label vector 𝒚 is converted to a summation vector 𝒔, namely 𝒔 =
𝑓−1(𝒚). As a result, the logistic regression is reduced to a linear regression problem, where the weights 
can be obtained in one step by the pseudoinverse concept:  
𝒘 = 𝑿+𝒔. (7) 
For simplicity, we assumed that the nonlinear neuron function is instead a step function and that the 
8 
 
summation vector 𝒔 in Fig. 2A is binarized according to: 
𝑠𝑖 = {
𝑎, 𝑖𝑓 𝑦𝑖 = 1
−𝑎, 𝑖𝑓 𝑦𝑖 = 0
, (8) 
where 𝑎 is a positive constant for regulating the output voltage in the circuit. After this transformation, 
the weights can be computed directly with the pseudoinverse circuit of Fig. 1A. 
Fig. 2B shows a set of 6 data points with coordinates (𝑥1, 𝑥2) divided into two classes, namely 𝑦 = 0 
(open) and 𝑦 = 1 (full). Fig. 2C shows the matrix 𝑿 where the first column is equal to 𝟏, while the 
other columns represent the coordinates 𝑥1 and 𝑥2 of the dataset. The sample matrix 𝑿 was mapped in 
the two crosspoint arrays of Fig. 1A, and input current was applied to each row to represent 𝒔 with 𝑎 =
0.2, according to Eq. (8). The circuit schematic is reported in Fig. S9 together with experimental results 
and relative errors of logistic regression. The simulated transient behavior of the circuit is shown in Fig. 
S10, with a computing time around 0.6 s. The output voltage yields the weights 𝒘 = [𝑤0; 𝑤1; 𝑤2] with 
𝑠 = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 = 0 representing the decision boundary for classification, where 𝑠 ≥ 0 
indicates the domain of class ‘1’ and 𝑠 < 0 the domain of class ‘0’. This is shown as a line in Fig. 2B, 
displaying a tight agreement with the analytical solution. The crosspoint circuit enables a one-step 
solution of logistic regression with datasets of various dimensionalities and sizes accommodated by the 
crosspoint arrays. Similar to linear regression, the circuit can also provide one-step classification of any 
new (unlabeled) point, which is stored in a grounded additional row of the left crosspoint array. The 
current flowing in the row yields the class of the new data point. Though here we consider two cases 
containing only positive independent-variable values for linear/logistic regression, datasets containing 
negative values can also be addressed by simply translating the entire data to be positive, as explained in 
Supplementary Fig. S11.  
Linear regression of Boston housing dataset 
While the circuit capability has been demonstrated in experiment for small models, the matrix size is an 
obvious concern that needs to be addressed for real-world applications. To study the circuit scalability, we 
9 
 
considered a large dataset, namely the Boston housing price list for linear regression (31,32). The dataset 
collects 13 attributes and the prices of 506 houses, 333 of which are used for training the linear regression 
model while the rest are used for testing the model. The attributes are summarized in the Supplementary 
text 4. We performed linear regression with the training set to compute the weights with the crosspoint 
circuit, applied the regression model to predict house prices of the test set.  
Fig. 3A shows the matrix 𝑿 for the training set, including a first column of 1, and the other columns 
recording the 13 attributes, and the input vector 𝒚, representing the corresponding prices. The matrix 𝑿 
was rescaled to make the conductance values in crosspoint arrays uniform, and the vector 𝒚 was also 
scaled down to prevent excessive output voltage 𝒘 (see Fig. S12). We simulated the linear regression 
circuit with SPICE (Simulation Program with Integrated Circuit Emphasis, see Materials and Methods), 
where the RRAM devices were assumed to accurately map the matrix values within 8-bit precision. 
Fig. 3B shows the calculated 𝒘 obtained from the output voltage in the simulated circuit, with the 
relative errors remaining within ±1%, thus demonstrating the good accuracy and scalability of the circuit.  
Fig. 3C shows the obtained regression results compared with the real house prices of the training set. A 
standard deviation P of 4,733$ is obtained from SPICE simulations, which is in line with the analytical 
solution P’ = 4,732$. Fig. 3D shows the predicted prices of the test set compared to the real prices. The 
standard deviation from the circuit simulation is P = 4,779$, in good agreement with the analytical 
results P’ = 4,769$. The resulting standard deviation is only slightly larger than the training set, which 
supports the ability for generalization of the model. One-step price prediction for test samples is possible 
by storing the unlabeled attributes in additional rows of the left crosspoint array and measuring the 
corresponding current, as indicated in Fig. S13. 
2-layer neural network training 
Logistic regression is widely employed in the last fully-connected classification layer in deep neural 
networks. The crosspoint circuit thus provides a hardware acceleration of the computation of the last-
10 
 
layer weights for training of a neural network. To test the crosspoint circuit as an accelerator for training 
neural networks, we considered the 2-layer perceptron in Fig. 4A, where the first-layer weights are set 
randomly (33,34), while the second-layer weights can be obtained by the pseudoinverse method in the 
crosspoint circuit. Note that a standard technique to train a 2-layer neural network is the backpropagation 
algorithm which reduces the squares error iteratively (35). In contrast to iterative backpropagation, the 
pseudoinverse approach can reach the LSE solution in just one step, thus providing a fast and energy-
efficient acceleration of network training. 
As a case study for neural network training, we adopted the MNIST dataset (36). To reduce the circuit 
size in the simulations, we used only 3,000 out of 50,000 samples to train the neural network. Also, to 
provide an efficient fan-out (for instance 4) for the first layer (34), the image size was down-sampled to 
be 14×14, resulting in a network of 196 input neurons, 784 hidden neurons and 10 output neurons for the 
classifications of the digits from 0 to 9. The training matrix 𝑻 is with a size of 3,000×196, and the first-
layer weights 𝑾(1) were randomly generated in the range between -0.5 and 0.5 with a uniform 
distribution (Fig. S14). The matrix 𝑿 can thus be obtained by:  
𝑿 = 𝑓(𝑻𝑾(1)), (9) 
while the weights of the second layer 𝑾(2) can be obtained by the pseudoinverse model of Eq. (2), with 
𝒀 containing all the known labels of training samples transformed according to Eq. (8) with 𝑎 = 0.05. 
For each training sample, the neuron corresponding to the digit is labelled 1, while the other 9 neurons are 
0. Note that the matrix 𝑿 results from the output of a sigmoid function of hidden neuron, thus is 
restricted in the range between 0 and 1 (Fig. S15). 
Fig. 4B shows the second-layer weights 𝑾(2) obtained by the simulation of the crosspoint circuit, where 
𝑿 was stored in the RRAM devices and each column of matrix 𝒀 was applied as input current. The 
weights were obtained in ten steps, one for each classification output (from digit ‘0’ to digit ‘9’). With the 
computed weights 𝑾(2), the network can recognize 500 handwritten digits with accuracy of 94.2%, 
11 
 
which is identical to the analytical pseudoinverse solution. For the whole test set (10,000 digits), the 
recognition accuracy is 92.15% using the simulated 𝑾(2), compared to 92.14% using the analytical 
solution. The crosspoint array can thus be used to accelerate the training of typical neural networks with 
ideal accuracy. The computed weights can then be stored in one or more open-loop crosspoint array for 
accelerating the neural network in the inference mode by exploiting in-memory MVM (Fig. S16) 
(8,11,12). 
Fig. 4B also shows the LSEs obtained from both the circuit simulation and analytical study. Note that the 
LSEs are different among the 10 digits, due to the dependence of LSEs on weight values (Supplementary 
text 5). Fig. 4C shows the simulated weights as a function of the analytical values for each output neuron, 
showing a good consistency except for the bias weight 𝑤0. The bias acts as a regulator to the summation 
of an output neuron, thus the deviated bias weight guarantees that the simulated LSE is close to the 
analytical one in Fig. 4B. It should be noted that, although a random 𝑾(1) was assumed in this study, 
𝑾(1) can be further optimized by gradient descent methods (37) to improve the accuracy. The same 
approach might be applied to pre-trained deep networks by the concept of transfer learning (38), thus 
enabling the one-step training capability for a generalized range of learning tasks. 
DISCUSSION 
Although the crosspoint circuit is inherently accurate and scalable, the imperfections of RRAM devices 
such as conductance discretization and stochastic variation (18) might affect the solution. To study the 
impact of these issues on the solution accuracy, we assumed a RRAM model with 32 discrete 
conductance levels, including 31 uniformly-spaced levels and one deep high resistance state (HRS), 
which is achievable in many resistive memory devices (39-41). The ratio between the maximum 
conductance 𝐺𝑚𝑎𝑥 and the minimum conductance 𝐺𝑚𝑖𝑛 is assumed to be 
𝐺𝑚𝑎𝑥
𝐺𝑚𝑖𝑛
= 103, in line with 
previous reports (42,43). To describe conductance variations, we assumed a standard deviation 𝜎 =
∆𝐺/6, ∆𝐺/4 or ∆𝐺/2, where ∆𝐺 is the nominal difference between 2 adjacent conductance levels. The 
12 
 
simulation results for the Boston housing benchmark (Fig. S17) shows that the resulting regression and 
prediction remains accurate for all cases. For the worst case (𝜎 = ∆𝐺/2), the standard deviation P of 
training set is equal to 4756$, compared to the ideal result of 4732$. The P of test set for prediction is 
even closer to the ideal one, namely 4765$ compared to 4769$. These results highlight the suitability of 
the crosspoint resistive memory circuit for machine learning tasks, where the device variations can be 
tolerated for regression, prediction and classification.  
Another concern for large scale circuits is the parasitic wire resistance. To study its impact on the 
accuracy of linear regression for Boston housing dataset, we adopted interconnect parameters at 65 nm 
technology obtained from the ITRS table (44), together with the RRAM model. The results in Fig. S18 
show an increased P for both regression and prediction, with the latter being more insignificant, which is 
consistent with the impact of device variation. Specifically, the P of prediction becomes merely 4809$, 
compared to the ideal 4769$, thus supporting the robustness of the linear regression circuit for predictive 
analysis.  
The circuit stability analysis in Supplementary text 1 reveals that the poles of the system all lie in the left 
half plane, thus the circuit is stable and the computing time is limited by the bandwidth corresponding to 
the first pole, which is the minimal eigenvalue (or real part of eigenvalue) 𝜆𝑚𝑖𝑛 (absolute value) of a 
quadratic eigenvalue problem (45). As 𝜆𝑚𝑖𝑛 becomes larger, the computation of the circuit gets faster, 
with no direct dependence on the size of the dataset. To support this scaling property of the circuit speed, 
we have simulated the transient dynamics of linear regression of the Boston housing dataset and its 
subsets for increasing size of the training samples (Fig. S19). The results show that the computing time 
may even decrease as the number of samples increases, which can be explained by the different 𝜆𝑚𝑖𝑛 of 
the datasets (Fig. S20). These results evidence that the time complexity of the crosspoint circuit for linear 
regression substantially differs from its counterparts of classical digital algorithms, with a potential of 
approaching size-independent time complexity to significantly speed up large scale ML problems. Note 
that, as the circuit size increases, a larger current is also required to sustain the circuit operation, which 
13 
 
might be limited by the capability of the OAs. To control the maximum current consumption in the 
circuit, the memory element should be carefully optimized by materials and device engineering (46) or by 
novel device concepts such as electrochemical transistor (47,48), to provide a low-conductance 
implementation. The impact of device variations and the energy efficiency of the circuit are studied for 
the 2-layer neural network for MNIST dataset training (Supplementary text 6 with Fig. S21). The results 
support the robustness of the circuit against device variations for classification applications, and an 
energy efficiency of 45.3 tera-operations per second per Watt (TOPS/W), which is 19.7 and 6.5 times 
better than the Google’s tensor processing unit (TPU) (49) and a highly optimized application-specific 
integrated circuit (ASIC) system (50), respectively. 
In conclusion, the crosspoint circuit has been shown to provide a one-step solution to linear regression 
and logistic regression, which is demonstrated in experiments with RRAM devices. The one-step learning 
capability relies on the high parallelism of analog computing by physical Ohm’s law and Kirchhoff’s law 
within the circuit, as well as physical iteration within the nested feedback architecture. Scalability of the 
crosspoint computing is demonstrated with large problems, such as the Boston housing dataset and the 
MNIST dataset. The results evidence that in-memory computing is significantly promising for 
accelerating machine learning tasks with high latency/energy performance, in a wide range of data-
intensive applications. 
  
14 
 
Materials and Methods 
RRAM device fabrication 
The RRAM devices in this work use a 5-nm HfO2 thin film as the dielectric layer, which was deposited 
by e-beam evaporation on a confined graphitic C bottom electrode (BE). Without breaking the vacuum, a 
Ti layer was deposited on top of the HfO2 layer as top electrode (TE). The forming process was operated 
by applying a DC voltage sweep from 0 to 5 V, where with the voltage was applied to the TE and the BE 
was grounded. After the forming process, the set and reset transitions take place under positive and 
negative voltages applied to the TE, respectively. 
Circuit experiment 
For all the experiments, the devices were arranged in the crosspoint configuration on a custom Printed 
Circuit Board (PCB, see Fig. S2), and an Agilent B2902A Precision Source/Measure Unit was employed 
to program the devices to different conductance states. 
Linear and logistic regression pseudoinverse experiments were carried out on a custom PCB with 
operational amplifiers (OAs) of model AD823 (Analog Devices) for the Negative Feedback Amplifiers 
(NFA) and OP2177 (Analog Devices) for Positive Feedback Amplifiers (PFA). RRAM devices of left 
matrix were connected with the BE to the NFAs’ inverting-input nodes and with the TE to the PFAs’ 
output terminals. RRAM devices of right matrix were connected with the BE to the PFAs’ inverting-input 
nodes and with the TE to the NFAs’ output terminals. A BAS40-04 diode is connected between every 
amplifier and ground, to limit the voltages within ±0.7 V, avoiding conductance changes of RRAM 
devices. 
All the input signals were given by a 4-channels arbitrary waveform generator (Aim-TTi TGA12104) and 
applied to fixed input resistances, which were connected between the input and the NFAs’ inverting-input 
nodes. The PFAs’ output voltages were monitored by an oscilloscope (LeCroy Wavesurfer 3024). 
The board was powered by a BK Precision 1761 DC power supply. 
SPICE simulation 
15 
 
Simulations of the crosspoint circuit for Boston housing case and MNIST training were carried out using 
LTSPICE (https://www.linear.com/solutions/1066). Linear resistors with defined conductance values 
were used to map a matrix in the crosspoint arrays. A universal op-amp model was used for all 
operational amplifiers, while positive and negative feedback amplifiers have different parameters. 
 
  
16 
 
References 
1. D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, The missing memristor found. Nature 453, 
80-83 (2008). 
2. D. Kau, et al., A stackable cross point phase change memory. In IEEE Int. Electron Devices Meet. 
(IEDM), pp 27.1.1-27.1.4 (2009). 
3. M. -J. Lee, et al., A fast, high-endurance and scalable non-volatile memory device made from 
asymmetric Ta2O5−x/TaO2−x bilayer structures. Nat. Mater. 10, 625-630 (2011). 
4. T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian, E. Eleftheriou, Stochastic phase-change neurons. Nat. 
Nanotech. 11, 693-699 (2016). 
5. G. Pedretti, V. Milo, S. Ambrogio, R. Carboni, S. Bianchi, A. Calderoni, N. Ramaswamy, A. S. 
Spinelli, D. Ielmini, Memristive neural network for on-line learning and tracking with brain-inspired 
spike timing dependent plasticity. Sci. Rep. 7, 5288 (2017). 
6. S. Balatti, S. Ambrogio, Z. -Q. Wang, D. Ielmini, True Random Number Generation by Variability of 
Resistive Switching in Oxide-Based Devices. IEEE J. Emerging Topics in Circuits and Systems 
(JETCAS) 5, 214-221 (2015). 
7. H. Jiang, et al., A novel true random number generator based on a stochastic diffusive memristor. Nat. 
Commun. 8, 882 (2017). 
8. D. Ielmini, H. -S. P. Wong, In-memory computing with resistive switching devices. Nat. Electron. 1, 
333-343 (2018). 
9. M. Cassinerio, N. Ciocchini, D. Ielmini, Logic computation in phase change materials by threshold and 
memory switching. Adv. Mater. 25, 5975-5980 (2013). 
10. Z. Sun, E. Ambrosi, A. Bricalli, D. Ielmini, Logic Computing with Stateful Neural Networks of 
Resistive Switches. Adv. Mater. 30, 1802554 (2018). 
11. G. W. Burr, et al., Experimental demonstration and tolerancing of a large-scale neural network (165 
000 synapses) using phase-change memory as the synaptic weight element. IEEE Trans. Electron 
Devices 62, 3498-3507 (2015). 
17 
 
12. T. Gokmen, Y. Vlasov, Acceleration of Deep Neural Network Training with Resistive Cross-Point 
Devices: Design Considerations. Front. Neurosci. 10, 333 (2016). 
13. P. Chi, et al., PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation 
in ReRAM-based Main Memory. In ACM/IEEE 43rd Ann. Int. Symp. Computer Architecture, pp. 27-
39 (2016). 
14. S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. 
Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, G. W. Burr, Equivalent-accuracy accelerated 
neural-network training using analogue memory. Nature 558, 60-67 (2018). 
15. C. Li, et al., Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 
52-59 (2018). 
16. M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, E. 
Eleftheriou, Mixed-precision in-memory computing. Nat. Electron. 1, 246-253 (2018). 
17. M. A. Zidan, Y. Jeong, J. Lee, B. Chen, S. Huang, M. J. Kushner, W. D. Lu, A general memristor-based 
partial differential equation solver. Nat. Electron. 1, 411-420 (2018). 
18. Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, D. Ielmini, Solving matrix equations in one step 
with cross-point resistive arrays. Proc. Natl Acad. Sci. USA 116, 4123-4128 (2019). 
19. M. H. Kutner, C. J. Nachtsheim, J. Neter, W. Li, Applied linear statistical models 5th ed. (McGraw-
Hill, New York, 2004). 
20. S. Weisberg, Applied Linear Regression 3rd ed. (John Wiley & Sons, Hoboken, 2005). 
21. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016). 
22. D. W. Hosmer, S. Lemeshow, Applied logistic regression 2nd ed. (John Wiley & Sons, Hoboken, 2000). 
23. C. M. Bishop, Pattern recognition and machine learning (Springer, New York, 2006). 
24. R. Rojas, Neural Networks: A Systematic Introduction (Springer-Verlag, Berlin, 1996). 
25. https://www.kaggle.com/surveys/2017. 
26. R. Penrose, A generalized inverse for matrices. Math. Proc. Cambridge Philos. Soc. 51, 406-413 
(1955). 
18 
 
27. E. Ambrosi, A. Bricalli, M. Laudato, D. Ielmini, Impact of oxide and electrode materials on the 
switching characteristics of oxide ReRAM devices. Faraday Discuss. 213, 87-98 (2019). 
28. E. M. Cherry, A new result in negative-feedback theory, and its application to audio power amplifiers. 
Int. J. Circuit Theory Appl. 6, 265-288 (1978). 
29. J. Fan, I. Gijbels, Local Polynomial Modelling and Its Applications (Chapman & Hall, London, 1996). 
30. W. Ashton, The Logit Transformation (Charles Griffin & Co., London, 1972). 
31. D. Harrison, D. L. Rubinfeld, Hedonic prices and the demand for clean air. J. Environ. Econ. Manag. 
5, 81-102 (1978). 
32. https://www.kaggle.com/c/boston-housing. 
33. W. F. Schmidt, M. A. Kraaijveld, R. P. Duin, Feed Forward Neural Networks With Random Weights. 
In 11th IAPR International Conference on Pattern Recognition, pp. 1-4 (IEEE, Piscataway, 1992). 
34. G. B. Huang, Q. Y. Zhu, C. K. Siew, Extreme learning machine: Theory and applications. 
Neurocomputing 70, 489-501 (2006). 
35. D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors. 
Nature 323, 533-536 (1986) 
36. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. 
Proc. IEEE 86, 2278-2324 (1998). 
37. D. Yu, L. Deng, Efficient and effective algorithms for training single-hidden-layer neural networks. 
Pattern Recognit. Lett. 33, 554-558 (2012). 
38. D. C. Ciresan, U. Meier, J. Schmidhuber, Transfer Learning for Latin and Chinese Characters with 
Deep Neural Networks. Proc. Int. Joint Conf. Neural Netw. (IJCNN) 20, 1-6 (2012). 
39. K. Seo, et al., Analog memory and spike-timing-dependent plasticity characteristics of a nanoscale 
titanium oxide bilayer resistive switching device. Nanotechnol. 22, 254023 (2011). 
40. J. Park, M. Kwak, K. Moon, J. Woo, D. Lee, H. Hwang, TiOx-based RRAM Synapse with 64-levels 
of Conductance and Symmetric Conductance Change by Adopting a Hybrid Pulse Scheme for 
Neuromorphic Computing. IEEE Electron Device Lett. 37, 1559-1562 (2016). 
19 
 
41. J. Tang, et al., ECRAM as Scalable Synaptic Cell for High-Speed, Low-Power Neuromorphic 
Computing. In 2018 IEEE International Electron Devices Meeting (IEDM 2018), pp. 13.1.1-13.1.4 
(2018). 
42. T. -C. Chang, K. -C. Chang, T. -M. Tsai, T. -J. Chu, S. M. Sze, Resistance random access memory. 
Mater. Today 19, 254-264 (2016). 
43. A. Mehonic, A. L. Shluger, D. Gao, I. Valov, E. Miranda, D. Ielmini, A. Bricalli, E. Ambrosi, C. Li, J. 
J. Yang, Q. Xia, A. J. Kenyon, Silicon Oxide (SiOx): A Promising Material for Resistance Switching? 
Adv. Mater. 30, 1801187 (2018). 
44. International Technology Roadmap for Semiconductors (ITRS), http://www.itrs.net/reports.html. 
45. F. Tisseur, K. Meerbergen, The Quadratic Eigenvalue Problem. SIAM Rev. 43, 235-286 (2001). 
46. K. Moon, A. Fumarola, S. Sidler, J. Jang, P. Narayanan, R. M. Shelby, G. W. Burr, H. Hwang, 
Bidirectional non-filamentary RRAM as an analog neuromorphic synapse, Part I: 
Al/Mo/Pr0.7Ca0.3MnO3 material improvements and device measurements. IEEE J. Electron Devices 
Soc. 6, 146 (2018). 
47. C. S. Yang, D. S. Shang, N. Liu, G. Shi, X. Shen, R. C. Yu, Y. Q. Li, Y. Sun, A Synaptic Transistor 
based on Quasi-2D Molybdenum Oxide. Adv. Mater. 29, 1700906 (2017). 
48. E. J. Fuller, S. T. Keene, A. Melianas, Z. Wang, S. Agarwal, Y. Li, Y. Tuchman, C. D. James, M. J. 
Marinella, J. J. Yang, A. Salleo, A. A. Talin, Parallel programming of an ionic floating-gate memory 
array for scalable neuromorphic computing. Science 364, 570-574 (2019). 
49. N. P. Jouppi, et al., In-Datacenter Performance Analysis of a Tensor Processing Unit. International 
Symposium on Computer Architecture (ISCA), pp. 1-12 (2017). 
50. P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang, W. D. Lu, Sparse coding with memristor networks. 
Nat. Nanotechnol. 12, 784-789 (2017). 
  
20 
 
Acknowledgments: This article has received funding from the European Research Council 
(ERC) under the European Union’s Horizon 2020 research and innovation programme (grant 
agreement No 648635). This work was partially performed at Polifab, the micro- and 
nanofabrication facility of Politecnico di Milano. Author contributions: Z.S. conceived the idea 
and designed the circuit. G.P. designed the printed circuit board. G.P. and Z.S. conducted the 
experiments. A.B. fabricated the devices. All the authors discussed the experimental and 
simulation results. Z.S. and D.I. wrote the manuscript with input from all authors. D.I. supervised 
the research. Competing interests: The authors declare that they have no competing interests. 
Data and materials availability: All data needed to evaluate the conclusions in the paper are 
present in the paper and/or the Supplementary Materials. Additional data available from authors 
upon request. 
 
21 
 
 
Fig. 1. Linear regression circuit and experiments. (A) Schematic illustration of the crosspoint circuit 
for solving linear regression with the pseudoinverse method. The conductance transformation unit is G0 = 
100 S. The feedback conductance GTI of TIA is equal to G0. A representative matrix 𝑿 for simple linear 
regression of 6 data points is also shown. (B) Linear regression of a 6-point dataset defined by the second 
column of 𝑿 in (A) on the x-axis and the input currents on the y-axis. The figure also shows the 
analytical and experimental regression lines, the latter being obtained as the measured voltages 𝒗 in the 
crosspoint circuit as regression weights. (C) A second 6-point regression experiment with the same vector 
𝑿 in (A) and a different set of input currents. A new input value x* = 4.91 was stored in an additional line 
of the left crosspoint array, thus enabling the one-step prediction along a sequence. The measured 
prediction y* = 0.727 is consistent with experimental and analytical regression lines. 
 
  
Analytical
Experiment
Date set
A
B C
(x*,y*)
X =
I1
I2
IN
V1
V2
GTI
GTI
GTI
GX
GX
22 
 
   
Fig. 2. Logistic regression experiments. (A) Illustration of a logistic regression model, consisting of the 
summation of weighted input signals being processed by a nonlinear activation function such as the 
sigmoid function (dash line) or the step function (solid line). The backward logit transformation is 
indicated by the bottom arrow. (B) A logistic regression of 6 data points divided in 2 classes. The input 
matrix 𝑿 is also shown, including a first column of discrete resistors, and a second and third columns 
storing the independent variables x1 and x2. The regression lines obtained from analytical and 
experimental results are also shown. The line provides the boundary line for data classification. 
 
  
A B
X =
∑
1
w1
w2
w0
xi1
xi2
s0
y
1
yi
s = f-1(y), w = X+s
s = Xw, y = f (s)
23 
 
  
Fig. 3. Linear regression of the Boston housing dataset. (A) Matrix 𝑿 including the 13 attributes for 
the 333 houses in the training set, and input vector 𝒚 of house prices. The same color bar was used for 
clarity, while the conductance and current units are assumed equal to 10 S and 10 A, respectively. (B) 
Calculated weights of the linear regression obtained by simulation of the crosspoint circuit and the 
relative errors with respect to the analytical results. (C) Correlation plot of the regression price of the 
training samples obtained by the simulated weights, as a function of the real dataset price. The small 
standard deviation P = 4,733$ supports the accuracy of the regression. (D) Same as (C), but for the test 
samples. A slightly larger standard deviation P = 4,779$ is obtained. 
 
  
X = y =
test set
A B
C D
training set
24 
 
  
Fig. 4. Training of a 2-layer neural network for MNIST digit recognition. (A) Illustration of the 2-
layer neural network, where the first-layer weights are random, while the second-layer weights are 
computed by the pseudoinverse method in circuit simulations. (B) Color plot of the second-layer weight 
matrix 𝑾(2) obtained by circuit simulations. Each column contains the weights of synapses connected to 
an output neuron, and was computed in one step by the crosspoint circuit. As a result, only 10 operations 
were needed to train the network. The LSEs of simulated and analytical weights are also shown for each 
neuron. (C) Correlation plots of the simulated weights as a function of the analytical weights for each of 
the 10 output neurons. Only the bias weight 𝑤0 shows a deviation from the analytical results, though not 
affecting the recognition accuracy. 
 
Label matrix 
Y: N×10
Input matrix 
T: N×196
Hidden matrix 
X = f(TW(1))
. .
 .
. .
 .Random W
(1)
(196×784)
W(2) = X+Y
14×14
0
1
2
3
4
5
6
7
8
9
N images
Analytical Simulation
Neuron 0 Neuron 1 Neuron 2 Neuron 3 Neuron 4
Neuron 5 Neuron 6 Neuron 7 Neuron 8 Neuron 9
Analytical weights
Si
m
u
la
te
d
 w
e
ig
h
ts w0
A B
C
