Abstract-The resistive cross-point array architecture has been proposed for on-chip implementation of weighted sum and weight update operations in neuro-inspired learning algorithms. However, several limiting factors potentially hamper the learning accuracy, including the nonlinearity and device variations in weight update, and the read noise, limited ON/OFF weight ratio and array parasitics in weighted sum. With unsupervised sparse coding as a case study algorithm, this paper employs device-algorithm co-design methodologies to quantify and mitigate the impact of these non-ideal properties on the accuracy. Our analysis shows that the realistic properties in weight update are tolerable, while those in weighted sum are detrimental to the accuracy. With calibration of realistic synaptic behaviors from experimental data, our study shows that the recognition accuracy of MNIST handwriting digits degrades from $96 to $30 percent. The strategies to mitigate this accuracy loss include 1) redundant cells to alleviate the impact of device variations; 2) a dummy column to eliminate the off-state current; and 3) selector and larger wire width to reduce IR drop along interconnects. The selector also reduces the leakage power in weight update. With improved properties by these strategies, the accuracy increases back to $95 percent, enabling reliable integration of realistic synaptic devices in neuromorphic systems.
INTRODUCTION
T HE neuro-inspired computing has attracted a lot of interests as the conventional von Neumann computation is limited by the bottleneck between the processor and the memory. The primary goal of neuro-inspired computing is to develop biological artificial systems that enable better interaction with the natural environment for problems such as image recognition, while achieving significant computation speed-up and low power consumption. Numerous large-scale neuro-inspired hardware platforms have been developed in the recent years, e.g., FACETs [1] , Caviar [2] , SpiNNaker [3] , TrueNorth [4] , etc. However, to construct these neuro-inspired systems with a massive number of synapses, an excessive hardware cost is unavoidable due to a large volume of CMOS based memories for implementing these synapses. The current progress in nanotechnology is paving the way toward low cost and ultra-high density memory arrays. Due to its maturity, the floating-gate memory technology has been successfully implemented on a single chip as synapses for the neuromorphic computation [5] , [6] . To achieve even higher integration density, faster speed and lower programming voltage, compact synaptic devices based on emerging non-volatile memory are proposed for the neuromorphic systems [7] . Resistive synaptic devices in this paper refer to a special subset of the resistive memory devices that can continuously tune the conductance into multi-level states [8] . To implement the learning algorithm on chip, the conductance of a synaptic device represents a weight element. Prior works suggest that the cross-point array architecture represents the weight matrix and can perform the weighted sum and weight update in a parallel fashion [9] , [10] , [11] . Prior work also evaluated the accuracy of weighted sum with IR drop problem along interconnects [12] . At device level, some non-ideal characteristics of resistive and phase change synaptic devices have been experimentally calibrated to evaluate the learning performance [13] , [14] . In this paper, we focus on the impact of non-ideal device properties and proposed potential solutions to mitigate the accuracy loss. This paper is an extension and improvement of our conference work [15] , which lacks the circuit-level perspective to bridge between the learning algorithm and the synaptic device properties. The new materials added to this journal version includes the discussions on the programming schemes for weight update and the proposal of selector to improve the performance of weighted sum and weight update. We also redefine the device variations into three categories and analyze their impact on the learning accuracy accordingly. On top of that, we add a new section on the impact of array-level interconnect resistance on the learning accuracy.
An ideal synaptic device behavior assumes a linear update of the weight with the input stimulus [13] , [14] , e.g., the number of voltage pulses. However, in reality, this assumption may not hold. Fig. 1a shows the microscopic top-view image and device structure of the fabricated TaO x /TiO 2 based synaptic device [16] . Fig. 1b shows that by applying positive (negative) pulses on the device, long-term potentiation (LTP) (long-term depression (LTD)) can increase (decrease) the conductance. Figs. 1c and 1d show other representative LTP/ LTD behaviors in the PCMO based [13] and Ag:a-Si based [17] synaptic devices. It is observed that a nonlinear behavior of LTP and LTD commonly exists in today's synaptic devices, possibly due to the inherent drift and diffusion dynamics of the ions/vacancies in these materials. Besides the nonlinearity in weight update, another realistic characteristic deviated from an ideal device is the limited ON/OFF weight ratio. The off-state conductance is not perfectly zero in all three examples in Fig. 1 , while the ideal synaptic device assumes that the minimum weight is zero. In addition, the device variation is always a concern for the emerging devices at the nanoscale. The synaptic device variation has two aspects: one is the spatial variation from device to device, and the other is the temporal variation from pulse to pulse. All of these realistic device characteristics may degrade the learning accuracy of neuromorphic systems. In this work, array design methodologies are proposed for the co-optimization of realistic synaptic device properties and array architecture to mitigate these undesirable effects.
SPARSE CODING (SC) ALGORITHM FOR
ON-CHIP LEARNING
Sparse Coding Algorithm
Sparse coding algorithm [18] is selected in this work to be implemented on-chip with synaptic devices due to its simplicity. Despite of a simple network with two-layer neurons and one weight synaptic matrix, it can still achieve reasonably high learning accuracy with invariance for pattern's spatial shift and rotation. The sparse coding is found to be a bio-physiological plausible model: neurons in mammalian primary visual cortex can form a sparse representation of natural scenes [19] , [20] , which is believed to emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent features such as lines, edges and corners. For real-world applications, the sparse coding algorithm has demonstrated its power in numerous domains such as audio processing, text mining and image recognition. In this work, we aim at evaluating and optimizing the synaptic device properties and cross-point architecture for fast and compact on-chip sparse feature learning as a case study. Fig. 2a shows the simplified process flow of the sparse coding algorithm (SC module), which is obtained from [21] with optimization on the algorithm parameters. In the training phase, with a given input vector set {X} (braces mean a collection of objects), the corresponding feature vector set {Z} and the dictionary matrix (D) are trained iteratively by minimizing the objective error function (E):
As each X is a sparse linear combination of Z via D, the first term of Eq. (1) generally measures how well the dictionary reconstructs the input data. The second term of Eq. (1) imposes constraint of the sparsity of the feature vector. Since both D and Z are unknown, the above optimization problem is a non-convex problem. It is proposed to alternatively optimize Z with fixed D by the coordinate descent (CD) method and optimize D with fixed Z by the stochastic gradient descent (SGD) method, which converts the problem into a convex optimization problem. Compared to traditional full gradient descent, SGD is more computationefficient with large-scale dataset [21] . Using SGD, the D weight update process can be expressed as:
It can be seen that D is modulated by the product of hRZ T , where R is the reconstruction error, and h is the learning rate, which is essentially the delta-rule. For the algorithm ideally implemented in software, the exact value of hRZ T can be calculated and applied to the update of D. However, the D update implemented on-chip needs to be translated to the number of pulses applied on the synaptic devices, and the effect of the programming pulses on the conductance of the devices may not represent the exact value of hRZ T due to the realistic properties of synaptic devices as mentioned above. In this work, we model the weight update curve and [16] and (b) the measured experimental data of conductance modulation for the weight update. Other reported measured experimental data in literature: (c) PCMO [13] and (d) Ag:a-Si [17] . Exponential function is used to fit the LTP and LTD curves. incorporate this model in the D update code in the SC algorithm. Fig. 2b describes the entire process flow that includes dictionary learning (training phase) and classification (testing phase). In this work, the MNIST handwriting digits [22] are used as the training and testing data set, where the raw images are densely sampled into small patches with 10 Â 10 pixels as X input vector with a dimension of 100, as shown in Fig. 3 . In the later analyses, a set of 40k images is used for training and a different set of 5k images is used for testing, as we have found that using the entire 60k training images does not have noticeable increase on the accuracy (only 1 percent) and the simulation will be much slower. Fig. 4 shows the learning accuracy as a function of Z vector dimension. The learning accuracy does not increase much beyond a dimension of 200. In this work, we fix the Z dimension to be 300, thus the size of the D matrix is 100 Â 300 (X Â Z). After the training process, the trained dictionary D train is used as a fixed D in the testing phase to generate the testing features {Z test }. Before the classification process, a simple maximum pooling operation is employed on both the trained and testing features for each image to select the most active neuron of each feature node:
where
are the ith elements of the feature vectors of total k small image patches per image. The maximum pooling merges all the feature vectors of small image patches into one feature vector per image by selecting the maximum value of each ith element. Finally, to classify the 10 digits, the support vector machine (SVM) [23] is used. With the input of testing labels, SVM performs classification and gives out the recognition accuracy.
Limited On-Chip Precision of SC
To implement the SC algorithm on-chip, it is necessary to limit the precision of D and Z in the algorithm as the chip cannot afford the floating-point computation. In the crosspoint architecture, the values in the Z vector are stored on local memories in the peripheral circuitry, and the values in the D matrix are represented by the synaptic weights in the array. Fig. 5 shows the learning accuracy with different precisions by truncation of the bits in the SC algorithm. It suggests that a 4-bit Z is sufficient for high learning accuracy and limited precision of D has more impact on the accuracy. For example, D should be at least 6 bits to achieve an accuracy >95 percent. This requirement of a high precision in the weight update for the learning (in the back-propagtion) is also reported in other recent works [24] , [25] . As the training of these algorithms (both this work and other works based on back-propagation) are error-driven, thus high precision is needed to preserve the error information. Since the number of bits D is related to how many levels of conductance that the synaptic device can achieve, a 6-bit D (64 levels) is chosen for later analysis based on the number of multi-level states available in today's synaptic devices (see Fig. 1 ). Fig. 6a shows the schematic of the proposed architecture of the resistive cross-point array. There is one selector in series with one synaptic device at each cross-point. The selector introduces nonlinear I-V characteristics for the synaptic device and is helpful for both weight update and weighted sum operations, which will be discussed later in this section. To compute the weighted sum (DZ) in the read operation, a read voltage (V R ) is applied in parallel to each row for every non-zero element of Z. Then V R is multiplied by the conductance of the synaptic device at each cross-point, and the weighted sum results in the output current at the end of the columns. The read peripheral circuitry for each column then converts the analog current output to the digital numbers. It should be noted that the sneak path problem of the unselected cells in the array for conventional memory application does not exist in the weighted sum operation. This is because the conventional memory requires reading out data by bit or by row, while the weighted sum operation here reads the entire array in parallel, thus all the cells in the array are participating in the computation according to the Kirchhoff's Law. It is preferred that the value of Z elements is encoded by the number of V R pulses (4 bits ¼ 16 pulses), which causes less distortion on the DZ product compared to the analog encoding scheme with varying voltage amplitude [12] . In the analog encoding scheme, it is also difficult to split V R (typically <1 V) into 16 voltage levels due to noise consideration and practical bias circuit design constraints.
CROSS-POINT ARRAY FOR ON-CHIP LEARNING
For the weight update operation, a fully parallel write scheme was developed in prior work to update the entire array for speed-up using the product of Z and R (Eq. (2)) [26] , however it requires complex peripheral circuit design thus the hardware cost will be tremendous. As shown in Fig. 6b , a simpler write scheme to perform the LTP (LTD) weight update in the write operation is proposed to select one row at a time with the write voltage (V W ) (0 V) applied at the edge, while other rows are biased at an intermediate voltage V X to prevent the write disturbance. Then, the V X -0-V X negative (V X -V W -V X positive) write pulses are then applied at all columns to perform the weight update. Similar to the fully parallel one, a write cycle of each row also has two phases for the LTP and LTD weight update. This write scheme is essentially a row-by-row operation with less speed-up but much lower hardware cost compared to the fully parallel one.
Unlike the weighted sum operation, the proposed weight update scheme suffers from the sneak path problem, as the cross-point array is partially selected and the sneak paths exists in the half-selected cells on other unselected rows or columns. The half-selected cells can see a voltage drop of V X (LTP) or V W -V X (LTD) during the weight update. Therefore, the selector is proposed to connect in series with the synaptic device to suppress the leakage current at these voltages. Fig. 7 shows the I-V characteristics of a TaO x /TiO 2 based synaptic device in ON state, the selector and the series of these two devices. In this study, we use the mixed-ionic-electronic-conduction (MIEC)-based selector with high nonlinearity ($85 mV/dec) [27] and set the original V W ¼ 2 V and V R ¼ 1 V for a single synaptic device. Without the selector, V X is designed to be 1 V, which is the V/2 write scheme in conventional memory application [28] . With the selector, the overall cell resistance is increased, which reduces the IR drop along interconnects in weighted sum while only affecting little on the mapping from device conductance to weight values because the conductance of selector is relatively higher than the conductance range of synaptic device at 1 V. Also, the selector can reduce the leakage on the half-selected cells in weight update, and it does not affect the weight update because at sufficient large voltage it is already turned on. In this case, V W should be increased to 3 V and the V X for LTP and LTD weight update are then 1 V and 2 V, respectively. It can ensure the voltage drop on the selected cells to be 2 V, which is the same as the original write condition for a single synaptic device. Also, the voltage drop on the half-selected cells will then be 1 V, where the leakage reduction is $10X as shown in Fig. 7 . Since most of the cells during the weight update are half-selected cells, the energy consumption is greatly reduced compared to the traditional V/2 write scheme where the voltage drop of halfselected cells are V W /2 ¼ 1.5 V.
REALISTIC PROPERTIES IN SYNAPTIC ARRAY
As previously shown in Fig. 1 , realistic synaptic behaviors include 1) the nonlinearity and 2) device variations in weight update, and 3) the read noise 4) limited ON/OFF weight ratio in weighted sum. The circuit model of synaptic device is also considered in the array-level analysis. In this section, these realistic properties are modeled individually into the sparse coding algorithm and their impact on the learning accuracy is investigated. As a baseline, the limited precision of the synaptic devices (64 levels) is considered.
Nonlinear Weight Update
To analyze the impact of nonlinear weight update on the learning, a general behavior that models the conductance change of LTP (G LTP ) and LTD (G LTD ) with the number of pulses (P) is described with the following equations: where G max , G min and P max can be directly extracted from the experimental data, which represents the maximum conductance, minimum conductance and the maximum pulse number required to switch the device between the minimum and maximum conductance states. A is the parameter that controls the nonlinear behavior of the weight update, and B is simply a function of A that fits the functions within the range of G max , G min and P max . A and B may be different in (4) and (5) . A set of nonlinear LTP and LTD behavior can be obtained by adjusting A as illustrated in Fig. 8a , where each nonlinear curve is labeled with a nonlinearity value from þ6 to À6. Here the plus and minus are merely the signs to label LTP and LTD, respectively. Then, we apply these nonlinear functions into the weight update in the SC algorithm. Fig. 8b shows that learning accuracy slightly decreases in the high nonlinearity region of LTP and LTD, and a relatively larger drop from $96 to $92 percent occurs at maximum nonlinearities (þ6/À6 curves). For today's synaptic devices (in Fig. 1 ), the nonlinearities of LTP and LTD are also labeled in the Fig. 8b . It is shown that the nonlinearity in the weight update has a moderate impact on the learning performance. Smart programming schemes have been designed in prior work to improve the nonlinearity [15] , however it requires additional circuitry and thereby results in the overhead of area, latency and energy consumption. Given the accuracy loss of $4 percent at the maximum nonlinearities, we think it is not crucial to apply the smart programming schemes thus saves those overheads.
Device Variations
It is well known that the synaptic devices involving drift and diffusion of the ions/vacancies show considerable variation from device to device, and even from pulse to pulse within one device. Owing to the device-to-device weight update variation, different devices in the array will follow different nonlinearity baselines. Owing to the cycle-to-cycle weight update variation, there will be pulse to pulse noise on top of the nonlinearity baseline. Owing to the read noise, the read-out current of a weight state will have some temporal fluctuation.
Device-to-Device Weight Update Variation
The effect of device-to-device variation can be analyzed by introducing the variation into the nonlinearity baseline for each synaptic device, as illustrated in Fig. 9a . For example, if a synaptic device has a þ100 percent device-to-device variation, there will be a þ1 deviation of the nonlinearity. As shown in Fig. 9b , the learning accuracy is insignificantly affected by the device-to-device variation even with 30 percent standard deviation from the baseline.
Cycle-to-Cycle Weight Update Variation
The cycle-to-cycle variation of the conductance occurs at every write pulse operation on the synaptic device, as illustrated in Fig. 10a . In this work, the cycle-to-cycle weight update variation is defined as the variation of weight change with one applied write pulse. As shown in Fig. 10b , the learning accuracy does not degrade with larger cycle-to-cycle weight update variation. Instead, for nonlinearity baseline (LTP, LTD) ¼ (6, À6), the learning accuracy somehow slightly improves with larger variation as the randomness in the pulse amplitude may partially compensate the high nonlinearity.
Read Noise in Weighted Sum
Similar to the cycle-to-cycle weight update variation, the read noise occurs at every read access to the synaptic device, but the average conductance state is not disturbed. As illustrated in Fig. 11a , the read-out current fluctuates at different conductance states with different number of read pulses. Fig. 11b shows significant degradation of learning accuracy due to the read noise. The impact is even more critical with nonlinearity baseline (6, À6). We have measured a variation of $2.89 percent in the read noise in our TaO x /TiO 2 based synaptic device, which could cause the accuracy drop below 90 percent considering this read noise effect only.
To alleviate the impact of device variations, we propose using multiple cells as one D weight element. This approach statistically averages out all the conductance variations of synaptic devices. If n cells are used as one weight element, the standard deviation of variations will be reduced by a factor of 1= ffiffiffi n p assuming that variations are normally distributed. Fig. 12 shows an example of the reduction on the variation using 9 cells compared to that using only 1 cell. This strategy is believed to have considerable improvement on the accuracy loss due to device read-out noise, and it does not have a large overhead in the array area as the area is determined by the pitch of the peripheral circuits in the logic design rule. For example, the array cell height should be aligned with the standard cell height of the array row driver. We estimate that the layout area of 9 resistive synaptic cells is increased by $20 percent compared to that of 1 cell at 65 nm technology node and 200 nm wire width. It should be noted that although this layout area of 9 resistive synaptic cells may be comparable to that of a floating-gate cell at the same technology node, part of the peripheral circuitry can be placed underneath the synaptic array to save the total area as the synaptic devices are integrated on top of the CMOS circuits at the interconnect level. However, using multiple cells inevitably increases the energy consumption by n times.
Limited ON/OFF Weight Ratio
Ideally the D values in the SC algorithm are represented by a normalized conductance of synaptic devices, and the range of the D value is from 0 to 1. However, the minimum conductance can be regarded as D ¼ 0 only when the ratio between the maximum and minimum conductance (ON/OFF ratio) approaches infinity, which is not feasible in today's synaptic devices. Fig. 13 shows the learning accuracy with different ON/OFF ratios. The learning accuracy dramatically decreases when the ON/OFF ratio shrinks below 25, because the calculations involved with small values of D in the algorithm will be significantly distorted. The Ag:a-Si device exhibits a largest ON/OFF ratio of $15 among the devices in Fig. 1 , while other devices show even smaller ON/OFF ratio. This means that without any optimization, none of these synaptic devices can lead to high recognition accuracy when used in on-chip implementation of sparse learning.
One approach to remedy this situation is to eliminate the effect of the off-state current in every weight element with the aid of a dummy column. The cross-point array architecture with a dummy column is illustrated in Fig. 14 . The synaptic devices in the dummy column remain in their minimum conductance states, such that the readout value at the output of dummy column represents the weighted sum of the Z vector and the off-state conductance. In the peripheral circuitry, we subtract the off-state weighted sum from all the partial weighted sums, D i Z, performed along the columns. Except for spatial variation between the synaptic devices in the same row, this virtually eliminates the effect of off-state current in the sparse learning task. An additional column will give 1 percent overhead on the array area as there are totally 100 columns (X ¼ 100), and the area of subtractors is estimated to be $7.84 percent of the array area with 9 cells at 65 nm technology node and 200 nm wire width. However, as the array is able to partially hide the subtractors, its area overhead can be further reduced.
Synaptic Device Model in Cross-Point Array
To simulate the weighted sum operation in SPICE, we model the synaptic device as a resistor in parallel with a capacitor. The synaptic device is in series with a selector as mentioned earlier. The wire resistances and parasitic capacitances are also considered. The interconnect parameters are obtained from the ITRS table [29] . Fig. 15a shows a sub-circuit module of a cross-point, and such module is to be duplicated for the entire array in SPICE. We extract statistical D, Z and R data at different learning stages from the SC algorithm run by software, and use these values to simulate the weighted sum DZ and DR (in the CD method in Fig. 2a) by SPICE using the read scheme described in Section 3. The deviation of weighted sum by SPICE is then calculated and incorporated back into the SC algorithm to evaluate its impact on the learning accuracy. Fig. 15b shows the learning accuracy with different wire widths. Wires with smaller width have larger wire resistance, thus the weighted sum becomes inaccurate and the learning accuracy is greatly reduced. To alleviate this, we propose reverse scaling on the wire's geometrical dimension, preferably with a wire width larger than 100 nm. Such reverse scaling plus the redundant cells for reduction of device variations dramatically increase the array area, but this may be acceptable considering the size of peripheral logic gates is complicated and it is thus comparable to the cell pitch of a synaptic cell in the array design.
ACCURACY IMPROVEMENT BY PROPOSED STRATEGIES
If we combine all the non-ideal device effects and array parasitics mentioned above, the learning accuracy of the system drops terribly low to $30 percent. Now we implemented the proposed mitigation strategies into the SC algorithm. Specifically, it is assumed that the following improvements on the realistic properties are achieved: 1) the ON/OFF weight ratio is increased by 4X from 12.5 (within the range of the Ag:a-Si device) to 50, using a dummy column but assuming that the off-state current is not completely removed due to device-to-device variation; 2) 9 cells as a weight element is used to reduce the variation of read noise from $2.89 to $0.96 percent. It is also assumed that the nonlinearity remains the same ((4.7, À4.7) for the TaO x /TiO 2 based synaptic device) and the array wire width is relaxed to be 200 nm. As shown in Fig. 16 , the recognition accuracy of synaptic devices can closely approach that of the ideal algorithm, achieving an accuracy improvement of >65 percent. However, the proposed strategies will bring some overhead on the chip area, latency and energy. Compared to the design without strategies, the area overhead mainly comes from the redundant cells with relaxed wire width ($20 percent for 9 cells and 200 nm wire). The area overhead of the subtractors can be smaller (<7.84 percent) if they are partially hidden underneath the array. The total latency of weighted sum operation will be similar if the weighted sum current readout is based on the principle of integrate-and-fire neuron model [26] , where both the weighted sum current and parasitic column capacitance are increased by 9X and these two effects cancel out each other. The total latency of the weight update will also be similar as the 9 cells are physically wired together and being programmed simultaneously. However, the energy consumption of both the weighted sum and weight update will be increased by $9X because 9 cells are used.
CONCLUSION
Synapses are the core elements of a neuromorphic system to establish communication between groups of neurons. Synaptic devices available today exhibit non-ideal device properties, e.g., the nonlinearity in weight update, device variations, read noise and limited ON/OFF weight ratio. The wire parasitics in nanoscale cross-point architecture also cannot be ignored. Sparse coding algorithm is used to provide a platform to evaluate the performance of unsupervised learning using realistic synaptic devices and arrays for image applications. It is found that the non-ideal synaptic device properties and the wire parasitics can lead to significant degradation on image recognition accuracy from $96 percent to $30 percent. The mitigation strategies to remedy this issue are proposed, including 1) the use of multiple cells for each weight element to alleviate the impact of Smaller wire width will degrade the learning accuracy due to interconnect effect. Fig. 16 . Comparison of the recognition accuracy of the MNIST handwriting digits trained by the sparse coding algorithm using the software approach running and implemented on the hardware architecture with realistic synaptic devices and arrays. With the proposed design methodologies, the recognition accuracy can approach the ideal value of the algorithm.
device variations and read noise; 2) a dummy column to eliminate the off-state current; 3) the use of selector and larger wire width to reduce the IR drop along interconnects thereby increase the accuracy of weighted sum. By applying these strategies with tolerable trade-offs on chip area, latency and energy, the synaptic behavior is greatly improved and the recognition accuracy returns to $95 percent, viably enabling the synaptic devices for practical hardware implementation of the sparse learning algorithm on chip.
The device-algorithm co-design methodologies presented in this work can also be applied to other neuro-inspired learning algorithms in general.
Pai-Yu Chen (S'15) received the MSE degree in electrical engineering from The University of Texas at Austin, Austin, TX, in 2013. He is currently working towards the PhD degree at Arizona State University, Tempe, AZ. His research interests include emerging nonvolatile memory device and architecture design, new computing paradigm exploration, and hardware design for security system. He is a student member of the IEEE.
Ligang Gao received the PhD degree in materials science from Nanjing University, China, in 2009. He is currently a research scientist in electrical engineering with Arizona State University, Tempe, AZ, where he is involved in emerging memory and its applications in neurocomputing and hardware security.
Shimeng Yu (S'10-M'14) received the MS and PhD degrees in electrical engineering from Stanford University, Stanford, CA, in 2011 and 2013, respectively. He is currently an assistant professor of electrical engineering in Arizona State University, Tempe, AZ. His research interests are emerging nano-devices and circuits with focus on the resistive switching memories, and new computing paradigms with focus on the neuro-inspired computing. He is a member of the IEEE.
