Metal oxide resistive random access memory based synaptic devices for brain-inspired computing The traditional Boolean computing paradigm based on the von Neumann architecture is facing great challenges for future information technology applications such as big data, the Internet of Things (IoT), and wearable devices, due to the limited processing capability issues such as binary data storage and computing, non-parallel data processing, and the buses requirement between memory units and logic units. The brain-inspired neuromorphic computing paradigm is believed to be one of the promising solutions for realizing more complex functions with a lower cost. To perform such brain-inspired computing with a low cost and low power consumption, novel devices for use as electronic synapses are needed. Metal oxide resistive random access memory (ReRAM) devices have emerged as the leading candidate for electronic synapses. This paper comprehensively addresses the recent work on the design and optimization of metal oxide ReRAM-based synaptic devices. A performance enhancement methodology and optimized operation scheme to achieve analog resistive switching and low-energy training behavior are provided. A three-dimensional vertical synapse network architecture is proposed for high-density integration and low-cost fabrication. The impacts of the ReRAM synaptic device features on the performances of neuromorphic systems are also discussed on the basis of a constructed neuromorphic visual system with a pattern recognition function. Possible solutions to achieve the high recognition accuracy and efficiency of neuromorphic systems are presented.
Introduction
In the coming era of big data, the traditional Boolean computing paradigm based on the von Neumann architecture cannot meet the requirements for future cloud computing and wearable device applications, due to its limited processing capability and large additional power consumption required in addition to that for computing. Brain-inspired neuromorphic computing is an attractive paradigm that complements the traditional von Neumann architecture based computing paradigm. [1] [2] [3] The salient features of neuromorphic computing are massive parallelism, unseparated data computing and storage, adaptivity to complex input information, and tolerance to errors. 2, 4) In recent years, scientists have made great efforts to imitate the human brain and thus enhance the computing capability of neuromorphic computers. 5) Although a software-based neural network computing approach has demonstrated superior application potential such as the recognition capability of animals, 6) the huge costs in terms of supercomputer use and energy consumption are unacceptable. Such costs are several orders larger than that of the human brain, which only consumes approximately 20 W power in a 10 −3 m 3 space. 7, 8) Therefore, developing a hardware-based artificial neural network directly imitating the human brain is the best way to realize neuromorphic computing with low cost and low energy consumption. [9] [10] [11] The basic structure of such a hardware-based braininspired artificial neural network consists of a number of neurons and synapses, as illustrated in Fig. 1 . One of the key challenges for a neural network is the realization of the synaptic function using electronic devices. As a crucial functional element in a neural network, synapses connect different neurons and are capable of changing the strength of the connection. The number of synapses in a typical neural network (including our brain) is several orders larger than the number of neurons. 8) In this case, the area, power, and computing efficiency of a brain-inspired chip is mainly dependent on the structure and performance of the electronic synapses. Meanwhile, the synapses have been demonstrated to have complex functions in a biological neural system, such as spike-time-dependent-plasticity (STDP), long-term potentiation (LTP), long-term depression (LTD), and short-term memory (STM). It is difficult to realize all these functions in a traditional device or a simple circuit. Therefore, it is highly desirable to emulate synapses with a novel device that has high density and low energy consumption.
IBM reported a 5.4-billion-transistor brain-inspired chip with 1 M spiking neurons and 256 M synapses. 10) One synapse consisted of several SRAMs, and therefore, the area was relatively large and the storage was volatile. IBM also reported a large-scale neural network using phase-change memory (PCM) devices as synapses. 12) Since for a PCM it is easy to control the multilevel storage behavior through the SET process but difficult through the RESET process, 14) The LTP, LTD, and STDP functions were all realized in this device. The device had a ca. 1 mA RESET current, 0.8 V operation voltage, and 50 ns speed, and therefore, the energy consumption was approximately 40 pJ. Zhu et al. demonstrated an IZO-transistor-based synapse. Short-term plasticity including paired-pulse facilitation (PPF), dynamic filter function, and spatiotemporal signal processing were mimicked. 15) Lee et al. also proposed a silicon-based chargetrap memory using an Al=HfO 2 =Al 2 O 3 =Si 3 N 4 =Si structure mimicking short-term memory functions in a biological synapse. 16) Among the various possible candidates for synaptic devices, metal oxide resistive random access memory (ReRAM) devices are promising due to their nonvolatile data storage capability, simple device structure, and superior performance, such as low power, robust endurance, high speed. [17] [18] [19] [20] [21] However, a specific device design is still required to minimize parameter variability, uncontrolled switching behavior, and other non-ideal effects. In this paper, we will review the recent contributions to the development of metal oxide ReRAM-based synaptic devices, especially focusing on the design and optimization issues of metal oxide synapses demonstrated by our group and collaborators.
Metal oxide ReRAM
ReRAM is first proposed to be applied as future nonvolatile memory. A typical device structure can be modeled as a metal oxide resistive switching layer sandwiched between two metal electrodes, 22, 23) as illustrated in Fig. 1 . A "SET" voltage can switch the device from a high-resistance state (HRS) to a low-resistance state (LRS) due to the electrically induced generation of oxygen vacancies (V O ) in the metal oxide layer, whereas a "RESET" voltage can switch the device from the LRS to the HRS due to the recovery of V O . 24, 25) The concentration and distribution of V O decide the resistance level. Generally, one or more conductive filaments consisting of V O form in the oxide layer after the SET process, and part of the filaments rupture after the RESET process. Signal transmission from pre-neurons to post-neurons was well controlled using such a three-terminal structure. 32) Although various synaptic functions have been demonstrated on metal oxide ReRAM-based synapses, the relation between biological functions and real applications should be paid more attention since our goal in the research of synaptic devices is to realize a brain-inspired chip with high performance. There are still several key challenges for the application of ReRAM synapses in the integrated circuit industry, including the realization of multilevel switching control, improved retention and uniformity, and reduced power consumption. In our study, we found that HfO x -based ReRAM showed excellent repeatable synaptic behavior. Therefore, we designed a high-performance HfO x ReRAM device for future chip integration. We also studied the system-level issues and the relationship between system requirements and practical device performance to promote the application of ReRAM-based synapses in the braininspired chips.
Device design
In contrast to the memory applications, analog resistive switching behavior, which requires gradual changes in resistance during the switching process, is of great importance for ReRAM-based synaptic devices. 17, 33) The number of resistance levels directly decides the processing ability and efficiency of a neuromorphic computing system. To obtain gradual multilevel resistive switching behavior, the concentration or number of V O in the oxide layer should be effectively modulated by at least one measurable electrical parameter during both the SET and RESET processes. 12, 26, 34) However, the generation and recombination of V O are usually related to a random avalanching process, resulting in the formation of clustered oxygen vacancy filaments and abrupt switching behavior. 15, 16, 35) In particular, during the SET process, if one oxygen vacancy is generated in the filament gap, the local electric field near the generated vacancy is enhanced and the local temperature is increased. Therefore, the generation probability of V O in the region near the asgenerated oxygen vacancy increases, 28, 36) which results in the formation of a strong but thin conductive filament, as illustrated in Fig. 2(a) . When the strong filament is connected or ruptured, the resistance decreases or increases abruptly, leading to digital resistive switching behavior.
To solve this problem, it is necessary to disperse the generation of V O through the material design methodology. One way to form multiple filaments is by using a "doping" approach. 35) When suitable atoms are doped into the oxide layer (e.g., doping trivalent dopants into HfO x ), 37, 38) the generation probability of V O near the dopants can be increased significantly, and thus the new V O will be preferentially generated near the dopants rather than near the previous V O . Since the dopants are distributed uniformly throughout the oxide layer, multiple and relatively weak filaments are formed, and multilevel resistive switching is much easier to realize in this case, as illustrated in Fig. 2(b) .
To demonstrate this methodology, HfO x synaptic devices doped with Al or Gd were fabricated. 15, 35, [39] [40] [41] For the Gddoped HfO x sample, a HfO x layer with a thickness of 20 nm was deposited on a Pt=Ti bottom electrode by reactive sputtering, which was followed by furnace annealing at 600°C in O 2 ambient for 20 min. Then Gd ions were implanted into the HfO x layer with an energy of 80 keV, prior to 800°C annealing in N 2 ambient for 5 min to activate the dopants. Finally, a TiN top electrode with a thickness of 100 nm was deposited by sputtering. Figure 2(c) shows the typical switching current-voltage (I-V) curves of the control sample (undoped HfO x ) and Gd-doped HfO x sample. Both the SET and RESET processes show analog multilevel switching behavior after Gd doping, in contrast to the abrupt switching observed in the undoped sample. Meanwhile, it was found that both the switching voltage and current decrease after doping due to the effective control of the generation=recombination of V O . Figure 2(d) shows the synaptic training process of the Gd-doped HfO x device. 42) Identical pulses are applied on the device to imitate exoteric stimulation, and the resistance of the device changed as the pulse number increased. Multilevel synaptic training processes were observed. The resistance can be modulated by the pulse amplitude or duration, indicating excellent controllability after optimization. Figure 2 (e) shows the results of an endurance test for four different states. 35) Although variation exists, the different resistance levels can be distinguished after at least 10 6 cycles. Another approach to tuning analog switching characteristics is to shift the switching mechanism from filamentary switching to interfacial switching. 30) For interfacial switching, an energy barrier for electrons forms at the interface between the oxide layer and one electrode. The barrier height or barrier width changes during the SET and RESET processes due to the migration of oxygen ions, 29, 30, 43) and thus modulates the resistance level. The interfacial switching is area-dependent, which means that the resistance is approximately inversely proportional to the device size. Since the switching is a whole area effect, multilevel switching is easy to achieve by controlling the concentration of V O at the interfacial layer. Most of the oxide ReRAM synapses demonstrated in the previous works involve interfacial switching. Although the multilevel performance is improved in the interfacial-switching-type ReRAM devices, they suffer from severe retention degradation. Generally, changes in resistance can be observed in less than 10 min even at room temperature. Further studies are strongly required to improve the retention of this type of device. Moreover, the speed of interfacial switching must also be improved since most of the devices can only switch in a time on the order of 10-1000 µs. Therefore, there is a long way to go before the practical application of interfacialswitching ReRAM devices, despite their excellent multilevel switching ability.
In addition to multilevel ability, energy consumption and density are also crucial parameters for synaptic devices. Our brain is consisted of 10 15 synapses with only 20 W power consumption. 9) Even though optimization of the system design may reduce the required number of synapses, 10) it will still be challenging to realize a highly intelligent neuromorphic computing system on a chip. To increase the integration density, a 3D ReRAM array with a vertical device structure is utilized, 31, 41) as illustrated in Fig. 3(a) . Stacked planar Pt layers of 22 nm were deposited with SiO 2 layers as isolated layers. Then holes were formed by etching the layers, which were filled with a HfO x layer of 5 nm or AlO y =HfO x layers of 3 nm each by atomic layer deposition (ALD). The uniformity and multilevel characteristics are much improved by embedding the AlO y layer. 44) The oxide layers conformally cover the sidewalls of the holes. Finally, TiN was deposited to fill the holes and form the vertical electrode. To Figure 3(d) shows the synaptic training during the RESET process. 41) At the beginning of the training process, the energy consumption per spike is largest since the resistance of the device changes from low to high during the RESET process. It was found that after introducing the interfacial layer, significant reduction of the energy consumption was realized. When applying a current compliance during a SET process to control the resistance level, energy consumption of lower than 1 fJ per spike was achieved. Simulation results show that the array size and access speed can be improved by introducing the 3D vertical structure, whereas the power consumption of the array increases due to the increase in the number of sneak paths. 45, 46) 4. Operation scheme design Different kinds of learning rules have been demonstrated on metal oxide ReRAM-based synapses, such as STDP and spike-rate-dependent-plasticity (SRDP). 27, [47] [48] [49] However, such behaviors have mainly been simulated and transferred from biological synapses. In fact, it is crucial to design learning rules based on the device characteristics and system requirements.
For a typical synaptic training process, only slight changes in resistance are necessary with each spike. The intermediate states can be modulated by the varying pulse width, pulse amplitude, or pulse number. 35) The latter is preferred since it significantly simplifies the design of the control circuit and learning arithmetic. In this case, a series of identical pulses are applied on the device sequentially, and the device conductance increases or decreases as the pulse number increases. In this study, forming free Pt=HfO x =TiO y =HfO x = TiO y =TiN (from bottom to top) ReRAM devices with excellent uniformity were fabricated to explore the synaptic behavior with the resistance as a function of pulse number. 50, 51) During the RESET process, which corresponds to LTD, a gradual increase in resistance with increasing pulse number is observed, as shown in Fig. 4(a) , indicating that the switching is analog type. However, the SET process, which corresponds to LTP, shows an abrupt transition, as shown in Fig. 4(b) . The critical number of pulses to switch the device is stochastic for different cycles. This random behavior can be used as a binary synapse. The increase in the switching probability with the pulse amplitude was measured, as shown in Figs. 4(c) and 4(d). 53) In our study, in addition to the demonstration of neuromorphic computing functions, we also investigated the relationship between system requirements and practical ReRAM device performance. To achieve this aim, we established a neuromorphic visual architecture for systemlevel study. Figure 5 (a) illustrates our neuromorphic visual system with a winner-take-all architecture implemented by integrate-and-fire neuron circuits as shown in Fig. 5(b) and ReRAM synapses.
Neuromorphic system design
9) 32 × 32 neurons in the first layer (representing the retina) are connected with several neurons in the second layer (representing the primary visual cortex) through 32 × 32 × n (n is the number of neurons in the second layer) synapses. The neurons in the first layer fire according to the light intensity of the input pattern, and send a pulse to the neurons in the second layer (representing the visual cortex) through synapses. The cortex neurons sum and integrate the input currents, and the neuron with the largest input current fires first, which inhibits all the other neurons from firing. Then the winner neuron sends a pulse back to all the retina neurons to modulate the weighting of the synapses. Using the system, one task was to classify the face orientation of different people in camera images, 41) as shown in Fig. 5(c) . Figure 5(d) shows the resistance map of different faces on the ReRAM arrays. Due to the resistance fluctuation during the training process, noise can be observed in the resistance maps, which leads to the degradation of pattern recognition accuracy. The fluctuation is attributed to the random nature of oxygen migration. Figure 5 (e) shows the measured and simulated resistance distributions during a RESET training process. 54) In the simulation, the relative resistance variation (δR=R) was set to 9% based on the experimental results. Figure 5(f) shows the recognition accuracy as a function of the relative resistance variation. It was found that the recognition accuracy markedly decreases as the variation increases.
To suppress the degradation caused by resistance variation, we propose a methodology involving the paralleling of a group of devices to mimic a synapse. 41) This group of devices is regarded as a single synaptic device whose conductance is the mean value of all the devices in the group. Figure 5(f) shows the simulated recognition accuracy of the system when using this method. A significant improvement is achieved even though only two or three devices are used in parallel. The accuracy increases with increasing number of devices. Using the 3D ReRAM array, the proposed methodology is easy to realize. As shown in Fig. 2(a) , ReRAM devices on different layers of the same pillar can be viewed as a synaptic cell. These devices receive the same training pulses at the same time. The recognition accuracy increases significantly from 56% (for the single-device condition) to 90% (for the two parallel devices condition) based on the measured training characteristics.
Since edge detection and classification are basic processes in complex pattern recognition, 1) we also performed another task to classify sticks oriented in different direction. 50) Sixteen neurons were used in the second layer to sense different orientations. During the training phase, 1000 grayscale images of a 2D Gaussian bar with a random center position and random orientation were fed into the first layer neurons. After the training, a specific neuron in the second layer responded to the specific orientation of the input image. Figures 6(a)-6(f) show normalized resistance maps before and after training for binary and analog synapses. Before training, the resistance is distributed randomly. After training, the map becomes orientation-selective. Both the binary and analog synapses can recognize the orientation correctly under the optimized training scheme. Figures 6(g)-6 ( j) show the simulated orientation selectivity, orientation storage capacity, and energy consumption as a function of pulse amplitude. For binary synapses, both the selectivity and energy increase with the pulse amplitude. Therefore, there is a trade-off between high efficiency and low power. An optimized training voltage of 1.6 V is suggested on the basis of the simulation results. For analog synapses, the selectivity decreases as the pulse amplitude increases. However, there is a turning point in the curve of the pulse amplitude dependence of the energy consumption. When the voltage is small, the energy consumption decreases as the voltage increases due to the reduction in the number of intermediate states. After the voltage increases to larger than the turning point, the energy consumption increases with the voltage. These results indicate that a suitable scheme should be carefully designed on the basis of the device characteristics and the system requirements. 
Conclusions
This review article summarizes the recent advancements in the design and fabrication of metal oxide ReRAM-based synaptic devices for brain-inspired neuromorphic computing applications. A material-oriented design methodology is proposed to improve the performance of ReRAM synapses. Optimized doping and a suitable interfacial layer are introduced to improve the analog training behavior and reduce the power consumption. A 3D vertical ReRAM The normalization is with respect to a reference with the highest conductance in the synapse array. (a, d) Initially, the resistances of all the devices are randomly distributed. After training, four distinct orientations emerge. Under +1.6 V=10 ns SET pulses for the binary synapses (b) and −1.1 V=10 ns RESET pulses for the analog synapses (e), the orientations are well detected. However, if the training scheme is not optimized, only three distinct orientations can be recognized, for example, for binary synapses using +2 V=10 ns SET pulses (c) and for analog synapses using −1.4 V=10 ns RESET pulses (f). (g)-( j) Simulated system performance metrics as a function of programming conditions. (g) Orientation selectivity and orientation storage capacity for binary synapses and (i) for analog synapses. (h) Energy consumption of the synapse array during the whole training (200 training images) process for binary synapses and ( j) for analog synapses.
The average values for 100 independent simulations are shown. synapse array is developed for high-density, low-cost, and high-calculation-efficiency applications. In addition to material optimization, a customized training scheme is also recommended for design based on the practical characteristics of metal oxide ReRAM devices. Binary synapses are also acceptable for neuromorphic visual systems. A simulator is developed to investigate the link between practical devices and neuromorphic systems. Using the simulator, performance metrics, such as the calculation accuracy and energy consumption, can be evaluated, which are useful for improving the design of the training scheme and system architecture. Possible solutions are presented for suppressing the impact of intrinsic variations of metal oxide ReRAM synapses. This paper demonstrates the feasibility of metal oxide ReRAM synapses for brain-inspired computing to promote the application of synaptic devices and neuromorphic systems.
