ABSTRACT An inference system using gated Schottky diode (GSD) is proposed for highly reliable hardware-based neural networks (HNNs). We explain the characteristics of the GSD and present circuits that take into account the characteristics of the device. The reverse current of the GSD, which is the synaptic current, is saturated with respect to input voltage, which results in immunity of input and output noise and overcoming the IR drop problem in metal wire. In order to take advantages of this saturated I-V characteristics, pulse-width modulation (PWM) of input data instead of amplitude modulation is proposed. In addition, by applying identical pulses to the bottom gate, the synaptic current of the GSD increases linearly, which makes it easy to transfer the calculated weights to the conductance of GSDs. By considering these characteristics, electronic circuits for PWM, current sum, and activation function are designed. Through SPICE simulation, we evaluate the inference accuracy of a 2-layer neural network. The classification accuracy rate of 100 images of MNIST test sets is 94%, and it is comparable to the reference accuracy obtained with software.
I. INTRODUCTION
Neuromorphic computing using electronic synaptic devices, which is hardware-based neural networks (HNNs), is under widespread research due to its capability of low power and massively parallel operations [1] . Various electronic devices such as resistive change memory (RRAM), phase change memory (PCRAM), and FET-based devices as synaptic devices have been studied to implement the HNN [2] , [3] . There are two main approaches of this HNN: on-chip and off-chip training. The need for on-chip or off-chip training is still debated, but on-chip training has advantages in terms of power and variation of devices [4] . The non-ideal characteristics such as the finite number of conductance levels, the finite dynamic range, asymmetry and nonlinearity of the conductance response, and inter-device variations should be considered in on-chip training [5] , [6] . Ways to handle such non-ideal properties in the weight update phase have been reported [6] - [8] , but challenging tasks still remain.
In addition, there is a need for power efficient neuron circuits that can accurately represent an activation function and its derivative [9] . So far, small networks capable of on-chip training using synaptic device array have been implemented [10] - [12] . By contrast, off-chip training provides flexibility in the size of the neural networks, so various techniques for neural networks can be applied. In off-chip training where pre-trained weights by software algorithms are transferred to the hardware synapse array, the electronic synaptic devices are used for the weight mapping and the vector-by-matrix multiplication (VMM) in the inference phase. There have been also many reports on this off-chip training approach [13] - [16] . Because there is no weight update phase in the HNN based on the off-chip training, it is relatively free of the non-ideal features. However, since non-ideal characteristics of synaptic devices cannot be considered in the training phase using software, it is important to map the weights precisely to the conductance of synaptic devices. In other words, it may be vulnerable to noise and inter-device variations, so the synaptic device array should be reliable. In this work, we focus on the offchip training using gated Schottky diodes (GSD). First, the characteristics of our GSD are introduced and the peripheral circuits compatible with GSDs are described. We then perform SPICE simulation to evaluate how well the designed inference system classifies MNIST handwritten digits set.
II. INFERENCE CIRCUITS USING GATED SCHOTTKY DIODES

A. GATED SCHOTTKY DIODE (GSD)
When performing the forward propagation of neural networks using electronic synaptic devices, the characteristics of synaptic device should be considered. In the earlier work, we reported the gated Schottky diode (GSD) as an electronic synaptic device [17] . The reverse current of the GSD, which operates as a synaptic current, is modulated by the stored charges in the charge storage layer by applying program or erase pulses. In this paper, we devise a bias scheme to eliminate the forward current in this GSD. When performing a VMM using a synaptic device array, the current of the GSDs should flow in the reverse direction only. Otherwise, the synaptic current summed by the Kirchhoff's current law (KCL) becomes unintended current, which results in malfunction of the neural networks [18] .
When the GSDs in the synapse array and the circuit for the current sum, subtraction, and activation functions are connected to form a system, some of the GSDs in the array can be biased forward. At this time, the forward-biased GSDs should not flow current in the forward direction to enable proper VMM. Fig. 1 shows a device structure and a modified bias scheme to prevent forward current of the Schottky diode. S and O represent the electrodes formed of aluminum for the Schottky junction and ohmic-like junction, respectively. Input voltage is applied to the O node and the current flows through the un-doped silicon channel between the S and O nodes. Bottom gates (BG S and BG O ) under S and O are used for modulating the Schottky barrier height between the electrode and the un-doped Si channel. Depending on the modulated Schottky barrier height, the junction between the electrode and the Si channel can be a Schottky junction (S) or an ohmic-like junction (O). The bottom gate between BG S and BG O (expressed as BG C in Fig. 1 ) is used for preventing the forward current of the Schottky diode. By applying bias or pulses to these bottom gates, the electrons or holes are induced in the Si channel. In the read operation of the n-type GSD, the Si channel above the BG O becomes an electrically p + -doped region by applying −4 V to the BG O and the Si channel above the BG C is electrically n + -doped region by applying 4 V to the BG C . Then, the PN junction is formed internally between the Si channel above the BG C and the Si channel above the BG O . Therefore, when a negative bias (−2 V) is applied to the O, there is no current flow due to a reverse biased PN junction, which means that the forward current of the GSD is effectively blocked. On the other hand, when a positive bias (2 V) is applied to the O, the internally formed PN junction is forward biased, so that the reverse current of the Schottky diode flows. This reverse current of the Schottky diode can be modulated by the charge stored in the Si 3 N 4 layer by applying program or erase pulses to the BG S node. In this way, the operation of the p-type GSD can also be explained. Fig. 2 shows the simulated band diagrams cut along the red line A-B-C-D when n-/p-type GSDs are in operating and cutoff modes. The PN junction is formed near point C. Fig. 3 shows the diode current with respect to the number of applied e − programming pulses, in case of the p-type GSD. As shown in Fig. 3 (a) , when −4 V is applied to the BG O , the forward current of the Schottky diode flows at a positive V O . However, the forward current of the Schottky diode is blocked when 4 V is applied to the BG O ( Fig. 3 (b) ). The reverse current of the Schottky diode is modulated by applying program pulses (8.5 V for 10 µs) to the BG S . In addition, the reverse current of the Schottky diode starts to saturate when the V O increases negatively to −1.5 V. The saturated reverse current of the Schottky diode increases linearly with increasing number of applied program pulses as shown in the inset. The physical mechanisms and the operation of the GSD were described in the previous work [17] . The saturation behavior and linear conductance response are taken into account when designing forward propagation. These characteristics have several advantages and are discussed later. We apply the I-V curves of the p-type GSDs with 64-level conductance to the SPICE simulation and verify that the designed peripheral circuits and system are working properly.
B. PULSE-WIDTH MODULATION (PWM) FOR INPUT BIAS
It is commonly assumed that the I-V characteristic of an electronic synaptic device is linear in a hardware-based neural network composed of electronic synaptic devices [19] .
If an electronic synaptic device shows linear I-V characteristics, the product of weight and input data can be simply expressed as the product of conductance and input voltage of the synaptic device, which is the synaptic current. The difference in conductance between a pair of synaptic devices is required to represent the negative weight. Then the product is given by
where W and x are weight and input of the neural network, and G ± , V, and I ± are conductance, input voltage, and synaptic current of synaptic device, respectively. In other words, since the conductance of a synaptic device is independent of the input voltage, the synaptic current is the product of the input voltage and the conductance of the synaptic device. In this case, the various input data can be linearly mapped to the amplitude of the input voltage of synaptic devices. However, most electronic devices exhibit nonlinear I-V characteristics, i.e., the conductance of synaptic device depends on input voltage, as expressed in (2)
In (2), the synaptic weight, represented by the conductance of synaptic device, can change with respect to the input voltage in the propagation phase. If so, simply expressing the synaptic current as the product of the input voltage and the conductance can bring about error depending on the I-V nonlinearity. Therefore, the input data should be converted to the voltage that will be properly applied to the electronic synaptic devices. To do this, there have been attempts to apply nonlinear mapping, nonlinear synaptic transmission function, or even improve the I-V linearity of the device [19] - [21] . Those are based on the amplitude modulation of the input voltage. In another approach, the time modulation of input voltage was reported [22] . While the methods that use nonlinear mapping or nonlinear synaptic transmission are only applied to specific synaptic device, the time-modulating method can be applied to any synaptic device and can be easily implemented with electronic circuits. As shown in Fig. 3 (b) , our GSD has saturation region in I-V characteristic. In the saturation region where near linear conductance response with respect to the number of applied pulses is observed, the synaptic current as well as the conductance are independent of the input voltage. In this case, the input signal can be converted into a pulse, modulated with only the pulse width while maintaining the same amplitude. That is, the synapse output is the product of the pulse width and the unit synaptic current, as expressed in
where Q ± and t are the charges and the pulse width while a voltage pulse is applied, respectively. The unit synaptic current, which represents a specific conductance value (equivalently, synaptic weight), can only be changed if a program or erase pulse is applied to the BG S in the weight update phase. However, it is fixed in the propagation phase, so time modulation of input voltage pulse is required. For this purpose, we design the circuits for pulse-width modulation (PWM) for the inference system using GSDs. Note that the rate modulation of input information instead of the PWM can also be applied because both are the time modulation of the input information. Fig. 4 (a) shows a simple circuit for PWM. For the sake of the circuit simplicity, the operating voltage is a positive voltage and the circuit symbols of reverse-biased n-type GSD are used. The amplitude of input voltage applied to the X (V X ), which is an analog value, is compared with the amplitude of sawtooth wave (V sawtooth ), and the difference between the input voltage and the sawtooth wave is amplified. Then, by using level shifter, we can obtain a voltage pulse (V in,s ) with a modulated pulse width and a desired amplitude. Fig. 4 (b) shows voltage pulses with modulated pulse width obtained from the PWM circuits when the input voltages are 0.6 V and 0.8 V, as an example. Since the max amplitude and the pulse width of the sawtooth wave are 1 V and 10 µs, respectively, the modulated pulse widths for 0.6 V and 0.8 V are 6 µs and 8 µs, respectively. Then, the synaptic current flows while the voltage pulse (V in,s ) is applied, as shown in Fig. 4 (c) . A voltage pulse from the PWM circuit is applied to the input node of the synaptic device, which is repeated in all layers of the neural network.
C. VECTOR-BY-MATRIX MULTIPLICATION IN SYNAPSE ARRAY
The VMM in software-based deep neural networks is accompanied by a large amount of power consumption. When this VMM is performed by a large electronic synaptic device array, it is important to accurately represent the current sum of the synaptic devices connected to a neuron. Since the current sum at the output of connected synaptic devices changes over time, the output node voltage of the synaptic devices varies with time. As a result, the synaptic current changes if the current is not saturated while the output node voltage is changing. Therefore, the change of the output node voltage results in degradation of the inference accuracy [23] .
To handle this problem, the use of operational amplifier with a high gain can be a simple solution [24] . However, an operational amplifier with high gain generally requires large area and high power consumption. If the current at each synapse is constant (saturated) with the voltage change at the output of the synapses, operational amplifier is unnecessary. Because our GSD has a constant current even if the voltage across the diode varies over the operating voltage range, we can use a simple 2-stage current mirror circuit for current summation and subtraction [25] . Note that such a simple current mirror circuit cannot be used when the change in output node voltage changes the synaptic current. Fig. 5 (a) shows the current mirror circuits for the current summing, subtraction, and activation function. 785 synapses are connected to a single neuron and each synapse consists of two GSDs to represent both positive and negative weights. Note the nMOSFETs in current mirror circuit have enough current drivability (W/L = 5) to accommodate the current from all synaptic devices connected to one neuron. As mentioned above, the output node voltage (V out,s ) of synaptic device can be changed with time, but the synaptic current does not change because the synaptic current is saturated. Fig. 5 (b) shows that the current of a synaptic device is kept constant with respect to the voltage difference between the input node voltage (V in,s_i ) and the output node voltage (V out,s + ), as an example. Input node voltage (V in,s_i ) of a synaptic device is 2 V with a modulated pulse width, but the output node voltage (V out,s + ) changes from about 0.3 V to 0.6 V. In other words, as the synaptic current flows, the voltage across the synaptic device (V in,s_i −V out,s + ) changes with time, but the synaptic current (I + ) does not change. It is an important issue when performing the VMM with analog operations using electronic synaptic device arrays. Furthermore, when the synapse device array is configured as a crossbar array, the IR drop along metal wires can cause the inaccurate VMM computation [26] . However, since our GSD operates in the saturation region, the noise of the input and output node voltages and IR drop along metal wires do not affect the current in the synaptic device, so the VMM can be accurately performed. We use an integration capacitor (1.5 pF) for converting the current to the voltage, so the activation function is the hard-sigmoid function. Unlike sigmoid or hyperbolic tangent functions, this piece-wise linear function can be implemented by using a single capacitor without additional circuitry. Fig. 5 (c) shows the result of a hard-sigmoid function on the summation of charges (Q tot + −Q tot − ) by synaptic currents (I tot + and I tot − ). The output value of hardsigmoid function is again applied to the X node of the PWM circuit of the next layer in order to generate a voltage pulse having a modulated pulse width.
III. SIMULATION RESULTS
A. NEURAL NETWORKS
We design a neural network consisting of 784 input, 50 hidden, and 10 output neurons. Including bias neurons, there are a total of 40545 synapses. For training in software, 60000 training images are applied, and the mini-batch size, the training epochs, and the learning rate are 100, 10, and 0.5, respectively. The classification accuracy rate for 10000 test images is 96.33%. After the weights are calculated using the algorithm in software, these weights are normalized and linearly quantized with discrete weights of 64 levels. After quantization of the weights, the classification accuracy rate for test sets is 96.30%. To map the pre-trained weights to synaptic devices, the 64-level quantized weights are transferred to the synaptic device array. Because our GSD shows linear conductance response in the potentiation, the linearly quantized weights can be applied to the synaptic device array. As shown in Fig. 6 , after normalization and quantization, we can obtain an index number representing how many pulses should be applied to the synaptic device for the target weight. Although the read-write-verify scheme can also be used for precise tuning in off-chip training [27] , it is more complicated method compared to the method using an index number [28] . Since the conductance response is linear, the only thing to consider is the index number that represents the number of pulses to be applied. Based on these index numbers, all synaptic devices have their own conductance values.
B. MNIST CLASSIFICATION BY SPICE SIMULATION
The designed neural network is evaluated by SPICE simulation. The designed circuit that performs the activation function and PWM can be considered as a neuron circuit. We obtained the MNIST classification accuracy for 100 randomly selected images from the test images. Fig. 7 shows, as an example, the output voltage obtained after the hard-sigmoid function of the last layer when the images from '0' to '9' are applied over time to the input neurons one by one. When the input image is '0', the output voltage of the neuron corresponding to '0' of the neurons in the last layer is the largest. Likewise, it is shown that the same results are obtained for other images, and the classification accuracy rate for 100 test images is 94%, which is similar to the software-based results. The designed neural network is evaluated only by 100 test images because of the time required for SPICE simulation, but we can confirm that the designed neural network works well.
IV. CONCLUSION
In this work, we have evaluated the inference accuracy of a neural network consisting of gated Schottky diodes (GSDs) and supporting circuits. The GSD exhibits a saturated current characteristic with respect to the input voltage change, and thus has a high immunity to the noise voltage and IR drop along metal wires. A pulse width modulation (PWM) scheme compatible with the saturation characteristics was introduced, and circuits supporting this were designed. Because GSD has a highly linear conductance response, it is advantageous to copy the optimized weights in software to synaptic devices. Using the MNIST test set in SPICE simulation, we obtained a 94% classification accuracy rate, which is very similar to the accuracy obtained by software-based learning.
