43 research outputs found

    Efficient ConvNets for Analog Arrays

    Full text link
    Analog arrays are a promising upcoming hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they compute matrix-vector products in constant time, irrespective of the size of the matrix. However, early convolution layers in ConvNets map very unfavorably onto analog arrays, because kernel matrices are typically small and the constant time operation needs to be sequentially iterated a large number of times, reducing the speed up advantage for ConvNets. Here, we propose to replicate the kernel matrix of a convolution layer on distinct analog arrays, and randomly divide parts of the compute among them, so that multiple kernel matrices are trained in parallel. With this modification, analog arrays execute ConvNets with an acceleration factor that is proportional to the number of kernel matrices used per layer (here tested 16-128). Despite having more free parameters, we show analytically and in numerical experiments that this convolution architecture is self-regularizing and implicitly learns similar filters across arrays. We also report superior performance on a number of datasets and increased robustness to adversarial attacks. Our investigation suggests to revise the notion that mixed analog-digital hardware is not suitable for ConvNets

    Training LSTM Networks with Resistive Cross-Point Devices

    Full text link
    In our previous work we have shown that resistive cross point devices, so called Resistive Processing Unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural networks. In this work, we further extend the RPU concept for training recurrent neural networks (RNNs) namely LSTMs. We show that the mapping of recurrent layers is very similar to the mapping of fully connected layers and therefore the RPU concept can potentially provide large acceleration factors for RNNs as well. In addition, we study the effect of various device imperfections and system parameters on training performance. Symmetry of updates becomes even more crucial for RNNs; already a few percent asymmetry results in an increase in the test error compared to the ideal case trained with floating point numbers. Furthermore, the input signal resolution to device arrays needs to be at least 7 bits for successful training. However, we show that a stochastic rounding scheme can reduce the input signal resolution back to 5 bits. Further, we find that RPU device variations and hardware noise are enough to mitigate overfitting, so that there is less need for using dropout. We note that the models trained here are roughly 1500 times larger than the fully connected network trained on MNIST dataset in terms of the total number of multiplication and summation operations performed per epoch. Thus, here we attempt to study the validity of the RPU approach for large scale networks.Comment: 17 pages, 5 figure

    Training large-scale ANNs on simulated resistive crossbar arrays

    Full text link
    Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have not yet been tackled. However, it is to be expected that device materials and hardware design constraints, such as noisy computations, finite number of resistive states of the device materials, saturating weight and activation ranges, and limited precision of analog-to-digital converters, will cause significant challenges to the successful training of state-of-the-art ANNs. By using analog hardware aware ANN training simulations, we here explore a number of simple algorithmic compensatory measures to cope with analog noise and limited weight and output ranges and resolutions, that dramatically improve the simulated training performances on RPU arrays on intermediately to large-scale ANNs

    On the possibility of obtaining MOSFET-like performance and sub-60 mV/decade swing in 1D broken-gap tunnel transistors

    Full text link
    Tunneling field-effect transistors (TFETs) have gained a great deal of recent interest due to their potential to reduce power dissipation in integrated circuits. One major challenge for TFETs so far has been achieving high drive currents, which is a prerequisite for high-performance operation. In this paper we explore the performance potential of a 1D TFET with a broken-gap heterojunction source injector using dissipative quantum transport simulations based on the nonequilibrium Green's function formalism, and the carbon nanotube bandstructure as the model 1D material system. We provide detailed insights into broken-gap TFET (BG-TFET) operation, and show that it can indeed produce less than 60mV/decade subthreshold swing at room temperature even in the presence of electron-phonon scattering. The 1D geometry is recognized to be uniquely favorable due to its superior electrostatic control, reduced carrier thermalization rate, and beneficial quantum confinement effects that reduce the off-state leakage below the thermionic limit. Because of higher source injection compared to staggered-gap and homojunction geometries, BG-TFET delivers superior performance that is comparable to MOSFET's. BG-TFET even exceeds the MOSFET performance at lower supply voltages (VDD), showing promise for low-power/high-performance applications.Comment: 34 pages, 11 figure

    High-Performance Air-Stable n-Type Carbon Nanotube Transistors with Erbium Contacts

    Get PDF
    O ver the past few decades, the continued down-scaling of the physical dimensions of silicon field-effect transistors (FETs) has been the main drive for achieving higher device density while improving the transistor performance in complementary metalÀoxideÀ semiconductor (CMOS) circuits. One of the principle benefits of the conventional scaling trend, namely, reducing the power consumption per computation, has diminished in recent years. In particular, power management is increasingly becoming a major challenge because of the inability to further decrease the operating voltage without compromising the performance of silicon FETs. Incorporation of alternative channel materials with superior carrier transport properties, as presently conceived, is a favorable strategy for the semiconductor industry to complement or replace silicon FETs. Among the promising candidates, carbon nanotubes (CNTs) are predicted to offer the most energy-efficient solution for computation compared with other channel materials, 1 owing to their unique properties such as ultrathin body and ballistic carrier transport in the channel. ABSTRACT So far, realization of reproducible n-type carbon nanotube (CNT) transistors suitable for integrated digital applications has been a difficult task. In this work, hundreds of n-type CNT transistors from three different low work function metals ; erbium, lanthanum, and yttrium ; are studied and benchmarked against p-type devices with palladium contacts. The crucial role of metal type and deposition conditions is elucidated with respect to overall yield and performance of the n-type devices. It is found that high oxidation rates and sensitivity to deposition conditions are the major causes for the lower yield and large variation in performance of n-type CNT devices with low work function metal contacts. Considerable improvement in device yield is attained using erbium contacts evaporated at high deposition rates. Furthermore, the air-stability of our n-type transistors is studied in light of the extreme sensitivity of these metals to oxidation
    corecore