43 research outputs found
Efficient ConvNets for Analog Arrays
Analog arrays are a promising upcoming hardware technology with the potential
to drastically speed up deep learning. Their main advantage is that they
compute matrix-vector products in constant time, irrespective of the size of
the matrix. However, early convolution layers in ConvNets map very unfavorably
onto analog arrays, because kernel matrices are typically small and the
constant time operation needs to be sequentially iterated a large number of
times, reducing the speed up advantage for ConvNets. Here, we propose to
replicate the kernel matrix of a convolution layer on distinct analog arrays,
and randomly divide parts of the compute among them, so that multiple kernel
matrices are trained in parallel. With this modification, analog arrays execute
ConvNets with an acceleration factor that is proportional to the number of
kernel matrices used per layer (here tested 16-128). Despite having more free
parameters, we show analytically and in numerical experiments that this
convolution architecture is self-regularizing and implicitly learns similar
filters across arrays. We also report superior performance on a number of
datasets and increased robustness to adversarial attacks. Our investigation
suggests to revise the notion that mixed analog-digital hardware is not
suitable for ConvNets
Training LSTM Networks with Resistive Cross-Point Devices
In our previous work we have shown that resistive cross point devices, so
called Resistive Processing Unit (RPU) devices, can provide significant power
and speed benefits when training deep fully connected networks as well as
convolutional neural networks. In this work, we further extend the RPU concept
for training recurrent neural networks (RNNs) namely LSTMs. We show that the
mapping of recurrent layers is very similar to the mapping of fully connected
layers and therefore the RPU concept can potentially provide large acceleration
factors for RNNs as well. In addition, we study the effect of various device
imperfections and system parameters on training performance. Symmetry of
updates becomes even more crucial for RNNs; already a few percent asymmetry
results in an increase in the test error compared to the ideal case trained
with floating point numbers. Furthermore, the input signal resolution to device
arrays needs to be at least 7 bits for successful training. However, we show
that a stochastic rounding scheme can reduce the input signal resolution back
to 5 bits. Further, we find that RPU device variations and hardware noise are
enough to mitigate overfitting, so that there is less need for using dropout.
We note that the models trained here are roughly 1500 times larger than the
fully connected network trained on MNIST dataset in terms of the total number
of multiplication and summation operations performed per epoch. Thus, here we
attempt to study the validity of the RPU approach for large scale networks.Comment: 17 pages, 5 figure
Training large-scale ANNs on simulated resistive crossbar arrays
Accelerating training of artificial neural networks (ANN) with analog
resistive crossbar arrays is a promising idea. While the concept has been
verified on very small ANNs and toy data sets (such as MNIST), more
realistically sized ANNs and datasets have not yet been tackled. However, it is
to be expected that device materials and hardware design constraints, such as
noisy computations, finite number of resistive states of the device materials,
saturating weight and activation ranges, and limited precision of
analog-to-digital converters, will cause significant challenges to the
successful training of state-of-the-art ANNs. By using analog hardware aware
ANN training simulations, we here explore a number of simple algorithmic
compensatory measures to cope with analog noise and limited weight and output
ranges and resolutions, that dramatically improve the simulated training
performances on RPU arrays on intermediately to large-scale ANNs
On the possibility of obtaining MOSFET-like performance and sub-60 mV/decade swing in 1D broken-gap tunnel transistors
Tunneling field-effect transistors (TFETs) have gained a great deal of recent
interest due to their potential to reduce power dissipation in integrated
circuits. One major challenge for TFETs so far has been achieving high drive
currents, which is a prerequisite for high-performance operation. In this paper
we explore the performance potential of a 1D TFET with a broken-gap
heterojunction source injector using dissipative quantum transport simulations
based on the nonequilibrium Green's function formalism, and the carbon nanotube
bandstructure as the model 1D material system. We provide detailed insights
into broken-gap TFET (BG-TFET) operation, and show that it can indeed produce
less than 60mV/decade subthreshold swing at room temperature even in the
presence of electron-phonon scattering. The 1D geometry is recognized to be
uniquely favorable due to its superior electrostatic control, reduced carrier
thermalization rate, and beneficial quantum confinement effects that reduce the
off-state leakage below the thermionic limit. Because of higher source
injection compared to staggered-gap and homojunction geometries, BG-TFET
delivers superior performance that is comparable to MOSFET's. BG-TFET even
exceeds the MOSFET performance at lower supply voltages (VDD), showing promise
for low-power/high-performance applications.Comment: 34 pages, 11 figure
High-Performance Air-Stable n-Type Carbon Nanotube Transistors with Erbium Contacts
O ver the past few decades, the continued down-scaling of the physical dimensions of silicon field-effect transistors (FETs) has been the main drive for achieving higher device density while improving the transistor performance in complementary metalÀoxideÀ semiconductor (CMOS) circuits. One of the principle benefits of the conventional scaling trend, namely, reducing the power consumption per computation, has diminished in recent years. In particular, power management is increasingly becoming a major challenge because of the inability to further decrease the operating voltage without compromising the performance of silicon FETs. Incorporation of alternative channel materials with superior carrier transport properties, as presently conceived, is a favorable strategy for the semiconductor industry to complement or replace silicon FETs. Among the promising candidates, carbon nanotubes (CNTs) are predicted to offer the most energy-efficient solution for computation compared with other channel materials, 1 owing to their unique properties such as ultrathin body and ballistic carrier transport in the channel. ABSTRACT So far, realization of reproducible n-type carbon nanotube (CNT) transistors suitable for integrated digital applications has been a difficult task. In this work, hundreds of n-type CNT transistors from three different low work function metals ; erbium, lanthanum, and yttrium ; are studied and benchmarked against p-type devices with palladium contacts. The crucial role of metal type and deposition conditions is elucidated with respect to overall yield and performance of the n-type devices. It is found that high oxidation rates and sensitivity to deposition conditions are the major causes for the lower yield and large variation in performance of n-type CNT devices with low work function metal contacts. Considerable improvement in device yield is attained using erbium contacts evaporated at high deposition rates. Furthermore, the air-stability of our n-type transistors is studied in light of the extreme sensitivity of these metals to oxidation