345 research outputs found
vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
The most widely used machine learning frameworks require users to carefully
tune their memory usage so that the deep neural network (DNN) fits into the
DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to
study different machine learning algorithms, forcing them to either use a less
desirable network architecture or parallelize the processing across multiple
GPUs. We propose a runtime memory manager that virtualizes the memory usage of
DNNs such that both GPU and CPU memory can simultaneously be utilized for
training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory
usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a
significant reduction in memory requirements of DNNs. Similar experiments on
VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the
memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256
(requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card
containing 12 GB of memory, with 18% performance loss compared to a
hypothetical, oracular GPU with enough memory to hold the entire DNN.Comment: Published as a conference paper at the 49th IEEE/ACM International
Symposium on Microarchitecture (MICRO-49), 201
The difference between memory and prediction in linear recurrent networks
Recurrent networks are trained to memorize their input better, often in the
hopes that such training will increase the ability of the network to predict.
We show that networks designed to memorize input can be arbitrarily bad at
prediction. We also find, for several types of inputs, that one-node networks
optimized for prediction are nearly at upper bounds on predictive capacity
given by Wiener filters, and are roughly equivalent in performance to randomly
generated five-node networks. Our results suggest that maximizing memory
capacity leads to very different networks than maximizing predictive capacity,
and that optimizing recurrent weights can decrease reservoir size by half an
order of magnitude
- …