18 research outputs found
Energy-efficient Amortized Inference with Cascaded Deep Classifiers
Deep neural networks have been remarkable successful in various AI tasks but
often cast high computation and energy cost for energy-constrained applications
such as mobile sensing. We address this problem by proposing a novel framework
that optimizes the prediction accuracy and energy cost simultaneously, thus
enabling effective cost-accuracy trade-off at test time. In our framework, each
data instance is pushed into a cascade of deep neural networks with increasing
sizes, and a selection module is used to sequentially determine when a
sufficiently accurate classifier can be used for this data instance. The
cascade of neural networks and the selection module are jointly trained in an
end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between
the computational cost and the predictive accuracy. Our method is able to
simultaneously improve the accuracy and efficiency by learning to assign easy
instances to fast yet sufficiently accurate classifiers to save computation and
energy cost, while assigning harder instances to deeper and more powerful
classifiers to ensure satisfiable accuracy. With extensive experiments on
several image classification datasets using cascaded ResNet classifiers, we
demonstrate that our method outperforms the standard well-trained ResNets in
accuracy but only requires less than 20% and 50% FLOPs cost on the CIFAR-10/100
datasets and 66% on the ImageNet dataset, respectively
Learning to Skim Text
Recurrent Neural Networks are showing much promise in many sub-areas of
natural language processing, ranging from document classification to machine
translation to automatic question answering. Despite their promise, many
recurrent models have to read the whole text word by word, making it slow to
handle long documents. For example, it is difficult to use a recurrent network
to read a book and answer questions about it. In this paper, we present an
approach of reading text while skipping irrelevant information if needed. The
underlying model is a recurrent network that learns how far to jump after
reading a few words of the input text. We employ a standard policy gradient
method to train the model to make discrete jumping decisions. In our benchmarks
on four different tasks, including number prediction, sentiment analysis, news
article classification and automatic Q\&A, our proposed model, a modified LSTM
with jumping, is up to 6 times faster than the standard sequential LSTM, while
maintaining the same or even better accuracy
Probabilistic Adaptive Computation Time
We present a probabilistic model with discrete latent variables that control
the computation time in deep learning models such as ResNets and LSTMs. A prior
on the latent variables expresses the preference for faster computation. The
amount of computation for an input is determined via amortized maximum a
posteriori (MAP) inference. MAP inference is performed using a novel stochastic
variational optimization method. The recently proposed Adaptive Computation
Time mechanism can be seen as an ad-hoc relaxation of this model. We
demonstrate training using the general-purpose Concrete relaxation of discrete
variables. Evaluation on ResNet shows that our method matches the
speed-accuracy trade-off of Adaptive Computation Time, while allowing for
evaluation with a simple deterministic procedure that has a lower memory
footprint