10,513 research outputs found
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
We have witnessed rapid evolution of deep neural network architecture design
in the past years. These latest progresses greatly facilitate the developments
in various areas such as computer vision and natural language processing.
However, along with the extraordinary performance, these state-of-the-art
models also bring in expensive computational cost. Directly deploying these
models into applications with real-time requirement is still infeasible.
Recently, Hinton etal. have shown that the dark knowledge within a powerful
teacher model can significantly help the training of a smaller and faster
student network. These knowledge are vastly beneficial to improve the
generalization ability of the student model. Inspired by their work, we
introduce a new type of knowledge -- cross sample similarities for model
compression and acceleration. This knowledge can be naturally derived from deep
metric learning model. To transfer them, we bring the "learning to rank"
technique into deep metric learning formulation. We test our proposed DarkRank
method on various metric learning tasks including pedestrian re-identification,
image retrieval and image clustering. The results are quite encouraging. Our
method can improve over the baseline method by a large margin. Moreover, it is
fully compatible with other existing methods. When combined, the performance
can be further boosted
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
Distributed data-parallel (DDP) training improves overall application
throughput as multiple devices train on a subset of data and aggregate updates
to produce a globally shared model. The periodic synchronization at each
iteration incurs considerable overhead, exacerbated by the increasing size and
complexity of state-of-the-art neural networks. Although many gradient
compression techniques propose to reduce communication cost, the ideal
compression factor that leads to maximum speedup or minimum data exchange
remains an open-ended problem since it varies with the quality of compression,
model size and structure, hardware, network topology and bandwidth. We propose
GraVAC, a framework to dynamically adjust compression factor throughout
training by evaluating model progress and assessing gradient information loss
associated with compression. GraVAC works in an online, black-box manner
without any prior assumptions about a model or its hyperparameters, while
achieving the same or better accuracy than dense SGD (i.e., no compression) in
the same number of iterations/epochs. As opposed to using a static compression
factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM
by 4.32x, 1.95x and 6.67x respectively. Compared to other adaptive schemes, our
framework provides 1.94x to 5.63x overall speedup
- …