2 research outputs found
Partial Weight Adaptation for Robust DNN Inference
Mainstream video analytics uses a pre-trained DNN model with an assumption
that inference input and training data follow the same probability
distribution. However, this assumption does not always hold in the wild:
autonomous vehicles may capture video with varying brightness; unstable
wireless bandwidth calls for adaptive bitrate streaming of video; and,
inference servers may serve inputs from heterogeneous IoT devices/cameras. In
such situations, the level of input distortion changes rapidly, thus reshaping
the probability distribution of the input.
We present GearNN, an adaptive inference architecture that accommodates
heterogeneous DNN inputs. GearNN employs an optimization algorithm to identify
a small set of "distortion-sensitive" DNN parameters, given a memory budget.
Based on the distortion level of the input, GearNN then adapts only the
distortion-sensitive parameters, while reusing the rest of constant parameters
across all input qualities. In our evaluation of DNN inference with dynamic
input distortions, GearNN improves the accuracy (mIoU) by an average of 18.12%
over a DNN trained with the undistorted dataset and 4.84% over stability
training from Google, with only 1.8% extra memory overhead.Comment: To appear in CVPR 202
Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning
Recent breakthroughs in deep learning (DL) have led to the emergence of many
intelligent mobile applications and services, but in the meanwhile also pose
unprecedented computing challenges on resource-constrained mobile devices. This
paper builds a collaborative deep inference system between a
resource-constrained mobile device and a powerful edge server, aiming at
joining the power of both on-device processing and computation offloading. The
basic idea of this system is to partition a deep neural network (DNN) into a
front-end part running on the mobile device and a back-end part running on the
edge server, with the key challenge being how to locate the optimal partition
point to minimize the end-to-end inference delay. Unlike existing efforts on
DNN partitioning that rely heavily on a dedicated offline profiling stage to
search for the optimal partition point, our system has a built-in online
learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn
the optimal partition point on-the-fly. Therefore, ANS is able to closely
follow the changes of the system environment by generating new knowledge for
adaptive decision making. The core of ANS is a novel contextual bandit learning
algorithm, called LinUCB, which not only has provable theoretical learning
performance guarantee but also is ultra-lightweight for easy real-world
implementation. We implement our system on a video stream object detection
testbed to validate the design of ANS and evaluate its performance. The
experiments show that ANS significantly outperforms state-of-the-art benchmarks
in terms of tracking system changes and reducing the end-to-end inference
delay