2 research outputs found
Gradients as Features for Deep Representation Learning
We address the challenging problem of deep representation learning--the
efficient adaption of a pre-trained deep network to different tasks.
Specifically, we propose to explore gradient-based features. These features are
gradients of the model parameters with respect to a task-specific loss given an
input sample. Our key innovation is the design of a linear model that
incorporates both gradient and activation of the pre-trained network. We show
that our model provides a local linear approximation to an underlying deep
model, and discuss important theoretical insights. Moreover, we present an
efficient algorithm for the training and inference of our model without
computing the actual gradient. Our method is evaluated across a number of
representation-learning tasks on several datasets and using different network
architectures. Strong results are obtained in all settings, and are
well-aligned with our theoretical insights.Comment: ICLR 2020 conference pape
Fast Adaptation with Linearized Neural Networks
The inductive biases of trained neural networks are difficult to understand
and, consequently, to adapt to new settings. We study the inductive biases of
linearizations of neural networks, which we show to be surprisingly good
summaries of the full network functions. Inspired by this finding, we propose a
technique for embedding these inductive biases into Gaussian processes through
a kernel designed from the Jacobian of the network. In this setting, domain
adaptation takes the form of interpretable posterior inference, with
accompanying uncertainty estimation. This inference is analytic and free of
local optima issues found in standard techniques such as fine-tuning neural
network weights to a new task. We develop significant computational speed-ups
based on matrix multiplies, including a novel implementation for scalable
Fisher vector products. Our experiments on both image classification and
regression demonstrate the promise and convenience of this framework for
transfer learning, compared to neural network fine-tuning. Code is available at
https://github.com/amzn/xfer/tree/master/finite_ntk.Comment: AISTATS 202