18 research outputs found
The Information Complexity of Learning Tasks, their Structure and their Distance
We introduce an asymmetric distance in the space of learning tasks, and a
framework to compute their complexity. These concepts are foundational for the
practice of transfer learning, whereby a parametric model is pre-trained for a
task, and then fine-tuned for another. The framework we develop is
non-asymptotic, captures the finite nature of the training dataset, and allows
distinguishing learning from memorization. It encompasses, as special cases,
classical notions from Kolmogorov complexity, Shannon, and Fisher Information.
However, unlike some of those frameworks, it can be applied to large-scale
models and real-world datasets. Our framework is the first to measure
complexity in a way that accounts for the effect of the optimization scheme,
which is critical in Deep Learning
Estimating Example Difficulty using Variance of Gradients
In machine learning, a question of great interest is understanding what
examples are challenging for a model to classify. Identifying atypical examples
helps inform safe deployment of models, isolates examples that require further
human inspection, and provides interpretability into model behavior. In this
work, we propose Variance of Gradients (VOG) as a proxy metric for detecting
outliers in the data distribution. We provide quantitative and qualitative
support that VOG is a meaningful way to rank data by difficulty and to surface
a tractable subset of the most challenging examples for human-in-the-loop
auditing. Data points with high VOG scores are more difficult for the model to
classify and over-index on examples that require memorization.Comment: Accepted to Workshop on Human Interpretability in Machine Learning
(WHI), ICML, 202
Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning
In Federated Learning, a global model is learned by aggregating model updates
computed at a set of independent client nodes, to reduce communication costs
multiple gradient steps are performed at each node prior to aggregation. A key
challenge in this setting is data heterogeneity across clients resulting in
differing local objectives which can lead clients to overly minimize their own
local objective, diverging from the global solution. We demonstrate that
individual client models experience a catastrophic forgetting with respect to
data from other clients and propose an efficient approach that modifies the
cross-entropy objective on a per-client basis by re-weighting the softmax
logits prior to computing the loss. This approach shields classes outside a
client's label set from abrupt representation change and we empirically
demonstrate it can alleviate client forgetting and provide consistent
improvements to standard federated learning algorithms. Our method is
particularly beneficial under the most challenging federated learning settings
where data heterogeneity is high and client participation in each round is low