20 research outputs found
Team Performance with Test Scores
Team performance is a ubiquitous area of inquiry in the social sciences, and
it motivates the problem of team selection -- choosing the members of a team
for maximum performance. Influential work of Hong and Page has argued that
testing individuals in isolation and then assembling the highest-scoring ones
into a team is not an effective method for team selection. For a broad class of
performance measures, based on the expected maximum of random variables
representing individual candidates, we show that tests directly measuring
individual performance are indeed ineffective, but that a more subtle family of
tests used in isolation can provide a constant-factor approximation for team
performance. These new tests measure the "potential" of individuals, in a
precise sense, rather than performance, to our knowledge they represent the
first time that individual tests have been shown to produce near-optimal teams
for a non-trivial team performance measure. We also show families of
subdmodular and supermodular team performance functions for which no test
applied to individuals can produce near-optimal teams, and discuss implications
for submodular maximization via hill-climbing
Transfusion: Understanding Transfer Learning for Medical Imaging
Transfer learning from natural image datasets, particularly ImageNet, using
standard large models and corresponding pretrained weights has become a
de-facto method for deep learning applications to medical imaging. However,
there are fundamental differences in data sizes, features and task
specifications between natural image classification and the target medical
tasks, and there is little understanding of the effects of transfer. In this
paper, we explore properties of transfer learning for medical imaging. A
performance evaluation on two large scale medical imaging tasks shows that
surprisingly, transfer offers little benefit to performance, and simple,
lightweight models can perform comparably to ImageNet architectures.
Investigating the learned representations and features, we find that some of
the differences from transfer learning are due to the over-parametrization of
standard models rather than sophisticated feature reuse. We isolate where
useful feature reuse occurs, and outline the implications for more efficient
model exploration. We also explore feature independent benefits of transfer
arising from weight scalings.Comment: NeurIPS 201
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
We propose a new technique, Singular Vector Canonical Correlation Analysis
(SVCCA), a tool for quickly comparing two representations in a way that is both
invariant to affine transform (allowing comparison between different layers and
networks) and fast to compute (allowing more comparisons to be calculated than
with previous methods). We deploy this tool to measure the intrinsic
dimensionality of layers, showing in some cases needless over-parameterization;
to probe learning dynamics throughout training, finding that networks converge
to final representations from the bottom up; to show where class-specific
information in networks is formed; and to suggest new training regimes that
simultaneously save computation and overfit less. Code:
https://github.com/google/svcca/Comment: Accepted to NIPS 2017, code: https://github.com/google/svcca/ , new
plots on Imagene
Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
An important research direction in machine learning has centered around
developing meta-learning algorithms to tackle few-shot learning. An especially
successful algorithm has been Model Agnostic Meta-Learning (MAML), a method
that consists of two optimization loops, with the outer loop finding a
meta-initialization, from which the inner loop can efficiently learn new tasks.
Despite MAML's popularity, a fundamental open question remains -- is the
effectiveness of MAML due to the meta-initialization being primed for rapid
learning (large, efficient changes in the representations) or due to feature
reuse, with the meta initialization already containing high quality features?
We investigate this question, via ablation studies and analysis of the latent
representations, finding that feature reuse is the dominant factor. This leads
to the ANIL (Almost No Inner Loop) algorithm, a simplification of MAML where we
remove the inner loop for all but the (task-specific) head of a MAML-trained
network. ANIL matches MAML's performance on benchmark few-shot image
classification and RL and offers computational improvements over MAML. We
further study the precise contributions of the head and body of the network,
showing that performance on the test tasks is entirely determined by the
quality of the learned features, and we can remove even the head of the network
(the NIL algorithm). We conclude with a discussion of the rapid learning vs
feature reuse question for meta-learning algorithms more broadly.Comment: ICLR 202
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
A key factor in the success of deep neural networks is the ability to scale
models to improve performance by varying the architecture depth and width. This
simple property of neural network design has resulted in highly effective
architectures for a variety of tasks. Nevertheless, there is limited
understanding of effects of depth and width on the learned representations. In
this paper, we study this fundamental question. We begin by investigating how
varying depth and width affects model hidden representations, finding a
characteristic block structure in the hidden representations of larger capacity
(wider or deeper) models. We demonstrate that this block structure arises when
model capacity is large relative to the size of the training set, and is
indicative of the underlying layers preserving and propagating the dominant
principal component of their representations. This discovery has important
ramifications for features learned by different models, namely, representations
outside the block structure are often similar across architectures with varying
widths and depths, but the block structure is unique to each model. We analyze
the output predictions of different model architectures, finding that even when
the overall accuracy is similar, wide and deep models exhibit distinctive error
patterns and variations across classes.Comment: ICLR 202
Exponential expressivity in deep neural networks through transient chaos
We combine Riemannian geometry with the mean field theory of high dimensional
chaos to study the nature of signal propagation in generic, deep neural
networks with random weights. Our results reveal an order-to-chaos expressivity
phase transition, with networks in the chaotic phase computing nonlinear
functions whose global curvature grows exponentially with depth but not width.
We prove this generic class of deep random functions cannot be efficiently
computed by any shallow network, going beyond prior work restricted to the
analysis of single functions. Moreover, we formalize and quantitatively
demonstrate the long conjectured idea that deep networks can disentangle highly
curved manifolds in input space into flat manifolds in hidden space. Our
theoretical analysis of the expressive power of deep networks broadly applies
to arbitrary nonlinearities, and provides a quantitative underpinning for
previously abstract notions about the geometry of deep functions.Comment: Fixed equation reference
Teaching with Commentaries
Effective training of deep neural networks can be challenging, and there
remain many open questions on how to best learn these models. Recently
developed methods to improve neural network training examine teaching:
providing learned information during the training process to improve downstream
model performance. In this paper, we take steps towards extending the scope of
teaching. We propose a flexible teaching framework using commentaries, learned
meta-information helpful for training on a particular task. We present
gradient-based methods to learn commentaries, leveraging recent work on
implicit differentiation for scalability. We explore diverse applications of
commentaries, from weighting training examples, to parameterising
label-dependent data augmentation policies, to representing attention masks
that highlight salient image regions. We find that commentaries can improve
training speed and/or performance, and provide insights about the dataset and
training process. We also observe that commentaries generalise: they can be
reused when training new models to obtain performance benefits, suggesting a
use-case where commentaries are stored with a dataset and leveraged in future
for improved model training.Comment: ICLR 202
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity,
which seeks to characterize how structural properties of a neural network
family affect the functions it is able to compute. Our approach is based on an
interrelated set of measures of expressivity, unified by the novel notion of
trajectory length, which measures how the output of a network changes as the
input sweeps along a one-dimensional path. Our findings can be summarized as
follows:
(1) The complexity of the computed function grows exponentially with depth.
(2) All weights are not equal: trained networks are more sensitive to their
lower (initial) layer weights.
(3) Regularizing on trajectory length (trajectory regularization) is a
simpler alternative to batch normalization, with the same performance.Comment: Accepted to ICML 201
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
A central challenge in developing versatile machine learning systems is
catastrophic forgetting: a model trained on tasks in sequence will suffer
significant performance drops on earlier tasks. Despite the ubiquity of
catastrophic forgetting, there is limited understanding of the underlying
process and its causes. In this paper, we address this important knowledge gap,
investigating how forgetting affects representations in neural network models.
Through representational analysis techniques, we find that deeper layers are
disproportionately the source of forgetting. Supporting this, a study of
methods to mitigate forgetting illustrates that they act to stabilize deeper
layers. These insights enable the development of an analytic argument and
empirical picture relating the degree of forgetting to representational
similarity between tasks. Consistent with this picture, we observe maximal
forgetting occurs for task sequences with intermediate similarity. We perform
empirical studies on the standard split CIFAR-10 setup and also introduce a
novel CIFAR-100 based task approximating realistic input distribution shift
A Survey of Deep Learning for Scientific Discovery
Over the past few years, we have seen fundamental breakthroughs in core
problems in machine learning, largely driven by advances in deep neural
networks. At the same time, the amount of data collected in a wide array of
scientific domains is dramatically increasing in both size and complexity.
Taken together, this suggests many exciting opportunities for deep learning
applications in scientific settings. But a significant challenge to this is
simply knowing where to start. The sheer breadth and diversity of different
deep learning techniques makes it difficult to determine what scientific
problems might be most amenable to these methods, or which specific combination
of methods might offer the most promising first approach. In this survey, we
focus on addressing this central issue, providing an overview of many widely
used deep learning models, spanning visual, sequential and graph structured
data, associated tasks and different training methods, along with techniques to
use deep learning with less data and better interpret these complex models ---
two central considerations for many scientific use cases. We also include
overviews of the full design process, implementation tips, and links to a
plethora of tutorials, research summaries and open-sourced deep learning
pipelines and pretrained models, developed by the community. We hope that this
survey will help accelerate the use of deep learning across different
scientific domains