9,387 research outputs found
On The Specialization of Neural Modules
A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional
architectures which aim to learn specialized modules dedicated to structures in a
task that can be composed to solve novel problems with similar structures. While
the compositionality of these architectures is guaranteed by design, the modules
specializing is not. Here we theoretically study the ability of network modules
to specialize to useful structures in a dataset and achieve systematic generalization. To this end we introduce a minimal space of datasets motivated by practical
systematic generalization benchmarks. From this space of datasets we present a
mathematical definition of systematicity and study the learning dynamics of linear
neural modules when solving components of the task. Our results shed light on the
difficulty of module specialization, what is required for modules to successfully
specialize, and the necessity of modular architectures to achieve systematicity.
Finally, we confirm that the theoretical results in our tractable setting generalize to
more complex datasets and non-linear architectures
FiLM: Visual Reasoning with a General Conditioning Layer
We introduce a general-purpose conditioning method for neural networks called
FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network
computation via a simple, feature-wise affine transformation based on
conditioning information. We show that FiLM layers are highly effective for
visual reasoning - answering image-related questions which require a
multi-step, high-level process - a task which has proven difficult for standard
deep learning methods that do not explicitly model reasoning. Specifically, we
show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error
for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are
robust to ablations and architectural modifications, and 4) generalize well to
challenging, new data from few examples or even zero-shot.Comment: AAAI 2018. Code available at http://github.com/ethanjperez/film .
Extends arXiv:1707.0301
Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review
The paper characterizes classes of functions for which deep learning can be
exponentially better than shallow learning. Deep convolutional networks are a
special case of these conditions, though weight sharing is not the main reason
for their exponential advantage
- …