Search CORE

9,387 research outputs found

On The Specialization of Neural Modules

Author: Jarvis Devon
Klein Richard
Rosman Benjamin
Saxe Andrew
Publication venue: ICLR
Publication date: 05/05/2023
Field of study

A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional architectures which aim to learn specialized modules dedicated to structures in a task that can be composed to solve novel problems with similar structures. While the compositionality of these architectures is guaranteed by design, the modules specializing is not. Here we theoretically study the ability of network modules to specialize to useful structures in a dataset and achieve systematic generalization. To this end we introduce a minimal space of datasets motivated by practical systematic generalization benchmarks. From this space of datasets we present a mathematical definition of systematicity and study the learning dynamics of linear neural modules when solving components of the task. Our results shed light on the difficulty of module specialization, what is required for modules to successfully specialize, and the necessity of modular architectures to achieve systematicity. Finally, we confirm that the theoretical results in our tractable setting generalize to more complex datasets and non-linear architectures

UCL Discovery

FiLM: Visual Reasoning with a General Conditioning Layer

Author: Courville Aaron
de Vries Harm
Dumoulin Vincent
Perez Ethan
Strub Florian
Publication venue
Publication date: 18/12/2017
Field of study

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.Comment: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.0301

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

Author: Liao Qianli
Mhaskar Hrushikesh
Miranda Brando
Poggio Tomaso
Rosasco Lorenzo
Publication venue
Publication date: 01/01/2017
Field of study

The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage

arXiv.org e-Print Archive

DSpace@MIT

Caltech Authors

Archivio istituzionale della ricerca - Università di Genova