Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Abstract

Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only rr directions within the ambient d≫rd\gg r-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth rr directions. In particular, any downstream task on the rr ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension dd, while a random feature model requires exponential complexity in dd for such a guarantee

    Similar works

    Full text

    thumbnail-image

    Available Versions