157,068 research outputs found
Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks
We provide several new results on the sample complexity of vector-valued
linear predictors (parameterized by a matrix), and more generally neural
networks. Focusing on size-independent bounds, where only the Frobenius norm
distance of the parameters from some fixed reference matrix is
controlled, we show that the sample complexity behavior can be surprisingly
different than what we may expect considering the well-studied setting of
scalar-valued linear predictors. This also leads to new sample complexity
bounds for feed-forward neural networks, tackling some open questions in the
literature, and establishing a new convex linear prediction problem that is
provably learnable without uniform convergence.Comment: 30 page
On Size-Independent Sample Complexity of ReLU Networks
We study the sample complexity of learning ReLU neural networks from the
point of view of generalization. Given norm constraints on the weight matrices,
a common approach is to estimate the Rademacher complexity of the associated
function class. Previously Golowich-Rakhlin-Shamir (2020) obtained a bound
independent of the network size (scaling with a product of Frobenius norms)
except for a factor of the square-root depth. We give a refinement which often
has no explicit depth-dependence at all.Comment: 4 page
- …