Faculty of Science, School of Mathematics and Statistics
Abstract
Deep neural networks (DNNs) have become central to modern machine learning due to their strong
empirical performance. However, their theoretical understanding—especially regarding
generalization—remains limited. This thesis advances the theory of deep ReLU networks through two
lenses: pairwise learning tasks and gradient descent methods.
For pairwise learning, we study generalization in non-parametric estimation without relying on
restrictive convexity or VC-class assumptions. We establish sharp oracle inequalities for empirical
minimizers under general hypothesis spaces and Lipschitz pairwise losses. Applied to pairwise least
squares regression, our bounds match known minimax rates up to logarithmic terms. A key
innovation is constructing a structured deep ReLU network approximating the true predictor, forming a target hypothesis space with controlled complexity. This framework successfully handles problems
beyond the reach of existing theories.
For metric and similarity learning, we exploit the structure of the true metric. By deriving its form
under hinge loss, we approximate it using structured deep ReLU networks and analyze the excess
generalization error by bounding the approximation and the estimation errors. An optimal excess risk
rate is achieved, marking the first known such analysis for metric/similarity learning. We also explore
extensions to general losses.
For gradient descent methods, we study GD and SGD for overparameterized deep ReLU networks in
the NTK regime. Prior work mainly covers shallow networks; we fill this gap by establishing the first
minimax-optimal generalization rates for GD/SGD with deep architectures. Under polynomial width
scaling, our results show these methods can match the generalization performance of kernel
approaches
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.