We study the performance of empirical risk minimization on the p-norm
linear regression problem for pβ(1,β). We show that, in the
realizable case, under no moment assumptions, and up to a
distribution-dependent constant, O(d) samples are enough to exactly recover
the target. Otherwise, for pβ[2,β), and under weak moment
assumptions on the target and the covariates, we prove a high probability
excess risk bound on the empirical risk minimizer whose leading term matches,
up to a constant that depends only on p, the asymptotically exact rate. We
extend this result to the case pβ(1,2) under mild assumptions that
guarantee the existence of the Hessian of the risk at its minimizer