9 research outputs found

    Stability and Deviation Optimal Risk Bounds with Convergence Rate O(1/n)O(1/n)

    Full text link
    The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondr\'{a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order Θ(1/n)\Theta(1/\sqrt{n}). When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term Θ(1/n)\Theta(1/\sqrt{n}) can be avoided, and high probability excess risk bounds of order up to O(1/n)O(1/n) are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate O(logn/n)O(\log n/n) for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how O(logn/n)O(\log n/n) high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.Comment: 12 pages; presented at NeurIP

    Non-Euclidean Differentially Private Stochastic Convex Optimization

    Full text link
    Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization focus on the Euclidean (i.e., 2\ell_2) setting, where the loss is assumed to be Lipschitz (and possibly smooth) w.r.t. the 2\ell_2 norm over a constraint set with bounded 2\ell_2 diameter. Algorithms based on noisy stochastic gradient descent (SGD) are known to attain the optimal excess risk in this setting. In this work, we conduct a systematic study of DP-SCO for p\ell_p-setups. For p=1p=1, under a standard smoothness assumption, we give a new algorithm with nearly optimal excess risk. This result also extends to general polyhedral norms and feasible sets. For p(1,2)p\in(1, 2), we give two new algorithms, whose central building block is a novel privacy mechanism, which generalizes the Gaussian mechanism. Moreover, we establish a lower bound on the excess risk for this range of pp, showing a necessary dependence on d\sqrt{d}, where dd is the dimension of the space. Our lower bound implies a sudden transition of the excess risk at p=1p=1, where the dependence on dd changes from logarithmic to polynomial, resolving an open question in prior work [TTZ15] . For p(2,)p\in (2, \infty), noisy SGD attains optimal excess risk in the low-dimensional regime; in particular, this proves the optimality of noisy SGD for p=p=\infty. Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness

    A Full Characterization of Excess Risk via Empirical Risk Landscape

    Full text link
    In this paper, we provide a unified analysis of the excess risk of the model trained by a proper algorithm with both smooth convex and non-convex loss functions. In contrast to the existing bounds in the literature that depends on iteration steps, our bounds to the excess risk do not diverge with the number of iterations. This underscores that, at least for smooth loss functions, the excess risk can be guaranteed after training. To get the bounds to excess risk, we develop a technique based on algorithmic stability and non-asymptotic characterization of the empirical risk landscape. The model obtained by a proper algorithm is proved to generalize with this technique. Specifically, for non-convex loss, the conclusion is obtained via the technique and analyzing the stability of a constructed auxiliary algorithm. Combining this with some properties of the empirical risk landscape, we derive converged upper bounds to the excess risk in both convex and non-convex regime with the help of some classical optimization results.Comment: 38page
    corecore