8 research outputs found

    Five Proofs of Chernoff's Bound with Applications

    Full text link
    We discuss five ways of proving Chernoff's bound and show how they lead to different extensions of the basic bound.Comment: 16 pages, no figures. This revision slightly updates the presentation of the IK-metho

    Generalization Bounds for Uniformly Stable Algorithms

    Full text link
    Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss function with range bounded in [0,1][0,1], the generalization error of a γ\gamma-uniformly stable learning algorithm on nn samples is known to be within O((γ+1/n)nlog(1/δ))O((\gamma +1/n) \sqrt{n \log(1/\delta)}) of the empirical error with probability at least 1δ1-\delta. Unfortunately, this bound does not lead to meaningful generalization bounds in many common settings where γ1/n\gamma \geq 1/\sqrt{n}. At the same time the bound is known to be tight only when γ=O(1/n)\gamma = O(1/n). We substantially improve generalization bounds for uniformly stable algorithms without making any additional assumptions. First, we show that the bound in this setting is O((γ+1/n)log(1/δ))O(\sqrt{(\gamma + 1/n) \log(1/\delta)}) with probability at least 1δ1-\delta. In addition, we prove a tight bound of O(γ2+1/n)O(\gamma^2 + 1/n) on the second moment of the estimation error. The best previous bound on the second moment is O(γ+1/n)O(\gamma + 1/n). Our proofs are based on new analysis techniques and our results imply substantially stronger generalization guarantees for several well-studied algorithms.Comment: Appeared in Neural Information Processing Systems (NeurIPS), 201

    Concentration Bounds for High Sensitivity Functions Through Differential Privacy

    Full text link
    A new line of work [Dwork et al. STOC 2015], [Hardt and Ullman FOCS 2014], [Steinke and Ullman COLT 2015], [Bassily et al. STOC 2016] demonstrates how differential privacy [Dwork et al. TCC 2006] can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a low-sensitivity function f, then w.h.p. f(S) is close to its expectation, although f is being chosen based on the data. Very recently, Steinke and Ullman observed that these generalization guarantees can be used for proving concentration bounds in the non-adaptive setting, where the low-sensitivity function is fixed beforehand. In particular, they obtain alternative proofs for classical concentration bounds for low-sensitivity functions, such as the Chernoff bound and McDiarmid's Inequality. In this work, we set out to examine the situation for functions with high-sensitivity, for which differential privacy does not imply generalization guarantees under adaptive analysis. We show that differential privacy can be used to prove concentration bounds for such functions in the non-adaptive setting

    A New Analysis of Differential Privacy's Generalization Guarantees

    Full text link
    We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality

    Hypothesis Set Stability and Generalization

    Full text link
    We present a study of generalization for data-dependent hypothesis sets. We give a general learning guarantee for data-dependent hypothesis sets based on a notion of transductive Rademacher complexity. Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce. This bound admits as special cases both standard Rademacher complexity bounds and algorithm-dependent uniform stability bounds. We also illustrate the use of these learning bounds in the analysis of several scenarios.Comment: Published in NeurIPS 2019. This version is equivalent to the camera-ready version but also includes the supplementary materia

    Generalization for Adaptively-chosen Estimators via Stable Median

    Full text link
    Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this problem in which the goal is to design an algorithm that, given a limited number of i.i.d.~samples from an unknown distribution, can answer adaptively-chosen queries about that distribution. We present an algorithm that estimates the expectations of kk arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as k\sqrt{k}. The answers given by our algorithm are essentially as accurate as if fresh samples were used to evaluate each estimator. In contrast, prior work yields error guarantees that scale with the worst-case sensitivity of each estimator. We also give a version of our algorithm that can be used to verify answers to such queries where the sample complexity depends logarithmically on the number of queries kk (as in the reusable holdout technique). Our algorithm is based on a simple approximate median algorithm that satisfies the strong stability guarantees of differential privacy. Our techniques provide a new approach for analyzing the generalization guarantees of differentially private algorithms.Comment: To appear in Conference on Learning Theory (COLT) 201

    PAC learning with stable and private predictions

    Full text link
    We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset. Specifically, we consider the notions of uniform stability (Bousquet and Elisseeff, 2001) and prediction privacy (Dwork and Feldman, 2018). Previous work on these notions shows how they can be achieved in the standard PAC model via simple aggregation of models trained on disjoint subsets of data. Unfortunately, this approach leads to a significant overhead in terms of sample complexity. Here we demonstrate several general approaches to stable and private prediction that either eliminate or significantly reduce the overhead. Specifically, we demonstrate that for any class CC of VC dimension dd there exists a γ\gamma-uniformly stable algorithm for learning CC with excess error α\alpha using O~(d/(αγ)+d/α2)\tilde O(d/(\alpha\gamma) + d/\alpha^2) samples. We also show that this bound is nearly tight. For ϵ\epsilon-differentially private prediction we give two new algorithms: one using O~(d/(α2ϵ))\tilde O(d/(\alpha^2\epsilon)) samples and another one using O~(d2/(αϵ)+d/α2)\tilde O(d^2/(\alpha\epsilon) + d/\alpha^2) samples. The best previously known bounds for these problems are O(d/(α2γ))O(d/(\alpha^2\gamma)) and O(d/(α3ϵ))O(d/(\alpha^3\epsilon)), respectively

    High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

    Full text link
    Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional n\sqrt{n} factor in the bound. Specifically, their bound on the estimation error of any γ\gamma-uniformly stable learning algorithm on nn samples and range in [0,1][0,1] is O(γnlog(1/δ)+log(1/δ)/n)O(\gamma \sqrt{n \log(1/\delta)} + \sqrt{\log(1/\delta)/n}) with probability 1δ\geq 1-\delta. The n\sqrt{n} overhead makes the bound vacuous in the common settings where γ1/n\gamma \geq 1/\sqrt{n}. A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most O(n1/4)O(n^{1/4}). Still, both of these results give optimal generalization bounds only when γ=O(1/n)\gamma = O(1/n). We prove a nearly tight bound of O(γlog(n)log(n/δ)+log(1/δ)/n)O(\gamma \log(n)\log(n/\delta) + \sqrt{\log(1/\delta)/n}) on the estimation error of any γ\gamma-uniformly stable algorithm. It implies that for algorithms that are uniformly stable with γ=O(1/n)\gamma = O(1/\sqrt{n}), estimation error is essentially the same as the sampling error. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate --- resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.Comment: this is a follow-up to and has minor text overlap with arXiv:1812.09859; v2: minor revision following acceptance for presentation at COLT 201
    corecore