Search CORE

8 research outputs found

Five Proofs of Chernoff's Bound with Applications

Author: Mulzer Wolfgang
Publication venue
Publication date: 02/05/2019
Field of study

We discuss five ways of proving Chernoff's bound and show how they lead to different extensions of the basic bound.Comment: 16 pages, no figures. This revision slightly updates the presentation of the IK-metho

arXiv.org e-Print Archive

Generalization Bounds for Uniformly Stable Algorithms

Author: Feldman Vitaly
Vondrak Jan
Publication venue
Publication date: 18/03/2019
Field of study

Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss function with range bounded in

[0,1]

, the generalization error of a

\gamma

-uniformly stable learning algorithm on

n

samples is known to be within

O((\gamma +1/n) \sqrt{n \log(1/\delta)})

of the empirical error with probability at least

1-\delta

. Unfortunately, this bound does not lead to meaningful generalization bounds in many common settings where

\gamma \geq 1/\sqrt{n}

. At the same time the bound is known to be tight only when

\gamma = O(1/n)

. We substantially improve generalization bounds for uniformly stable algorithms without making any additional assumptions. First, we show that the bound in this setting is

O(\sqrt{(\gamma + 1/n) \log(1/\delta)})

with probability at least

1-\delta

. In addition, we prove a tight bound of

O(\gamma^2 + 1/n)

on the second moment of the estimation error. The best previous bound on the second moment is

O(\gamma + 1/n)

. Our proofs are based on new analysis techniques and our results imply substantially stronger generalization guarantees for several well-studied algorithms.Comment: Appeared in Neural Information Processing Systems (NeurIPS), 201

arXiv.org e-Print Archive

Concentration Bounds for High Sensitivity Functions Through Differential Privacy

Author: Nissim Kobbi
Stemmer Uri
Publication venue
Publication date: 06/03/2017
Field of study

A new line of work [Dwork et al. STOC 2015], [Hardt and Ullman FOCS 2014], [Steinke and Ullman COLT 2015], [Bassily et al. STOC 2016] demonstrates how differential privacy [Dwork et al. TCC 2006] can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a low-sensitivity function f, then w.h.p. f(S) is close to its expectation, although f is being chosen based on the data. Very recently, Steinke and Ullman observed that these generalization guarantees can be used for proving concentration bounds in the non-adaptive setting, where the low-sensitivity function is fixed beforehand. In particular, they obtain alternative proofs for classical concentration bounds for low-sensitivity functions, such as the Chernoff bound and McDiarmid's Inequality. In this work, we set out to examine the situation for functions with high-sensitivity, for which differential privacy does not imply generalization guarantees under adaptive analysis. We show that differential privacy can be used to prove concentration bounds for such functions in the non-adaptive setting

arXiv.org e-Print Archive

A New Analysis of Differential Privacy's Generalization Guarantees

Author: Jung Christopher
Ligett Katrina
Neel Seth
Roth Aaron
Sharifi-Malvajerdi Saeed
Shenfeld Moshe
Publication venue
Publication date: 08/09/2019
Field of study

We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality

arXiv.org e-Print Archive

Hypothesis Set Stability and Generalization

Author: Foster Dylan J.
Greenberg Spencer
Kale Satyen
Luo Haipeng
Mohri Mehryar
Sridharan Karthik
Publication venue
Publication date: 05/10/2020
Field of study

We present a study of generalization for data-dependent hypothesis sets. We give a general learning guarantee for data-dependent hypothesis sets based on a notion of transductive Rademacher complexity. Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce. This bound admits as special cases both standard Rademacher complexity bounds and algorithm-dependent uniform stability bounds. We also illustrate the use of these learning bounds in the analysis of several scenarios.Comment: Published in NeurIPS 2019. This version is equivalent to the camera-ready version but also includes the supplementary materia

arXiv.org e-Print Archive

Generalization for Adaptively-chosen Estimators via Stable Median

Author: Feldman Vitaly
Steinke Thomas
Publication venue
Publication date: 15/06/2017
Field of study

Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this problem in which the goal is to design an algorithm that, given a limited number of i.i.d.~samples from an unknown distribution, can answer adaptively-chosen queries about that distribution. We present an algorithm that estimates the expectations of

k

arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as

\sqrt{k}

. The answers given by our algorithm are essentially as accurate as if fresh samples were used to evaluate each estimator. In contrast, prior work yields error guarantees that scale with the worst-case sensitivity of each estimator. We also give a version of our algorithm that can be used to verify answers to such queries where the sample complexity depends logarithmically on the number of queries

k

(as in the reusable holdout technique). Our algorithm is based on a simple approximate median algorithm that satisfies the strong stability guarantees of differential privacy. Our techniques provide a new approach for analyzing the generalization guarantees of differentially private algorithms.Comment: To appear in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

PAC learning with stable and private predictions

Author: Dagan Yuval
Feldman Vitaly
Publication venue
Publication date: 23/09/2020
Field of study

We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset. Specifically, we consider the notions of uniform stability (Bousquet and Elisseeff, 2001) and prediction privacy (Dwork and Feldman, 2018). Previous work on these notions shows how they can be achieved in the standard PAC model via simple aggregation of models trained on disjoint subsets of data. Unfortunately, this approach leads to a significant overhead in terms of sample complexity. Here we demonstrate several general approaches to stable and private prediction that either eliminate or significantly reduce the overhead. Specifically, we demonstrate that for any class

C

of VC dimension

d

there exists a

\gamma

-uniformly stable algorithm for learning

C

with excess error

\alpha

using

\tilde O(d/(\alpha\gamma) + d/\alpha^2)

samples. We also show that this bound is nearly tight. For

\epsilon

-differentially private prediction we give two new algorithms: one using

\tilde O(d/(\alpha^2\epsilon))

samples and another one using

\tilde O(d^2/(\alpha\epsilon) + d/\alpha^2)

samples. The best previously known bounds for these problems are

O(d/(\alpha^2\gamma))

and

O(d/(\alpha^3\epsilon))

, respectively

arXiv.org e-Print Archive

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

Author: Feldman Vitaly
Vondrak Jan
Publication venue
Publication date: 23/06/2019
Field of study

Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional

\sqrt{n}

factor in the bound. Specifically, their bound on the estimation error of any

\gamma

-uniformly stable learning algorithm on

n

samples and range in

[0,1]

O(\gamma \sqrt{n \log(1/\delta)} + \sqrt{\log(1/\delta)/n})

with probability

\geq 1-\delta

. The

\sqrt{n}

overhead makes the bound vacuous in the common settings where

\gamma \geq 1/\sqrt{n}

. A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most

O(n^{1/4})

. Still, both of these results give optimal generalization bounds only when

\gamma = O(1/n)

. We prove a nearly tight bound of

O(\gamma \log(n)\log(n/\delta) + \sqrt{\log(1/\delta)/n})

on the estimation error of any

\gamma

-uniformly stable algorithm. It implies that for algorithms that are uniformly stable with

\gamma = O(1/\sqrt{n})

, estimation error is essentially the same as the sampling error. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate --- resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.Comment: this is a follow-up to and has minor text overlap with arXiv:1812.09859; v2: minor revision following acceptance for presentation at COLT 201

arXiv.org e-Print Archive