13 research outputs found
An equivalence between private classification and online prediction
https://arxiv.org/pdf/2003.00563.pd
Online Agnostic Boosting via Regret Minimization
Boosting is a widely used machine learning approach based on the idea of
aggregating weak learning rules. While in statistical learning numerous
boosting methods exist both in the realizable and agnostic settings, in online
learning they exist only in the realizable case. In this work we provide the
first agnostic online boosting algorithm; that is, given a weak learner with
only marginally-better-than-trivial regret guarantees, our algorithm boosts it
to a strong learner with sublinear regret.
Our algorithm is based on an abstract (and simple) reduction to online convex
optimization, which efficiently converts an arbitrary online convex optimizer
to an online booster.
Moreover, this reduction extends to the statistical as well as the online
realizable settings, thus unifying the 4 cases of statistical/online and
agnostic/realizable boosting
Information Theoretic Lower Bounds for Information Theoretic Upper Bounds
We examine the relationship between the mutual information between the output
model and the empirical sample and the generalization of the algorithm in the
context of stochastic convex optimization. Despite increasing interest in
information-theoretic generalization bounds, it is uncertain if these bounds
can provide insight into the exceptional performance of various learning
algorithms. Our study of stochastic convex optimization reveals that, for true
risk minimization, dimension-dependent mutual information is necessary. This
indicates that existing information-theoretic generalization bounds fall short
in capturing the generalization capabilities of algorithms like SGD and
regularized ERM, which have dimension-independent sample complexity
How unfair is private learning ?
As machine learning algorithms are deployed on sensitive data in critical
decision making processes, it is becoming increasingly important that they are
also private and fair. In this paper, we show that, when the data has a
long-tailed structure, it is not possible to build accurate learning algorithms
that are both private and results in higher accuracy on minority
subpopulations. We further show that relaxing overall accuracy can lead to good
fairness even with strict privacy requirements. To corroborate our theoretical
results in practice, we provide an extensive set of experimental results using
a variety of synthetic, vision (CIFAR10 and CelebA), and tabular (Law School)
datasets and learning algorithms.Comment: Accepted as an Oral paper in UAI '2022, Major update on 23 Dec, 202