6 research outputs found
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
On the Power of Multiple Anonymous Messages
An exciting new development in differential privacy is the shuffled model, in
which an anonymous channel enables non-interactive, differentially private
protocols with error much smaller than what is possible in the local model,
while relying on weaker trust assumptions than in the central model. In this
paper, we study basic counting problems in the shuffled model and establish
separations between the error that can be achieved in the single-message
shuffled model and in the shuffled model with multiple messages per user.
For the problem of frequency estimation for users and a domain of size
, we obtain:
- A nearly tight lower bound of on the error in the single-message shuffled model. This implies
that the protocols obtained from the amplification via shuffling work of
Erlingsson et al. (SODA 2019) and Balle et al. (Crypto 2019) are essentially
optimal for single-message protocols. A key ingredient in the proof is a lower
bound on the error of locally-private frequency estimation in the low-privacy
(aka high ) regime.
- Protocols in the multi-message shuffled model with
bits of communication per user and error, which provide an
exponential improvement on the error compared to what is possible with
single-message algorithms.
For the related selection problem on a domain of size , we prove:
- A nearly tight lower bound of on the number of users in the
single-message shuffled model. This significantly improves on the
lower bound obtained by Cheu et al. (Eurocrypt 2019), and
when combined with their -error multi-message protocol,
implies the first separation between single-message and multi-message protocols
for this problem.Comment: 70 pages, 2 figures, 3 table