6 research outputs found

    Private Incremental Regression

    Full text link
    Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of d\approx\sqrt{d}, where dd is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.Comment: To appear in PODS 201

    On the Power of Multiple Anonymous Messages

    Get PDF
    An exciting new development in differential privacy is the shuffled model, in which an anonymous channel enables non-interactive, differentially private protocols with error much smaller than what is possible in the local model, while relying on weaker trust assumptions than in the central model. In this paper, we study basic counting problems in the shuffled model and establish separations between the error that can be achieved in the single-message shuffled model and in the shuffled model with multiple messages per user. For the problem of frequency estimation for nn users and a domain of size BB, we obtain: - A nearly tight lower bound of Ω~(min(n4,B))\tilde{\Omega}( \min(\sqrt[4]{n}, \sqrt{B})) on the error in the single-message shuffled model. This implies that the protocols obtained from the amplification via shuffling work of Erlingsson et al. (SODA 2019) and Balle et al. (Crypto 2019) are essentially optimal for single-message protocols. A key ingredient in the proof is a lower bound on the error of locally-private frequency estimation in the low-privacy (aka high ϵ\epsilon) regime. - Protocols in the multi-message shuffled model with poly(logB,logn)poly(\log{B}, \log{n}) bits of communication per user and polylogBpoly\log{B} error, which provide an exponential improvement on the error compared to what is possible with single-message algorithms. For the related selection problem on a domain of size BB, we prove: - A nearly tight lower bound of Ω(B)\Omega(B) on the number of users in the single-message shuffled model. This significantly improves on the Ω(B1/17)\Omega(B^{1/17}) lower bound obtained by Cheu et al. (Eurocrypt 2019), and when combined with their O~(B)\tilde{O}(\sqrt{B})-error multi-message protocol, implies the first separation between single-message and multi-message protocols for this problem.Comment: 70 pages, 2 figures, 3 table
    corecore