53 research outputs found

    Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

    Get PDF
    In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. By using Stein's lemma and its variants, we first show that there is an (ϵ,δ)(\epsilon, \delta)-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded 1\ell_1-norm. Then with high probability, the sample complexity of the public and private data, for the algorithm to achieve an α\alpha estimation error (in \ell_\infty-norm), is O(p2α2)O(p^2\alpha^{-2}) and O(p2α2ϵ2){O}(p^2\alpha^{-2}\epsilon^{-2}), respectively, if α\alpha is not too small ({\em i.e.,} αΩ(1p)\alpha\geq \Omega(\frac{1}{\sqrt{p}})), where pp is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in α\alpha) or exponential (in pp) complexity of GLM with no public data. Also, our algorithm can answer multiple (at most exp(O(p))\exp(O(p))) GLM queries with the same sample complexities as in the one GLM query case with at least constant probability. We then extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data

    Estimating smooth GLM in non-interactive local differential privacy model with public unlabeled data

    Get PDF
    In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. Firstly, motived by Stein’s lemma, we show that if each data record is i.i.d. sampled from zero-mean Gaussian distribution, we show that there exists an (, )-NLDP algorithm for GLM. The sample complexity of the public and private data, for the algorithm to achieve an estimation error (in _2-norm) with high probability, is O(p^-2) and O(p^3^-2^-2), respectively. This is a significant improvement over the previously known exponential or quasi-polynomial in ^-1, or exponential in p sample complexity of GLM with no public data. Then, by a variant of Stein’s lemma, we show that there is an (, )-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded _1-norm. Then the sample complexity of the public and private data, for the algorithm to achieve an estimation error (in ∞-norm) with high probability, is O(p^2^-2) and O(p^2^-2^-2), respectively, if is not too small (i.e., ≥ Ω (1/√p )), where p is the dimensionality of the data. We also extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.http://proceedings.mlr.press/v132/wang21a/wang21a.pd

    On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data

    Full text link
    In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.Comment: To appear in The 14th Asian Conference on Machine Learning (ACML 2022

    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

    Full text link
    We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded kk-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP

    Efficient Private SCO for Heavy-Tailed Data via Clipping

    Full text link
    We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Prior work on this problem is restricted to the gradient descent (GD) method, which is inefficient for large-scale problems. In this paper, we resolve this issue and derive the first high-probability bounds for the private stochastic method with clipping. For general convex problems, we derive excess population risks \Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta d}}}{(n\epsilon)^{2/7}}\right) and \Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta d}}{(n\epsilon)^{2/7}}\right) under bounded or unbounded domain assumption, respectively (here nn is the sample size, dd is the dimension of the data, β\beta is the confidence level and ϵ\epsilon is the private level). Then, we extend our analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with Ho¨\ddot{\text{o}}lder-continuous gradients). We establish new excess risk bounds without bounded domain assumption. The results above achieve lower excess risks and gradient complexities than existing methods in their corresponding cases. Numerical experiments are conducted to justify the theoretical improvement

    Practical Differentially Private and Byzantine-resilient Federated Learning

    Full text link
    Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms. In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct an aggregation that effectively rejects many existing Byzantine attacks. We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% of distributive workers are Byzantine

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    The 11th Conference of PhD Students in Computer Science

    Get PDF
    corecore