Search CORE

53 research outputs found

Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

Author: Gaboardi Marco
Wang Di
Xu Jinhui
Zhang Huanyu
Publication venue
Publication date: 29/09/2020
Field of study

(\epsilon, \delta)

-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded

\ell_1

-norm. Then with high probability, the sample complexity of the public and private data, for the algorithm to achieve an

\alpha

estimation error (in

\ell_\infty

-norm), is

O(p^2\alpha^{-2})

and

{O}(p^2\alpha^{-2}\epsilon^{-2})

, respectively, if

\alpha

is not too small ({\em i.e.,}

\alpha\geq \Omega(\frac{1}{\sqrt{p}})

), where

p

is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in

\alpha

) or exponential (in

p

) complexity of GLM with no public data. Also, our algorithm can answer multiple (at most

\exp(O(p))

) GLM queries with the same sample complexities as in the one GLM query case with at least constant probability. We then extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Estimating smooth GLM in non-interactive local differential privacy model with public unlabeled data

Author: Gaboardi Marco
Wang Di
Xu Jinhui
Zhang Huanyu
Publication venue: PMLR
Publication date: 16/03/2021
Field of study

In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. Firstly, motived by Stein’s lemma, we show that if each data record is i.i.d. sampled from zero-mean Gaussian distribution, we show that there exists an (, )-NLDP algorithm for GLM. The sample complexity of the public and private data, for the algorithm to achieve an estimation error (in _2-norm) with high probability, is O(p^-2) and O(p^3^-2^-2), respectively. This is a significant improvement over the previously known exponential or quasi-polynomial in ^-1, or exponential in p sample complexity of GLM with no public data. Then, by a variant of Stein’s lemma, we show that there is an (, )-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded _1-norm. Then the sample complexity of the public and private data, for the algorithm to achieve an estimation error (in ∞-norm) with high probability, is O(p^2^-2) and O(p^2^-2^-2), respectively, if is not too small (i.e., ≥ Ω (1/√p )), where p is the dimensionality of the data. We also extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.http://proceedings.mlr.press/v132/wang21a/wang21a.pd

Boston University Institutional Repository (OpenBU)

On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data

Author: Su Jinyan
Wang Di
Xu Jinhui
Publication venue
Publication date: 17/09/2022
Field of study

In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.Comment: To appear in The 14th Asian Conference on Machine Learning (ACML 2022

arXiv.org e-Print Archive

Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

Author: Kamath Gautam
Liu Xingtu
Zhang Huanyu
Publication venue
Publication date: 21/11/2021
Field of study

We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded

k

-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP

arXiv.org e-Print Archive

Efficient Private SCO for Heavy-Tailed Data via Clipping

Author: Cheng James
Han Bo
Jin Chenhan
Yang Ming-Chang
Zhou Kaiwen
Publication venue
Publication date: 15/07/2022
Field of study

We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Prior work on this problem is restricted to the gradient descent (GD) method, which is inefficient for large-scale problems. In this paper, we resolve this issue and derive the first high-probability bounds for the private stochastic method with clipping. For general convex problems, we derive excess population risks \Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta d}}}{(n\epsilon)^{2/7}}\right) and \Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta d}}{(n\epsilon)^{2/7}}\right) under bounded or unbounded domain assumption, respectively (here

n

is the sample size,

d

is the dimension of the data,

\beta

is the confidence level and

\epsilon

is the private level). Then, we extend our analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H

\ddot{\text{o}}

lder-continuous gradients). We establish new excess risk bounds without bounded domain assumption. The results above achieve lower excess risks and gradient complexities than existing methods in their corresponding cases. Numerical experiments are conducted to justify the theoretical improvement

arXiv.org e-Print Archive

Practical Differentially Private and Byzantine-resilient Federated Learning

Author: Lin Wanyu
Wang Di
Wang Tianhao
Xiang Zihang
Publication venue
Publication date: 15/04/2023
Field of study

Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms. In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct an aggregation that effectively rejects many existing Byzantine attacks. We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% of distributive workers are Byzantine

arXiv.org e-Print Archive

LIPIcs, Volume 251, ITCS 2023, Complete Volume

Author: Tauman Kalai Yael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 251, ITCS 2023, Complete Volum

Dagstuhl Research Online Publication Server

The 11th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2018
Field of study

University of Szeged

Joint Redundancy Analysis by a Multivariate Linear Predictor

Author: Salvatore Renato
Publication venue: Pearson
Publication date: 01/01/2020
Field of study

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)