416 research outputs found
Contraction of Locally Differentially Private Mechanisms
We investigate the contraction properties of locally differentially private
mechanisms. More specifically, we derive tight upper bounds on the divergence
between and output distributions of an
-LDP mechanism in terms of a divergence between the
corresponding input distributions and , respectively. Our first main
technical result presents a sharp upper bound on the -divergence
in terms of and
. We also show that the same result holds for a large family of
divergences, including KL-divergence and squared Hellinger distance. The second
main technical result gives an upper bound on
in terms of total variation distance
and . We then utilize these bounds to
establish locally private versions of the van Trees inequality, Le Cam's,
Assouad's, and the mutual information methods, which are powerful tools for
bounding minimax estimation risks. These results are shown to lead to better
privacy analyses than the state-of-the-arts in several statistical problems
such as entropy and discrete distribution estimation, non-parametric density
estimation, and hypothesis testing
Crystal structure of the outer membrane protein OmpU from Vibrio cholerae at 2.2 Å resolution
Vibrio cholerae causes a severe disease that kills thousands of people annually. The outer membrane protein OmpU is the most abundant outer membrane protein in V. cholerae, and has been identified as an important virulence factor that is involved in host-cell interaction and recognition, as well as being critical for the survival of the pathogenic V. cholerae in the host body and in harsh environments. The mechanism of these processes is not well understood owing to a lack of the structure of V. cholerae OmpU. Here, the crystal structure of the V. cholerae OmpU trimer is reported to a resolution of 2.2 Å. The protomer forms a 16-β-stranded barrel with a noncanonical N-terminal coil located in the lumen of the barrel that consists of residues Gly32–Ser42 and is observed to participate in forming the second gate in the pore. By mapping the published functional data onto the OmpU structure, the OmpU structure reinforces the notion that the long extracellular loop L4 with a β-hairpin-like motif may be critical for host-cell binding and invasion, while L3, L4 and L8 are crucially implicated in phage recognition by V. cholerae
Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data
We study stochastic convex optimization with heavy-tailed data under the
constraint of differential privacy (DP). Most prior work on this problem is
restricted to the case where the loss function is Lipschitz. Instead, as
introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general
convex loss functions with the assumption that the distribution of gradients
has bounded -th moments. We provide improved upper bounds on the excess
population risk under concentrated DP for convex and strongly convex loss
functions. Along the way, we derive new algorithms for private mean estimation
of heavy-tailed distributions, under both pure and concentrated DP. Finally, we
prove nearly-matching lower bounds for private stochastic convex optimization
with strongly convex losses and mean estimation, showing new separations
between pure and concentrated DP
Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data
In this paper, we study the problem of estimating smooth Generalized Linear
Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model.
Different from its classical setting, our model allows the server to access
some additional public but unlabeled data. By using Stein's lemma and its
variants, we first show that there is an -NLDP algorithm
for GLM (under some mild assumptions), if each data record is i.i.d sampled
from some sub-Gaussian distribution with bounded -norm. Then with high
probability, the sample complexity of the public and private data, for the
algorithm to achieve an estimation error (in -norm), is
and , respectively, if
is not too small ({\em i.e.,} ), where is the dimensionality of the data. This
is a significant improvement over the previously known quasi-polynomial (in
) or exponential (in ) complexity of GLM with no public data. Also,
our algorithm can answer multiple (at most ) GLM queries with the
same sample complexities as in the one GLM query case with at least constant
probability. We then extend our idea to the non-linear regression problem and
show a similar phenomenon for it. Finally, we demonstrate the effectiveness of
our algorithms through experiments on both synthetic and real world datasets.
To our best knowledge, this is the first paper showing the existence of
efficient and effective algorithms for GLM and non-linear regression in the
NLDP model with public unlabeled data
- …