1,087 research outputs found
Privacy-Utility Tradeoff of OLS with Random Projections
We study the differential privacy (DP) of a core ML problem, linear ordinary
least squares (OLS), a.k.a. -regression. Our key result is that the
approximate LS algorithm (ALS) (Sarlos, 2006), a randomized solution to the OLS
problem primarily used to improve performance on large datasets, also preserves
privacy. ALS achieves a better privacy/utility tradeoff, without modifications
or further noising, when compared to alternative private OLS algorithms which
modify and/or noise OLS. We give the first {\em tight} DP-analysis for the ALS
algorithm and the standard Gaussian mechanism (Dwork et al., 2014) applied to
OLS. Our methodology directly improves the privacy analysis of (Blocki et al.,
2012) and (Sheffet, 2019)) and introduces new tools which may be of independent
interest: (1) the exact spectrum of -DP parameters (``DP
spectrum") for mechanisms whose output is a -dimensional Gaussian, and (2)
an improved DP spectrum for random projection (compared to (Blocki et al.,
2012) and (Sheffet, 2019)).
All methods for private OLS (including ours) assume, often implicitly,
restrictions on the input database, such as bounds on leverage and residuals.
We prove that such restrictions are necessary. Hence, computing the privacy of
mechanisms such as ALS must estimate these database parameters, which can be
infeasible in big datasets. For more complex ML models, DP bounds may not even
be tractable. There is a need for blackbox DP-estimators (Lu et al., 2022)
which empirically estimate a data-dependent privacy. We demonstrate the
effectiveness of such a DP-estimator by empirically recovering a DP-spectrum
that matches our theory for OLS. This validates the DP-estimator in a
nontrivial ML application, opening the door to its use in more complex
nonlinear ML settings where theory is unavailable
Extremal Mechanisms for Local Differential Privacy
Local differential privacy has recently surfaced as a strong measure of
privacy in contexts where personal information remains private even from data
analysts. Working in a setting where both the data providers and data analysts
want to maximize the utility of statistical analyses performed on the released
data, we study the fundamental trade-off between local differential privacy and
utility. This trade-off is formulated as a constrained optimization problem:
maximize utility subject to local differential privacy constraints. We
introduce a combinatorial family of extremal privatization mechanisms, which we
call staircase mechanisms, and show that it contains the optimal privatization
mechanisms for a broad class of information theoretic utilities such as mutual
information and -divergences. We further prove that for any utility function
and any privacy level, solving the privacy-utility maximization problem is
equivalent to solving a finite-dimensional linear program, the outcome of which
is the optimal staircase mechanism. However, solving this linear program can be
computationally expensive since it has a number of variables that is
exponential in the size of the alphabet the data lives in. To account for this,
we show that two simple privatization mechanisms, the binary and randomized
response mechanisms, are universally optimal in the low and high privacy
regimes, and well approximate the intermediate regime.Comment: 52 pages, 10 figures in JMLR 201
Improving Frequency Estimation under Local Differential Privacy
Local Differential Privacy protocols are stochastic protocols used in data
aggregation when individual users do not trust the data aggregator with their
private data. In such protocols there is a fundamental tradeoff between user
privacy and aggregator utility. In the setting of frequency estimation,
established bounds on this tradeoff are either nonquantitative, or far from
what is known to be attainable. In this paper, we use information-theoretical
methods to significantly improve established bounds. We also show that the new
bounds are attainable for binary inputs. Furthermore, our methods lead to
improved frequency estimators, which we experimentally show to outperform
state-of-the-art methods
- …