306 research outputs found
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
Pseudorandomness via the discrete Fourier transform
We present a new approach to constructing unconditional pseudorandom
generators against classes of functions that involve computing a linear
function of the inputs. We give an explicit construction of a pseudorandom
generator that fools the discrete Fourier transforms of linear functions with
seed-length that is nearly logarithmic (up to polyloglog factors) in the input
size and the desired error parameter. Our result gives a single pseudorandom
generator that fools several important classes of tests computable in logspace
that have been considered in the literature, including halfspaces (over general
domains), modular tests and combinatorial shapes. For all these classes, our
generator is the first that achieves near logarithmic seed-length in both the
input length and the error parameter. Getting such a seed-length is a natural
challenge in its own right, which needs to be overcome in order to derandomize
RL - a central question in complexity theory.
Our construction combines ideas from a large body of prior work, ranging from
a classical construction of [NN93] to the recent gradually increasing
independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some
novel analytic machinery which might find other applications
- …