5,648 research outputs found
Nearly Optimal Private Convolution
We study computing the convolution of a private input with a public input
, while satisfying the guarantees of -differential
privacy. Convolution is a fundamental operation, intimately related to Fourier
Transforms. In our setting, the private input may represent a time series of
sensitive events or a histogram of a database of confidential personal
information. Convolution then captures important primitives including linear
filtering, which is an essential tool in time series analysis, and aggregation
queries on projections of the data.
We give a nearly optimal algorithm for computing convolutions while
satisfying -differential privacy. Surprisingly, we follow
the simple strategy of adding independent Laplacian noise to each Fourier
coefficient and bounding the privacy loss using the composition theorem of
Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal
noise to add to each Fourier coefficient using convex programming duality. Our
algorithm is very efficient -- it is essentially no more computationally
expensive than a Fast Fourier Transform.
To prove near optimality, we use the recent discrepancy lowerbounds of
Muthukrishnan and Nikolov and derive a spectral lower bound using a
characterization of discrepancy in terms of determinants
Recovering Structured Probability Matrices
We consider the problem of accurately recovering a matrix B of size M by M ,
which represents a probability distribution over M2 outcomes, given access to
an observed matrix of "counts" generated by taking independent samples from the
distribution B. How can structural properties of the underlying matrix B be
leveraged to yield computationally efficient and information theoretically
optimal reconstruction algorithms? When can accurate reconstruction be
accomplished in the sparse data regime? This basic problem lies at the core of
a number of questions that are currently being considered by different
communities, including building recommendation systems and collaborative
filtering in the sparse data regime, community detection in sparse random
graphs, learning structured models such as topic models or hidden Markov
models, and the efforts from the natural language processing community to
compute "word embeddings".
Our results apply to the setting where B has a low rank structure. For this
setting, we propose an efficient algorithm that accurately recovers the
underlying M by M matrix using Theta(M) samples. This result easily translates
to Theta(M) sample algorithms for learning topic models and learning hidden
Markov Models. These linear sample complexities are optimal, up to constant
factors, in an extremely strong sense: even testing basic properties of the
underlying matrix (such as whether it has rank 1 or 2) requires Omega(M)
samples. We provide an even stronger lower bound where distinguishing whether a
sequence of observations were drawn from the uniform distribution over M
observations versus being generated by an HMM with two hidden states requires
Omega(M) observations. This precludes sublinear-sample hypothesis tests for
basic properties, such as identity or uniformity, as well as sublinear sample
estimators for quantities such as the entropy rate of HMMs
On Integer Programming, Discrepancy, and Convolution
Integer programs with a constant number of constraints are solvable in
pseudo-polynomial time. We give a new algorithm with a better pseudo-polynomial
running time than previous results. Moreover, we establish a strong connection
to the problem (min, +)-convolution. (min, +)-convolution has a trivial
quadratic time algorithm and it has been conjectured that this cannot be
improved significantly. We show that further improvements to our
pseudo-polynomial algorithm for any fixed number of constraints are equivalent
to improvements for (min, +)-convolution. This is a strong evidence that our
algorithm's running time is the best possible. We also present a faster
specialized algorithm for testing feasibility of an integer program with few
constraints and for this we also give a tight lower bound, which is based on
the SETH.Comment: A preliminary version appeared in the proceedings of ITCS 201
- …