20 research outputs found
A Unifying Framework for Differentially Private Sums under Continual Observation
We study the problem of maintaining a differentially private decaying sum
under continual observation. We give a unifying framework and an efficient
algorithm for this problem for \emph{any sufficiently smooth} function. Our
algorithm is the first differentially private algorithm that does not have a
multiplicative error for polynomially-decaying weights. Our algorithm improves
on all prior works on differentially private decaying sums under continual
observation and recovers exactly the additive error for the special case of
continual counting from Henzinger et al. (SODA 2023) as a corollary.
Our algorithm is a variant of the factorization mechanism whose error depends
on the and norm of the underlying matrix. We give a
constructive proof for an almost exact upper bound on the and
norm and an almost tight lower bound on the norm for a
large class of lower-triangular matrices. This is the first non-trivial lower
bound for lower-triangular matrices whose non-zero entries are not all the
same. It includes matrices for all continual decaying sums problems, resulting
in an upper bound on the additive error of any differentially private decaying
sums algorithm under continual observation.
We also explore some implications of our result in discrepancy theory and
operator algebra. Given the importance of the norm in computer
science and the extensive work in mathematics, we believe our result will have
further applications.Comment: 32 page
Differentially Private Linear Algebra in the Streaming Model
The focus of this paper is a systematic study of differential privacy on streaming data using sketch-based algorithms. Previous works, like Dwork {\it et al.} (ICS 2010, STOC 2010), explored random sampling based streaming algorithms. We work in the well studied streaming model of computation, where the database is stored in the form of a matrix and a curator can access the database row-wise or column-wise. Dwork {\it et al.} (STOC 2010) gave impossibility result for any non-trivial query on a streamed data with respect to the user level privacy. Therefore, in this paper, we work with the event level privacy. {We provide optimal, up to logarithmic factor, space data-structure in the streaming model for three basic linear algebraic tasks in a differentially private manner: matrix multiplication, linear regression, and low rank approximation, while incurring significantly less additive error}.
The mechanisms for matrix multiplication and linear regression can be seen as the private analogues of known non-private algorithms, and have some similarities with Blocki {\it et al.} (FOCS 2012) and Upadhyay (ASIACRYPT 2013) on the superficial level, but there are some subtle differences. For example, they perform an affine transformation to convert the private matrix in to a set of vectors for some appropriate , while we perform a perturbation that raises the singular values of the private matrix. In order to get a streaming algorithm for low rank approximation, we have to reuse the random Gaussian matrix in a specific way. We prove that the resulting distribution also preserve differential privacy. We do not make any assumptions, like singular value separation, as made in the earlier works of Hardt and Roth (STOC 2013) and Kapralov and Talwar (SODA 2013). Further, we do not assume normalized row as in the work of Dwork {\it et al.} (STOC 2014). All our mechanisms, in the form presented, can also be computed in the distributed setting of Biemel, Nissim, and Omri (CRYPTO 2008)
Integrity and Privacy of Large Data
There has been considerable recent interest in "cloud storage" wherein a user asks a server to store a large file. One issue is whether the user can verify that the server is actually storing the file, and typically a challenge-response protocol is employed to convince the user that the file is indeed being stored correctly. The security of these schemes is phrased in terms of an extractor which will recover the file given any ``proving algorithm'' that has a sufficiently high success probability. This forms the basis of proof-of-retrievability (PoR) and proof-of-data-possession (PDP) systems. The contributions of this thesis in secure cloud storage are as below.
1. We provide a general analytical framework for various known PoR schemes that yields exact reductions that precisely quantify conditions for extraction to succeed as a function of the success probability of a proving algorithm. We apply this analysis to several archetypal schemes. In addition, we provide a new methodology for the analysis of keyed PoR schemes in an unconditionally secure setting, and use it to prove the security of a modified version of a scheme due to Shacham and Waters (ASIACRYPT, 2009) under a slightly restricted attack model, thus providing the first example of a keyed PoR scheme with unconditional security. We also show how classical statistical techniques can be used to evaluate whether the responses of the prover on the storage are accurate enough to permit successful extraction. Finally, we prove a new lower bound on the storage and communication complexity of PoR schemes.
2. We propose a new type of scheme that we term a proof-of-data-observability scheme. Our definition tries to capture the stronger requirement that the server must have an actual copy of M in its memory space while it executes the challenge-response protocol. We give some examples of schemes that satisfy this new security definition. As well, we analyze the efficiency and security of the protocols we present, and we prove some necessary conditions for the existence of these kinds of protocols.
3. We study secure storage on multiple servers. Our contribution in multiple-server PoR systems is twofold. We formalize security definitions for two possible scenarios: (i) when a threshold of servers succeed with high enough probability (worst-case) and (ii) when the average of the success probability of all the servers is above a threshold (average-case). Using coding theory, we show instances of protocols that are secure both in the average-case and the worst-case scenarios
Generic Attacks on Hash Functions
The subject of this thesis is a security property of hash functions, called chosen-target forced-prefix preimage (CTFP) resistance and the generic attack on this property, called the herding attack. The study of CTFP resistance started when Kelsey-Kohno introduced a new data structure, called a diamond structure, in order to show the strength of a CTFP resistance property of a hash function.
In this thesis, we concentrate on the complexity of the diamond structure and its application in the herding attack. We review the analysis done by Kelsey and Kohno and point out a subtle flaw in their analysis. We propose a correction of their analysis and based on our revised analysis, calculate the message complexity and the computational complexity of the generic attacks that are based on the diamond structure. As an application of the diamond structure on generic attacks, we propose a multiple herding attack on a special generalization of iterated hash functions, proposed by Nandi-Stinson
Langevin Diffusion: An Almost Universal Algorithm for Private Euclidean (Convex) Optimization
In this paper we revisit the problem of differentially private empirical risk
minimization (DP-ERM) and stochastic convex optimization (DP-SCO). We show that
a well-studied continuous time algorithm from statistical physics called
Langevin diffusion (LD) simultaneously provides optimal privacy/utility
tradeoffs for both DP-ERM and DP-SCO under -DP and
-DP. Using the uniform stability properties of LD, we
provide the optimal excess population risk guarantee for -Lipschitz
convex losses under -DP (even up to factors), thus improving
on Asi et al.
Along the way we provide various technical tools which can be of independent
interest: i) A new R\'enyi divergence bound for LD when run on loss functions
over two neighboring data sets, ii) Excess empirical risk bounds for
last-iterate LD analogous to that of Shamir and Zhang for noisy stochastic
gradient descent (SGD), and iii) A two phase excess risk analysis of LD, where
the first phase is when the diffusion has not converged in any reasonable sense
to a stationary distribution, and in the second phase when the diffusion has
converged to a variant of Gibbs distribution. Our universality results
crucially rely on the dynamics of LD. When it has converged to a stationary
distribution, we obtain the optimal bounds under -DP. When it is run
only for a very short time , we obtain the optimal bounds under
-DP. Here, is the dimensionality of the model space.
Our work initiates a systematic study of DP continuous time optimization. We
believe this may have ramifications in the design of discrete time DP
optimization algorithms analogous to that in the non-private setting, where
continuous time dynamical viewpoints have helped in designing new algorithms,
including the celebrated mirror-descent and Polyak's momentum method.Comment: Added a comparison to the work of Asi et a
Constant matters: Fine-grained Complexity of Differentially Private Continual Observation Using Completely Bounded Norms
We study fine-grained error bounds for differentially private algorithms for averaging and counting in the continual observation model. For this, we use the completely bounded spectral norm (cb norm) from operator algebra. For a matrix , its cb norm is defined as
where denotes the Schur product and denotes the spectral norm. We bound the cb norm of two fundamental matrices studied in differential privacy under the continual observation model: the counting matrix and the averaging matrix . For , we give lower and upper bound whose additive gap is . Our factorization also has two desirable properties sufficient for streaming setting: the factorization contains of lower-triangular matrices and the number of distinct entries in the factorization is exactly . This allows us to compute the factorization on the fly while requiring the curator to store a -dimensional vector. For , we show an additive gap between the lower and upper bound of