20 research outputs found

    A Unifying Framework for Differentially Private Sums under Continual Observation

    Full text link
    We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works on differentially private decaying sums under continual observation and recovers exactly the additive error for the special case of continual counting from Henzinger et al. (SODA 2023) as a corollary. Our algorithm is a variant of the factorization mechanism whose error depends on the γ2\gamma_2 and γF\gamma_F norm of the underlying matrix. We give a constructive proof for an almost exact upper bound on the γ2\gamma_2 and γF\gamma_F norm and an almost tight lower bound on the γ2\gamma_2 norm for a large class of lower-triangular matrices. This is the first non-trivial lower bound for lower-triangular matrices whose non-zero entries are not all the same. It includes matrices for all continual decaying sums problems, resulting in an upper bound on the additive error of any differentially private decaying sums algorithm under continual observation. We also explore some implications of our result in discrepancy theory and operator algebra. Given the importance of the γ2\gamma_2 norm in computer science and the extensive work in mathematics, we believe our result will have further applications.Comment: 32 page

    Differentially Private Linear Algebra in the Streaming Model

    Get PDF
    The focus of this paper is a systematic study of differential privacy on streaming data using sketch-based algorithms. Previous works, like Dwork {\it et al.} (ICS 2010, STOC 2010), explored random sampling based streaming algorithms. We work in the well studied streaming model of computation, where the database is stored in the form of a matrix and a curator can access the database row-wise or column-wise. Dwork {\it et al.} (STOC 2010) gave impossibility result for any non-trivial query on a streamed data with respect to the user level privacy. Therefore, in this paper, we work with the event level privacy. {We provide optimal, up to logarithmic factor, space data-structure in the streaming model for three basic linear algebraic tasks in a differentially private manner: matrix multiplication, linear regression, and low rank approximation, while incurring significantly less additive error}. The mechanisms for matrix multiplication and linear regression can be seen as the private analogues of known non-private algorithms, and have some similarities with Blocki {\it et al.} (FOCS 2012) and Upadhyay (ASIACRYPT 2013) on the superficial level, but there are some subtle differences. For example, they perform an affine transformation to convert the private matrix in to a set of {w/n,1}n\{\sqrt{w/n},1\}^n vectors for some appropriate ww, while we perform a perturbation that raises the singular values of the private matrix. In order to get a streaming algorithm for low rank approximation, we have to reuse the random Gaussian matrix in a specific way. We prove that the resulting distribution also preserve differential privacy. We do not make any assumptions, like singular value separation, as made in the earlier works of Hardt and Roth (STOC 2013) and Kapralov and Talwar (SODA 2013). Further, we do not assume normalized row as in the work of Dwork {\it et al.} (STOC 2014). All our mechanisms, in the form presented, can also be computed in the distributed setting of Biemel, Nissim, and Omri (CRYPTO 2008)

    Integrity and Privacy of Large Data

    Get PDF
    There has been considerable recent interest in "cloud storage" wherein a user asks a server to store a large file. One issue is whether the user can verify that the server is actually storing the file, and typically a challenge-response protocol is employed to convince the user that the file is indeed being stored correctly. The security of these schemes is phrased in terms of an extractor which will recover the file given any ``proving algorithm'' that has a sufficiently high success probability. This forms the basis of proof-of-retrievability (PoR) and proof-of-data-possession (PDP) systems. The contributions of this thesis in secure cloud storage are as below. 1. We provide a general analytical framework for various known PoR schemes that yields exact reductions that precisely quantify conditions for extraction to succeed as a function of the success probability of a proving algorithm. We apply this analysis to several archetypal schemes. In addition, we provide a new methodology for the analysis of keyed PoR schemes in an unconditionally secure setting, and use it to prove the security of a modified version of a scheme due to Shacham and Waters (ASIACRYPT, 2009) under a slightly restricted attack model, thus providing the first example of a keyed PoR scheme with unconditional security. We also show how classical statistical techniques can be used to evaluate whether the responses of the prover on the storage are accurate enough to permit successful extraction. Finally, we prove a new lower bound on the storage and communication complexity of PoR schemes. 2. We propose a new type of scheme that we term a proof-of-data-observability scheme. Our definition tries to capture the stronger requirement that the server must have an actual copy of M in its memory space while it executes the challenge-response protocol. We give some examples of schemes that satisfy this new security definition. As well, we analyze the efficiency and security of the protocols we present, and we prove some necessary conditions for the existence of these kinds of protocols. 3. We study secure storage on multiple servers. Our contribution in multiple-server PoR systems is twofold. We formalize security definitions for two possible scenarios: (i) when a threshold of servers succeed with high enough probability (worst-case) and (ii) when the average of the success probability of all the servers is above a threshold (average-case). Using coding theory, we show instances of protocols that are secure both in the average-case and the worst-case scenarios

    Generic Attacks on Hash Functions

    Get PDF
    The subject of this thesis is a security property of hash functions, called chosen-target forced-prefix preimage (CTFP) resistance and the generic attack on this property, called the herding attack. The study of CTFP resistance started when Kelsey-Kohno introduced a new data structure, called a diamond structure, in order to show the strength of a CTFP resistance property of a hash function. In this thesis, we concentrate on the complexity of the diamond structure and its application in the herding attack. We review the analysis done by Kelsey and Kohno and point out a subtle flaw in their analysis. We propose a correction of their analysis and based on our revised analysis, calculate the message complexity and the computational complexity of the generic attacks that are based on the diamond structure. As an application of the diamond structure on generic attacks, we propose a multiple herding attack on a special generalization of iterated hash functions, proposed by Nandi-Stinson

    Langevin Diffusion: An Almost Universal Algorithm for Private Euclidean (Convex) Optimization

    Full text link
    In this paper we revisit the problem of differentially private empirical risk minimization (DP-ERM) and stochastic convex optimization (DP-SCO). We show that a well-studied continuous time algorithm from statistical physics called Langevin diffusion (LD) simultaneously provides optimal privacy/utility tradeoffs for both DP-ERM and DP-SCO under ϵ\epsilon-DP and (ϵ,δ)(\epsilon,\delta)-DP. Using the uniform stability properties of LD, we provide the optimal excess population risk guarantee for 2\ell_2-Lipschitz convex losses under ϵ\epsilon-DP (even up to logn\log n factors), thus improving on Asi et al. Along the way we provide various technical tools which can be of independent interest: i) A new R\'enyi divergence bound for LD when run on loss functions over two neighboring data sets, ii) Excess empirical risk bounds for last-iterate LD analogous to that of Shamir and Zhang for noisy stochastic gradient descent (SGD), and iii) A two phase excess risk analysis of LD, where the first phase is when the diffusion has not converged in any reasonable sense to a stationary distribution, and in the second phase when the diffusion has converged to a variant of Gibbs distribution. Our universality results crucially rely on the dynamics of LD. When it has converged to a stationary distribution, we obtain the optimal bounds under ϵ\epsilon-DP. When it is run only for a very short time 1/p\propto 1/p, we obtain the optimal bounds under (ϵ,δ)(\epsilon,\delta)-DP. Here, pp is the dimensionality of the model space. Our work initiates a systematic study of DP continuous time optimization. We believe this may have ramifications in the design of discrete time DP optimization algorithms analogous to that in the non-private setting, where continuous time dynamical viewpoints have helped in designing new algorithms, including the celebrated mirror-descent and Polyak's momentum method.Comment: Added a comparison to the work of Asi et a

    Constant matters: Fine-grained Complexity of Differentially Private Continual Observation Using Completely Bounded Norms

    Get PDF
    We study fine-grained error bounds for differentially private algorithms for averaging and counting in the continual observation model. For this, we use the completely bounded spectral norm (cb norm) from operator algebra. For a matrix WW, its cb norm is defined as Wcb=maxQ{QWQ}, \|{W}\|_{\mathsf{cb}} = \max_{Q} \left\{ \frac{\|{Q \bullet W}\|}{\|{Q}\|} \right\}, where QWQ \bullet W denotes the Schur product and \|{\cdot}\| denotes the spectral norm. We bound the cb norm of two fundamental matrices studied in differential privacy under the continual observation model: the counting matrix McountingM_{\mathsf{counting}} and the averaging matrix MaverageM_{\mathsf{average}}. For McountingM_{\mathsf{counting}}, we give lower and upper bound whose additive gap is 1+1π1 + \frac{1}{\pi}. Our factorization also has two desirable properties sufficient for streaming setting: the factorization contains of lower-triangular matrices and the number of distinct entries in the factorization is exactly TT. This allows us to compute the factorization on the fly while requiring the curator to store a TT-dimensional vector. For MaverageM_{\mathsf{average}}, we show an additive gap between the lower and upper bound of 0.64\approx 0.64
    corecore