8,889 research outputs found

    The Geometry of Differential Privacy: the Sparse and Approximate Cases

    Full text link
    In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of dd linear queries over a database xRNx \in \R^N, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an O(log2d)O(\log^2 d) approximation to the optimal mechanism is known. Our first contribution is to give an O(log2d)O(\log^2 d) approximation guarantee for the case of (\eps,\delta)-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d>nx1d > n \triangleq \|x\|_1. It is known that better mechanisms exist in this setting. Our second main contribution is to give an (\eps,\delta)-differentially private mechanism which is optimal up to a \polylog(d,N) factor for any given query set AA and any given upper bound nn on x1\|x\|_1. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the \eps-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size nn by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix AA

    Perspects in astrophysical databases

    Full text link
    Astrophysics has become a domain extremely rich of scientific data. Data mining tools are needed for information extraction from such large datasets. This asks for an approach to data management emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Moreover, clustering and classification techniques on large datasets pose additional requirements in terms of computation and memory scalability and interpretability of results. In this study we review some possible solutions

    Event Stream Processing with Multiple Threads

    Full text link
    Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time
    corecore