2,697 research outputs found

    MVG Mechanism: Differential Privacy under Matrix-Valued Query

    Full text link
    Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (ϵ,δ)(\epsilon,\delta)-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline.Comment: Appeared in CCS'1

    Experiences with Some Benchmarks for Deductive Databases and Implementations of Bottom-Up Evaluation

    Full text link
    OpenRuleBench is a large benchmark suite for rule engines, which includes deductive databases. We previously proposed a translation of Datalog to C++ based on a method that "pushes" derived tuples immediately to places where they are used. In this paper, we report performance results of various implementation variants of this method compared to XSB, YAP and DLV. We study only a fraction of the OpenRuleBench problems, but we give a quite detailed analysis of each such task and the factors which influence performance. The results not only show the potential of our method and implementation approach, but could be valuable for anybody implementing systems which should be able to execute tasks of the discussed types.Comment: In Proceedings WLP'15/'16/WFLP'16, arXiv:1701.0014

    Managing data through the lens of an ontology

    Get PDF
    Ontology-based data management aims at managing data through the lens of an ontology, that is, a conceptual representation of the domain of interest in the underlying information system. This new paradigm provides several interesting features, many of which have already been proved effective in managing complex information systems. This article introduces the notion of ontology-based data management, illustrating the main ideas underlying the paradigm, and pointing out the importance of knowledge representation and automated reasoning for addressing the technical challenges it introduces

    Tight Lower Bounds for Differentially Private Selection

    Full text link
    A pervasive task in the differential privacy literature is to select the kk items of "highest quality" out of a set of dd items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of n=O(klogd)n = O(\sqrt{k}\log d) samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size n=Ω(klogd)n = \Omega(\sqrt{k} \log d) is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of dd queries, but often only some much smaller set of kk queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries

    Learning Models over Relational Data using Sparse Tensors and Functional Dependencies

    Full text link
    Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them. This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.Comment: 61 pages, 9 figures, 2 table

    Answering Conjunctive Queries under Updates

    Full text link
    We consider the task of enumerating and counting answers to kk-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear^\ast delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis. )^\ast) By sublinear we mean O(n1ε)O(n^{1-\varepsilon}) for some ε>0\varepsilon>0, where nn is the size of the active domain of the current database
    corecore