4,463 research outputs found

    Functorial Data Migration

    Get PDF
    In this paper we present a simple database definition language: that of categories and functors. A database schema is a small category and an instance is a set-valued functor on it. We show that morphisms of schemas induce three "data migration functors", which translate instances from one schema to the other in canonical ways. These functors parameterize projections, unions, and joins over all tables simultaneously and can be used in place of conjunctive and disjunctive queries. We also show how to connect a database and a functional programming language by introducing a functorial connection between the schema and the category of types for that language. We begin the paper with a multitude of examples to motivate the definitions, and near the end we provide a dictionary whereby one can translate database concepts into category-theoretic concepts and vice-versa.Comment: 30 page

    Near-Optimal Algorithms for Differentially-Private Principal Components

    Full text link
    Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

    Simple and Deterministic Matrix Sketching

    Full text link
    We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix ARn×mA \in \R^{n \times m} one after the other in a streaming fashion. It maintains a sketch matrix B \in \R^ {1/\eps \times m} such that for any unit vector xx [\|Ax\|^2 \ge \|Bx\|^2 \ge \|Ax\|^2 - \eps \|A\|_{f}^2 \.] Sketch updates per row in AA require O(m/\eps^2) operations in the worst case. A slight modification of the algorithm allows for an amortized update time of O(m/\eps) operations per row. The presented algorithm stands out in that it is: deterministic, simple to implement, and elementary to prove. It also experimentally produces more accurate sketches than widely used approaches while still being computationally competitive

    MVG Mechanism: Differential Privacy under Matrix-Valued Query

    Full text link
    Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (ϵ,δ)(\epsilon,\delta)-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline.Comment: Appeared in CCS'1

    Experiences with Some Benchmarks for Deductive Databases and Implementations of Bottom-Up Evaluation

    Full text link
    OpenRuleBench is a large benchmark suite for rule engines, which includes deductive databases. We previously proposed a translation of Datalog to C++ based on a method that "pushes" derived tuples immediately to places where they are used. In this paper, we report performance results of various implementation variants of this method compared to XSB, YAP and DLV. We study only a fraction of the OpenRuleBench problems, but we give a quite detailed analysis of each such task and the factors which influence performance. The results not only show the potential of our method and implementation approach, but could be valuable for anybody implementing systems which should be able to execute tasks of the discussed types.Comment: In Proceedings WLP'15/'16/WFLP'16, arXiv:1701.0014
    corecore