10 research outputs found

    Sensitivity of Counting Queries

    Get PDF
    In the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximise the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of the query. In this context, a mechanism for releasing statistical information is thus a trade-off between utility and privacy. In order to balance these two "conflicting" requirements, privacy preserving mechanisms calibrate the added noise to the so-called sensitivity of the query, and thus a precise estimate of the sensitivity of the query is necessary to determine the amplitude of the noise to be added. In this paper, we initiate a systematic study of sensitivity of counting queries over relational databases. We first observe that the sensitivity of a Relational Algebra query with counting is not computable in general, and that while the sensitivity of Conjunctive Queries with counting is computable, it becomes unbounded as soon as the query includes a join. We then consider restricted classes of databases (databases with constraints), and study the problem of computing the sensitivity of a query given such constraints. We are able to establish bounds on the sensitivity of counting conjunctive queries over constrained databases. The kind of constraints studied here are: functional dependencies and cardinality dependencies. The latter is a natural generalisation of functional dependencies that allows us to provide tight bounds on the sensitivity of counting conjunctive queries

    Computing Local Sensitivities of Counting Queries with Joins

    Full text link
    Local sensitivity of a query Q given a database instance D, i.e. how much the output Q(D) changes when a tuple is added to D or deleted from D, has many applications including query analysis, outlier detection, and in differential privacy. However, it is NP-hard to find local sensitivity of a conjunctive query in terms of the size of the query, even for the class of acyclic queries. Although the complexity is polynomial when the query size is fixed, the naive algorithms are not efficient for large databases and queries involving multiple joins. In this paper, we present a novel approach to compute local sensitivity of counting queries involving join operations by tracking and summarizing tuple sensitivities -- the maximum change a tuple can cause in the query result when it is added or removed. We give algorithms for the sensitivity problem for full acyclic join queries using join trees, that run in polynomial time in both the size of the database and query for an interesting sub-class of queries, which we call 'doubly acyclic queries' that include path queries, and in polynomial time in combined complexity when the maximum degree in the join tree is bounded. Our algorithms can be extended to certain non-acyclic queries using generalized hypertree decompositions. We evaluate our approach experimentally, and show applications of our algorithms to obtain better results for differential privacy by orders of magnitude.Comment: To be published in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Dat

    Enumeration Complexity of Conjunctive Queries with Functional Dependencies

    Get PDF
    We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing. In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)

    Enumeration Complexity of Conjunctive Queries with Functional Dependencies

    Get PDF
    We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing. In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)

    Differentially private data publishing via cross-moment microaggregation

    Get PDF
    Differential privacy is one of the most prominent privacy notions in the field of anonymization. However, its strong privacy guarantees very often come at the expense of significantly degrading the utility of the protected data. To cope with this, numerous mechanisms have been studied that reduce the sensitivity of the data and hence the noise required to satisfy this notion. In this paper, we present a generalization of classical microaggregation, where the aggregated records are replaced by the group mean and additional statistical measures, with the purpose of evaluating it as a sensitivity reduction mechanism. We propose an anonymization methodology for numerical microdata in which the target of protection is a data set microaggregated in this generalized way, and the disclosure risk limitation is guaranteed through differential privacy via record-level perturbation. Specifically, we describe three anonymization algorithms where microaggregation can be applied to either entire records or groups of attributes independently. Our theoretical analysis computes the sensitivities of the first two central cross moments; we apply fundamental results from matrix perturbation theory to derive sensitivity bounds on the eigenvalues and eigenvectors of the covariance and coskewness matrices. Our extensive experimental evaluation shows that data utility can be enhanced significantly for medium to large sizes of the microaggregation groups. For this range of group sizes, we find experimental evidence that our approach can provide not only higher utility but also higher privacy than traditional microaggregation.The authors are thankful to A. Azzalini for his clarifications on the sampling of multivariate skew-normal distributions. Partial support to this work has been received from the European Commission (projects H2020-644024 “CLARUS” and H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Academia Prize to J. Domingo-Ferrer), and the Spanish Government (projects TIN2014-57364-C2-1-R “Smart-Glacis” and TIN2016-80250-R “Sec-MCloud”). J. Parra-Arnau is the recipient of a Juan de la Cierva postdoctoral fellowship, FJCI-2014-19703, from the Spanish Ministry of Economy and Competitiveness. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.Postprint (author's final draft

    Multi-level Policy-aware Privacy Analysis

    Get PDF
    Projekt NAPLES (Novel Tools for Analysing Privacy Leakages – Privaatslekete Analüüsi Uudsed Vahendid) on Tartu Ülikooli ja Cybernetica AS-i ühine teadusprojekt, mida rahastab Kaitsealase Täiustatud Uurimisprojektide Agentuuri (DARPA) Brandeisi programm.NAPLES-i raames on välja töötatud teooria ja erinevaid tööriistu, et tuvastada ning kirjeldada infosüsteemide andmelekkeid. PLEAK on tööriist, mille sisendiks on äriprotsessimudeli ja -notatsiooni (BPMN) abil kirja pandud äriprotsess. Lisaks standardsele notatsioonile on mudelile lisatud arvutuslikke detaile ning infot privaatsuskaitse tehnoloogiate kohta, mis võimaldavad erinevatel tasemetel privaatsuslekete analüüse. NAPLES-i projekti käigus on loodud mitu erinevat analüüsitööriista. Peamiselt keskenduvad analüsaatorid niinimetatud "SQL koostöövoole" - BPMN-i koostöö mudelile, mille tegevused ning andmeobjektid on kirjeldatud vastavalt SQL päringute ning tabeli skeemidega. Binaarne avalikustamise analüüs annab privaatsuskaitse tehnoloogiate põhjal kõrgtasemelise ülevaate selle kohta, kellele on mingid andmed kättesaadavad. Teised analüüsivahendid nagu Leaks-When (Millal lekib) ja Guessing Advantage (äraarvamise edukus) lisavad detailsemad kvalitatiivseid ning kvantitatiivseid meetmeid lekete paremaks mõistmiseks.Minu töö oli NAPLE projekti osa ning minu panused olid mitmesugused.Esiteks ma lisasin globaalse ja lokaalse privaatsuspoliitika ideed SQL koostöövoogudessse. Privaatsuspoliitika tagab äriprotsessis osalejale ligipääsuõiguse mingile osale SQL skeemiga kirjeldatud andmetest. Teiseks ma kavandasin ning integreerisin mitmekihilise lekkanalüüsi alates binaarsest avalikustamise analüüsist (millised andmed on nähtaval) kuni tingimusliku avalikustamise (mis tingimustel leke toimub) ja kvantitatiivse meetmeni (kui palju andmete kohta lekib). Mitmekihiline analüüs põhineb PLEAK-i analüsaatoritel, kuid neid oli vaja täiendada, et nad toetaksid ühtseid sisendeid ning et Leaks-When ja Guessing Advantage tööriistad põhineksid privaatsuspoliitikatel. Lisaks arendasin juhtumiuuringu, et demonstreerida integreeritud mitmetasandilist privaatsusanalüüsi ning PLEAK-i tööriistu.The NAPLES (Novel Tools for Analysing Privacy Leakages) project is a research initiative conducted as a collaboration between Cybernetica AS and the University of Tartu, with funds of the Brandeis program of the Defense Advanced Research Projects Agency (DARPA). The research project has produced the theory and a set of tools for the analysis of privacy-related concerns, to determine the potential leakage of the data from the information systems. Specifically, PLEAK is a tool that takes as input business processes specified with the Business Process Model and Notation (BPMN), where modelentities are associated with privacy-enhancing technologies, in order to enable the analysis of privacy concerns at different levels of granularity. With the time, the NAPLES project has produced several analyzers. Such analyzers target SQLcollaborative workflows, that is, BPMN collaborative models that specify the steps of computation that correspond to SQL manipulation statements over the data objects representing the SQL data sources. The simple disclosure analysis performs a high-level data reachability analysis that reveals potentialdata leakages in the privacy-enhanced model of a business process: it tells whether a data object is visible to a given party. Other analyzers, such as the Leaks-When and the Guessing Advantage ones, provide finer-grained, qualitative and quantitative measures of data leakage to stakeholders.My work was part of the NAPLES project and my contributions are manifold. First, I added the concept of Global and Local privacy policies in the SQL collaborative workflows, which endow a party of the business process with access rights to the selected SQL entities with defined constraints. Second,I designed an integrated multi-level approach to the disclosure analysis: from the high-level declarative disclosure (What data might leak?) to the conditional disclosure (When does data leak?) and quantitative measure (How much does data leak?). This approach is based on existing tools of PLEAK for privacyanalysis. However, I refined these tools to accept more unified set of inputs and integrated the privacy policies with the Leaks-When and Guessing Advantage analyzers. Finally, I developed a case study, which has been used for showcasing the aforementioned integrated multi-level approach to the disclosure analysis, and that has been used as a proof-of-concept for NAPLES tools
    corecore