Search CORE

10 research outputs found

Sensitivity of Counting Queries

Author: Arapinis Myrto
Figueira Diego
Gaboardi Marco
Publication venue
Publication date: 01/01/2016
Field of study

In the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximise the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of the query. In this context, a mechanism for releasing statistical information is thus a trade-off between utility and privacy. In order to balance these two "conflicting" requirements, privacy preserving mechanisms calibrate the added noise to the so-called sensitivity of the query, and thus a precise estimate of the sensitivity of the query is necessary to determine the amplitude of the noise to be added. In this paper, we initiate a systematic study of sensitivity of counting queries over relational databases. We first observe that the sensitivity of a Relational Algebra query with counting is not computable in general, and that while the sensitivity of Conjunctive Queries with counting is computable, it becomes unbounded as soon as the query includes a join. We then consider restricted classes of databases (databases with constraints), and study the problem of computing the sensitivity of a query given such constraints. We are able to establish bounds on the sensitivity of counting conjunctive queries over constrained databases. The kind of constraints studied here are: functional dependencies and cardinality dependencies. The latter is a natural generalisation of functional dependencies that allows us to provide tight bounds on the sensitivity of counting conjunctive queries

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

Computing Local Sensitivities of Counting Queries with Joins

Author: Arapinis M.
Ebadi H.
Laud P.
Leskovec J.
Pihur V.
Yannakakis M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2020
Field of study

Local sensitivity of a query Q given a database instance D, i.e. how much the output Q(D) changes when a tuple is added to D or deleted from D, has many applications including query analysis, outlier detection, and in differential privacy. However, it is NP-hard to find local sensitivity of a conjunctive query in terms of the size of the query, even for the class of acyclic queries. Although the complexity is polynomial when the query size is fixed, the naive algorithms are not efficient for large databases and queries involving multiple joins. In this paper, we present a novel approach to compute local sensitivity of counting queries involving join operations by tracking and summarizing tuple sensitivities -- the maximum change a tuple can cause in the query result when it is added or removed. We give algorithms for the sensitivity problem for full acyclic join queries using join trees, that run in polynomial time in both the size of the database and query for an interesting sub-class of queries, which we call 'doubly acyclic queries' that include path queries, and in polynomial time in combined complexity when the maximum degree in the join tree is bounded. Our algorithms can be extended to certain non-acyclic queries using generalized hypertree decompositions. We evaluate our approach experimentally, and show applications of our algorithms to obtain better results for differential privacy by orders of magnitude.Comment: To be published in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Dat

arXiv.org e-Print Archive

Crossref

Enumeration Complexity of Conjunctive Queries with Functional Dependencies

Author: Carmeli Nofar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing. In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)

Dagstuhl Research Online Publication Server

Enumeration Complexity of Conjunctive Queries with Functional Dependencies

Author: Carbonero Martín Miguel Ángel
de Souza Barcelar Lucicleide
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Dagstuhl Research Online Publication Server

Cadernos Espinosanos (E-Journal)

DIALNET

Differentially private data publishing via cross-moment microaggregation

Author: Domingo Ferrer Josep
Parra Arnau Javier
Soria Comas Jordi
Publication venue: Elsevier
Publication date: 01/01/2020
Field of study

Differential privacy is one of the most prominent privacy notions in the field of anonymization. However, its strong privacy guarantees very often come at the expense of significantly degrading the utility of the protected data. To cope with this, numerous mechanisms have been studied that reduce the sensitivity of the data and hence the noise required to satisfy this notion. In this paper, we present a generalization of classical microaggregation, where the aggregated records are replaced by the group mean and additional statistical measures, with the purpose of evaluating it as a sensitivity reduction mechanism. We propose an anonymization methodology for numerical microdata in which the target of protection is a data set microaggregated in this generalized way, and the disclosure risk limitation is guaranteed through differential privacy via record-level perturbation. Specifically, we describe three anonymization algorithms where microaggregation can be applied to either entire records or groups of attributes independently. Our theoretical analysis computes the sensitivities of the first two central cross moments; we apply fundamental results from matrix perturbation theory to derive sensitivity bounds on the eigenvalues and eigenvectors of the covariance and coskewness matrices. Our extensive experimental evaluation shows that data utility can be enhanced significantly for medium to large sizes of the microaggregation groups. For this range of group sizes, we find experimental evidence that our approach can provide not only higher utility but also higher privacy than traditional microaggregation.The authors are thankful to A. Azzalini for his clarifications on the sampling of multivariate skew-normal distributions. Partial support to this work has been received from the European Commission (projects H2020-644024 “CLARUS” and H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Academia Prize to J. Domingo-Ferrer), and the Spanish Government (projects TIN2014-57364-C2-1-R “Smart-Glacis” and TIN2016-80250-R “Sec-MCloud”). J. Parra-Arnau is the recipient of a Juan de la Cierva postdoctoral fellowship, FJCI-2014-19703, from the Spanish Ministry of Economy and Competitiveness. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Multi-level Policy-aware Privacy Analysis

Author: Yerokhin Maksym
Publication venue
Publication date: 01/01/2019
Field of study

Projekt NAPLES (Novel Tools for Analysing Privacy Leakages – Privaatslekete Analüüsi Uudsed Vahendid) on Tartu Ülikooli ja Cybernetica AS-i ühine teadusprojekt, mida rahastab Kaitsealase Täiustatud Uurimisprojektide Agentuuri (DARPA) Brandeisi programm.NAPLES-i raames on välja töötatud teooria ja erinevaid tööriistu, et tuvastada ning kirjeldada infosüsteemide andmelekkeid. PLEAK on tööriist, mille sisendiks on äriprotsessimudeli ja -notatsiooni (BPMN) abil kirja pandud äriprotsess. Lisaks standardsele notatsioonile on mudelile lisatud arvutuslikke detaile ning infot privaatsuskaitse tehnoloogiate kohta, mis võimaldavad erinevatel tasemetel privaatsuslekete analüüse. NAPLES-i projekti käigus on loodud mitu erinevat analüüsitööriista. Peamiselt keskenduvad analüsaatorid niinimetatud "SQL koostöövoole" - BPMN-i koostöö mudelile, mille tegevused ning andmeobjektid on kirjeldatud vastavalt SQL päringute ning tabeli skeemidega. Binaarne avalikustamise analüüs annab privaatsuskaitse tehnoloogiate põhjal kõrgtasemelise ülevaate selle kohta, kellele on mingid andmed kättesaadavad. Teised analüüsivahendid nagu Leaks-When (Millal lekib) ja Guessing Advantage (äraarvamise edukus) lisavad detailsemad kvalitatiivseid ning kvantitatiivseid meetmeid lekete paremaks mõistmiseks.Minu töö oli NAPLE projekti osa ning minu panused olid mitmesugused.Esiteks ma lisasin globaalse ja lokaalse privaatsuspoliitika ideed SQL koostöövoogudessse. Privaatsuspoliitika tagab äriprotsessis osalejale ligipääsuõiguse mingile osale SQL skeemiga kirjeldatud andmetest. Teiseks ma kavandasin ning integreerisin mitmekihilise lekkanalüüsi alates binaarsest avalikustamise analüüsist (millised andmed on nähtaval) kuni tingimusliku avalikustamise (mis tingimustel leke toimub) ja kvantitatiivse meetmeni (kui palju andmete kohta lekib). Mitmekihiline analüüs põhineb PLEAK-i analüsaatoritel, kuid neid oli vaja täiendada, et nad toetaksid ühtseid sisendeid ning et Leaks-When ja Guessing Advantage tööriistad põhineksid privaatsuspoliitikatel. Lisaks arendasin juhtumiuuringu, et demonstreerida integreeritud mitmetasandilist privaatsusanalüüsi ning PLEAK-i tööriistu.The NAPLES (Novel Tools for Analysing Privacy Leakages) project is a research initiative conducted as a collaboration between Cybernetica AS and the University of Tartu, with funds of the Brandeis program of the Defense Advanced Research Projects Agency (DARPA). The research project has produced the theory and a set of tools for the analysis of privacy-related concerns, to determine the potential leakage of the data from the information systems. Specifically, PLEAK is a tool that takes as input business processes specified with the Business Process Model and Notation (BPMN), where modelentities are associated with privacy-enhancing technologies, in order to enable the analysis of privacy concerns at different levels of granularity. With the time, the NAPLES project has produced several analyzers. Such analyzers target SQLcollaborative workflows, that is, BPMN collaborative models that specify the steps of computation that correspond to SQL manipulation statements over the data objects representing the SQL data sources. The simple disclosure analysis performs a high-level data reachability analysis that reveals potentialdata leakages in the privacy-enhanced model of a business process: it tells whether a data object is visible to a given party. Other analyzers, such as the Leaks-When and the Guessing Advantage ones, provide finer-grained, qualitative and quantitative measures of data leakage to stakeholders.My work was part of the NAPLES project and my contributions are manifold. First, I added the concept of Global and Local privacy policies in the SQL collaborative workflows, which endow a party of the business process with access rights to the selected SQL entities with defined constraints. Second,I designed an integrated multi-level approach to the disclosure analysis: from the high-level declarative disclosure (What data might leak?) to the conditional disclosure (When does data leak?) and quantitative measure (How much does data leak?). This approach is based on existing tools of PLEAK for privacyanalysis. However, I refined these tools to accept more unified set of inputs and integrated the privacy policies with the Leaks-When and Guessing Advantage analyzers. Finally, I developed a case study, which has been used for showcasing the aforementioned integrated multi-level approach to the disclosure analysis, and that has been used as a proof-of-concept for NAPLES tools

DSpace at Tartu University Library