Search CORE

4 research outputs found

Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees

Author: Lopuhaä-Zwakenberg Milan
Tong Haochen
Škorić Boris
Publication venue
Publication date: 30/08/2020
Field of study

In the Open Data approach, governments and other public organisations want to share their datasets with the public, for accountability and to support participation. Data must be opened in such a way that individual privacy is safeguarded. The Privacy Funnel is a mathematical approach that produces a sanitised database that does not leak private data beyond a chosen threshold. The downsides to this approach are that it does not give worst-case privacy guarantees, and that finding optimal sanitisation protocols can be computationally prohibitive. We tackle these problems by using differential privacy metrics, and by considering local protocols which operate on one entry at a time. We show that under both the Local Differential Privacy and Local Information Privacy leakage metrics, one can efficiently obtain optimal protocols. Furthermore, Local Information Privacy is both more closely aligned to the privacy requirements of the Privacy Funnel scenario, and more efficiently computable. We also consider the scenario where each user has multiple attributes, for which we define Side-channel Resistant Local Information Privacy, and we give efficient methods to find protocols satisfying this criterion while still offering good utility. Finally, we introduce Conditional Reporting, an explicit LIP protocol that can be used when the optimal protocol is infeasible to compute, and we test this protocol on real-world and synthetic data. Experiments on real-world and synthetic data confirm the validity of these methods.Comment: This preprint is an extended version of arXiv:2002.01501 (Fourteenth International Conference on the Digital Society, 2020

arXiv.org e-Print Archive

Pure OAI Repository

Bottleneck Problems: Information and Estimation-Theoretic View

Author: Asoodeh Shahab
Calmon Flavio
Publication venue: 'MDPI AG'
Publication date: 12/11/2020
Field of study

Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding and dependence dilution. Leveraging these connections, we prove a new cardinality bound for the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed as \textit{bottleneck problems}, by replacing mutual information in IB and PF with other notions of mutual information, namely

f

-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantee in a variety of inference tasks with statistical constraints on accuracy and privacy. Although the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to binary case, we derive closed form expressions for several bottleneck problems

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Achievable Schemes and Performance Bounds for Centralized and Distributed Index Coding

Author: Liu Yucheng
Publication venue
Publication date: 01/01/2021
Field of study

Index coding studies the efficient broadcast problem where a server broadcasts multiple messages to a group of receivers with side information. Through exploiting the receiver side information, the amount of required communication from the server can be significantly reduced. Thanks to its basic yet highly nontrivial model, index coding has been recognized as a canonical problem in network information theory, which is fundamentally connected with many other problems such as network coding, distributed storage, coded computation, and coded caching. In this thesis, we study the index coding problem both in its classic setting where the messages are stored at a centralized server, and also in a more general and practical setting where different subsets of messages are stored at multiple servers. In both scenarios the ultimate goal is to establish the capacity region, which contains all the communication rates simultaneously achievable for all the messages. While finding the index coding capacity region remains open in general, we characterize it through developing various inner and outer bounds. The inner bounds we propose on the capacity region are achievable rate regions, each associated with a concrete coding scheme. Our proposed coding schemes are built upon a two-layer random coding scheme referred to as composite coding, introduced by Arbabjolfaei et al. in 2013 for the classic centralized index coding problem. We first propose a series of simplifications for the composite coding scheme, and then enhance it through utilizing more flexible fractional allocation of the broadcast channel capacity. We also show that one can strictly improve composite coding by adding one more layer of random coding into the coding scheme. For the multi-server scenario, we generalize composite coding to a distributed version. The outer bounds characterize the fundamental performance limits enforced by the problem setup that hold generally for any valid coding scheme. The performance bounds we propose are based on Shannon-type inequalities. For the centralized index coding problem, we define a series of interfering message structures based on the receiver side information. Such structures lead to nontrivial generalizations of the alignment chain model in the literature, based upon which we propose a series of novel iterative performance bounds. For the multi-server scenario, our main result is a general outer bound built upon the polymatroidal axioms of the entropy function. This outer bound utilizes general groupings of servers of different levels of granularity, allowing a natural tradeoff between tightness and computational complexity. The security aspect of the index coding problem is also studied, for which a number of achievability and performance bounds on the optimal secure communication rate are established. To conclude this thesis, we investigate a privacy-preserving data publishing problem, whose model is inspired by index coding, and characterize its optimal privacy-utility tradeoff

The Australian National University

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server