4 research outputs found

    Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees

    Get PDF
    In the Open Data approach, governments and other public organisations want to share their datasets with the public, for accountability and to support participation. Data must be opened in such a way that individual privacy is safeguarded. The Privacy Funnel is a mathematical approach that produces a sanitised database that does not leak private data beyond a chosen threshold. The downsides to this approach are that it does not give worst-case privacy guarantees, and that finding optimal sanitisation protocols can be computationally prohibitive. We tackle these problems by using differential privacy metrics, and by considering local protocols which operate on one entry at a time. We show that under both the Local Differential Privacy and Local Information Privacy leakage metrics, one can efficiently obtain optimal protocols. Furthermore, Local Information Privacy is both more closely aligned to the privacy requirements of the Privacy Funnel scenario, and more efficiently computable. We also consider the scenario where each user has multiple attributes, for which we define Side-channel Resistant Local Information Privacy, and we give efficient methods to find protocols satisfying this criterion while still offering good utility. Finally, we introduce Conditional Reporting, an explicit LIP protocol that can be used when the optimal protocol is infeasible to compute, and we test this protocol on real-world and synthetic data. Experiments on real-world and synthetic data confirm the validity of these methods.Comment: This preprint is an extended version of arXiv:2002.01501 (Fourteenth International Conference on the Digital Society, 2020

    Bottleneck Problems: Information and Estimation-Theoretic View

    Full text link
    Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding and dependence dilution. Leveraging these connections, we prove a new cardinality bound for the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed as \textit{bottleneck problems}, by replacing mutual information in IB and PF with other notions of mutual information, namely ff-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantee in a variety of inference tasks with statistical constraints on accuracy and privacy. Although the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to binary case, we derive closed form expressions for several bottleneck problems

    Achievable Schemes and Performance Bounds for Centralized and Distributed Index Coding

    Get PDF
    Index coding studies the efficient broadcast problem where a server broadcasts multiple messages to a group of receivers with side information. Through exploiting the receiver side information, the amount of required communication from the server can be significantly reduced. Thanks to its basic yet highly nontrivial model, index coding has been recognized as a canonical problem in network information theory, which is fundamentally connected with many other problems such as network coding, distributed storage, coded computation, and coded caching. In this thesis, we study the index coding problem both in its classic setting where the messages are stored at a centralized server, and also in a more general and practical setting where different subsets of messages are stored at multiple servers. In both scenarios the ultimate goal is to establish the capacity region, which contains all the communication rates simultaneously achievable for all the messages. While finding the index coding capacity region remains open in general, we characterize it through developing various inner and outer bounds. The inner bounds we propose on the capacity region are achievable rate regions, each associated with a concrete coding scheme. Our proposed coding schemes are built upon a two-layer random coding scheme referred to as composite coding, introduced by Arbabjolfaei et al. in 2013 for the classic centralized index coding problem. We first propose a series of simplifications for the composite coding scheme, and then enhance it through utilizing more flexible fractional allocation of the broadcast channel capacity. We also show that one can strictly improve composite coding by adding one more layer of random coding into the coding scheme. For the multi-server scenario, we generalize composite coding to a distributed version. The outer bounds characterize the fundamental performance limits enforced by the problem setup that hold generally for any valid coding scheme. The performance bounds we propose are based on Shannon-type inequalities. For the centralized index coding problem, we define a series of interfering message structures based on the receiver side information. Such structures lead to nontrivial generalizations of the alignment chain model in the literature, based upon which we propose a series of novel iterative performance bounds. For the multi-server scenario, our main result is a general outer bound built upon the polymatroidal axioms of the entropy function. This outer bound utilizes general groupings of servers of different levels of granularity, allowing a natural tradeoff between tightness and computational complexity. The security aspect of the index coding problem is also studied, for which a number of achievability and performance bounds on the optimal secure communication rate are established. To conclude this thesis, we investigate a privacy-preserving data publishing problem, whose model is inspired by index coding, and characterize its optimal privacy-utility tradeoff

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore