13 research outputs found

    Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees

    Get PDF
    In the Open Data approach, governments and other public organisations want to share their datasets with the public, for accountability and to support participation. Data must be opened in such a way that individual privacy is safeguarded. The Privacy Funnel is a mathematical approach that produces a sanitised database that does not leak private data beyond a chosen threshold. The downsides to this approach are that it does not give worst-case privacy guarantees, and that finding optimal sanitisation protocols can be computationally prohibitive. We tackle these problems by using differential privacy metrics, and by considering local protocols which operate on one entry at a time. We show that under both the Local Differential Privacy and Local Information Privacy leakage metrics, one can efficiently obtain optimal protocols. Furthermore, Local Information Privacy is both more closely aligned to the privacy requirements of the Privacy Funnel scenario, and more efficiently computable. We also consider the scenario where each user has multiple attributes, for which we define Side-channel Resistant Local Information Privacy, and we give efficient methods to find protocols satisfying this criterion while still offering good utility. Finally, we introduce Conditional Reporting, an explicit LIP protocol that can be used when the optimal protocol is infeasible to compute, and we test this protocol on real-world and synthetic data. Experiments on real-world and synthetic data confirm the validity of these methods.Comment: This preprint is an extended version of arXiv:2002.01501 (Fourteenth International Conference on the Digital Society, 2020

    Bottleneck Problems: Information and Estimation-Theoretic View

    Full text link
    Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding and dependence dilution. Leveraging these connections, we prove a new cardinality bound for the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed as \textit{bottleneck problems}, by replacing mutual information in IB and PF with other notions of mutual information, namely ff-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantee in a variety of inference tasks with statistical constraints on accuracy and privacy. Although the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to binary case, we derive closed form expressions for several bottleneck problems

    Achievable Schemes and Performance Bounds for Centralized and Distributed Index Coding

    Get PDF
    Index coding studies the efficient broadcast problem where a server broadcasts multiple messages to a group of receivers with side information. Through exploiting the receiver side information, the amount of required communication from the server can be significantly reduced. Thanks to its basic yet highly nontrivial model, index coding has been recognized as a canonical problem in network information theory, which is fundamentally connected with many other problems such as network coding, distributed storage, coded computation, and coded caching. In this thesis, we study the index coding problem both in its classic setting where the messages are stored at a centralized server, and also in a more general and practical setting where different subsets of messages are stored at multiple servers. In both scenarios the ultimate goal is to establish the capacity region, which contains all the communication rates simultaneously achievable for all the messages. While finding the index coding capacity region remains open in general, we characterize it through developing various inner and outer bounds. The inner bounds we propose on the capacity region are achievable rate regions, each associated with a concrete coding scheme. Our proposed coding schemes are built upon a two-layer random coding scheme referred to as composite coding, introduced by Arbabjolfaei et al. in 2013 for the classic centralized index coding problem. We first propose a series of simplifications for the composite coding scheme, and then enhance it through utilizing more flexible fractional allocation of the broadcast channel capacity. We also show that one can strictly improve composite coding by adding one more layer of random coding into the coding scheme. For the multi-server scenario, we generalize composite coding to a distributed version. The outer bounds characterize the fundamental performance limits enforced by the problem setup that hold generally for any valid coding scheme. The performance bounds we propose are based on Shannon-type inequalities. For the centralized index coding problem, we define a series of interfering message structures based on the receiver side information. Such structures lead to nontrivial generalizations of the alignment chain model in the literature, based upon which we propose a series of novel iterative performance bounds. For the multi-server scenario, our main result is a general outer bound built upon the polymatroidal axioms of the entropy function. This outer bound utilizes general groupings of servers of different levels of granularity, allowing a natural tradeoff between tightness and computational complexity. The security aspect of the index coding problem is also studied, for which a number of achievability and performance bounds on the optimal secure communication rate are established. To conclude this thesis, we investigate a privacy-preserving data publishing problem, whose model is inspired by index coding, and characterize its optimal privacy-utility tradeoff

    Deep Learning-Enabled Semantic Communication Systems with Task-Unaware Transmitter and Dynamic Data

    Full text link
    Existing deep learning-enabled semantic communication systems often rely on shared background knowledge between the transmitter and receiver that includes empirical data and their associated semantic information. In practice, the semantic information is defined by the pragmatic task of the receiver and cannot be known to the transmitter. The actual observable data at the transmitter can also have non-identical distribution with the empirical data in the shared background knowledge library. To address these practical issues, this paper proposes a new neural network-based semantic communication system for image transmission, where the task is unaware at the transmitter and the data environment is dynamic. The system consists of two main parts, namely the semantic coding (SC) network and the data adaptation (DA) network. The SC network learns how to extract and transmit the semantic information using a receiver-leading training process. By using the domain adaptation technique from transfer learning, the DA network learns how to convert the data observed into a similar form of the empirical data that the SC network can process without retraining. Numerical experiments show that the proposed method can be adaptive to observable datasets while keeping high performance in terms of both data recovery and task execution

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum
    corecore