4 research outputs found
Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees
In the Open Data approach, governments and other public organisations want to
share their datasets with the public, for accountability and to support
participation. Data must be opened in such a way that individual privacy is
safeguarded. The Privacy Funnel is a mathematical approach that produces a
sanitised database that does not leak private data beyond a chosen threshold.
The downsides to this approach are that it does not give worst-case privacy
guarantees, and that finding optimal sanitisation protocols can be
computationally prohibitive. We tackle these problems by using differential
privacy metrics, and by considering local protocols which operate on one entry
at a time. We show that under both the Local Differential Privacy and Local
Information Privacy leakage metrics, one can efficiently obtain optimal
protocols. Furthermore, Local Information Privacy is both more closely aligned
to the privacy requirements of the Privacy Funnel scenario, and more
efficiently computable. We also consider the scenario where each user has
multiple attributes, for which we define Side-channel Resistant Local
Information Privacy, and we give efficient methods to find protocols satisfying
this criterion while still offering good utility. Finally, we introduce
Conditional Reporting, an explicit LIP protocol that can be used when the
optimal protocol is infeasible to compute, and we test this protocol on
real-world and synthetic data. Experiments on real-world and synthetic data
confirm the validity of these methods.Comment: This preprint is an extended version of arXiv:2002.01501 (Fourteenth
International Conference on the Digital Society, 2020
Bottleneck Problems: Information and Estimation-Theoretic View
Information bottleneck (IB) and privacy funnel (PF) are two closely related
optimization problems which have found applications in machine learning, design
of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong
data processing inequalities, among others. In this work, we first investigate
the functional properties of IB and PF through a unified theoretical framework.
We then connect them to three information-theoretic coding problems, namely
hypothesis testing against independence, noisy source coding and dependence
dilution. Leveraging these connections, we prove a new cardinality bound for
the auxiliary variable in IB, making its computation more tractable for
discrete random variables.
In the second part, we introduce a general family of optimization problems,
termed as \textit{bottleneck problems}, by replacing mutual information in IB
and PF with other notions of mutual information, namely -information and
Arimoto's mutual information. We then argue that, unlike IB and PF, these
problems lead to easily interpretable guarantee in a variety of inference tasks
with statistical constraints on accuracy and privacy. Although the underlying
optimization problems are non-convex, we develop a technique to evaluate
bottleneck problems in closed form by equivalently expressing them in terms of
lower convex or upper concave envelope of certain functions. By applying this
technique to binary case, we derive closed form expressions for several
bottleneck problems
Achievable Schemes and Performance Bounds for Centralized and Distributed Index Coding
Index coding studies the efficient broadcast problem where a server broadcasts multiple messages to a group of receivers with side information. Through exploiting the receiver side information, the amount of required communication from the server can be significantly reduced. Thanks to its basic yet highly nontrivial model, index coding has been recognized as a canonical problem in network information theory, which is fundamentally connected with many other problems such as network coding, distributed storage, coded computation, and coded caching.
In this thesis, we study the index coding problem both in its classic setting where the messages are stored at a centralized server, and also in a more general and practical setting where different subsets of messages are stored at multiple servers. In both scenarios the ultimate goal is to establish the capacity region, which contains all the communication rates simultaneously achievable for all the messages. While finding the index coding capacity region remains open in general, we characterize it through developing various inner and outer bounds. The inner bounds we propose on the capacity region are achievable rate regions, each associated with a concrete coding scheme. Our proposed coding schemes are built upon a two-layer random coding scheme referred to as composite coding, introduced by Arbabjolfaei et al. in 2013 for the classic centralized index coding problem. We first propose a series of simplifications for the composite coding scheme, and then enhance it through utilizing more flexible fractional allocation of the broadcast channel capacity. We also show that one can strictly improve composite coding by adding one more layer of random coding into the coding scheme. For the multi-server scenario, we generalize composite coding to a distributed version.
The outer bounds characterize the fundamental performance limits enforced by the problem setup that hold generally for any valid coding scheme. The performance bounds we propose are based on Shannon-type inequalities. For the centralized index coding problem, we define a series of interfering message structures based on the receiver side information. Such structures lead to nontrivial generalizations of the alignment chain model in the literature, based upon which we propose a series of novel iterative performance bounds. For the multi-server scenario, our main result is a general outer bound built upon the polymatroidal axioms of the entropy function. This outer bound utilizes general groupings of servers of different levels of granularity, allowing a natural tradeoff between tightness and computational complexity.
The security aspect of the index coding problem is also studied, for which a number of achievability and performance bounds on the optimal secure communication rate are established. To conclude this thesis, we investigate a privacy-preserving data publishing problem, whose model is inspired by index coding, and characterize its optimal privacy-utility tradeoff
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum