11 research outputs found
The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases
We consider private information retrieval (PIR) of a single file out of
files from non-colluding databases with heterogeneous storage constraints
. The aim of this work is to jointly design the
content placement phase and the information retrieval phase in order to
minimize the download cost in the PIR phase. We characterize the optimal PIR
download cost as a linear program. By analyzing the structure of the optimal
solution of this linear program, we show that, surprisingly, the optimal
download cost in our heterogeneous case matches its homogeneous counterpart
where all databases have the same average storage constraint . Thus, we show that there is no loss in the PIR capacity
due to heterogeneity of storage spaces of the databases. We provide the optimum
content placement explicitly for .Comment: Submitted for publication, February 201
Improved Storage for Efficient Private Information Retrieval
We consider the problem of private information retrieval from
\emph{storage-constrained} databases. In this problem, a user wishes to
retrieve a single message out of messages (of size ) without revealing
any information about the identity of the message to individual databases. Each
database stores symbols, i.e., a fraction of the entire library,
where . Our goal is to characterize the optimal
tradeoff curve for the storage cost (captured by ) and the normalized
download cost (). We show that the download cost can be reduced by
employing a hybrid storage scheme that combines \emph{MDS coding} ideas with
\emph{uncoded partial replication} ideas. When there is no coding, our scheme
reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by
Maddah-Ali-Niesen in the context of the caching problem, and when there is no
uncoded partial replication, our scheme reduces to Banawan-Ulukus storage
scheme; in general, our scheme outperforms both.Comment: ITW 201
Latent-variable Private Information Retrieval
In many applications, content accessed by users (movies, videos, news
articles, etc.) can leak sensitive latent attributes, such as religious and
political views, sexual orientation, ethnicity, gender, and others. To prevent
such information leakage, the goal of classical PIR is to hide the identity of
the content/message being accessed, which subsequently also hides the latent
attributes. This solution, while private, can be too costly, particularly, when
perfect (information-theoretic) privacy constraints are imposed. For instance,
for a single database holding messages, privately retrieving one message is
possible if and only if the user downloads the entire database of messages.
Retrieving content privately, however, may not be necessary to perfectly hide
the latent attributes.
Motivated by the above, we formulate and study the problem of latent-variable
private information retrieval (LV-PIR), which aims at allowing the user
efficiently retrieve one out of messages (indexed by ) without
revealing any information about the latent variable (modeled by ). We focus
on the practically relevant setting of a single database and show that one can
significantly reduce the download cost of LV-PIR (compared to the classical
PIR) based on the correlation between and . We present a general
scheme for LV-PIR as a function of the statistical relationship between
and , and also provide new results on the capacity/download cost of
LV-PIR. Several open problems and new directions are also discussed
Breaking the MDS-PIR Capacity Barrier via Joint Storage Coding
The capacity of private information retrieval (PIR) from databases coded
using maximum distance separable (MDS) codes has been previously characterized
by Banawan and Ulukus, where it was assumed that the messages are encoded and
stored separably into the databases. This assumption was also usually taken in
other related works in the literature, and this capacity is usually referred to
as the MDS-PIR capacity colloquially. In this work, we considered the question
if and when this capacity barrier can be broken through joint encoding and
storing of the messages. Our main results are two classes of novel code
constructions which allow joint encoding as well as the corresponding PIR
protocols, which indeed outperform the separate MDS-coded systems. Moreover, we
show that a simple but novel expansion technique allows us to generalize these
two classes of codes, resulting in a wider range of the cases where this
capacity barrier can be broken
Private Information Retrieval from Heterogeneous Uncoded Storage Constrained Databases with Reduced Sub-Messages
We propose capacity-achieving schemes for private information retrieval (PIR)
from uncoded databases (DBs) with both homogeneous and heterogeneous storage
constraints. In the PIR setting, a user queries a set of DBs to privately
download a message, where privacy implies that no one DB can infer which
message the user desires. In general, a PIR scheme is comprised of storage
placement and delivery designs. Previous works have derived the capacity, or
infimum download cost, of PIR with uncoded storage placement and also
sufficient conditions of a storage placement design to meet capacity. However,
the currently proposed storage placement designs require splitting each message
into an exponential number of sub-messages with respect to the number of DBs.
In this work, when DBs have the same storage constraint, we propose two simple
storage placement designs that satisfy the capacity conditions. Then, for more
general heterogeneous storage constraints, we translate the storage placement
design process into a "filling problem". We design an iterative algorithm to
solve the filling problem where, in each iteration, messages are partitioned
into sub-messages and stored at subsets of DBs. All of our proposed storage
placement designs require a number of sub-messages per message at most equal to
the number of DBs.Comment: arXiv admin note: text overlap with arXiv:1901.0749
-Secure -Private Federated Submodel Learning with Elastic Dropout Resilience
Motivated by recent interest in federated submodel learning, this work
explores the fundamental problem of privately reading from and writing to a
database comprised of files (submodels) that are stored across
distributed servers according to an -secure threshold secret sharing scheme.
One after another, various users wish to retrieve their desired file, locally
process the information and then update the file in the distributed database
while keeping the identity of their desired file private from any set of up to
colluding servers. The availability of servers changes over time, so
elastic dropout resilience is required. The main contribution of this work is
an adaptive scheme, called ACSA-RW, that takes advantage of all currently
available servers to reduce its communication costs, fully updates the database
after each write operation even though the database is only partially
accessible due to server dropouts, and ensures a memoryless operation of the
network in the sense that the storage structure is preserved and future users
may remain oblivious of the past history of server dropouts. The ACSA-RW
construction builds upon CSA codes that were originally introduced for XSTPIR
and have been shown to be natural solutions for secure distributed matrix
multiplication problems. ACSA-RW achieves the desired private read and write
functionality with elastic dropout resilience, matches the best results for
private-read from PIR literature, improves significantly upon available
baselines for private-write, reveals a striking symmetry between upload and
download costs, and exploits redundant storage dimensions to accommodate
arbitrary read and write dropout servers up to certain threshold values. It
also answers in the affirmative an open question by Kairouz et al. by
exploiting synergistic gains from the joint design of private read and write
operations
Multi-Party Private Set Intersection: An Information-Theoretic Approach
We investigate the problem of multi-party private set intersection (MP-PSI).
In MP-PSI, there are parties, each storing a data set over
replicated and non-colluding databases, and we want to calculate the
intersection of the data sets without leaking any
information beyond the set intersection to any of the parties. We consider a
specific communication protocol where one of the parties, called the leader
party, initiates the MP-PSI protocol by sending queries to the remaining
parties which are called client parties. The client parties are not allowed to
communicate with each other. We propose an information-theoretic scheme that
privately calculates the intersection with a
download cost of .
Similar to the 2-party PSI problem, our scheme builds on the connection between
the PSI problem and the multi-message symmetric private information retrieval
(MM-SPIR) problem. Our scheme is a non-trivial generalization of the 2-party
PSI scheme as it needs an intricate design of the shared common randomness.
Interestingly, in terms of the download cost, our scheme does not incur any
penalty due to the more stringent privacy constraints in the MP-PSI problem
compared to the 2-party PSI problem
Semantic Private Information Retrieval
We investigate the problem of semantic private information retrieval
(semantic PIR). In semantic PIR, a user retrieves a message out of
independent messages stored in replicated and non-colluding databases
without revealing the identity of the desired message to any individual
database. The messages come with \emph{different semantics}, i.e., the messages
are allowed to have \emph{non-uniform a priori probabilities} denoted by
, which are a proxy for their respective popularity of
retrieval, and \emph{arbitrary message sizes} . This is a
generalization of the classical private information retrieval (PIR) problem,
where messages are assumed to have equal a priori probabilities and equal
message sizes. We derive the semantic PIR capacity for general , . The
results show that the semantic PIR capacity depends on the number of databases
, the number of messages , the a priori probability distribution of
messages , and the message sizes . We present two achievable semantic
PIR schemes: The first one is a deterministic scheme which is based on message
asymmetry. This scheme employs non-uniform subpacketization. The second scheme
is probabilistic and is based on choosing one query set out of multiple options
at random to retrieve the required message without the need for exponential
subpacketization. We derive necessary and sufficient conditions for the
semantic PIR capacity to exceed the classical PIR capacity with equal priors
and sizes. Our results show that the semantic PIR capacity can be larger than
the classical PIR capacity when longer messages have higher popularities.
However, when messages are equal-length, the non-uniform priors cannot be
exploited to improve the retrieval rate over the classical PIR capacity.Comment: submitted for publicatio
Two-Level Private Information Retrieval
In the conventional robust -colluding private information retrieval (PIR)
system, the user needs to retrieve one of the possible messages while keeping
the identity of the requested message private from any colluding servers.
Motivated by the possible heterogeneous privacy requirements for different
messages, we consider the two-level PIR system with a
total of messages in the system, where and .
Any one of the messages needs to be retrieved privately against
colluding servers, and any one of the full set of messages needs to be
retrieved privately against colluding servers. We obtain a lower bound to
the capacity by proposing two novel coding schemes, namely the non-uniform
successive cancellation scheme and the non-uniform block cancellation scheme. A
capacity upper bound is also derived. The gap between the upper bound and the
lower bounds is analyzed, and shown to vanish when . Lastly, we show
that the upper bound is in general not tight by providing a stronger bound for
a special setting
Asymmetric Leaky Private Information Retrieval
Information-theoretic formulations of the private information retrieval (PIR)
problem have been investigated under a variety of scenarios. Symmetric private
information retrieval (SPIR) is a variant where a user is able to privately
retrieve one out of messages from non-colluding replicated databases
without learning anything about the remaining messages. However, the goal
of perfect privacy can be too taxing for certain applications. In this paper,
we investigate if the information-theoretic capacity of SPIR (equivalently, the
inverse of the minimum download cost) can be increased by relaxing both user
and DB privacy definitions. Such relaxation is relevant in applications where
privacy can be traded for communication efficiency. We introduce and
investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy
leakage budgets in each direction. For user privacy leakage, we bound the
probability ratios between all possible realizations of DB queries by a
function of a non-negative constant . For DB privacy, we bound the
mutual information between the undesired messages, the queries, and the
answers, by a function of a non-negative constant . We propose a
general AL-PIR scheme that achieves an upper bound on the optimal download cost
for arbitrary and . We show that the optimal download cost
of AL-PIR is upper-bounded as . Second, we obtain an
information-theoretic lower bound on the download cost as
. The gap
analysis between the two bounds shows that our AL-PIR scheme is optimal when
, i.e., under perfect user privacy and it is optimal within a
maximum multiplicative gap of for any