Search CORE

11 research outputs found

The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases

Author: Arasli Batuhan
Banawan Karim
Ulukus Sennur
Wei Yi-Peng
Publication venue
Publication date: 25/02/2019
Field of study

We consider private information retrieval (PIR) of a single file out of

K

files from

N

non-colluding databases with heterogeneous storage constraints

\mathbf{m}=(m_1, \cdots, m_N)

. The aim of this work is to jointly design the content placement phase and the information retrieval phase in order to minimize the download cost in the PIR phase. We characterize the optimal PIR download cost as a linear program. By analyzing the structure of the optimal solution of this linear program, we show that, surprisingly, the optimal download cost in our heterogeneous case matches its homogeneous counterpart where all databases have the same average storage constraint

\mu=\frac{1}{N} \sum_{n=1}^{N} m_n

. Thus, we show that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. We provide the optimum content placement explicitly for

N=3

.Comment: Submitted for publication, February 201

arXiv.org e-Print Archive

Improved Storage for Efficient Private Information Retrieval

Author: Arasli Batuhan
Banawan Karim
Ulukus Sennur
Publication venue
Publication date: 29/08/2019
Field of study

We consider the problem of private information retrieval from

N

\emph{storage-constrained} databases. In this problem, a user wishes to retrieve a single message out of

M

messages (of size

L

) without revealing any information about the identity of the message to individual databases. Each database stores

\mu ML

symbols, i.e., a

\mu

fraction of the entire library, where

\frac{1}{N} \leq \mu \leq 1

. Our goal is to characterize the optimal tradeoff curve for the storage cost (captured by

\mu

) and the normalized download cost (

D/L

). We show that the download cost can be reduced by employing a hybrid storage scheme that combines \emph{MDS coding} ideas with \emph{uncoded partial replication} ideas. When there is no coding, our scheme reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by Maddah-Ali-Niesen in the context of the caching problem, and when there is no uncoded partial replication, our scheme reduces to Banawan-Ulukus storage scheme; in general, our scheme outperforms both.Comment: ITW 201

arXiv.org e-Print Archive

Latent-variable Private Information Retrieval

Author: Attia Mohamed A.
Lazos Loukas
Samy Islam
Tandon Ravi
Publication venue
Publication date: 14/05/2020
Field of study

In many applications, content accessed by users (movies, videos, news articles, etc.) can leak sensitive latent attributes, such as religious and political views, sexual orientation, ethnicity, gender, and others. To prevent such information leakage, the goal of classical PIR is to hide the identity of the content/message being accessed, which subsequently also hides the latent attributes. This solution, while private, can be too costly, particularly, when perfect (information-theoretic) privacy constraints are imposed. For instance, for a single database holding

K

messages, privately retrieving one message is possible if and only if the user downloads the entire database of

K

messages. Retrieving content privately, however, may not be necessary to perfectly hide the latent attributes. Motivated by the above, we formulate and study the problem of latent-variable private information retrieval (LV-PIR), which aims at allowing the user efficiently retrieve one out of

K

messages (indexed by

\theta

) without revealing any information about the latent variable (modeled by

S

). We focus on the practically relevant setting of a single database and show that one can significantly reduce the download cost of LV-PIR (compared to the classical PIR) based on the correlation between

\theta

and

S

. We present a general scheme for LV-PIR as a function of the statistical relationship between

\theta

and

S

, and also provide new results on the capacity/download cost of LV-PIR. Several open problems and new directions are also discussed

arXiv.org e-Print Archive

Breaking the MDS-PIR Capacity Barrier via Joint Storage Coding

Author: Sun Hua
Tian Chao
Publication venue
Publication date: 19/08/2019
Field of study

The capacity of private information retrieval (PIR) from databases coded using maximum distance separable (MDS) codes has been previously characterized by Banawan and Ulukus, where it was assumed that the messages are encoded and stored separably into the databases. This assumption was also usually taken in other related works in the literature, and this capacity is usually referred to as the MDS-PIR capacity colloquially. In this work, we considered the question if and when this capacity barrier can be broken through joint encoding and storing of the messages. Our main results are two classes of novel code constructions which allow joint encoding as well as the corresponding PIR protocols, which indeed outperform the separate MDS-coded systems. Moreover, we show that a simple but novel expansion technique allows us to generalize these two classes of codes, resulting in a wider range of the cases where this capacity barrier can be broken

arXiv.org e-Print Archive

Private Information Retrieval from Heterogeneous Uncoded Storage Constrained Databases with Reduced Sub-Messages

Author: Chen Rong-Rong
Ji Mingyue
Woolsey Nicholas
Publication venue
Publication date: 16/10/2019
Field of study

We propose capacity-achieving schemes for private information retrieval (PIR) from uncoded databases (DBs) with both homogeneous and heterogeneous storage constraints. In the PIR setting, a user queries a set of DBs to privately download a message, where privacy implies that no one DB can infer which message the user desires. In general, a PIR scheme is comprised of storage placement and delivery designs. Previous works have derived the capacity, or infimum download cost, of PIR with uncoded storage placement and also sufficient conditions of a storage placement design to meet capacity. However, the currently proposed storage placement designs require splitting each message into an exponential number of sub-messages with respect to the number of DBs. In this work, when DBs have the same storage constraint, we propose two simple storage placement designs that satisfy the capacity conditions. Then, for more general heterogeneous storage constraints, we translate the storage placement design process into a "filling problem". We design an iterative algorithm to solve the filling problem where, in each iteration, messages are partitioned into sub-messages and stored at subsets of DBs. All of our proposed storage placement designs require a number of sub-messages per message at most equal to the number of DBs.Comment: arXiv admin note: text overlap with arXiv:1901.0749

arXiv.org e-Print Archive

$X$ -Secure $T$ -Private Federated Submodel Learning with Elastic Dropout Resilience

Author: Jafar Syed A.
Jia Zhuqing
Publication venue
Publication date: 22/03/2021
Field of study

Motivated by recent interest in federated submodel learning, this work explores the fundamental problem of privately reading from and writing to a database comprised of

K

files (submodels) that are stored across

N

distributed servers according to an

X

-secure threshold secret sharing scheme. One after another, various users wish to retrieve their desired file, locally process the information and then update the file in the distributed database while keeping the identity of their desired file private from any set of up to

T

colluding servers. The availability of servers changes over time, so elastic dropout resilience is required. The main contribution of this work is an adaptive scheme, called ACSA-RW, that takes advantage of all currently available servers to reduce its communication costs, fully updates the database after each write operation even though the database is only partially accessible due to server dropouts, and ensures a memoryless operation of the network in the sense that the storage structure is preserved and future users may remain oblivious of the past history of server dropouts. The ACSA-RW construction builds upon CSA codes that were originally introduced for XSTPIR and have been shown to be natural solutions for secure distributed matrix multiplication problems. ACSA-RW achieves the desired private read and write functionality with elastic dropout resilience, matches the best results for private-read from PIR literature, improves significantly upon available baselines for private-write, reveals a striking symmetry between upload and download costs, and exploits redundant storage dimensions to accommodate arbitrary read and write dropout servers up to certain threshold values. It also answers in the affirmative an open question by Kairouz et al. by exploiting synergistic gains from the joint design of private read and write operations

arXiv.org e-Print Archive

Multi-Party Private Set Intersection: An Information-Theoretic Approach

Author: Banawan Karim
Ulukus Sennur
Wang Zhusheng
Publication venue
Publication date: 17/08/2020
Field of study

We investigate the problem of multi-party private set intersection (MP-PSI). In MP-PSI, there are

M

parties, each storing a data set

\mathcal{p}_i

over

N_i

replicated and non-colluding databases, and we want to calculate the intersection of the data sets

\cap_{i=1}^M \mathcal{p}_i

without leaking any information beyond the set intersection to any of the parties. We consider a specific communication protocol where one of the parties, called the leader party, initiates the MP-PSI protocol by sending queries to the remaining parties which are called client parties. The client parties are not allowed to communicate with each other. We propose an information-theoretic scheme that privately calculates the intersection

\cap_{i=1}^M \mathcal{p}_i

with a download cost of

D = \min_{t \in \{1, \cdots, M\}} \sum_{i \in \{1, \cdots M\}\setminus {t}} \left\lceil \frac{|\mathcal{p}_t|N_i}{N_i-1}\right\rceil

. Similar to the 2-party PSI problem, our scheme builds on the connection between the PSI problem and the multi-message symmetric private information retrieval (MM-SPIR) problem. Our scheme is a non-trivial generalization of the 2-party PSI scheme as it needs an intricate design of the shared common randomness. Interestingly, in terms of the download cost, our scheme does not incur any penalty due to the more stringent privacy constraints in the MP-PSI problem compared to the 2-party PSI problem

arXiv.org e-Print Archive

Semantic Private Information Retrieval

Author: Banawan Karim
Ulukus Sennur
Vithana Sajani
Publication venue
Publication date: 30/03/2020
Field of study

We investigate the problem of semantic private information retrieval (semantic PIR). In semantic PIR, a user retrieves a message out of

K

independent messages stored in

N

replicated and non-colluding databases without revealing the identity of the desired message to any individual database. The messages come with \emph{different semantics}, i.e., the messages are allowed to have \emph{non-uniform a priori probabilities} denoted by

(p_i>0,\: i \in [K])

, which are a proxy for their respective popularity of retrieval, and \emph{arbitrary message sizes}

(L_i,\: i \in [K])

. This is a generalization of the classical private information retrieval (PIR) problem, where messages are assumed to have equal a priori probabilities and equal message sizes. We derive the semantic PIR capacity for general

K

N

. The results show that the semantic PIR capacity depends on the number of databases

N

, the number of messages

K

, the a priori probability distribution of messages

p_i

, and the message sizes

L_i

. We present two achievable semantic PIR schemes: The first one is a deterministic scheme which is based on message asymmetry. This scheme employs non-uniform subpacketization. The second scheme is probabilistic and is based on choosing one query set out of multiple options at random to retrieve the required message without the need for exponential subpacketization. We derive necessary and sufficient conditions for the semantic PIR capacity to exceed the classical PIR capacity with equal priors and sizes. Our results show that the semantic PIR capacity can be larger than the classical PIR capacity when longer messages have higher popularities. However, when messages are equal-length, the non-uniform priors cannot be exploited to improve the retrieval rate over the classical PIR capacity.Comment: submitted for publicatio

arXiv.org e-Print Archive

Two-Level Private Information Retrieval

Author: Plank James
Sun Hua
Tian Chao
Zhou Ruida
Publication venue
Publication date: 10/12/2021
Field of study

In the conventional robust

T

-colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any

T

colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the

(N, T_1:K_1, T_2:K_2)

two-level PIR system with a total of

K_2

messages in the system, where

T_1\geq T_2

and

K_1\leq K_2

. Any one of the

K_1

messages needs to be retrieved privately against

T_1

colluding servers, and any one of the full set of

K_2

messages needs to be retrieved privately against

T_2

colluding servers. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancellation scheme and the non-uniform block cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when

T_1=T_2

. Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting

arXiv.org e-Print Archive

Asymmetric Leaky Private Information Retrieval

Author: Attia Mohamed A.
Lazos Loukas
Samy Islam
Tandon Ravi
Publication venue
Publication date: 04/06/2020
Field of study

Information-theoretic formulations of the private information retrieval (PIR) problem have been investigated under a variety of scenarios. Symmetric private information retrieval (SPIR) is a variant where a user is able to privately retrieve one out of

K

messages from

N

non-colluding replicated databases without learning anything about the remaining

K-1

messages. However, the goal of perfect privacy can be too taxing for certain applications. In this paper, we investigate if the information-theoretic capacity of SPIR (equivalently, the inverse of the minimum download cost) can be increased by relaxing both user and DB privacy definitions. Such relaxation is relevant in applications where privacy can be traded for communication efficiency. We introduce and investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy leakage budgets in each direction. For user privacy leakage, we bound the probability ratios between all possible realizations of DB queries by a function of a non-negative constant

\epsilon

. For DB privacy, we bound the mutual information between the undesired messages, the queries, and the answers, by a function of a non-negative constant

\delta

. We propose a general AL-PIR scheme that achieves an upper bound on the optimal download cost for arbitrary

\epsilon

and

\delta

. We show that the optimal download cost of AL-PIR is upper-bounded as

D^{*}(\epsilon,\delta)\leq 1+\frac{1}{N-1}-\frac{\delta e^{\epsilon}}{N^{K-1}-1}

. Second, we obtain an information-theoretic lower bound on the download cost as

D^{*}(\epsilon,\delta)\geq 1+\frac{1}{Ne^{\epsilon}-1}-\frac{\delta}{(Ne^{\epsilon})^{K-1}-1}

. The gap analysis between the two bounds shows that our AL-PIR scheme is optimal when

\epsilon =0

, i.e., under perfect user privacy and it is optimal within a maximum multiplicative gap of

\frac{N-e^{-\epsilon}}{N-1}

for any

(\epsilon,\delta)

arXiv.org e-Print Archive