Search CORE

31,060 research outputs found

Private Information Retrieval from MDS Coded Databases with Colluding Servers under Several Variant Models

Author: Ge Gennian
Zhang Yiwei
Publication venue
Publication date: 10/10/2017
Field of study

Private information retrieval (PIR) gets renewed attentions due to its information-theoretic reformulation and its application in distributed storage system (DSS). The general PIR model considers a coded database containing

N

servers storing

M

files. Each file is stored independently via the same arbitrary

(N,K)

-MDS code. A user wants to retrieve a specific file from the database privately against an arbitrary set of

T

colluding servers. A key problem is to analyze the PIR capacity, defined as the maximal number of bits privately retrieved per one downloaded bit. Several extensions for the general model appear by bringing in various additional constraints. In this paper, we propose a general PIR scheme for several variant PIR models including: PIR with robust servers, PIR with Byzantine servers, the multi-file PIR model and PIR with arbitrary collusion patterns.Comment: The current draft is extended by considering several PIR models. The original version named "Multi-file Private Information Retrieval from MDS Coded Databases with Colluding Servers" is abridged into a section within the current draft. arXiv admin note: text overlap with arXiv:1704.0678

arXiv.org e-Print Archive

The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases

Author: Arasli Batuhan
Banawan Karim
Ulukus Sennur
Wei Yi-Peng
Publication venue
Publication date: 25/02/2019
Field of study

We consider private information retrieval (PIR) of a single file out of

K

files from

N

non-colluding databases with heterogeneous storage constraints

\mathbf{m}=(m_1, \cdots, m_N)

. The aim of this work is to jointly design the content placement phase and the information retrieval phase in order to minimize the download cost in the PIR phase. We characterize the optimal PIR download cost as a linear program. By analyzing the structure of the optimal solution of this linear program, we show that, surprisingly, the optimal download cost in our heterogeneous case matches its homogeneous counterpart where all databases have the same average storage constraint

\mu=\frac{1}{N} \sum_{n=1}^{N} m_n

. Thus, we show that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. We provide the optimum content placement explicitly for

N=3

.Comment: Submitted for publication, February 201

arXiv.org e-Print Archive

Secure Symmetric Private Information Retrieval from Colluding Databases with Adversaries

Author: Skoglund Mikael
Wang Qiwen
Publication venue
Publication date: 07/07/2017
Field of study

The problem of symmetric private information retrieval (SPIR) from replicated databases with colluding servers and adversaries is studied. Specifically, the database comprises

K

files, which are replicatively stored among

N

servers. A user wants to retrieve one file from the database by communicating with the

N

servers, without revealing the identity of the desired file to any server. Furthermore, the user shall learn nothing about the other

K-1

files. Any

T

out of

N

servers may collude, that is, they may communicate their interactions with the user to guess the identity of the requested file. An adversary in the system can tap in on or even try to corrupt the communication. Three types of adversaries are considered: a Byzantine adversary who can overwrite the transmission of any

B

servers to the user; a passive eavesdropper who can tap in on the incoming and outgoing transmissions of any

E

servers; and a combination of both -- an adversary who can tap in on a set of any

E

nodes, and overwrite the transmission of a set of any

B

nodes. The problems of SPIR with colluding servers and the three types of adversaries are named T-BSPIR, T-ESPIR and T-BESPIR respectively. The capacity of the problem is defined as the maximum number of information bits of the desired file retrieved per downloaded bit. We show that the information-theoretical capacity of T-BSPIR equals

1-\frac{2B+T}{N}

, if the servers share common randomness (unavailable at the user) with amount at least

\frac{2B+T}{N-2B-T}

times the file size. Otherwise, the capacity equals zero. The capacity of T-ESPIR is proved to equal

1-\frac{\max(T,E)}{N}

, with common randomness at least

\frac{\max(T,E)}{N-\max(T,E)}

times the file size. Finally, the capacity of T-BESPIR is proved to be

1-\frac{2B+\max(T,E)}{N}

, with common randomness at least

\frac{2B+\max(T,E)}{N-2B-\max(T,E)}

times the file size

arXiv.org e-Print Archive

A general private information retrieval scheme for MDS coded databases with colluding servers

Author: Ge Gennian
Zhang Yiwei
Publication venue
Publication date: 22/04/2017
Field of study

The problem of private information retrieval gets renewed attentions in recent years due to its information-theoretic reformulation and applications in distributed storage systems. PIR capacity is the maximal number of bits privately retrieved per one bit of downloaded bit. The capacity has been fully solved for some degenerating cases. For a general case where the database is both coded and colluded, the exact capacity remains unknown. We build a general private information retrieval scheme for MDS coded databases with colluding servers. Our scheme achieves the rate

(1+R+R^2+\cdots+R^{M-1})

, where

R=1-\frac{{{N-T}\choose K}}{{N\choose K}}

. Compared to existing PIR schemes, our scheme performs better for a certain range of parameters and is suitable for any underlying MDS code used in the distributed storage system.Comment: Submitted to IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Linear Symmetric Private Information Retrieval for MDS Coded Distributed Storage with Colluding Servers

Author: Skoglund Mikael
Wang Qiwen
Publication venue
Publication date: 17/08/2017
Field of study

The problem of symmetric private information retrieval (SPIR) from a coded database which is distributively stored among colluding servers is studied. Specifically, the database comprises

K

files, which are stored among

N

servers using an

(N,M)

-MDS storage code. A user wants to retrieve one file from the database by communicating with the

N

servers, without revealing the identity of the desired file to any server. Furthermore, the user shall learn nothing about the other

K-1

files in the database. In the

T

-colluding SPIR problem (hence called TSPIR), any

T

out of

N

servers may collude, that is, they may communicate their interactions with the user to guess the identity of the requested file. We show that for linear schemes, the information-theoretic capacity of the MDS-TSPIR problem, defined as the maximum number of information bits of the desired file retrieved per downloaded bit, equals

1-\frac{M+T-1}{N}

, if the servers share common randomness (unavailable at the user) with amount at least

\frac{M+T-1}{N-M-T+1}

times the file size. Otherwise, the capacity equals zero. We conjecture that our capacity holds also for general MDS-TSPIR schemes.Comment: arXiv admin note: text overlap with arXiv:1707.0215

arXiv.org e-Print Archive

Secure Private Information Retrieval from Colluding Databases with Eavesdroppers

Author: Skoglund Mikael
Wang Qiwen
Publication venue
Publication date: 03/10/2017
Field of study

The problem of private information retrieval (PIR) is to retrieve one message out of

K

messages replicated at

N

databases, without revealing the identity of the desired message to the databases. We consider the problem of PIR with colluding servers and eavesdroppers, named T-EPIR. Specifically, any

T

out of

N

databases may collude, i.e. they may communicate their interactions with the user to guess the identity of the requested message. An eavesdropper is curious to know the database and can tap in on the incoming and outgoing transmissions of any

E

databases. The databases share some common randomness unknown to the eavesdropper and the user, and use the common randomness to generate the answers, such that the eavesdropper can learn no information about the

K

messages. Define

R^*

as the optimal ratio of the number of the desired message information bits to the number of total downloaded bits, and

\rho^*

to be the optimal ratio of the information bits of the shared common randomness to the information bits of the desired file. In our previous work, we found that when

E \geq T

, the optimal ratio that can be achieved equals

1-\frac{E}{N}

. In this work, we focus on the case when

E \leq T

. We derive an outer bound

R^* \leq (1-\frac{T}{N}) \frac{1-\frac{E}{N} \cdot (\frac{T}{N})^{K-1}}{1-(\frac{T}{N})^K}

. We also obtain a lower bound of

\rho^* \geq \frac{\frac{E}{N}(1-(\frac{T}{N})^K)}{(1-\frac{T}{N})(1-\frac{E}{N} \cdot (\frac{T}{N})^{K-1})}

. For the achievability, we propose a scheme which achieves the rate (inner bound)

R=\frac{1-\frac{T}{N}}{1-(\frac{T}{N})^K}-\frac{E}{KN}

. The amount of shared common randomness used in the achievable scheme is

\frac{\frac{E}{N}(1-(\frac{T}{N})^K)}{1-\frac{T}{N}-\frac{E}{KN}(1-(\frac{T}{N})^K)}

times the file size. The gap between the derived inner and outer bounds vanishes as the number of messages

K

tends to infinity

arXiv.org e-Print Archive

Private Information Retrieval Through Wiretap Channel II: Privacy Meets Security

Author: Banawan Karim
Ulukus Sennur
Publication venue
Publication date: 18/01/2018
Field of study

We consider the problem of private information retrieval through wiretap channel II (PIR-WTC-II). In PIR-WTC-II, a user wants to retrieve a single message (file) privately out of

M

messages, which are stored in

N

replicated and non-communicating databases. An external eavesdropper observes a fraction

\mu_n

(of its choice) of the traffic exchanged between the

n

th database and the user. In addition to the privacy constraint, the databases should encode the returned answer strings such that the eavesdropper learns absolutely nothing about the \emph{contents} of the databases. We aim at characterizing the capacity of the PIR-WTC-II under the combined privacy and security constraints. We obtain a general upper bound for the problem in the form of a max-min optimization problem, which extends the converse proof of the PIR problem under asymmetric traffic constraints. We propose an achievability scheme that satisfies the security constraint by encoding a secret key, which is generated securely at each database, into an artificial noise vector using an MDS code. The user and the databases operate at one of the corner points of the achievable scheme for the PIR under asymmetric traffic constraints such that the retrieval rate is maximized under the imposed security constraint. The upper bound and the lower bound match for the case of

M=2

and

M=3

messages, for any

N

, and any

\boldsymbol{\mu}=(\mu_1, \cdots, \mu_N)

.Comment: Submitted to IEEE Transactions on Information Theory, January 201

arXiv.org e-Print Archive

Cache-Aided Private Information Retrieval with Partially Known Uncoded Prefetching: Fundamental Limits

Author: Banawan Karim
Ulukus Sennur
Wei Yi-Peng
Publication venue
Publication date: 18/12/2017
Field of study

We consider the problem of private information retrieval (PIR) from

N

non-colluding and replicated databases, when the user is equipped with a cache that holds an uncoded fraction

r

of the symbols from each of the

K

stored messages in the databases. This model operates in a two-phase scheme, namely, the prefetching phase where the user acquires side information and the retrieval phase where the user privately downloads the desired message. In the prefetching phase, the user receives

\frac{r}{N}

uncoded fraction of each message from the

n

th database. This side information is known only to the

n

th database and unknown to the remaining databases, i.e., the user possesses \emph{partially known} side information. We investigate the optimal normalized download cost

D^*(r)

in the retrieval phase as a function of

K

N

r

. We develop lower and upper bounds for the optimal download cost. The bounds match in general for the cases of very low caching ratio (

r \leq \frac{1}{N^{K-1}}

) and very high caching ratio (

r \geq \frac{K-2}{N^2-3N+KN}

). We fully characterize the optimal download cost caching ratio tradeoff for

K=3

. For general

K

N

, and

r

, we show that the largest gap between the achievability and the converse bounds is

\frac{5}{32}

.Comment: Submitted for publication, December 2017. arXiv admin note: substantial text overlap with arXiv:1709.0105

arXiv.org e-Print Archive

The Capacity of Private Information Retrieval from Byzantine and Colluding Databases

Author: Banawan Karim
Ulukus Sennur
Publication venue
Publication date: 05/06/2017
Field of study

We consider the problem of single-round private information retrieval (PIR) from

N

replicated databases. We consider the case when

B

databases are outdated (unsynchronized), or even worse, adversarial (Byzantine), and therefore, can return incorrect answers. In the PIR problem with Byzantine databases (BPIR), a user wishes to retrieve a specific message from a set of

M

messages with zero-error, irrespective of the actions performed by the Byzantine databases. We consider the

T

-privacy constraint in this paper, where any

T

databases can collude, and exchange the queries submitted by the user. We derive the information-theoretic capacity of this problem, which is the maximum number of \emph{correct symbols} that can be retrieved privately (under the

T

-privacy constraint) for every symbol of the downloaded data. We determine the exact BPIR capacity to be

C=\frac{N-2B}{N}\cdot\frac{1-\frac{T}{N-2B}}{1-(\frac{T}{N-2B})^M}

, if

2B+T < N

. This capacity expression shows that the effect of Byzantine databases on the retrieval rate is equivalent to removing

2B

databases from the system, with a penalty factor of

\frac{N-2B}{N}

, which signifies that even though the number of databases needed for PIR is effectively

N-2B

, the user still needs to access the entire

N

databases. The result shows that for the unsynchronized PIR problem, if the user does not have any knowledge about the fraction of the messages that are mis-synchronized, the single-round capacity is the same as the BPIR capacity. Our achievable scheme extends the optimal achievable scheme for the robust PIR (RPIR) problem to correct the \emph{errors} introduced by the Byzantine databases as opposed to \emph{erasures} in the RPIR problem. Our converse proof uses the idea of the cut-set bound in the network coding problem against adversarial nodes.Comment: Submitted to IEEE Transactions on Information Theory, June 201

arXiv.org e-Print Archive

The Capacity of Private Information Retrieval with Partially Known Private Side Information

Author: Banawan Karim
Ulukus Sennur
Wei Yi-Peng
Publication venue
Publication date: 26/11/2017
Field of study

We consider the problem of private information retrieval (PIR) of a single message out of

K

messages from

N

replicated and non-colluding databases where a cache-enabled user (retriever) of cache-size

M

possesses side information in the form of full messages that are partially known to the databases. In this model, the user and the databases engage in a two-phase scheme, namely, the prefetching phase where the user acquires side information and the retrieval phase where the user downloads desired information. In the prefetching phase, the user receives

m_n

full messages from the

n

th database, under the cache memory size constraint

\sum_{n=1}^N m_n \leq M

. In the retrieval phase, the user wishes to retrieve a message such that no individual database learns anything about the identity of the desired message. In addition, the identities of the side information messages that the user did not prefetch from a database must remain private against that database. Since the side information provided by each database in the prefetching phase is known by the providing database and the side information must be kept private against the remaining databases, we coin this model as \textit{partially known private side information}. We characterize the capacity of the PIR with partially known private side information to be

C=\left(1+\frac{1}{N}+\cdots+\frac{1}{N^{K-M-1}}\right)^{-1}=\frac{1-\frac{1}{N}}{1-(\frac{1}{N})^{K-M}}

. Interestingly, this result is the same if none of the databases knows any of the prefetched side information, i.e., when the side information is obtained externally, a problem posed by Kadhe et al. and settled by Chen-Wang-Jafar recently. Thus, our result implies that there is no loss in using the same databases for both prefetching and retrieval phases.Comment: Submitted to IEEE Transactions on Information Theory, November 201

arXiv.org e-Print Archive