11 research outputs found

    The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases

    Full text link
    We consider private information retrieval (PIR) of a single file out of KK files from NN non-colluding databases with heterogeneous storage constraints m=(m1,,mN)\mathbf{m}=(m_1, \cdots, m_N). The aim of this work is to jointly design the content placement phase and the information retrieval phase in order to minimize the download cost in the PIR phase. We characterize the optimal PIR download cost as a linear program. By analyzing the structure of the optimal solution of this linear program, we show that, surprisingly, the optimal download cost in our heterogeneous case matches its homogeneous counterpart where all databases have the same average storage constraint μ=1Nn=1Nmn\mu=\frac{1}{N} \sum_{n=1}^{N} m_n. Thus, we show that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. We provide the optimum content placement explicitly for N=3N=3.Comment: Submitted for publication, February 201

    Improved Storage for Efficient Private Information Retrieval

    Full text link
    We consider the problem of private information retrieval from NN \emph{storage-constrained} databases. In this problem, a user wishes to retrieve a single message out of MM messages (of size LL) without revealing any information about the identity of the message to individual databases. Each database stores μML\mu ML symbols, i.e., a μ\mu fraction of the entire library, where 1Nμ1\frac{1}{N} \leq \mu \leq 1. Our goal is to characterize the optimal tradeoff curve for the storage cost (captured by μ\mu) and the normalized download cost (D/LD/L). We show that the download cost can be reduced by employing a hybrid storage scheme that combines \emph{MDS coding} ideas with \emph{uncoded partial replication} ideas. When there is no coding, our scheme reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by Maddah-Ali-Niesen in the context of the caching problem, and when there is no uncoded partial replication, our scheme reduces to Banawan-Ulukus storage scheme; in general, our scheme outperforms both.Comment: ITW 201

    Latent-variable Private Information Retrieval

    Full text link
    In many applications, content accessed by users (movies, videos, news articles, etc.) can leak sensitive latent attributes, such as religious and political views, sexual orientation, ethnicity, gender, and others. To prevent such information leakage, the goal of classical PIR is to hide the identity of the content/message being accessed, which subsequently also hides the latent attributes. This solution, while private, can be too costly, particularly, when perfect (information-theoretic) privacy constraints are imposed. For instance, for a single database holding KK messages, privately retrieving one message is possible if and only if the user downloads the entire database of KK messages. Retrieving content privately, however, may not be necessary to perfectly hide the latent attributes. Motivated by the above, we formulate and study the problem of latent-variable private information retrieval (LV-PIR), which aims at allowing the user efficiently retrieve one out of KK messages (indexed by θ\theta) without revealing any information about the latent variable (modeled by SS). We focus on the practically relevant setting of a single database and show that one can significantly reduce the download cost of LV-PIR (compared to the classical PIR) based on the correlation between θ\theta and SS. We present a general scheme for LV-PIR as a function of the statistical relationship between θ\theta and SS, and also provide new results on the capacity/download cost of LV-PIR. Several open problems and new directions are also discussed

    Breaking the MDS-PIR Capacity Barrier via Joint Storage Coding

    Full text link
    The capacity of private information retrieval (PIR) from databases coded using maximum distance separable (MDS) codes has been previously characterized by Banawan and Ulukus, where it was assumed that the messages are encoded and stored separably into the databases. This assumption was also usually taken in other related works in the literature, and this capacity is usually referred to as the MDS-PIR capacity colloquially. In this work, we considered the question if and when this capacity barrier can be broken through joint encoding and storing of the messages. Our main results are two classes of novel code constructions which allow joint encoding as well as the corresponding PIR protocols, which indeed outperform the separate MDS-coded systems. Moreover, we show that a simple but novel expansion technique allows us to generalize these two classes of codes, resulting in a wider range of the cases where this capacity barrier can be broken

    Private Information Retrieval from Heterogeneous Uncoded Storage Constrained Databases with Reduced Sub-Messages

    Full text link
    We propose capacity-achieving schemes for private information retrieval (PIR) from uncoded databases (DBs) with both homogeneous and heterogeneous storage constraints. In the PIR setting, a user queries a set of DBs to privately download a message, where privacy implies that no one DB can infer which message the user desires. In general, a PIR scheme is comprised of storage placement and delivery designs. Previous works have derived the capacity, or infimum download cost, of PIR with uncoded storage placement and also sufficient conditions of a storage placement design to meet capacity. However, the currently proposed storage placement designs require splitting each message into an exponential number of sub-messages with respect to the number of DBs. In this work, when DBs have the same storage constraint, we propose two simple storage placement designs that satisfy the capacity conditions. Then, for more general heterogeneous storage constraints, we translate the storage placement design process into a "filling problem". We design an iterative algorithm to solve the filling problem where, in each iteration, messages are partitioned into sub-messages and stored at subsets of DBs. All of our proposed storage placement designs require a number of sub-messages per message at most equal to the number of DBs.Comment: arXiv admin note: text overlap with arXiv:1901.0749

    XX-Secure TT-Private Federated Submodel Learning with Elastic Dropout Resilience

    Full text link
    Motivated by recent interest in federated submodel learning, this work explores the fundamental problem of privately reading from and writing to a database comprised of KK files (submodels) that are stored across NN distributed servers according to an XX-secure threshold secret sharing scheme. One after another, various users wish to retrieve their desired file, locally process the information and then update the file in the distributed database while keeping the identity of their desired file private from any set of up to TT colluding servers. The availability of servers changes over time, so elastic dropout resilience is required. The main contribution of this work is an adaptive scheme, called ACSA-RW, that takes advantage of all currently available servers to reduce its communication costs, fully updates the database after each write operation even though the database is only partially accessible due to server dropouts, and ensures a memoryless operation of the network in the sense that the storage structure is preserved and future users may remain oblivious of the past history of server dropouts. The ACSA-RW construction builds upon CSA codes that were originally introduced for XSTPIR and have been shown to be natural solutions for secure distributed matrix multiplication problems. ACSA-RW achieves the desired private read and write functionality with elastic dropout resilience, matches the best results for private-read from PIR literature, improves significantly upon available baselines for private-write, reveals a striking symmetry between upload and download costs, and exploits redundant storage dimensions to accommodate arbitrary read and write dropout servers up to certain threshold values. It also answers in the affirmative an open question by Kairouz et al. by exploiting synergistic gains from the joint design of private read and write operations

    Multi-Party Private Set Intersection: An Information-Theoretic Approach

    Full text link
    We investigate the problem of multi-party private set intersection (MP-PSI). In MP-PSI, there are MM parties, each storing a data set pi\mathcal{p}_i over NiN_i replicated and non-colluding databases, and we want to calculate the intersection of the data sets i=1Mpi\cap_{i=1}^M \mathcal{p}_i without leaking any information beyond the set intersection to any of the parties. We consider a specific communication protocol where one of the parties, called the leader party, initiates the MP-PSI protocol by sending queries to the remaining parties which are called client parties. The client parties are not allowed to communicate with each other. We propose an information-theoretic scheme that privately calculates the intersection i=1Mpi\cap_{i=1}^M \mathcal{p}_i with a download cost of D=mint{1,,M}i{1,M}tptNiNi1D = \min_{t \in \{1, \cdots, M\}} \sum_{i \in \{1, \cdots M\}\setminus {t}} \left\lceil \frac{|\mathcal{p}_t|N_i}{N_i-1}\right\rceil. Similar to the 2-party PSI problem, our scheme builds on the connection between the PSI problem and the multi-message symmetric private information retrieval (MM-SPIR) problem. Our scheme is a non-trivial generalization of the 2-party PSI scheme as it needs an intricate design of the shared common randomness. Interestingly, in terms of the download cost, our scheme does not incur any penalty due to the more stringent privacy constraints in the MP-PSI problem compared to the 2-party PSI problem

    Semantic Private Information Retrieval

    Full text link
    We investigate the problem of semantic private information retrieval (semantic PIR). In semantic PIR, a user retrieves a message out of KK independent messages stored in NN replicated and non-colluding databases without revealing the identity of the desired message to any individual database. The messages come with \emph{different semantics}, i.e., the messages are allowed to have \emph{non-uniform a priori probabilities} denoted by (pi>0,i[K])(p_i>0,\: i \in [K]), which are a proxy for their respective popularity of retrieval, and \emph{arbitrary message sizes} (Li,i[K])(L_i,\: i \in [K]). This is a generalization of the classical private information retrieval (PIR) problem, where messages are assumed to have equal a priori probabilities and equal message sizes. We derive the semantic PIR capacity for general KK, NN. The results show that the semantic PIR capacity depends on the number of databases NN, the number of messages KK, the a priori probability distribution of messages pip_i, and the message sizes LiL_i. We present two achievable semantic PIR schemes: The first one is a deterministic scheme which is based on message asymmetry. This scheme employs non-uniform subpacketization. The second scheme is probabilistic and is based on choosing one query set out of multiple options at random to retrieve the required message without the need for exponential subpacketization. We derive necessary and sufficient conditions for the semantic PIR capacity to exceed the classical PIR capacity with equal priors and sizes. Our results show that the semantic PIR capacity can be larger than the classical PIR capacity when longer messages have higher popularities. However, when messages are equal-length, the non-uniform priors cannot be exploited to improve the retrieval rate over the classical PIR capacity.Comment: submitted for publicatio

    Two-Level Private Information Retrieval

    Full text link
    In the conventional robust TT-colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any TT colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the (N,T1:K1,T2:K2)(N, T_1:K_1, T_2:K_2) two-level PIR system with a total of K2K_2 messages in the system, where T1T2T_1\geq T_2 and K1K2K_1\leq K_2. Any one of the K1K_1 messages needs to be retrieved privately against T1T_1 colluding servers, and any one of the full set of K2K_2 messages needs to be retrieved privately against T2T_2 colluding servers. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancellation scheme and the non-uniform block cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when T1=T2T_1=T_2. Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting

    Asymmetric Leaky Private Information Retrieval

    Full text link
    Information-theoretic formulations of the private information retrieval (PIR) problem have been investigated under a variety of scenarios. Symmetric private information retrieval (SPIR) is a variant where a user is able to privately retrieve one out of KK messages from NN non-colluding replicated databases without learning anything about the remaining K1K-1 messages. However, the goal of perfect privacy can be too taxing for certain applications. In this paper, we investigate if the information-theoretic capacity of SPIR (equivalently, the inverse of the minimum download cost) can be increased by relaxing both user and DB privacy definitions. Such relaxation is relevant in applications where privacy can be traded for communication efficiency. We introduce and investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy leakage budgets in each direction. For user privacy leakage, we bound the probability ratios between all possible realizations of DB queries by a function of a non-negative constant ϵ\epsilon. For DB privacy, we bound the mutual information between the undesired messages, the queries, and the answers, by a function of a non-negative constant δ\delta. We propose a general AL-PIR scheme that achieves an upper bound on the optimal download cost for arbitrary ϵ\epsilon and δ\delta. We show that the optimal download cost of AL-PIR is upper-bounded as D(ϵ,δ)1+1N1δeϵNK11D^{*}(\epsilon,\delta)\leq 1+\frac{1}{N-1}-\frac{\delta e^{\epsilon}}{N^{K-1}-1}. Second, we obtain an information-theoretic lower bound on the download cost as D(ϵ,δ)1+1Neϵ1δ(Neϵ)K11D^{*}(\epsilon,\delta)\geq 1+\frac{1}{Ne^{\epsilon}-1}-\frac{\delta}{(Ne^{\epsilon})^{K-1}-1}. The gap analysis between the two bounds shows that our AL-PIR scheme is optimal when ϵ=0\epsilon =0, i.e., under perfect user privacy and it is optimal within a maximum multiplicative gap of NeϵN1\frac{N-e^{-\epsilon}}{N-1} for any (ϵ,δ)(\epsilon,\delta)
    corecore