131 research outputs found
Private Computation of Systematically Encoded Data with Colluding Servers
Private Computation (PC), recently introduced by Sun and Jafar, is a
generalization of Private Information Retrieval (PIR) in which a user wishes to
privately compute an arbitrary function of data stored across several servers.
We construct a PC scheme which accounts for server collusion, coded data, and
non-linear functions. For data replicated over several possibly colluding
servers, our scheme computes arbitrary functions of the data with rate equal to
the asymptotic capacity of PIR for this setup. For systematically encoded data
stored over colluding servers, we privately compute arbitrary functions of the
columns of the data matrix and calculate the rate explicitly for polynomial
functions. The scheme is a generalization of previously studied star-product
PIR schemes.Comment: Submitted to IEEE International Symposium on Information Theory 2018.
Version 2 fixes some typos and adds some clarifying remark
Private Polynomial Computation from Lagrange Encoding
Private computation is a generalization of private information retrieval, in which a user is able to compute a function on a distributed dataset without revealing the identity of that function to the servers that store the dataset. In this paper it is shown that Lagrange encoding, a recently suggested powerful technique for encoding Reed-Solomon codes, enables private computation in many cases of interest. In particular, we present a scheme that enables private computation of polynomials of any degree on Lagrange encoded data, while being robust to Byzantine and straggling servers, and to servers that collude in attempt to deduce the identities of the functions to be evaluated. Moreover, incorporating ideas from the well-known Shamir secret sharing scheme allows the data itself to be concealed from the servers as well. Our results extend private computation to non-linear polynomials and to data-privacy, and reveal a tight connection between private computation and coded computation
Private Polynomial Computation from Lagrange Encoding
Private computation is a generalization of private information retrieval, in
which a user is able to compute a function on a distributed dataset without
revealing the identity of that function to the servers. In this paper it is
shown that Lagrange encoding, a powerful technique for encoding Reed-Solomon
codes, enables private computation in many cases of interest. In particular, we
present a scheme that enables private computation of polynomials of any degree
on Lagrange encoded data, while being robust to Byzantine and straggling
servers, and to servers colluding to attempt to deduce the identities of the
functions to be evaluated. Moreover, incorporating ideas from the well-known
Shamir secret sharing scheme allows the data itself to be concealed from the
servers as well. Our results extend private computation to high degree
polynomials and to data-privacy, and reveal a tight connection between private
computation and coded computation.Comment: To appear in Transactions on Information Forensics and Securit
On the Asymptotic Capacity of -Secure -Private Information Retrieval with Graph Based Replicated Storage
The problem of private information retrieval with graph-based replicated
storage was recently introduced by Raviv, Tamo and Yaakobi. Its capacity
remains open in almost all cases. In this work the asymptotic (large number of
messages) capacity of this problem is studied along with its generalizations to
include arbitrary -privacy and -security constraints, where the privacy
of the user must be protected against any set of up to colluding servers
and the security of the stored data must be protected against any set of up to
colluding servers. A general achievable scheme for arbitrary storage
patterns is presented that achieves the rate , where
is the total number of servers, and each message is replicated at least
times. Notably, the scheme makes use of a special structure
inspired by dual Generalized Reed Solomon (GRS) codes. A general converse is
also presented. The two bounds are shown to match for many settings, including
symmetric storage patterns. Finally, the asymptotic capacity is fully
characterized for the case without security constraints for arbitrary
storage patterns provided that each message is replicated no more than
times. As an example of this result, consider PIR with arbitrary graph based
storage () where every message is replicated at exactly servers.
For this -replicated storage setting, the asymptotic capacity is equal to
where is the maximum size of a -matching in a
storage graph . In this undirected graph, the vertices correspond
to the set of servers, and there is an edge between vertices
only if a subset of messages is replicated at both servers and
Coding against stragglers in distributed computation scenarios
Data and analytics capabilities have made a leap forward in recent years. The volume of available data has grown exponentially. The huge amount of data needs to be transferred and stored with extremely high reliability. The concept of coded computing , or a distributed computing paradigm that utilizes coding theory to smartly inject and leverage data/computation redundancy into distributed computing systems, mitigates the fundamental performance bottlenecks for running large-scale data analytics.
In this dissertation, a distributed computing framework, first for input files distributedly stored on the uplink of a cloud radio access network architecture, is studied. It focuses on that decoding at the cloud takes place via network function virtualization on commercial off-the-shelf servers. In order to mitigate the impact of straggling decoders in this platform, a novel coding strategy is proposed, whereby the cloud re-encodes the received frames via a linear code before distributing them to the decoding processors. Transmission of a single frame is considered first, and upper bounds on the resulting frame unavailability probability as a function of the decoding latency are derived by assuming a binary symmetric channel for uplink communications. Then, the analysis is extended to account for random frame arrival times. In this case, the trade-off between an average decoding latency and the frame error rate is studied for two different queuing policies, whereby the servers carry out per-frame decoding or continuous decoding, respectively. Numerical examples demonstrate that the bounds are useful tools for code design and that coding is instrumental in obtaining a desirable compromise between decoding latency and reliability.
In the second part of this dissertation large matrix multiplications are considered which are central to large-scale machine learning applications. These operations are often carried out on a distributed computing platform with a master server and multiple workers in the cloud operating in parallel. For such distributed platforms, it has been recently shown that coding over the input data matrices can reduce the computational delay, yielding a trade-off between recovery threshold, i.e., the number of workers required to recover the matrix product, and communication load, and the total amount of data to be downloaded from the workers. In addition to exact recovery requirements, security and privacy constraints on the data matrices are imposed, and the recovery threshold as a function of the communication load is studied. First, it is assumed that both matrices contain private information and that workers can collude to eavesdrop on the content of these data matrices. For this problem, a novel class of secure codes is introduced, referred to as secure generalized PolyDot codes, that generalize state-of-the-art non-secure codes for matrix multiplication. Secure generalized PolyDot codes allow a flexible trade-off between recovery threshold and communication load for a fixed maximum number of colluding workers while providing perfect secrecy for the two data matrices. Then, a connection between secure matrix multiplication and private information retrieval is studied. It is assumed that one of the data matrices is taken from a public set known to all the workers. In this setup, the identity of the matrix of interest should be kept private from the workers. For this model, a variant of generalized PolyDot codes is presented that can guarantee both secrecy of one matrix and privacy for the identity of the other matrix for the case of no colluding servers
Private Polynomial Computation from Lagrange Encoding
Private computation is a generalization of private information retrieval, in which a user is able to compute a function on a distributed dataset without revealing the identity of that function to the servers that store the dataset. In this paper it is shown that Lagrange encoding, a recently suggested powerful technique for encoding Reed-Solomon codes, enables private computation in many cases of interest. In particular, we present a scheme that enables private computation of polynomials of any degree on Lagrange encoded data, while being robust to Byzantine and straggling servers, and to servers that collude in attempt to deduce the identities of the functions to be evaluated. Moreover, incorporating ideas from the well-known Shamir secret sharing scheme allows the data itself to be concealed from the servers as well. Our results extend private computation to non-linear polynomials and to data-privacy, and reveal a tight connection between private computation and coded computation
The Asymptotic Capacity of -Secure -Private Linear Computation with Graph Based Replicated Storage
The problem of -secure -private linear computation with graph based
replicated storage (GXSTPLC) is to enable the user to retrieve a linear
combination of messages privately from a set of distributed servers where
every message is only allowed to store among a subset of servers subject to an
-security constraint, i.e., any groups of up to colluding servers must
reveal nothing about the messages. Besides, any groups of up to servers
cannot learn anything about the coefficients of the linear combination
retrieved by the user. In this work, we completely characterize the asymptotic
capacity of GXSTPLC, i.e., the supremum of average number of desired symbols
retrieved per downloaded symbol, in the limit as the number of messages
approaches infinity. Specifically, it is shown that a prior linear programming
based upper bound on the asymptotic capacity of GXSTPLC due to Jia and Jafar is
tight by constructing achievability schemes. Notably, our achievability scheme
also settles the exact capacity (i.e., for finite ) of -secure linear
combination with graph based replicated storage (GXSLC). Our achievability
proof builds upon an achievability scheme for a closely related problem named
asymmetric -secure -private linear computation with
graph based replicated storage (Asymm-GXSTPLC) that guarantees non-uniform
security and privacy levels across messages and coefficients. In particular, by
carefully designing Asymm-GXSTPLC settings for GXSTPLC problems, the
corresponding Asymm-GXSTPLC schemes can be reduced to asymptotic capacity
achieving schemes for GXSTPLC. In regard to the achievability scheme for
Asymm-GXSTPLC, interesting aspects of our construction include a novel query
and answer design which makes use of a Vandermonde decomposition of Cauchy
matrices, and a trade-off among message replication, security and privacy
thresholds.Comment: 39 pages, 2 figure
- …