138 research outputs found
PIR Codes with Short Block Length
In this work private information retrieval (PIR) codes are studied. In a
-PIR code, information bits are encoded in such a way that every
information bit has mutually disjoint recovery sets. The main problem under
this paradigm is to minimize the number of encoded bits given the values of
and , where this value is denoted by . The main focus of this work
is to analyze for a large range of parameters of and . In
particular, we improve upon several of the existing results on this value.Comment: 10 pages, 1 tabl
Providing Private and Fast Data Access for Cloud Systems
Cloud storage and computing systems have become the backbone of many applications such as streaming (Netflix, YouTube), storage (Dropbox, Google Drive), and computing (Amazon Elastic Computing, Microsoft Azure). To address the ever growing demand for storage and computing requirements of these applications, cloud services are typically im-plemented over a large-scale distributed data storage system. Cloud systems are expected to provide the following two pivotal services for the users: 1) private content access and 2) fast content access. The goal of this thesis is to understand and address some of the challenges that need to be overcome to provide these two services.
The first part of this thesis focuses on private data access in distributed systems. In particular, we contribute to the areas of Private Information Retrieval (PIR) and Private Computation (PC). In the PIR problem, there is a user who wishes to privately retrieve a subset of files belonging to a database stored on a single or multiple remote server(s). In the PC problem, the user wants to privately compute functions of a subset of files in the database. The PIR and PC problems seek the most efficient solutions with the minimum download cost that enable the user to retrieve or compute what it wants privately.
We establish fundamental bounds on the minimum download cost required for guaran-teeing the privacy requirement in some practical and realistic settings of the PIR and PC problems and develop novel and efficient privacy-preserving algorithms for these settings. In particular, we study the single-server and multi-server settings of PIR in which the user initially has a random linear combination of a subset of files in the database as side in-formation, referred to as PIR with coded side information. We also study the multi-server setting of the PC in which the user wants to privately compute multiple linear combinations of a subset of files in the database, referred to as Private Linear Transformation.
The second part of this thesis focuses on fast content access in distributed systems. In particular, we study the use of erasure coding to handle data access requests in distributed storage and computing systems. Service rate region is an important performance metric for coded distributed systems, which expresses the set of all data access request rates that can be simultaneously served by the system. In this context, two classes of problems arise: 1) characterizing the service rate region of a given storage scheme and finding the optimal request allocation, and 2) designing the underlying erasure code to handle a given desired service rate region.
As contributions along the first class of problems, we characterize the service rate region of systems with some common coding schemes such as Simplex codes and Reed-Muller codes by introducing two novel techniques: 1) fractional matching and vertex cover on graph representation of codes, and 2) geometric representations of codes. Moreover, along the second class of code design, we establish some lower bounds on the minimum storage required to handle a desired service rate region for a coded distributed system and in some regimes, we design efficient storage schemes that provide the desired service rate region while minimizing the storage requirements
Bounds and Constructions for Generalized Batch Codes
Private information retrieval (PIR) codes and batch codes are two important
types of codes that are designed for coded distributed storage systems and
private information retrieval protocols. These codes have been the focus of
much attention in recent years, as they enable efficient and secure storage and
retrieval of data in distributed systems.
In this paper, we introduce a new class of codes called \emph{-batch
codes}. These codes are a type of storage codes that can handle any multi-set
of requests, comprised of distinct information symbols. Importantly,
PIR codes and batch codes are special cases of -batch codes.
The main goal of this paper is to explore the relationship between the number
of redundancy symbols and the -batch code property. Specifically, we
establish a lower bound on the number of redundancy symbols required and
present several constructions of -batch codes. Furthermore, we extend
this property to the case where each request is a linear combination of
information symbols, which we refer to as \emph{functional -batch
codes}. Specifically, we demonstrate that simplex codes are asymptotically
optimal functional -batch codes, in terms of the number of redundancy
symbols required, under certain parameter regime.Comment: 25 page
How proofs are prepared at Camelot
We study a design framework for robust, independently verifiable, and
workload-balanced distributed algorithms working on a common input. An
algorithm based on the framework is essentially a distributed encoding
procedure for a Reed--Solomon code, which enables (a) robustness against
byzantine failures with intrinsic error-correction and identification of failed
nodes, and (b) independent randomized verification to check the entire
computation for correctness, which takes essentially no more resources than
each node individually contributes to the computation. The framework builds on
recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\
Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the
observation that {\em Merlin's magic is not needed} for batch evaluation---mere
Knights can prepare the proof, in parallel, and with intrinsic
error-correction.
The contribution of this paper is to show that in many cases the verifiable
batch evaluation framework admits algorithms that match in total resource
consumption the best known sequential algorithm for solving the problem. As our
main result, we show that the -cliques in an -vertex graph can be counted
{\em and} verified in per-node time and space on
compute nodes, for any constant and
positive integer divisible by , where is the
exponent of matrix multiplication. This matches in total running time the best
known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em
Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves
its space usage and parallelizability. Further results include novel algorithms
for counting triangles in sparse graphs, computing the chromatic polynomial of
a graph, and computing the Tutte polynomial of a graph.Comment: 42 p
Algorithmic Solutions for Combinatorial Problems in Resource Management of Manufacturing Environments
This thesis studies the use of heuristic algorithms in a number of combinatorial problems that occur in various resource constrained environments. Such problems occur, for example, in manufacturing, where a restricted number of resources (tools, machines, feeder slots) are needed to perform some operations. Many of these problems turn out to be computationally intractable, and heuristic algorithms are used to provide efficient, yet sub-optimal solutions. The main goal of the present study is to build upon existing methods to create new heuristics that provide improved solutions for some of these problems. All of these problems occur in practice, and one of the motivations of our study was the request for improvements from industrial sources. We approach three different resource constrained problems. The first is the tool switching and loading problem, and occurs especially in the assembly of printed circuit boards. This problem has to be solved when an efficient, yet small primary storage is used to access resources (tools) from a less efficient (but unlimited) secondary storage area. We study various forms of the problem and provide improved heuristics for its solution. Second, the nozzle assignment problem is concerned with selecting a suitable set of vacuum nozzles for the arms of a robotic assembly machine. It turns out that this is a specialized formulation of the MINMAX resource allocation formulation of the apportionment problem and it can be solved efficiently and optimally. We construct an exact algorithm specialized for the nozzle selection and provide a proof of its optimality. Third, the problem of feeder assignment and component tape construction occurs when electronic components are inserted and certain component types cause tape movement delays that can significantly impact the efficiency of printed circuit board assembly. Here, careful selection of component slots in the feeder improves the tape movement speed. We provide a formal proof that this problem is of the same complexity as the turnpike problem (a well studied geometric optimization problem), and provide a heuristic algorithm for this problem.Siirretty Doriast
- …