Search CORE

Cloud storage and computing systems have become the backbone of many applications such as streaming (Netflix, YouTube), storage (Dropbox, Google Drive), and computing (Amazon Elastic Computing, Microsoft Azure). To address the ever growing demand for storage and computing requirements of these applications, cloud services are typically im-plemented over a large-scale distributed data storage system. Cloud systems are expected to provide the following two pivotal services for the users: 1) private content access and 2) fast content access. The goal of this thesis is to understand and address some of the challenges that need to be overcome to provide these two services. The first part of this thesis focuses on private data access in distributed systems. In particular, we contribute to the areas of Private Information Retrieval (PIR) and Private Computation (PC). In the PIR problem, there is a user who wishes to privately retrieve a subset of files belonging to a database stored on a single or multiple remote server(s). In the PC problem, the user wants to privately compute functions of a subset of files in the database. The PIR and PC problems seek the most efficient solutions with the minimum download cost that enable the user to retrieve or compute what it wants privately. We establish fundamental bounds on the minimum download cost required for guaran-teeing the privacy requirement in some practical and realistic settings of the PIR and PC problems and develop novel and efficient privacy-preserving algorithms for these settings. In particular, we study the single-server and multi-server settings of PIR in which the user initially has a random linear combination of a subset of files in the database as side in-formation, referred to as PIR with coded side information. We also study the multi-server setting of the PC in which the user wants to privately compute multiple linear combinations of a subset of files in the database, referred to as Private Linear Transformation. The second part of this thesis focuses on fast content access in distributed systems. In particular, we study the use of erasure coding to handle data access requests in distributed storage and computing systems. Service rate region is an important performance metric for coded distributed systems, which expresses the set of all data access request rates that can be simultaneously served by the system. In this context, two classes of problems arise: 1) characterizing the service rate region of a given storage scheme and finding the optimal request allocation, and 2) designing the underlying erasure code to handle a given desired service rate region. As contributions along the first class of problems, we characterize the service rate region of systems with some common coding schemes such as Simplex codes and Reed-Muller codes by introducing two novel techniques: 1) fractional matching and vertex cover on graph representation of codes, and 2) geometric representations of codes. Moreover, along the second class of code design, we establish some lower bounds on the minimum storage required to handle a desired service rate region for a coded distributed system and in some regimes, we design efficient storage schemes that provide the desired service rate region while minimizing the storage requirements

Texas A&M Repository

Bounds and Constructions for Generalized Batch Codes

Author: Elishco Ohad
Kong Xiangliang
Publication venue
Publication date: 13/09/2023
Field of study

Private information retrieval (PIR) codes and batch codes are two important types of codes that are designed for coded distributed storage systems and private information retrieval protocols. These codes have been the focus of much attention in recent years, as they enable efficient and secure storage and retrieval of data in distributed systems. In this paper, we introduce a new class of codes called \emph{

(s,t)

-batch codes}. These codes are a type of storage codes that can handle any multi-set of

t

requests, comprised of

s

distinct information symbols. Importantly, PIR codes and batch codes are special cases of

(s,t)

-batch codes. The main goal of this paper is to explore the relationship between the number of redundancy symbols and the

(s,t)

-batch code property. Specifically, we establish a lower bound on the number of redundancy symbols required and present several constructions of

(s,t)

-batch codes. Furthermore, we extend this property to the case where each request is a linear combination of information symbols, which we refer to as \emph{functional

(s,t)

-batch codes}. Specifically, we demonstrate that simplex codes are asymptotically optimal functional

(s,t)

-batch codes, in terms of the number of redundancy symbols required, under certain parameter regime.Comment: 25 page

arXiv.org e-Print Archive

How proofs are prepared at Camelot

Author: Freivalds R.
Gao S.
Nešetřil J.
Publication venue
Publication date: 01/01/2016
Field of study

We study a design framework for robust, independently verifiable, and workload-balanced distributed algorithms working on a common input. An algorithm based on the framework is essentially a distributed encoding procedure for a Reed--Solomon code, which enables (a) robustness against byzantine failures with intrinsic error-correction and identification of failed nodes, and (b) independent randomized verification to check the entire computation for correctness, which takes essentially no more resources than each node individually contributes to the computation. The framework builds on recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\ Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the observation that {\em Merlin's magic is not needed} for batch evaluation---mere Knights can prepare the proof, in parallel, and with intrinsic error-correction. The contribution of this paper is to show that in many cases the verifiable batch evaluation framework admits algorithms that match in total resource consumption the best known sequential algorithm for solving the problem. As our main result, we show that the

k

-cliques in an

n

-vertex graph can be counted {\em and} verified in per-node

O(n^{(\omega+\epsilon)k/6})

time and space on

O(n^{(\omega+\epsilon)k/6})

compute nodes, for any constant

\epsilon>0

and positive integer

k

divisible by

6

, where

2\leq\omega<2.3728639

is the exponent of matrix multiplication. This matches in total running time the best known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves its space usage and parallelizability. Further results include novel algorithms for counting triangles in sparse graphs, computing the chromatic polynomial of a graph, and computing the Tutte polynomial of a graph.Comment: 42 p

arXiv.org e-Print Archive

Lund University Publications

Crossref

Proceedings of the 17th Cologne-Twente Workshop on Graphs and Combinatorial Optimization

Author
Publication venue: 'University Library/University of Twente'
Publication date: 01/01/2019
Field of study

University of Twente Research Information

Algorithmic Solutions for Combinatorial Problems in Resource Management of Manufacturing Environments

Author: Ráduly-Baka Csaba
Publication venue: Turku Centre for Computer Science
Publication date: 07/10/2011
Field of study

This thesis studies the use of heuristic algorithms in a number of combinatorial problems that occur in various resource constrained environments. Such problems occur, for example, in manufacturing, where a restricted number of resources (tools, machines, feeder slots) are needed to perform some operations. Many of these problems turn out to be computationally intractable, and heuristic algorithms are used to provide efficient, yet sub-optimal solutions. The main goal of the present study is to build upon existing methods to create new heuristics that provide improved solutions for some of these problems. All of these problems occur in practice, and one of the motivations of our study was the request for improvements from industrial sources. We approach three different resource constrained problems. The first is the tool switching and loading problem, and occurs especially in the assembly of printed circuit boards. This problem has to be solved when an efficient, yet small primary storage is used to access resources (tools) from a less efficient (but unlimited) secondary storage area. We study various forms of the problem and provide improved heuristics for its solution. Second, the nozzle assignment problem is concerned with selecting a suitable set of vacuum nozzles for the arms of a robotic assembly machine. It turns out that this is a specialized formulation of the MINMAX resource allocation formulation of the apportionment problem and it can be solved efficiently and optimally. We construct an exact algorithm specialized for the nozzle selection and provide a proof of its optimality. Third, the problem of feeder assignment and component tape construction occurs when electronic components are inserted and certain component types cause tape movement delays that can significantly impact the efficiency of printed circuit board assembly. Here, careful selection of component slots in the feeder improves the tape movement speed. We provide a formal proof that this problem is of the same complexity as the turnpike problem (a well studied geometric optimization problem), and provide a heuristic algorithm for this problem.Siirretty Doriast

UTUPub