60 research outputs found

    Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×13.43\times, and also achieves a 2.36×2.36\times-12.65×12.65\times speedup over the state-of-the-art straggler mitigation strategies

    Randomized Polar Codes for Anytime Distributed Machine Learning

    Full text link
    We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations

    On the Existence of Optimal Exact-Repair MDS Codes for Distributed Storage

    Full text link
    The high repair cost of (n,k) Maximum Distance Separable (MDS) erasure codes has recently motivated a new class of codes, called Regenerating Codes, that optimally trade off storage cost for repair bandwidth. In this paper, we address bandwidth-optimal (n,k,d) Exact-Repair MDS codes, which allow for any failed node to be repaired exactly with access to arbitrary d survivor nodes, where k<=d<=n-1. We show the existence of Exact-Repair MDS codes that achieve minimum repair bandwidth (matching the cutset lower bound) for arbitrary admissible (n,k,d), i.e., k<n and k<=d<=n-1. Our approach is based on interference alignment techniques and uses vector linear codes which allow to split symbols into arbitrarily small subsymbols.Comment: 20 pages, 6 figure

    Secure and Reliable Data Outsourcing in Cloud Computing

    Get PDF
    The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure

    Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage

    Get PDF
    Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high. Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding

    Instantly Decodable Network Coding: From Centralized to Device-to-Device Communications

    Get PDF
    From its introduction to its quindecennial, network coding has built a strong reputation for enhancing packet recovery and achieving maximum information flow in both wired and wireless networks. Traditional studies focused on optimizing the throughput of the system by proposing elaborate schemes able to reach the network capacity. With the shift toward distributed computing on mobile devices, performance and complexity become both critical factors that affect the efficiency of a coding strategy. Instantly decodable network coding presents itself as a new paradigm in network coding that trades off these two aspects. This paper review instantly decodable network coding schemes by identifying, categorizing, and evaluating various algorithms proposed in the literature. The first part of the manuscript investigates the conventional centralized systems, in which all decisions are carried out by a central unit, e.g., a base-station. In particular, two successful approaches known as the strict and generalized instantly decodable network are compared in terms of reliability, performance, complexity, and packet selection methodology. The second part considers the use of instantly decodable codes in a device-to-device communication network, in which devices speed up the recovery of the missing packets by exchanging network coded packets. Although the performance improvements are directly proportional to the computational complexity increases, numerous successful schemes from both the performance and complexity viewpoints are identified

    Random coding for sharing bosonic quantum secrets

    Get PDF
    We consider a protocol for sharing quantum states using continuous variable systems. Specifically we introduce an encoding procedure where bosonic modes in arbitrary secret states are mixed with several ancillary squeezed modes through a passive interferometer. We derive simple conditions on the interferometer for this encoding to define a secret sharing protocol and we prove that they are satisfied by almost any interferometer. This implies that, if the interferometer is chosen uniformly at random, the probability that it may not be used to implement a quantum secret sharing protocol is zero. Furthermore, we show that the decoding operation can be obtained and implemented efficiently with a Gaussian unitary using a number of single-mode squeezers that is at most twice the number of modes of the secret, regardless of the number of players. We benchmark the quality of the reconstructed state by computing the fidelity with the secret state as a function of the input squeezing.Comment: Updated figure 1, added figure 2, closer to published versio
    corecore