8 research outputs found

    Coding Schemes for Distributed Storage Systems

    Get PDF
    This thesis is devoted to problems in error-correcting codes motivated by data integrity problems arising in large-scale distributed storage systems. We study properties and constructions of Maximum Distance Separable (MDS) codes, which are widely used in storage applications since they provide the maximum failure tolerance for a given amount of storage overhead. Among the parameters of the code that are important for storage applications are: the amount of data transferred in the system during node repair (the repair bandwidth), which characterizes the network usage, and the volume of accessed data, which corresponds to the number of disk I/O operations. Therefore, recent research on MDS codes for distributed storage has focused on codes that can minimize these two quantities. A lower bound on the repair bandwidth of a code, called the cut-set bound, was proved by Dimakis et al. in 2010, and codes that attain this bound are said to have the optimal repair property. Explicit optimal-repair low-rate (rate ≤1/2\le 1/2) MDS codes were constructed by Rashmi et al. in 2011. At the same time, large-scale distributed systems such as the Google File System and Hadoop Distributed File System, employ high-rate (rate >1/2> 1/2) MDS codes due to the need of reducing storage overhead. Until recently, except for some particular cases, no general explicit constructions of high-rate optimal-repair MDS codes were known. In this thesis, we present the first explicit constructions of optimal-repair MDS codes, thereby providing a solution to the general construction problem of such codes for the high-rate regime. More specifically, we construct explicit MDS codes that can repair any number of failed nodes from any number of helper nodes with the smallest possible amount of downloaded/accessed data. For the particular case of repairing a single node failure, we further present an explicit family of MDS codes that minimize the amount of accessed data during the repair. This family of codes has an additional favorable property that the node size (the amount of information stored in the node) is also the smallest possible. Reducing the node size directly translates into reducing the complexity of storage systems. While most studies on MDS codes with optimal repair bandwidth focus on array codes, the repair problem of widely used scalar codes such as Reed-Solomon codes has also recently attracted attention of researchers. It has been an open problem whether scalar linear MDS codes can achieve the cut-set bound. In this thesis, we answer this question in the affirmative by giving explicit constructions of Reed-Solomon codes that can be repaired at the cut-set bound. We also prove a lower bound on the node size of optimally repairable scalar MDS codes, showing that the node size of our RS codes is close to the best possible for scalar linear codes. Finally, we extend the concept of repair bandwidth from erasure correction to error correction, which forms a new problem in coding theory. We prove a bound on the amount of downloaded information for this problem and present explicit code families that attain this bound for a wide range of parameters

    Coding for Security and Reliability in Distributed Systems

    Get PDF
    This dissertation studies the use of coding techniques to improve the reliability and security of distributed systems. The first three parts focus on distributed storage systems, and study schemes that encode a message into n shares, assigned to n nodes, such that any n - r nodes can decode the message (reliability) and any colluding z nodes cannot infer any information about the message (security). The objective is to optimize the computational, implementation, communication and access complexity of the schemes during the process of encoding, decoding and repair. These are the key metrics of the schemes so that when they are applied in practical distributed storage systems, the systems are not only reliable and secure, but also fast and cost-effective. Schemes with highly efficient computation and implementation are studied in Part I. For the practical high rate case of r ≤ 3 and z ≤ 3, we construct schemes that require only r + z XORs to encode and z XORs to decode each message bit, based on practical erasure codes including the B, EVENODD and STAR codes. This encoding and decoding complexity is shown to be optimal. For general r and z, we design schemes over a special ring from Cauchy matrices and Vandermonde matrices. Both schemes can be efficiently encoded and decoded due to the structure of the ring. We also discuss methods to shorten the proposed schemes. Part II studies schemes that are efficient in terms of communication and access complexity. We derive a lower bound on the decoding bandwidth, and design schemes achieving the optimal decoding bandwidth and access. We then design schemes that achieve the optimal bandwidth and access not only for decoding, but also for repair. Furthermore, we present a family of Shamir's schemes with asymptotically optimal decoding bandwidth. Part III studies the problem of secure repair, i.e., reconstructing the share of a (failed) node without leaking any information about the message. We present generic secure repair protocols that can securely repair any linear schemes. We derive a lower bound on the secure repair bandwidth and show that the proposed protocols are essentially optimal in terms of bandwidth. In the final part of the dissertation, we study the use of coding techniques to improve the reliability and security of network communication. Specifically, in Part IV we draw connections between several important problems in network coding. We present reductions that map an arbitrary multiple-unicast network coding instance to a unicast secure network coding instance in which at most one link is eavesdropped, or a unicast network error correction instance in which at most one link is erroneous, such that a rate tuple is achievable in the multiple-unicast network coding instance if and only if a corresponding rate is achievable in the unicast secure network coding instance, or in the unicast network error correction instance. Conversely, we show that an arbitrary unicast secure network coding instance in which at most one link is eavesdropped can be reduced back to a multiple-unicast network coding instance. Additionally, we show that the capacity of a unicast network error correction instance in general is not (exactly) achievable. We derive upper bounds on the secrecy capacity for the secure network coding problem, based on cut-sets and the connectivity of links. Finally, we study optimal coding schemes for the network error correction problem, in the setting that the network and adversary parameters are not known a priori.</p

    Enhanced Threshold Schemes and their Applications

    Get PDF

    ClusterRAID: Architecture and Prototype of a Distributed Fault-Tolerant Mass Storage System for Clusters

    Get PDF
    During the past few years clusters built from commodity off-the-shelf (COTS) components have emerged as the predominant supercomputer architecture. Typically comprising a collection of standard PCs or workstations and an interconnection network, they have replaced the traditionally used integrated systems due to their better price/performance ratio. As paradigms shift from mere computing intensive to I/O intensive applications, mass storage solutions for cluster installations become a more and more crucial aspect of these systems. The inherent unreliability of the underlying components is one of the reasons why no system has been established as a standard storage solution for clusters yet. This thesis sets out the architecture and prototype implementation of a novel distributed mass storage system for commodity off-the-shelf clusters and addresses the issue of the unreliable constituent components. The key concept of the presented system is the conversion of the local hard disk drive of a cluster node into a reliable device while preserving the block device interface. By the deployment of sophisticated erasure-correcting codes, the system allows the adjustment of the number of tolerable failures and thus the overall reliability. In addition, the applied data layout considers the access behaviour of a broad range of applications and minimizes the number of required network transactions. Extensive measurements and functionality tests of the prototype, both stand-alone and in conjunction with local or distributed file systems, show the validity of the concept

    Assuming Data Integrity and Empirical Evidence to The Contrary

    Get PDF
    Background: Not all respondents to surveys apply their minds or understand the posed questions, and as such provide answers which lack coherence, and this threatens the integrity of the research. Casual inspection and limited research of the 10-item Big Five Inventory (BFI-10), included in the dataset of the World Values Survey (WVS), suggested that random responses may be common. Objective: To specify the percentage of cases in the BRI-10 which include incoherent or contradictory responses and to test the extent to which the removal of these cases will improve the quality of the dataset. Method: The WVS data on the BFI-10, measuring the Big Five Personality (B5P), in South Africa (N=3 531), was used. Incoherent or contradictory responses were removed. Then the cases from the cleaned-up dataset were analysed for their theoretical validity. Results: Only 1 612 (45.7%) cases were identified as not including incoherent or contradictory responses. The cleaned-up data did not mirror the B5P- structure, as was envisaged. The test for common method bias was negative. Conclusion: In most cases the responses were incoherent. Cleaning up the data did not improve the psychometric properties of the BFI-10. This raises concerns about the quality of the WVS data, the BFI-10, and the universality of B5P-theory. Given these results, it would be unwise to use the BFI-10 in South Africa. Researchers are alerted to do a proper assessment of the psychometric properties of instruments before they use it, particularly in a cross-cultural setting

    Leading Towards Voice and Innovation: The Role of Psychological Contract

    Get PDF
    Background: Empirical evidence generally suggests that psychological contract breach (PCB) leads to negative outcomes. However, some literature argues that, occasionally, PCB leads to positive outcomes. Aim: To empirically determine when these positive outcomes occur, focusing on the role of psychological contract (PC) and leadership style (LS), and outcomes such as employ voice (EV) and innovative work behaviour (IWB). Method: A cross-sectional survey design was adopted, using reputable questionnaires on PC, PCB, EV, IWB, and leadership styles. Correlation analyses were used to test direct links within the model, while regression analyses were used to test for the moderation effects. Results: Data with acceptable psychometric properties were collected from 11 organisations (N=620). The results revealed that PCB does not lead to substantial changes in IWB. PCB correlated positively with prohibitive EV, but did not influence promotive EV, which was a significant driver of IWB. Leadership styles were weak predictors of EV and IWB, and LS only partially moderated the PCB-EV relationship. Conclusion: PCB did not lead to positive outcomes. Neither did LS influencing the relationships between PCB and EV or IWB. Further, LS only partially influenced the relationships between variables, and not in a manner which positively influence IWB

    Proceedings of the II International Congress on Interdisciplinarity in Social and Human Sciences

    Get PDF
    Interdisciplinarity is the main topic and the main goal of this conference. Since the sixteen century with the creation of the first Academy of Sciences, in Napoles (Italy) (1568), and before that with the creation of the Fine Arts Academies, the world of science and arts began to work independently, on the contrary of the Academy of Plato, in Classical Antiquity, where science, art and sport went interconnected. Over time, specific sciences began to be independent, and the specificity of sciences caused an increased difficulty in mutual understanding. The same trend has affected the Human and Social Sciences. Each of the specific sciences gave rise to a wide range of particular fields. This has the advantage of allowing the deepening of specialised knowledge, but it means that there is often only a piecemeal approach of the research object, not taking into account its overall complexity. So, it is important to work for a better understanding of the scientific phenomena with the complementarity of the different sciences, in an interdisciplinary perspective. With this growing specialisation of sciences, Interdisciplinarity acquired more relevance for scientists to find more encompassing and useful answers for their research questions. CIEO (Research Centre for Spatial and Organizational Dynamics) organises this conference, being Interdisciplinarity an important issue.info:eu-repo/semantics/acceptedVersio
    corecore