7 research outputs found

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    Design and implementation of periodic broadcast video servers

    Get PDF
    Periodic broadcast is an effective paradigm for large-scale dissemination of popular videos. In the periodic broadcast paradigm, a video file is logically partitioned into a number of segments. These segments are periodically broadcast (using mulitcast) on the server channels. A client tunes into one or more channels at proper times to download the video segments into the client disk buffer. The client typically switches channels to download subsequent segments while playing out one of the buffered segments. Periodic broadcast guarantees a bounded service delay, which is equal to the length of time to broadcast the first segment, regardless of the number of concurrent requests making it suitable for popular videos. Considerable research efforts have gone into designing many excellent periodic broadcast protocols in terms of minimizing the server network bandwidth and the client resources. However, there are only a few implementations of periodic broadcast protocols available. This is probably because little has been documented on how the memory and disk bandwidth resources of a periodic broadcast server should be allocated. In this thesis, we present a Generalized Periodic Broadcast Server (GPBS) model that supports any periodic broadcast protocol. Based on this model, we formulate and solve a new optimization problem whose solution provides insights into the server\u27s memory and disk resources allocation. We use our analysis to estimate (i) the effect of keeping some video segments in the server memory during the entire broadcast of the video, and (ii) the effect of data placement on disk in periodic broadcast servers. We also discuss our prototype implementation of GPBS. Our work facilitates future implementation and deployment of many existing periodic broadcast protocols

    DISK DESIGN-SPACE EXPLORATION IN TERMS OF SYSTEM-LEVEL PERFORMANCE, POWER, AND ENERGY CONSUMPTION

    Get PDF
    To make the common case fast, most studies focus on the computation phase of applications in which most instructions are executed. However, many programs spend significant time in the I/O intensive phase due to the I/O latency. To obtain a system with more balanced phases, we require greater insight into the effects of the I/O configurations to the entire system in both performance and power dissipation domains. Due to lack of public tools with the complete picture of the entire memory hierarchy, we developed SYSim. SYSim is a complete-system simulator aiming at complete memory hierarchy studies in both performance and power consumption domains. In this dissertation, we used SYSim to investigate the system-level impacts of several disk enhancements and technology improvements to the detailed interaction in memory hierarchy during the I/O-intensive phase. The experimental results are reported in terms of both total system performance and power/energy consumption. With SYSim, we conducted the complete-system experiments and revealed intriguing behaviors including, but not limited to, the following: During the I/O intensive phase which consists of both disk reads and writes, the average system CPI tracks only average disk read response time, and not overall average disk response time, which is the widely-accepted metric in disk drive research. In disk read-dominating applications, Disk Prefetching is more important than increasing the disk RPM. On the other hand, in applications with both disk reads and writes, the disk RPM matters. The execution time can be improved to an order of magnitude by applying some disk enhancements. Using disk caching and prefetching can improve the performance by the factor of 2, and write-buffering can improve the performance by the factor of 10. Moreover, using disk caching/prefetching and the write-buffering techniques in conjunction can improve the total system performance by at least an order of magnitude. Increasing the disk RPM and the number of disks in RAID disk system also have an impressive improvement over the total system performance. However, employing such techniques requires careful consideration for trade-offs in power/energy consumption

    An erasure-resilient and compute-efficient coding scheme for storage applications

    Get PDF
    Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme

    Autonomic management of virtualized resources in cloud computing

    Get PDF
    The last five years have witnessed a rapid growth of cloud computing in business, governmental and educational IT deployment. The success of cloud services depends critically on the effective management of virtualized resources. A key requirement of cloud management is the ability to dynamically match resource allocations to actual demands, To this end, we aim to design and implement a cloud resource management mechanism that manages underlying complexity, automates resource provisioning and controls client-perceived quality of service (QoS) while still achieving resource efficiency. The design of an automatic resource management centers on two questions: when to adjust resource allocations and how much to adjust. In a cloud, applications have different definitions on capacity and cloud dynamics makes it difficult to determine a static resource to performance relationship. In this dissertation, we have proposed a generic metric that measures application capacity, designed model-independent and adaptive approaches to manage resources and built a cloud management system scalable to a cluster of machines. To understand web system capacity, we propose to use a metric of productivity index (PI), which is defined as the ratio of yield to cost, to measure the system processing capability online. PI is a generic concept that can be applied to different levels to monitor system progress in order to identify if more capacity is needed. We applied the concept of PI to the problem of overload prevention in multi-tier websites. The overload predictor built on the PI metric shows more accurate and responsive overload prevention compared to conventional approaches. To address the issue of the lack of accurate server model, we propose a model-independent fuzzy control based approach for CPU allocation. For adaptive and stable control performance, we embed the controller with self-tuning output amplification and flexible rule selection. Finally, we build a QoS provisioning framework that supports multi-objective QoS control and service differentiation. Experiments on a virtual cluster with two service classes show the effectiveness of our approach in both performance and power control. To address the problems of complex interplay between resources and process delays in fine-grained multi-resource allocation, we consider capacity management as a decision-making problem and employ reinforcement learning (RL) to optimize the process. The optimization depends on the trial-and-error interactions with the cloud system. In order to improve the initial management performance, we propose a model-based RL algorithm. The neural network based environment model, which is learned from previous management history, generates simulated resource allocations for the RL agent. Experiment results on heterogeneous applications show that our approach makes efficient use of limited interactions and find near optimal resource configurations within 7 steps. Finally, we present a distributed reinforcement learning approach to the cluster-wide cloud resource management. We decompose the cluster-wide resource allocation problem into sub-problems concerning individual VM resource configurations. The cluster-wide allocation is optimized if individual VMs meet their SLA with a high resource utilization. For scalability, we develop an efficient reinforcement learning approach with continuous state space. For adaptability, we use VM low-level runtime statistics to accommodate workload dynamics. Prototyped in a iBalloon system, the distributed learning approach successfully manages 128 VMs on a 16-node close correlated cluster

    Service Replication in Wireless Mobile Ad Hoc Networks

    Get PDF
    Die vorliegende Arbeit beschäftigt sich mit dem Management von Diensten im mobilen ad-hoc Netzwerken (MANETs). MANETs sind drahtlose Netzverbände mobiler Einheiten die sich dezentral ohne eine übergeordnete Organisation selbst verwalten. Die Netztopologie eines MANET verändert sich dabei dynamisch mit der Bewegung der autonomen Teilnehmer. Sensor Netzwerke, Personal Area Networks und Satelliten Netzwerke sind typische Beispiele für derartige MANETs. Mit der wachsenden Bedeutung der drahtlosen Vernetzung mobiler Geräte haben sich MANETs in den vergangenen Jahren zu einem wichtigen Forschungsgebiet entwickelt. Im Katastrophenmanagement, bei zivilen Rettungsfällen oder in militärischen Szenarien kann ihre infrastrukturlose Selbstorganisation MANETs zum einzig möglichen Kommunikationsmittel machen. Die mobilen Knoten eines MANETs kooperieren um essenzielle Netzwerkdienste wie das Routing und den Datentransport gemeinschaftlich zu gewährleisten. Ressourcen wie die Bandbreite zwischen Knoten, die Rechenleistung der mobilen Geräte und ihre Batterieleistung sind dabei typischerweise stark begrenzt und zudem wechselnd. Das Teilen der verfügbaren Ressourcen ist daher eine Notwendigkeit für das effiziente Funktionieren eines MANETs. Dienstorientierte Architekturen (SOAs) stellen ein geeignetes Paradigma dar, um geteilte Ressourcen zu verwalten. Wenn verfügbare Ressourcen als Dienst aufgefasst werden, lässt sich ihre Nutzung als Dienstabfrage bearbeiten. In diesem Zusammenhang ermöglichen SOAs Abstraktion, Kapselung, lose Koppelung, Auffindbarkeit von Ressourcen und dir für MANETs essenzielle Autonomie. Die Anwendung von SOAs auf MANETs findet daher zunehmend Beachtung in der Forschung

    DEFINING DIGITAL PRESERVATION WORK: A CASE STUDY OF THE DEVELOPMENT OF THE REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM

    Full text link
    I report on a multi-method case study of the development of a standard called the Reference Model for an Open Archival Information System (OAIS), which describes components and services required to develop and maintain archives in order to support long-term access and understanding of the information in those archives. The development of the OAIS took place within a standards development organization called the Consultative Committee for Space Data Systems (CCSDS), whose formal purview is the work of space agencies, but the effort reached far beyond the traditional CCSDS interests and stakeholders. It has become a fundamental component of digital archive research and development in a variety of disciplines and sectors. Through document analysis, social network analysis and qualitative analysis of interview data, I explain how and why the OAIS development effort, which took place within a space data standards body, was transformed into a standard of much wider scope, relevant to a diverse set of actors. The OAIS development process involved substantial enrollment of resources from the environment, including skills and expertise; social ties; documentary artifacts; structures and routines; physical facilities and proximity; and funding streams. Enrollment from the environment did not occur automatically. It was based on concerted efforts by actors who searched for relevant literature, framed the process as open, and promoted it at professional events. Their acts of participation also helped to enroll resources, contributing to what structuration theory calls the signification and legitimation of the Reference Model, i.e. enactment of what the document means, and why and to whom it is important. Documentary artifacts were most successfully incorporated into the OAIS when they were perceived to support modularity and to be at an appropriate level of abstraction. The content of the Reference Model was subject to stabilization over time, making changes less likely and more limited in scope. A major factor in the success of the OAIS was the timing of its development. Actors within several streams of activity related to digital preservation perceived the need for a highlevel model but had not themselves developed one. At the same time, several actors now felt they had knowledge from their own recent digital archiving efforts, which could inform the development of the OAIS. This study has important implications for research on standardization, and it provides many lessons for those engaged in future standards development efforts.Ph.D.InformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/39372/2/dissertation_callee.pd
    corecore