1,503 research outputs found

    Computing in the RAIN: a reliable array of independent nodes

    Get PDF
    The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

    Power Management Techniques for Data Centers: A Survey

    Full text link
    With growing use of internet and exponential growth in amount of data to be stored and processed (known as 'big data'), the size of data centers has greatly increased. This, however, has resulted in significant increase in the power consumption of the data centers. For this reason, managing power consumption of data centers has become essential. In this paper, we highlight the need of achieving energy efficiency in data centers and survey several recent architectural techniques designed for power management of data centers. We also present a classification of these techniques based on their characteristics. This paper aims to provide insights into the techniques for improving energy efficiency of data centers and encourage the designers to invent novel solutions for managing the large power dissipation of data centers.Comment: Keywords: Data Centers, Power Management, Low-power Design, Energy Efficiency, Green Computing, DVFS, Server Consolidatio

    Dynamic Physiological Partitioning on a Shared-nothing Database Cluster

    Full text link
    Traditional DBMS servers are usually over-provisioned for most of their daily workloads and, because they do not show good-enough energy proportionality, waste a lot of energy while underutilized. A cluster of small (wimpy) servers, where its size can be dynamically adjusted to the current workload, offers better energy characteristics for these workloads. Yet, data migration, necessary to balance utilization among the nodes, is a non-trivial and time-consuming task that may consume the energy saved. For this reason, a sophisticated and easy to adjust partitioning scheme fostering dynamic reorganization is needed. In this paper, we adapt a technique originally created for SMP systems, called physiological partitioning, to distribute data among nodes, that allows to easily repartition data without interrupting transactions. We dynamically partition DB tables based on the nodes' utilization and given energy constraints and compare our approach with physical partitioning and logical partitioning methods. To quantify possible energy saving and its conceivable drawback on query runtimes, we evaluate our implementation on an experimental cluster and compare the results w.r.t. performance and energy consumption. Depending on the workload, we can substantially save energy without sacrificing too much performance

    QCDOC: A 10-teraflops scale computer for lattice QCD

    Get PDF
    The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from ``QCD On a Chip''.Comment: Lattice 2000 (machines) 8 pages, 4 figure

    Reliable and randomized data distribution strategies for large scale storage systems

    Get PDF
    The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This paper presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations, drastically reducing the required amount of randomness while delivering a perfect load distribution.Peer ReviewedPostprint (author’s final draft

    Cloud computing: survey on energy efficiency

    Get PDF
    International audienceCloud computing is today’s most emphasized Information and Communications Technology (ICT) paradigm that is directly or indirectly used by almost every online user. However, such great significance comes with the support of a great infrastructure that includes large data centers comprising thousands of server units and other supporting equipment. Their share in power consumption generates between 1.1% and 1.5% of the total electricity use worldwide and is projected to rise even more. Such alarming numbers demand rethinking the energy efficiency of such infrastructures. However, before making any changes to infrastructure, an analysis of the current status is required. In this article, we perform a comprehensive analysis of an infrastructure supporting the cloud computing paradigm with regards to energy efficiency. First, we define a systematic approach for analyzing the energy efficiency of most important data center domains, including server and network equipment, as well as cloud management systems and appliances consisting of a software utilized by end users. Second, we utilize this approach for analyzing available scientific and industrial literature on state-of-the-art practices in data centers and their equipment. Finally, we extract existing challenges and highlight future research directions

    The expandable network disk

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 91-96).This thesis presents a virtual disk cluster called END, the Expandable Network Disk. END aggregates storage on a cluster of servers into a single virtual disk. END's main goals are to offer good performance during normal operation, and efficiently handle changes in the cluster membership. END achieves these goals using a two-layer design, in which storage "bricks," servers that consist of CPU, memory, and hard disks, hold two kinds of information. The lower layer stores replicated immutable chunks of data, each indexed by a unique key. The upper layer maps each block address to the key of its current data chunk; each mapping is held on two bricks using primary-backup replication. This separation allows END flexibility in where it stores chunks and thus efficiency: it writes new chunks to bricks chosen for speed; it moves only address mappings (not data) when bricks fail and recover, which results in fast recovery; it fully replicates new writes during temporary brick failures; and it uses chunks on a recovered brick without risk of staleness. The END prototype's write throughput on a cluster of 16 PC-based bricks is 150 megabytes/s with 2x replication. END continues after a single brick failure, re-incorporates a rebooting brick, and expands to include a new brick after a few seconds of reduced performance during each change. The results show that END's two-layer design maintains good performance, resumes operation quickly after changes in the cluster, and maintains full replication of new writes even during a brick failure.by Athicha Muthitacharoen.Ph.D

    Parallel replication for distributed video-on-demand systems.

    Get PDF
    Lie, Wai-Kwok Peter.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 79-83).Abstract --- p.iAcknowledgments --- p.iiChapter 1 --- Introduction --- p.1Chapter 2 --- Background & Related Work --- p.5Chapter 2.1 --- Early Work on Multimedia Servers --- p.6Chapter 2.2 --- Compression of Multimedia Data --- p.6Chapter 2.3 --- Multimedia File Systems --- p.7Chapter 2.4 --- Scheduling Support for Multimedia Systems --- p.8Chapter 2.5 --- Inter-media Synchronization --- p.9Chapter 2.6 --- Related Work on Replication in VOD Systems --- p.9Chapter 3 --- System Model --- p.12Chapter 4 --- Replication Methodology --- p.15Chapter 4.1 --- Replication Triggering Policy --- p.16Chapter 4.2 --- Source & Target Nodes Selection Policies --- p.17Chapter 4.3 --- Replication Policies --- p.18Chapter 4.3.1 --- Policy 1: Injected Sequential Replication --- p.20Chapter 4.3.2 --- Policy 2: Piggybacked Sequential Replication --- p.22Chapter 4.3.3 --- Policy 3: Injected Parallel Replication --- p.25Chapter 4.3.4 --- Policy 4: Piggybacked Parallel Replication --- p.28Chapter 4.3.5 --- Policy 5: Injected & Piggybacked Parallel Replication --- p.34Chapter 4.3.6 --- Policy 6: Multi-Source Injected & Piggybacked Parallel Replication --- p.36Chapter 4.4 --- Dereplication Policy --- p.37Chapter 5 --- Distributed Architecture for VOD Server --- p.39Chapter 5.1 --- Server Node --- p.40Chapter 5.2 --- Movie Manager --- p.42Chapter 5.3 --- Metadata Manager --- p.42Chapter 5.4 --- Protocols for Distributed VOD Architecture --- p.43Chapter 5.4.1 --- Protocol for Servicing New Customers --- p.43Chapter 5.4.2 --- Protocol for Servicing Existing Customers --- p.45Chapter 5.4.3 --- Protocol for Single/Multi-Source Injected & Parallel Replication --- p.46Chapter 5.4.4 --- Protocol for Dereplication --- p.48Chapter 5.5 --- Failure Handling --- p.49Chapter 5.5.1 --- Handling of Server Node Failures --- p.50Chapter 5.5.2 --- Handling of Movie Manager Failures --- p.52Chapter 6 --- Results --- p.55Chapter 6.1 --- Performance Metric --- p.56Chapter 6.2 --- Simulation Environment --- p.58Chapter 6.3 --- Results of Experiments without Dereplication --- p.59Chapter 6.3.1 --- Comparison of Different Replication Policies --- p.60Chapter 6.3.2 --- Effect of Early Acceptance/Migration --- p.61Chapter 6.3.3 --- Answer to the Resources Consumption Tradeoff issue --- p.62Chapter 6.3.4 --- Effect of Varying Movie Popularity Skewness --- p.64Chapter 6.3.5 --- Effect of Varying Replication Threshold --- p.64Chapter 6.3.6 --- Comparison of Different Target Node Selection Policies --- p.65Chapter 6.4 --- Overall Impact of Dynamic Replication --- p.66Chapter 7 --- Comparison with BSR-based Policy --- p.71Chapter 8 --- Conclusions --- p.75Chapter 8.1 --- Summary --- p.75Chapter 8.2 --- Future Research Directions --- p.76Bibliography --- p.7
    corecore