625 research outputs found

    Computing in the RAIN: a reliable array of independent nodes

    Get PDF
    The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

    Performance comparison of hierarchical checkpoint protocols grid computing

    Get PDF
    Grid infrastructure is a large set of nodes geographically distributed and connected by a communication. In this context, fault tolerance is a necessity imposed by the distribution that poses a number of problems related to the heterogeneity of hardware, operating systems, networks, middleware, applications, the dynamic resource, the scalability, the lack of common memory, the lack of a common clock, the asynchronous communication between processes. To improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resistance to these faults of the system. Fault tolerance is intended to allow the system to provide service as specified in spite of occurrences of faults. It appears as an indispensable element in distributed systems. To meet this need, several techniques have been proposed in the literature. We will study the protocols based on rollback recovery. These protocols are classified into two categories: coordinated checkpointing and rollback protocols and log-based independent checkpointing protocols or message logging protocols. However, the performance of a protocol depends on the characteristics of the system, network and applications running. Faced with the constraints of large-scale environments, many of algorithms of the literature showed inadequate. Given an application environment and a system, it is not easy to identify the recovery protocol that is most appropriate for a cluster or hierarchical environment, like grid computing. While some protocols have been used successfully in small scale, they are not suitable for use in large scale. Hence there is a need to implement these protocols in a hierarchical fashion to compare their performance in grid computing. In this paper, we propose hierarchical version of four well-known protocols. We have implemented and compare the performance of these protocols in clusters and grid computing using the Omnet++ simulator

    APUS: Fast and Scalable PAXOS on RDMA

    Get PDF
    State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on a number of hosts, tolerating various types of failures. Unfortunately, traditional Paxos protocols incur prohibitive performance overhead on server programs due to their high consensus latency on TCP/IP. Worse, the consensus latency of extant Paxos protocols increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, the first RDMA-based Paxos protocol that aims to be fast and scalable to client connections and hosts. APUS intercepts inbound socket calls of an unmodified server program, assigns a total order for all input requests, and uses fast RDMA primitives to replicate these requests concurrently. We evaluated APUS on nine widely-used server programs (e.g., Redis and MySQL). APUS incurred a mean overhead of 4.3% in response time and 4.2% in throughput. We integrated APUS with an SMR system Calvin. Our Calvin-APUS integration was 8.2X faster than the extant Calvin-ZooKeeper integration. The consensus latency of APUS outperformed an RDMA-based consensus protocol by 4.9X. APUS source code and raw results are released on github. com/hku-systems/apus.published_or_final_versio

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    ALGORITHMS FOR FAULT TOLERANCE IN DISTRIBUTED SYSTEMS AND ROUTING IN AD HOC NETWORKS

    Get PDF
    Checkpointing and rollback recovery are well-known techniques for coping with failures in distributed systems. Future generation Supercomputers will be message passing distributed systems consisting of millions of processors. As the number of processors grow, failure rate also grows. Thus, designing efficient checkpointing and recovery algorithms for coping with failures in such large systems is important for these systems to be fully utilized. We presented a novel communication-induced checkpointing algorithm which helps in reducing contention for accessing stable storage to store checkpoints. Under our algorithm, a process involved in a distributed computation can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes involved in the computation come to know about the consistent global checkpoint initiation through information piggy-backed with the application messages or limited control messages if necessary. When a process comes to know about a new consistent global checkpoint initiation, it takes a tentative checkpoint after processing the message. The tentative checkpoints taken can be flushed to stable storage when there is no contention for accessing stable storage. The tentative checkpoints together with the message logs stored in the stable storage form a consistent global checkpoint. Ad hoc networks consist of a set of nodes that can form a network for communication with each other without the aid of any infrastructure or human intervention. Nodes are energy-constrained and hence routing algorithm designed for these networks should take this into consideration. We proposed two routing protocols for mobile ad hoc networks which prevent nodes from broadcasting route requests unnecessarily during the route discovery phase and hence conserve energy and prevent contention in the network. One is called Triangle Based Routing (TBR) protocol. The other routing protocol we designed is called Routing Protocol with Selective Forwarding (RPSF). Both of the routing protocols greatly reduce the number of control packets which are needed to establish routes between pairs of source nodes and destination nodes. As a result, they reduce the energy consumed for route discovery. Moreover, these protocols reduce congestion and collision of packets due to limited number of nodes retransmitting the route requests

    Effective Node Clustering and Data Dissemination In Large-Scale Wireless Sensor Networks

    Get PDF
    The denseness and random distribution of large-scale WSNs makes it quite difficult to replace or recharge nodes. Energy efficiency and management is a major design goal in these networks. In addition, reliability and scalability are two other major goals that have been identified by researchers as necessary in order to further expand the deployment of such networks for their use in various applications. This thesis aims to provide an energy efficient and effective node clustering and data dissemination algorithm in large-scale wireless sensor networks. In the area of clustering, the proposed research prolongs the lifetime of the network by saving energy through the use of node ranking to elect cluster heads, contrary to other existing cluster-based work that selects a random node or the node with the highest energy at a particular time instance as the new cluster head. Moreover, a global knowledge strategy is used to maintain a level of universal awareness of existing nodes in the subject area and to avoid the problem of disconnected or forgotten nodes. In the area of data dissemination, the aim of this research is to effectively manage the data collection by developing an efficient data collection scheme using a ferry node and applying a selective duty cycle strategy to the sensor nodes. Depending on the application, mobile ferries can be used for collecting data in a WSN, especially those that are large in scale, with delay tolerant applications. Unlike data collection via multi-hop forwarding among the sensing nodes, ferries travel across the sensing field to collect data. A ferry-based approach thus eliminates, or minimizes, the need for the multi-hop forwarding of data, and as a result, energy consumption at the nodes will be significantly reduced. This is especially true for nodes that are near the base station as they are used by other nodes to forward data to the base station. MATLAB is used to design, simulate and evaluate the proposed work against the work that has already been done by others by using various performance criteria

    Mitosis based speculative multithreaded architectures

    Get PDF
    In the last decade, industry made a right-hand turn and shifted towards multi-core processor designs, also known as Chip-Multi-Processors (CMPs), in order to provide further performance improvements under a reasonable power budget, design complexity, and validation cost. Over the years, several processor vendors have come out with multi-core chips in their product lines and they have become mainstream, with the number of cores increasing in each processor generation. Multi-core processors improve the performance of applications by exploiting Thread Level Parallelism (TLP) while the Instruction Level Parallelism (ILP) exploited by each individual core is limited. These architectures are very efficient when multiple threads are available for execution. However, single-thread sections of code (single-thread applications and serial sections of parallel applications) pose important constraints on the benefits achieved by parallel execution, as pointed out by Amdahl’s law. Parallel programming, even with the help of recently proposed techniques like transactional memory, has proven to be a very challenging task. On the other hand, automatically partitioning applications into threads may be a straightforward task in regular applications, but becomes much harder for irregular programs, where compilers usually fail to discover sufficient TLP. In this scenario, two main directions have been followed in the research community to take benefit of multi-core platforms: Speculative Multithreading (SpMT) and Non-Speculative Clustered architectures. The former splits a sequential application into speculative threads, while the later partitions the instructions among the cores based on data-dependences but avoid large degree of speculation. Despite the large amount of research on both these approaches, the proposed techniques so far have shown marginal performance improvements. In this thesis we propose novel schemes to speed-up sequential or lightly threaded applications in multi-core processors that effectively address the main unresolved challenges of previous approaches. In particular, we propose a SpMT architecture, called Mitosis, that leverages a powerful software value prediction technique to manage inter-thread dependences, based on pre-computation slices (p-slices). Thanks to the accuracy and low cost of this technique, Mitosis is able to effectively parallelize applications even in the presence of frequent dependences among threads. We also propose a novel architecture, called Anaphase, that combines the best of SpMT schemes and clustered architectures. Anaphase effectively exploits ILP, TLP and Memory Level Parallelism (MLP), thanks to its unique finegrain thread decomposition algorithm that adapts to the available parallelism in the application

    Run-time Variability with Roles

    Get PDF
    Adaptability is an intrinsic property of software systems that require adaptation to cope with dynamically changing environments. Achieving adaptability is challenging. Variability is a key solution as it enables a software system to change its behavior which corresponds to a specific need. The abstraction of variability is to manage variants, which are dynamic parts to be composed to the base system. Run-time variability realizes these variant compositions dynamically at run time to enable adaptation. Adaptation, relying on variants specified at build time, is called anticipated adaptation, which allows the system behavior to change with respect to a set of predefined execution environments. This implies the inability to solve practical problems in which the execution environment is not completely fixed and often unknown until run time. Enabling unanticipated adaptation, which allows variants to be dynamically added at run time, alleviates this inability, but it holds several implications yielding system instability such as inconsistency and run-time failures. Adaptation should be performed only when a system reaches a consistent state to avoid inconsistency. Inconsistency is an effect of adaptation happening when the system changes the state and behavior while a series of methods is still invoking. A software bug is another source of system instability. It often appears in a variant composition and is brought to the system during adaptation. The problem is even more critical for unanticipated adaptation as the system has no prior knowledge of the new variants. This dissertation aims to achieve anticipated and unanticipated adaptation. In achieving adaptation, the issues of inconsistency and software failures, which may happen as a consequence of run-time adaptation, are evidently addressed as well. Roles encapsulate dynamic behavior used to adapt players representing the base system, which is the rationale to select roles as the software system's variants. Based on the role concept, this dissertation presents three mechanisms to comprehensively address adaptation. First, a dynamic instance binding mechanism is proposed to loosely bind players and roles. Dynamic binding of roles enables anticipated and unanticipated adaptation. Second, an object-level tranquility mechanism is proposed to avoid inconsistency by allowing a player object to adapt only when its consistent state is reached. Last, a rollback recovery mechanism is proposed as a proactive mechanism to embrace and handle failures resulting from a defective composition of variants. A checkpoint of a system configuration is created before adaptation. If a specialized bug sensor detects a failure, the system rolls back to the most recent checkpoint. These mechanisms are integrated into a role-based runtime, called LyRT. LyRT was validated with three case studies to demonstrate the practical feasibility. This validation showed that LyRT is more advanced than the existing variability approaches with respect to adaptation due to its consistency control and failure handling. Besides, several benchmarks were set up to quantify the overhead of LyRT concerning the execution time of adaptation. The results revealed that the overhead introduced to achieve anticipated and unanticipated adaptation to be small enough for practical use in adaptive software systems. Thus, LyRT is suitable for adaptive software systems that frequently require the adaptation of large sets of objects
    corecore