9 research outputs found

    Interactive Consistency Algorithms Based on Voting and Error-Correding Codes

    Get PDF
    This paper presents a new class of synchronous deterministic non authenticated algorithms for reaching interactive consistency (Byzantine agreement). The algorithms are based on voting and error correcting codes and require considerably less data communication than the original algorithm, whereas the number of rounds and the number of modules meet the minimum bounds. These algorithms based on voting and coding are defined and proved on the basis of a class of algorithms, called the dispersed joined communication algorithm

    Computing in the RAIN: a reliable array of independent nodes

    Get PDF
    The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

    Robust Sensor Fusion Algorithms: Calibration and Cost Minimization.

    Get PDF
    A system reacting to its environment requires sensor input to model the environment. Unfortunately, sensors are electromechanical devices subject to physical limitations. It is challenging for a system to robustly evaluate sensor data which is of questionable accuracy and dependability. Sensor fusion addresses this problem by taking inputs from several sensors and merging the individual sensor readings into a single logical reading. The use of heterogeneous physical sensors allows a logical sensor to be less sensitive to the limitations of any single sensor technology, and the use of multiple identical sensors allows the system to tolerate failures of some of its component physical sensors. These are examples of fault masking, or N-modular redundancy. This research resolves two problems of fault masking systems: the automatic calibration of systems which return partially redundant image data is problematic, and the cost incurred by installing redundant system components can be prohibitive. Both are presented in mathematical terms as optimization problems. To combine inputs from multiple independent sensors, readings must be registered to a common coordinate system. This problem is complex when functions equating the readings are not known a priori. It is even more difficult in the case of sensor readings, where data contains noise and may have a sizable periodic component. A practical method must find a near optimal answer in the presence of large amounts of noise. The first part of this research derives a computational scheme capable of registering partially overlapping noisy sensor readings. Another problem with redundant systems is the cost incurred by redundancy. The trade-off between reliability and system cost is most evident in fault-tolerant systems. Given several component types with known dependability statistics, it is possible to determine the combinations of components which fulfill dependability constraints by modeling the system using Markov chains. When unit costs are known, it is desirable to use low cost combinations of components to fulfill the reliability constraints. The second part of this dissertation develops a methodology for designing sensor systems, with redundant components, which satisfy dependability constraints at near minimal cost. Open problems are also listed

    Combining Error-Correcting Codes and Decision Diagrams for the Design of Fault-Tolerant Logic

    Get PDF
    In modern logic circuits, fault-tolerance is increasingly important, since even atomic-scale imperfections can result in circuit failures as the size of the components is shrinking. Therefore, in addition to existing techniques for providing fault-tolerance to logic circuits, it is important to develop new techniques for detecting and correcting possible errors resulting from faults in the circuitry. Error-correcting codes are typically used in data transmission for error detection and correction. Their theory is far developed, and linear codes, in particular, have many useful properties and fast decoding algorithms. The existing fault-tolerance techniques utilizing error-correcting codes require less redundancy than other error detection and correction schemes, and such techniques are usually implemented using special decoding circuits. Decision diagrams are an efficient graphical representation for logic functions, which, depending on the technology, directly determine the complexity and layout of the circuit. Therefore, they are easy to implement. In this thesis, error-correcting codes are combined with decision diagrams to obtain a new method for providing fault-tolerance in logic circuits. The resulting method of designing fault-tolerant logic, namely error-correcting decision diagrams, introduces redundancy already to the representations of logic functions, and as a consequence no additional checker circuits are needed in the circuit layouts obtained with the new method. The purpose of the thesis is to introduce this original concept and provide fault-tolerance analysis for the obtained decision diagrams. The fault-tolerance analysis of error-correcting decision diagrams carried out in this thesis shows that the obtained robust diagrams have a significantly reduced probability for an incorrect output in comparison with non-redundant diagrams. However, such useful properties are not obtained without a cost, since adding redundancy also adds complexity, and consequently better error-correcting properties result in increased complexity in the circuit layout. /Kir1
    corecore