63,324 research outputs found

    Computing in the RAIN: a reliable array of independent nodes

    Get PDF
    The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

    The Raincore Distributed Session Service for Networking Elements

    Get PDF
    Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer protocols for networking elements. These protocols are designed to enable a network cluster to share the state information necessary for balancing network traffic and computation load among a group of networking elements. In addition, in the presence of failures, they allow network traffic to fail-over from failed networking elements to healthy ones. To maximize the overall network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals

    Development of a selftriggered high counting rate ASIC for readout of 2D gas microstrip neutron detectors

    Get PDF
    In the frame of the DETNI project a 32-channel ASIC suitable for readout of a novel 2D thermal neutron detector based on a hybrid low-pressure Micro-Strip Gas Chamber with solid 157Gd converter has been developed. Each channel delivers position information, a fast time stamp of 2 ns resolution and the signal amplitude (called energy below). The time stamp is used for correlating the signals from X and Y strips while the amplitude is used for finding the center of gravity of a cluster of strips. The timing and energy information are stored in derandomizing buffers and read out via token ring architecture

    Telemetry downlink interfaces and level-zero processing

    Get PDF
    The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed

    Algebraic Models for Contextual Nets

    No full text
    We extend the algebraic approach of Meseguer and Montanari from ordinary place/transition Petri nets to contextual nets, covering both the collective and the individual token philosophy uniformly along the two interpretations of net behaviors

    Spacelab system analysis: A study of the Marshall Avionics System Testbed (MAST)

    Get PDF
    An analysis of the Marshall Avionics Systems Testbed (MAST) communications requirements is presented. The average offered load for typical nodes is estimated. Suitable local area networks are determined

    Parameterized Synthesis

    Full text link
    We study the synthesis problem for distributed architectures with a parametric number of finite-state components. Parameterized specifications arise naturally in a synthesis setting, but thus far it was unclear how to detect realizability and how to perform synthesis in a parameterized setting. Using a classical result from verification, we show that for a class of specifications in indexed LTL\X, parameterized synthesis in token ring networks is equivalent to distributed synthesis in a network consisting of a few copies of a single process. Adapting a well-known result from distributed synthesis, we show that the latter problem is undecidable. We describe a semi-decision procedure for the parameterized synthesis problem in token rings, based on bounded synthesis. We extend the approach to parameterized synthesis in token-passing networks with arbitrary topologies, and show applicability on a simple case study. Finally, we sketch a general framework for parameterized synthesis based on cutoffs and other parameterized verification techniques.Comment: Extended version of TACAS 2012 paper, 29 page

    Crux: Locality-Preserving Distributed Services

    Full text link
    Distributed systems achieve scalability by distributing load across many machines, but wide-area deployments can introduce worst-case response latencies proportional to the network's diameter. Crux is a general framework to build locality-preserving distributed systems, by transforming an existing scalable distributed algorithm A into a new locality-preserving algorithm ALP, which guarantees for any two clients u and v interacting via ALP that their interactions exhibit worst-case response latencies proportional to the network latency between u and v. Crux builds on compact-routing theory, but generalizes these techniques beyond routing applications. Crux provides weak and strong consistency flavors, and shows latency improvements for localized interactions in both cases, specifically up to several orders of magnitude for weakly-consistent Crux (from roughly 900ms to 1ms). We deployed on PlanetLab locality-preserving versions of a Memcached distributed cache, a Bamboo distributed hash table, and a Redis publish/subscribe. Our results indicate that Crux is effective and applicable to a variety of existing distributed algorithms.Comment: 11 figure
    corecore