63,324 research outputs found
Computing in the RAIN: a reliable array of independent nodes
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology
The Raincore Distributed Session Service for Networking Elements
Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer
protocols for networking elements. These protocols are
designed to enable a network cluster to share the state
information necessary for balancing network traffic and
computation load among a group of networking elements.
In addition, in the presence of failures, they allow
network traffic to fail-over from failed networking
elements to healthy ones. To maximize the overall
network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals
Development of a selftriggered high counting rate ASIC for readout of 2D gas microstrip neutron detectors
In the frame of the DETNI project a 32-channel ASIC suitable for readout of a novel 2D thermal neutron detector based on a hybrid low-pressure Micro-Strip Gas Chamber with solid 157Gd converter has been developed. Each channel delivers position information, a fast time stamp of 2 ns resolution and the signal amplitude (called energy below). The time stamp is used for correlating the signals from X and Y strips while the amplitude is used for finding the center of gravity of a cluster of strips. The timing and energy information are stored in derandomizing buffers and read out via token ring architecture
Telemetry downlink interfaces and level-zero processing
The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed
Algebraic Models for Contextual Nets
We extend the algebraic approach of Meseguer and Montanari from ordinary place/transition Petri nets to contextual nets, covering both the collective and the individual token philosophy uniformly along the two interpretations of net behaviors
Spacelab system analysis: A study of the Marshall Avionics System Testbed (MAST)
An analysis of the Marshall Avionics Systems Testbed (MAST) communications requirements is presented. The average offered load for typical nodes is estimated. Suitable local area networks are determined
Parameterized Synthesis
We study the synthesis problem for distributed architectures with a
parametric number of finite-state components. Parameterized specifications
arise naturally in a synthesis setting, but thus far it was unclear how to
detect realizability and how to perform synthesis in a parameterized setting.
Using a classical result from verification, we show that for a class of
specifications in indexed LTL\X, parameterized synthesis in token ring networks
is equivalent to distributed synthesis in a network consisting of a few copies
of a single process. Adapting a well-known result from distributed synthesis,
we show that the latter problem is undecidable. We describe a semi-decision
procedure for the parameterized synthesis problem in token rings, based on
bounded synthesis. We extend the approach to parameterized synthesis in
token-passing networks with arbitrary topologies, and show applicability on a
simple case study. Finally, we sketch a general framework for parameterized
synthesis based on cutoffs and other parameterized verification techniques.Comment: Extended version of TACAS 2012 paper, 29 page
Crux: Locality-Preserving Distributed Services
Distributed systems achieve scalability by distributing load across many
machines, but wide-area deployments can introduce worst-case response latencies
proportional to the network's diameter. Crux is a general framework to build
locality-preserving distributed systems, by transforming an existing scalable
distributed algorithm A into a new locality-preserving algorithm ALP, which
guarantees for any two clients u and v interacting via ALP that their
interactions exhibit worst-case response latencies proportional to the network
latency between u and v. Crux builds on compact-routing theory, but generalizes
these techniques beyond routing applications. Crux provides weak and strong
consistency flavors, and shows latency improvements for localized interactions
in both cases, specifically up to several orders of magnitude for
weakly-consistent Crux (from roughly 900ms to 1ms). We deployed on PlanetLab
locality-preserving versions of a Memcached distributed cache, a Bamboo
distributed hash table, and a Redis publish/subscribe. Our results indicate
that Crux is effective and applicable to a variety of existing distributed
algorithms.Comment: 11 figure
- …