35,294 research outputs found
Computing in the RAIN: a reliable array of independent nodes
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology
Recommended from our members
Collision Avoidance Tree networks
The Collision Avoidance Tree is a new local area network based on a hardware device called collision avoidance switch, which arbitrates random access to a shared communications channel. Collision Avoidance Tree combines the benefits of random access (low delay when traffic is light; simple, distributed, and therefore robust, protocols) with concurrency of transmission, excellent network utilization and suitability for the domain of high-speed, optical networking.The Collision Avoidance Tree is classified in two classes: the Collision Avoidance Single Broadcast (CASB) Tree and the Collision Avoidance Multiple Broadcast (CAMB) Tree. The CASB Tree allows only a single transmission on the network at a given time, while the CAMB Tree is more general and allows concurrent transmissions on the network.This paper describes network architectures (e.g., station and switch protocols) and designs and implementations of the CASB and CAMB Trees. Performance results derived from analyses, simulations, measurements of experimental networks are also presented
Recommended from our members
Analysis of a class of distributed queues with application
Recently we have developed a class of media access control algorithms for different types of Local Area Networks. A common feature of these LAN algorithms is that they represent various strategies by which the processors in the LAN can simulate the availability of a centralized packet transport facility, but whose service incorporates a particular type of change over time known as 'moving sever' overhead. First we describe the operation of moving server systems in general, for both First-Come - First-Served and Head-of-the-Line orders of service, together with an approach for their delay analysis in which we transform the moving server queueing system into a conventional queueing system having proportional waiting times. Then we describe how the various LAN algorithms may be obtained from the ideal moving server system, and how a significant component of their performance characteristics is determined by the performance characteristics of that ideal system. Finally, we evaluate the compatibility of such LAN algorithms with separable queueing network models of distributed systems by computing the interdeparture time distribution for M/M/1 in the presence of moving server overhead. Although it is not exponential, except in the limits of low server utilization or low overhead, the interdeparture time distribution is a weighted sum of exponential terms with a coefficient of variation not much smaller than unity. Thus, we conjecture that a service centre with moving server overhead could be used to represent one of these LAN algorithms in a product form queueing network model of a distributed system without introducing significant approximation errors
Crux: Locality-Preserving Distributed Services
Distributed systems achieve scalability by distributing load across many
machines, but wide-area deployments can introduce worst-case response latencies
proportional to the network's diameter. Crux is a general framework to build
locality-preserving distributed systems, by transforming an existing scalable
distributed algorithm A into a new locality-preserving algorithm ALP, which
guarantees for any two clients u and v interacting via ALP that their
interactions exhibit worst-case response latencies proportional to the network
latency between u and v. Crux builds on compact-routing theory, but generalizes
these techniques beyond routing applications. Crux provides weak and strong
consistency flavors, and shows latency improvements for localized interactions
in both cases, specifically up to several orders of magnitude for
weakly-consistent Crux (from roughly 900ms to 1ms). We deployed on PlanetLab
locality-preserving versions of a Memcached distributed cache, a Bamboo
distributed hash table, and a Redis publish/subscribe. Our results indicate
that Crux is effective and applicable to a variety of existing distributed
algorithms.Comment: 11 figure
Queueing models for token and slotted ring networks
Currently the end-to-end delay characteristics of very high speed local area networks are not well understood. The transmission speed of computer networks is increasing, and local area networks especially are finding increasing use in real time systems. Ring networks operation is generally well understood for both token rings and slotted rings. There is, however, a severe lack of queueing models for high layer operation. There are several factors which contribute to the processing delay of a packet, as opposed to the transmission delay, e.g., packet priority, its length, the user load, the processor load, the use of priority preemption, the use of preemption at packet reception, the number of processors, the number of protocol processing layers, the speed of each processor, and queue length limitations. Currently existing medium access queueing models are extended by adding modeling techniques which will handle exhaustive limited service both with and without priority traffic, and modeling capabilities are extended into the upper layers of the OSI model. Some of the model are parameterized solution methods, since it is shown that certain models do not exist as parameterized solutions, but rather as solution methods
Spacelab system analysis: A study of communications systems for advanced launch systems
An analysis of the required performance of internal avionics data bases for future launch vehicles is presented. Suitable local area networks that can service these requirements are determined
Communicating Personal Gadgets
This paper focuses on communication in personal area networks. A personal area networks (PAN) is characterized as an informal collection, or community, of connected small, lightweight, and resource-lean devices, or gadgets. Two basic concepts are visible in the development of PANs, the distributed and the centralized concept. The paper introduces a real-time communication protocol that is suitable for both concepts. The communication protocol can deal with several types of traffic: real-time or nonreal- time, bursty or isochronous, high or low bitrate. The protocol is undemanding in terms of resources, so even simple devices can participate in the network. The network is simulated and a prototype is realized
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
Spacelab system analysis: A study of the Marshall Avionics System Testbed (MAST)
An analysis of the Marshall Avionics Systems Testbed (MAST) communications requirements is presented. The average offered load for typical nodes is estimated. Suitable local area networks are determined
- …