6,886 research outputs found

    Tolerating multiple faults in multistage interconnection networks with minimal extra stages

    Get PDF
    Adams and Siegel (1982) proposed an extra stage cube interconnection network that tolerates one switch failure with one extra stage. We extend their results and discover a class of extra stage interconnection networks that tolerate multiple switch failures with a minimal number of extra stages. Adopting the same fault model as Adams and Siegel, the faulty switches can be bypassed by a pair of demultiplexer/multiplexer combinations. It is easy to show that, to maintain point to point and broadcast connectivities, there must be at least S extra stages to tolerate I switch failures. We present the first known construction of an extra stage interconnection network that meets this lower-bound. This 12-dimensional multistage interconnection network has n+f stages and tolerates I switch failures. An n-bit label called mask is used for each stage that indicates the bit differences between the two inputs coming into a common switch. We designed the fault-tolerant construction such that it repeatedly uses the singleton basis of the n-dimensional vector space as the stage mask vectors. This construction is further generalized and we prove that an n-dimensional multistage interconnection network is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages span the n-dimensional vector space

    Fault-tolerant onboard digital information switching and routing for communications satellites

    Get PDF
    The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed

    Multicast in DKS(N, k, f) Overlay Networks

    Get PDF
    Recent developments in the area of peer-to-peer computing show that structured overlay networks implementing distributed hash tables scale well and can serve as infrastructures for Internet scale applications. We are developing a family of infrastructures, DKS(N; k; f), for the construction of peer-to-peer applications. An instance of DKS(N; k; f) is an overlay network that implements a distributed hash table and which has a number of desirable properties: low cost of communication, scalability, logarithmic lookup length, fault-tolerance and strong guarantees of locating any data item that was inserted in the system. In this paper, we show how multicast is achieved in DKS(N, k, f) overlay networks. The design presented here is attractive in three main respects. First, members of a multicast group self-organize in an instance of DKS(N, k, f) in a way that allows co-existence of groups of different sizes, degree of fault-tolerance, and maintenance cost, thereby, providing flexibility. Second, each member of a group can multicast, rather than having single source multicast. Third, within a group, dissemination of a multicast message is optimal under normal system operation in the sense that there are no redundant messages despite the presence of outdated routing information

    Self-Correcting Broadcast in Distributed Hash Tables

    Get PDF
    We present two broadcast algorithms that can be used on top of distributed hash tables (DHTs) to perform group communication and arbitrary queries. Unlike other P2P group communication mechanisms, which either embed extra information in the DHTs or use random overlay networks, our algorithms take advantage of the structured DHT overlay networks without maintaining additional information. The proposed algorithms do not send any redundant messages. Furthermore the two algorithms ensure 100% coverage of the nodes in the system even when routing information is outdated as a result of dynamism in the network. The first algorithm performs some correction of outdated routing table entries with a low cost of correction traffic. The second algorithm exploits the nature of the broadcasts to extensively update erroneous routing information at the cost of higher correction traffic. The algorithms are validated and evaluated in our stochastic distributed-algorithms simulator

    Exploiting the Synergy Between Gossiping and Structured Overlays

    Get PDF
    In this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale distributed systems. Despite the common goals, the two areas have, however, been relatively isolated. We focus on three problem domains where there is an untapped potential of using gossiping combined with SONs. We argue for applying gossip-based membership for ring-based SONs---such as Chord and Bamboo---to make them handle partition mergers and loopy networks. We argue that small world SONs---such as Accordion and Mercury---are specifically well-suited for gossip-based membership management. The benefits would be better graph-theoretic properties. Finally, we argue that gossip-based algorithms could use the overlay constructed by SONs. For example, many unreliable broadcast algorithms for SONs could be augmented with anti-entropy protocols. Similarly, gossip-based aggregation could be used in SONs for network size estimation and load-balancing purposes

    Self-stabilizing tree algorithms

    Full text link
    Designers of distributed algorithms have to contend with the problem of making the algorithms tolerant to several forms of coordination loss, primarily faulty initialization. The processes in a distributed system do not share a global memory and can only get a partial view of the global state. Transient failures in one part of the system may go unnoticed in other parts and thus cause the system to go into an illegal state. If the system were self-stabilizing, however, it is guaranteed that it will return to a legal state after a finite number of state transitions. This thesis presents and proves self-stabilizing algorithms for calculating tree metrics and for achieving mutual exclusion on a tree structured distributed system
