74 research outputs found

    Reliability Analysis of the Hypercube Architecture.

    Get PDF
    This dissertation presents improved techniques for analyzing network-connected (NCF), 2-connected (2CF), task-based (TBF), and subcube (SF) functionality measures in a hypercube multiprocessor with faulty processing elements (PE) and/or communication elements (CE). These measures help study system-level fault tolerance issues and relate to various application modes in the hypercube. Solutions discussed in the text fall into probabilistic and deterministic models. The probabilistic measure assumes a stochastic graph of the hypercube where PE\u27s and/or CE\u27s may fail with certain probabilities, while the deterministic model considers that some system components are already failed and aims to determine the system functionality. For probabilistic model, MIL-HDBK-217F is used to predict PE and CE failure rates for an Intel iPSC system. First, a technique called CAREL is presented. A proof of its correctness is included in an appendix. Using the shelling ordering concept, CAREL is shown to solve the exact probabilistic NCF measure for a hypercube in time polynomial in the number of spanning trees. However, this number increases exponentially in the hypercube dimension. This dissertation, then, aims to more efficiently obtain lower and upper bounds on the measures. Algorithms, presented in the text, generate tighter bounds than had been obtained previously and run in time polynomial in the cube dimension. The proposed algorithms for probabilistic 2CF measure consider PE and/or CE failures. In attempting to evaluate deterministic measures, a hybrid method for fault tolerant broadcasting in the hypercube is proposed. This method combines the favorable features of redundant and non-redundant techniques. A generalized result on the deterministic TBF measure for the hypercube is then described. Two distributed algorithms are proposed to identify the largest operational subcubes in a hypercube C\sb{n} with faulty PE\u27s. Method 1, called LOS1, requires a list of faulty components and utilizes the CMB operator of CAREL to solve the problem. In case the number of unavailable nodes (faulty or busy) increases, an alternative distributed approach, called LOS2, processes m available nodes in O(mn) time. The proposed techniques are simple and efficient

    Efficient fault-tolerant routing in multihop optical WDM networks

    Get PDF
    This paper addresses the problem of efficient routing in unreliable multihop optical networks supported by Wavelength Division Multiplexing (WDM). We first define a new cost model for routing in (optical) WDM networks that is more general than the existing models. Our model takes into consideration not only the cost of wavelength access and conversion but also the delay for queuing signals arriving at different input channels that share the same output channel at the same node. We then propose a set of efficient algorithms in a reliable WDM network on the new cost model for each of the three most important communication patterns - multiple point-to-point routing, multicast, and multiple multicast. Finally, we show how to obtain a set of efficient algorithms in an unreliable WDM network with up to f faulty optical channels and wavelength conversion gates. Our strategy is to first enhance the physical paths constructed by the algorithms for reliable networks to ensure success of fault-tolerant routing, and then to route among the enhanced paths to establish a set of fault-free physical routes to complete the corresponding routing request for each of the communication patterns.published_or_final_versio

    Fault-tolerant adaptive and minimal routing in mesh-connected multicomputers using extended safety levels

    Full text link

    Load Redistribution on Hypercubes in the Presence of Faults

    Get PDF
    In this paper, we present load redistribution algorithms for hypercubes in the presence of faults. Our algorithms complete in low-order polynomial of the number of faulty nodes and exhibit excellent experimental performance. These algorithms are topology independent and can be applied to a wide variety of networks

    Efficient hypercube communications

    Get PDF
    Hypercube algorithms may be developed for a variety of communication-intensive tasks such as sending a message from one node to another, broadcasting a message from one node to all others, broadcasting a message from each node to all others, all-to-all personalized communication, one-to-all personalized communication, and exchanging messages between nodes via fixed permutations. All these communication patterns are special cases of many-to-many personalized communication. The problem of many-to-many personalized communication is investigated here. Two routing algorithms for many-to-many personalized communication are presented here. The algorithms proposed yield very high performance with respect to the number of time steps and packet transmissions. The first algorithm yields high performance through attempts to equibalance the number of messages at intermediate nodes. This technique tries to avoid creating a bottleneck at any node and thus reduces the total communication time. The second algorithm yields high performance through one-step time-lookahead equibalancing. It chooses from the candidate intermediate nodes the one which will probably have the minimum number of messages in the next cycle

    High Performance Software Reconfiguration in the Context of Distributed Systems and Interconnection Networks.

    Get PDF
    Designed algorithms that are useful for developing protocols and supporting tools for fault tolerance, dynamic load balancing, and distributing monitoring in loosely coupled multi-processor systems. Four efficient algorithms are developed to learn network topology and reconfigure distributed application programs in execution using the available tools for replication and process migration. The first algorithm provides techniques for transparent software reconfiguration based on process migration in the context of quadtree embeddings in Hypercubes. Our novel approach provides efficient reconfiguration for some classes of faults that may be identified easily. We provide a theoretical characterization to use graph matching, quadratic assignment, and a variety of branch and bound techniques to recover from general faults at run-time and maintain load balance. The second algorithm provides distributed recognition of articulation points, biconnected components, and bridges. Since the removal of an articulation point disconnects the network, knowledge about it may be used for selective replication. We have obtained the most efficient distributed algorithms with linear message complexity for the recognition of these properties. The third algorithm is an optimal linear message complexity distributed solution for recognizing graph planarity which is one of the most celebrated problems in graph theory and algorithm design. Recently, efficient shortest path algorithms are developed for planar graphs whose efficient recognition itself was left open. Our algorithm also leads to designing efficient distributed algorithm to recognize outer-planar graphs with applications in Hamiltonian path, shortest path routing and graph coloring. It is shown that efficient routing of information and distributing the stack needed for for planarity testing permit local computations leading to an efficient distributed algorithm. The fourth algorithm provides software redundancy techniques to provide fault tolerance to program structures. We consider the problem of mapping replicated program structures to provide efficient communication between modules in multiple replicas. We have obtained an optimal mapping of 2-replicated binary trees into hypercubes. For replication numbers greater than two, we provide efficient heuristic simulation results to provide efficient support for both \u27N-version programming\u27 and \u27Recovery block\u27 approaches for software replication

    More Improvement by Helping Ant to Fault-Tolerant Heuristic Routing Algorithm in Mesh Networks

    Get PDF
    Abstract: Routing with fault-tolerant mechanisms has a crucial effect on the fast exchange of information in variety of networks including mesh networks. This study attempts to choose an optimal path in terms of fault tolerance to transmit messages from source to destination while taking into account faulty nodes in such mesh networks. In this study, we take advantage of ant colony optimization algorithm to propose Adaptive Heuristic Routing algorithms to this problem. We use color pheromone ants to overcome problem of fail-recover behavior of network components. The proposed method is compared with fault-tolerant routing algorithm in mesh networks using the balanced ring. Simulation results depict that this method reacted quickly in terms of network faults, meanwhile in each time step the data can choose the optimal path to reach their destination. In this study, we improve performance of the proposed method using update ants to inform other nodes about the discovered shortest path. Simulation results show that the proposed method dramaticcaly increase efficiency of routing mechanism in mesh networks
    • …
    corecore