5,997 research outputs found

    Software-based fault-tolerant routing algorithm in multidimensional networks

    Get PDF
    Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The software-based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks (Suh et al., 2000). Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments

    On quantifying fault patterns of the mesh interconnect networks

    Get PDF
    One of the key issues in the design of Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and peerto- peer networks is the development of an efficient communication network to provide high throughput and low latency and its ability to survive beyond the failure of individual components. Generally, the faulty components may be coalesced into fault regions, which are classified into convex and concave shapes. In this paper, we propose a mathematical solution for counting the number of common fault patterns in a 2-D mesh interconnect network including both convex (|-shape, | |-shape, ý-shape) and concave (L-shape, Ushape, T-shape, +-shape, H-shape) regions. The results presented in this paper which have been validated through simulation experiments can play a key role when studying, particularly, the performance analysis of fault-tolerant routing algorithms and measure of a network fault-tolerance expressed as the probability of a disconnection

    Unattended network operations technology assessment study. Technical support for defining advanced satellite systems concepts

    Get PDF
    The results are summarized of an unattended network operations technology assessment study for the Space Exploration Initiative (SEI). The scope of the work included: (1) identified possible enhancements due to the proposed Mars communications network; (2) identified network operations on Mars; (3) performed a technology assessment of possible supporting technologies based on current and future approaches to network operations; and (4) developed a plan for the testing and development of these technologies. The most important results obtained are as follows: (1) addition of a third Mars Relay Satellite (MRS) and MRS cross link capabilities will enhance the network's fault tolerance capabilities through improved connectivity; (2) network functions can be divided into the six basic ISO network functional groups; (3) distributed artificial intelligence technologies will augment more traditional network management technologies to form the technological infrastructure of a virtually unattended network; and (4) a great effort is required to bring the current network technology levels for manned space communications up to the level needed for an automated fault tolerance Mars communications network

    A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes

    Full text link
    [EN] Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k-ary n-direct s-indirect family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault-tolerant routing methodology for the k-ary n-direct s-indirect topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source-destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are 10 faults in a 3-D network with 1000 nodes using only 1 intermediate node and more than 99.98% if 2 intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2-D network with 1024 nodes and 1% faulty links.This work was supported by the Spanish Ministerio de Economía y Competitividad (MINECO), by FEDER funds under Grant TIN2015-66972-C5-1-R, by Programa de Ayudas de Investigación y Desarrollo (PAID) from Universitat Politècnica de alència and by the financial support of the FP7 HiPEAC Network of Excellence under grant agreement 287759Peñaranda Cebrián, R.; Gómez Requena, ME.; López Rodríguez, PJ.; Gran, EG.; Skeie, T. (2017). A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes. Concurrency and Computation Practice and Experience. 29(13):1-11. https://doi.org/10.1002/cpe.4065S111291

    Improved Fair-Zone technique using Mobility Prediction in WSN

    Full text link
    The self-organizational ability of ad-hoc Wireless Sensor Networks (WSNs) has led them to be the most popular choice in ubiquitous computing. Clustering sensor nodes organizing them hierarchically have proven to be an effective method to provide better data aggregation and scalability for the sensor network while conserving limited energy. It has some limitation in energy and mobility of nodes. In this paper we propose a mobility prediction technique which tries overcoming above mentioned problems and improves the life time of the network. The technique used here is Exponential Moving Average for online updates of nodal contact probability in cluster based network.Comment: 10 pages, 7 figures, Published in International Journal Of Advanced Smart Sensor Network Systems (IJASSN

    Validation of a software dependability tool via fault injection experiments

    Get PDF
    Presents the validation of the strategies employed in the RECCO tool to analyze a C/C++ software; the RECCO compiler scans C/C++ source code to extract information about the significance of the variables that populate the program and the code structure itself. Experimental results gathered on an Open Source Router are used to compare and correlate two sets of critical variables, one obtained by fault injection experiments, and the other applying the RECCO tool, respectively. Then the two sets are analyzed, compared, and correlated to prove the effectiveness of RECCO's methodology

    Interconnection Networks for Scalable Quantum Computers

    Full text link
    We show that the problem of communication in a quantum computer reduces to constructing reliable quantum channels by distributing high-fidelity EPR pairs. We develop analytical models of the latency, bandwidth, error rate and resource utilization of such channels, and show that 100s of qubits must be distributed to accommodate a single data communication. Next, we show that a grid of teleportation nodes forms a good substrate on which to distribute EPR pairs. We also explore the control requirements for such a network. Finally, we propose a specific routing architecture and simulate the communication patterns of the Quantum Fourier Transform to demonstrate the impact of resource contention.Comment: To appear in International Symposium on Computer Architecture 2006 (ISCA 2006

    Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures

    Full text link
    Quantum computers have recently made great strides and are on a long-term path towards useful fault-tolerant computation. A dominant overhead in fault-tolerant quantum computation is the production of high-fidelity encoded qubits, called magic states, which enable reliable error-corrected computation. We present the first detailed designs of hardware functional units that implement space-time optimized magic-state factories for surface code error-corrected machines. Interactions among distant qubits require surface code braids (physical pathways on chip) which must be routed. Magic-state factories are circuits comprised of a complex set of braids that is more difficult to route than quantum circuits considered in previous work [1]. This paper explores the impact of scheduling techniques, such as gate reordering and qubit renaming, and we propose two novel mapping techniques: braid repulsion and dipole moment braid rotation. We combine these techniques with graph partitioning and community detection algorithms, and further introduce a stitching algorithm for mapping subgraphs onto a physical machine. Our results show a factor of 5.64 reduction in space-time volume compared to the best-known previous designs for magic-state factories.Comment: 13 pages, 10 figure

    Network-on-Chip

    Get PDF
    Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems
    corecore