369 research outputs found

    Fault-tolerant meshes and hypercubes with minimal numbers of spares

    Get PDF
    Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes

    Fault-tolerant meshes and hypercubes with minimal numbers of spares

    Full text link

    Compilation Optimizations to Enhance Resilience of Big Data Programs and Quantum Processors

    Get PDF
    Modern computers can experience a variety of transient errors due to the surrounding environment, known as soft faults. Although the frequency of these faults is low enough to not be noticeable on personal computers, they become a considerable concern during large-scale distributed computations or systems in more vulnerable environments like satellites. These faults occur as a bit flip of some value in a register, operation, or memory during execution. They surface as either program crashes, hangs, or silent data corruption (SDC), each of which can waste time, money, and resources. Hardware methods, such as shielding or error correcting memory (ECM), exist, though they can be difficult to implement, expensive, and may be limited to only protecting against errors in specific locations. Researchers have been exploring software detection and correction methods as an alternative, commonly trading either overhead in execution time or memory usage to protect against faults. Quantum computers, a relatively recent advancement in computing technology, experience similar errors on a much more severe scale. The errors are more frequent, costly, and difficult to detect and correct. Error correction algorithms like Shor’s code promise to completely remove errors, but they cannot be implemented on current noisy intermediate-scale quantum (NISQ) systems due to the low number of available qubits. Until the physical systems become large enough to support error correction, researchers instead have been studying other methods to reduce and compensate for errors. In this work, we present two methods for improving the resilience of classical processes, both single- and multi-threaded. We then introduce quantum computing and compare the nature of errors and correction methods to previous classical methods. We further discuss two designs for improving compilation of quantum circuits. One method, focused on quantum neural networks (QNNs), takes advantage of partial compilation to avoid recompiling the entire circuit each time. The other method is a new approach to compiling quantum circuits using graph neural networks (GNNs) to improve the resilience of quantum circuits and increase fidelity. By using GNNs with reinforcement learning, we can train a compiler to provide improved qubit allocation that improves the success rate of quantum circuits

    Composites for Advanced Drive Systems, A Systems Analysis-Revolutionary Vertical Lift Technology (RVLT)

    Get PDF
    Rotorcraft propulsion systems are continually looking to improve power density; that is reducing weight and increasing power throughput. In order to advance rotorcraft propulsion system technology, NASA Glenn Research Center (NGRC) contracted Boeing Vertical Lift (Contract #NNA15AB12B, Task Order NNA16BE07T) to perform system level benefit assessments for incorporation of composite materials into rotorcraft transmission gear and shaft systems, in the rotating frame. In general, the environment inside a typical rotorcraft transmission is aggressive for typical composite materials. Design challenges in the rotating frame and related safety risks must be understood and accounted for in the design. Boeing developed a technical approach that evaluated a relatively large population of rotorcraft main transmissions. This technical approach took rotorcraft from various size classes and configurations and applied parametric weight estimating principles to assess the performance impact of composite hybrid technologies inside transmissions, in the rotating frame. Parametric weight estimates showed that composite hybrid technologies account for an average 9% weight savings over the baseline transmissions. More weight savings may be observed when accounting for quantity of transmissions in an aircraft configuration and benefits to airframe, landing gear, and fuel systems. A weight reduction of 595 lbs was calculated for NASA's Large Civil Tilt Rotor (LCTR2) by utilizing composite hybrid components inside the Prop-Rotor Transmission in the rotating frame and accounting for design changes to the airframe, landing gear, and fuel system. In order to develop composite hybrid technologies, sub-scale and full-scale testing should continue, building on the work that NGRC has begun. Design and testing efforts should focus on technical challenges, such as joint and attachment interfaces, temperature effects, inspection procedures, and fault detection. It is recommended to address technical challenges with targeted research and development efforts, conducted at relevant scale, prior to incorporating composite hybrid technologies within the rotating frame of helicopter transmissions

    Multiple Bus Networks for Binary -Tree Algorithms.

    Get PDF
    Multiple bus networks (MBN) connect processors via buses. This dissertation addresses issues related to running binary-tree algorithms on MBNs. These algorithms are of a fundamental nature, and reduce inputs at leaves of a binary tree to a result at the root. We study the relationships between running time, degree (maximum number of connections per processor) and loading (maximum number of connections per bus). We also investigate fault-tolerance, meshes enhanced with MBNs, and VLSI layouts for binary-tree MBNs. We prove that the loading of optimal-time, degree-2, binary-tree MBNs is non-constant. In establishing this result, we derive three loading lower bounds Wn , W&parl0;n23&parr0; and W&parl0;nlogn&parr0; , each tighter than the previous one. We also show that if the degree is increased to 3, then the loading can be a constant. A constant loading degree-2 MBN exists, if the algorithm is allowed to run slower than the optimal. We introduce a new enhanced mesh architecture (employing binary-tree MBNs) that captures features of all existing enhanced meshes. This architecture is more flexible, allowing all existing enhanced mesh results to be ported to a more implementable platform. We present two methods for imparting tolerance to bus and processor faults in binary-tree MBNs. One of the methods is general, and can be used with any MBN and for both processor and bus faults. A key feature of this method is that it permits the network designer to designate a set of buses as unimportant and consider all faulty buses as unimportant. This minimizes the impact of faulty elements on the MBN. The second method is specific to bus faults in binary-tree MBNs, whose features it exploits to produce faster solutions. We also derive a series of results that distill the lower bound on the perimeter layout area of optimal-time, binary-tree MBNs to a single conjecture. Based on this we believe that optimal-time, binary-tree MBNs require no less area than a balanced tree topology even though such MBNs can reuse buses over various steps of the algorithm

    Combinatorial Structures in Hypercubes

    Get PDF

    Some studies on the multi-mesh architecture.

    Get PDF
    In this thesis, we have reported our investigations on interconnection network architectures based on the idea of a recently proposed multi-processor architecture, Multi-Mesh network. This includes the development of a new interconnection architecture, study of its topological properties and a proposal for implementing Multi-Mesh using optical technology. We have presented a new network topology, called the 3D Multi-Mesh (3D MM) that is an extension of the Multi-Mesh architecture [DDS99]. This network consists of n3 three-dimensional meshes (termed as 3D blocks), each having n3 processors, interconnected in a suitable manner so that the resulting topology is 6-regular with n6 processors and a diameter of only 3n. We have shown that the connectivity of this network is 6. We have explored an algorithm for point-to-point communication on the 3D MM. It is expected that this architecture will enable more efficient algorithm mapping compared to existing architectures. We have also proposed some implementation of the multi-mesh avoiding the electronic bottleneck due to long copper wires for communication between some processors. Our implementation considers a number of realistic scenarios based on hybrid (optical and electronic) communication. One unique feature of this investigation is our use of WDM wavelength routing and the protection scheme. We are not aware of any implementation of interconnection networks using these techniques.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .A32. Source: Masters Abstracts International, Volume: 43-03, page: 0868. Adviser: Subir Bandyopadhyay. Thesis (M.Sc.)--University of Windsor (Canada), 2004
    corecore