2,796 research outputs found

    Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

    Get PDF
    Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

    Improving Scalability and Usability of Parallel Runtime Environments for High Availability and High Performance Systems

    Get PDF
    The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. Hence, parallel runtime environments have to support and adapt to the underlying platforms that require scalability and fault management in more and more dynamic environments. This dissertation aims to analyze, understand and improve the state of the art mechanisms for managing highly dynamic, large scale applications. This dissertation demonstrates that the use of new scalable and fault-tolerant topologies, combined with rerouting techniques, builds parallel runtime environments, which are able to efficiently and reliably deliver sets of information to a large number of processes. Several important graph properties are provided to illustrate the theoretical capability of these topologies in terms of both scalability and fault-tolerance, such as reasonable degree, regular graph, low diameter, symmetric graph, low cost factor, low message traffic density, optimal connectivity, low fault-diameter and strongly resilient. The dissertation builds a communication framework based on these topologies to support parallel runtime environments. Such a framework can handle multiple types of messages, e.g., unicast, multicast, broadcast and all-gather. Additionally, the communication framework has been formally verified to work in both normal and failure circumstances without creating any of the common problems such as broadcast storm, deadlock and non-progress cycle

    Topology design for time-varying networks

    Get PDF
    Traditional wireless networks seek to support end-to-end communication through either a single-hop wireless link to infrastructure or multi-hop wireless path to some destination. However, in some wireless networks (such as delay tolerant networks, or mobile social networks), due to sparse node distribution, node mobility, and time-varying network topology, end-to-end paths between the source and destination are not always available. In such networks, the lack of continuous connectivity, network partitioning, and long delays make design of network protocols very challenging. Previous DTN or time-varying network research mainly focuses on routing and information propagation. However, with large number of wireless devices' participation, and a lot of network functionality depends on the topology, how to maintain efficient and dynamic topology of a time-varying network becomes crucial. In this dissertation, I model a time-evolving network as a directed time-space graph which includes both spacial and temporal information of the network, then I study various topology control problems with such time-space graphs. First, I study the basic topology design problem where the links of the network are reliable. It aims to build a sparse structure from the original time-space graph such that (1) the network is still connected over time and/or supports efficient routing between any two nodes; (2) the total cost of the structure is minimized. I first prove that this problem is NP-hard, and then propose several greedy-based methods as solutions. Second, I further study a cost-efficient topology design problem, which not only requires the above two objective, but also guarantees that the spanning ratio of the topology is bounded by a given threshold. This problem is also NP-hard, and I give several greedy algorithms to solve it. Last, I consider a new topology design problem by relaxing the assumption of reliable links. Notice that in wireless networks the topologies are not quit predictable and the links are often unreliable. In this new model, each link has a probability to reflect its reliability. The new reliable topology design problem aims to build a sparse structure from the original space-time graph such that (1) for any pair of devices, there is a space-time path connecting them with the reliability larger than a required threshold; (2) the total cost of the structure is minimized. Several heuristics are proposed, which can significantly reduce the total cost of the topology while maintain the connectivity or reliability over time. Extensive simulations on both random networks and real-life tracing data have been conducted, and results demonstrate the efficiency of the proposed methods

    Adaptive Routing Approaches for Networked Many-Core Systems

    Get PDF
    Through advances in technology, System-on-Chip design is moving towards integrating tens to hundreds of intellectual property blocks into a single chip. In such a many-core system, on-chip communication becomes a performance bottleneck for high performance designs. Network-on-Chip (NoC) has emerged as a viable solution for the communication challenges in highly complex chips. The NoC architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication challenges such as wiring complexity, communication latency, and bandwidth. Furthermore, the combined benefits of 3D IC and NoC schemes provide the possibility of designing a high performance system in a limited chip area. The major advantages of 3D NoCs are the considerable reductions in average latency and power consumption. There are several factors degrading the performance of NoCs. In this thesis, we investigate three main performance-limiting factors: network congestion, faults, and the lack of efficient multicast support. We address these issues by the means of routing algorithms. Congestion of data packets may lead to increased network latency and power consumption. Thus, we propose three different approaches for alleviating such congestion in the network. The first approach is based on measuring the congestion information in different regions of the network, distributing the information over the network, and utilizing this information when making a routing decision. The second approach employs a learning method to dynamically find the less congested routes according to the underlying traffic. The third approach is based on a fuzzy-logic technique to perform better routing decisions when traffic information of different routes is available. Faults affect performance significantly, as then packets should take longer paths in order to be routed around the faults, which in turn increases congestion around the faulty regions. We propose four methods to tolerate faults at the link and switch level by using only the shortest paths as long as such path exists. The unique characteristic among these methods is the toleration of faults while also maintaining the performance of NoCs. To the best of our knowledge, these algorithms are the first approaches to bypassing faults prior to reaching them while avoiding unnecessary misrouting of packets. Current implementations of multicast communication result in a significant performance loss for unicast traffic. This is due to the fact that the routing rules of multicast packets limit the adaptivity of unicast packets. We present an approach in which both unicast and multicast packets can be efficiently routed within the network. While suggesting a more efficient multicast support, the proposed approach does not affect the performance of unicast routing at all. In addition, in order to reduce the overall path length of multicast packets, we present several partitioning methods along with their analytical models for latency measurement. This approach is discussed in the context of 3D mesh networks.Siirretty Doriast

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

    A new fault-tolerant configuration for the Cambridge Ring: the Hierarchical Ring-Star

    Get PDF
    The primary objective of this research is to look at ways of resolving the reliability problems of the Cambridge Ring local area network system. The result is a novel design to enhance the Cambridge Ring with fault tolerance by introducing redundant communication paths with dynamic reconfiguration. The proposed Ring-Star system combines the advantages of ring and star networks to create a network which is topologically resilient while retaining the efficient communication advantage of rings. [Continues.

    An Improvement in Congestion Control Using Multipath Routing in Manet

    Get PDF
    The ad hoc connections, which opens many opportunities for MANET applications. In ad hoc network nodes are movable and there is no centralised management. Routing is an important factor in mobile ad hoc network which not only works well with a small network, but also it can also work well if network get expanded dynamically. Routing in Manets is a main factor considered among all the issues. Mobile nodes in Manet have limited transmission capacity, they intercommunicate by multi hop relay. Multi hop routing have many challenges such as limited wireless bandwidth, low device power, dynamically changing network topology, and high vulnerability to Failure. To answer those challenges, many routing algorithms in Manets were proposed. But one of the problems in routing algorithm is congestion which decreases the overall performance of the network so in this paper we are trying to identify the best routing algorithm which will improve the congestion control mechanism among all the Multipath routing protocols

    High-level services for networks-on-chip

    Get PDF
    Future technology trends envision that next-generation Multiprocessors Systems-on- Chip (MPSoCs) will be composed of a combination of a large number of processing and storage elements interconnected by complex communication architectures. Communication and interconnection between these basic blocks play a role of crucial importance when the number of these elements increases. Enabling reliable communication channels between cores becomes therefore a challenge for system designers. Networks-on-Chip (NoCs) appeared as a strategy for connecting and managing the communication between several design elements and IP blocks, as required in complex Systems-on-Chip (SoCs). The topic can be considered as a multidisciplinary synthesis of multiprocessing, parallel computing, networking, and on- chip communication domains. Networks-on-Chip, in addition to standard communication services, can be employed for providing support for the implementation of system-level services. This dissertation will demonstrate how high-level services can be added to an MPSoC platform by embedding appropriate hardware/software support in the network interfaces (NIs) of the NoC. In this dissertation, the implementation of innovative modules acting in parallel with protocol translation and data transmission in NIs is proposed and evaluated. The modules can support the execution of the high-level services in the NoC at a relatively low cost in terms of area and energy consumption. Three types of services will be addressed and discussed: security, monitoring, and fault tolerance. With respect to the security aspect, this dissertation will discuss the implementation of an innovative data protection mechanism for detecting and preventing illegal accesses to protected memory blocks and/or memory mapped peripherals. The second aspect will be addressed by proposing the implementation of a monitoring system based on programmable multipurpose monitoring probes aimed at detecting NoC internal events and run-time characteristics. As last topic, new architectural solutions for the design of fault tolerant network interfaces will be presented and discussed
    corecore