766 research outputs found

    On quantifying fault patterns of the mesh interconnect networks

    Get PDF
    One of the key issues in the design of Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and peerto- peer networks is the development of an efficient communication network to provide high throughput and low latency and its ability to survive beyond the failure of individual components. Generally, the faulty components may be coalesced into fault regions, which are classified into convex and concave shapes. In this paper, we propose a mathematical solution for counting the number of common fault patterns in a 2-D mesh interconnect network including both convex (|-shape, | |-shape, ý-shape) and concave (L-shape, Ushape, T-shape, +-shape, H-shape) regions. The results presented in this paper which have been validated through simulation experiments can play a key role when studying, particularly, the performance analysis of fault-tolerant routing algorithms and measure of a network fault-tolerance expressed as the probability of a disconnection

    LAPSES: A Recipe for High-Performance Adaptive Router Design

    Get PDF
    Earlier research has shown that adaptive routing can help in improving network performance. However, it has not received adequate attention in commercial routers mainly due to the additional hardware complexity, and the perceived cost and performance degradation that may result from this complexity. These concerns can be mitigated if one can design a cost-effective router that can support adaptive routing. This paper proposes a three step recipe — Look-Ahead routing, intelligent Path Selection, and an Economic Storage implementation, called the LAPSES approach — for cost-effective high performance pipelined adaptive router design. The first step, look-ahead routing, reduces a pipeline stage in the router by making table lookup and arbitration concurrent. Next, three new traffic-sensitive path selection heuristics (LRU, LFU and MAX-CREDIT) are proposed to select one of the available alternate paths. Finally, two techniques for reducing routing table size of the adaptive router are presented. These are called meta-table routing and economical storage. The proposed economical storage needs a routing table with only 9 and 27 entries for two and three dimensional meshes, respectively. All these design ideas are evaluated on a (16 16) mesh network via simulation. A fully adaptive algorithm and various traffic patterns are used to examine the performance benefits. Performance results show that the look-ahead design as well as the path selection heuristics boost network performance, while the economical storage approach turns out to be an ideal choice in comparison to full-table and meta-table options. We believe the router resulting from these three design enhancements can make adaptive routing a viable choice for interconnects.

    Embedded dynamic programming networks for networks-on-chip

    Get PDF
    PhD ThesisRelentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it presents a new dynamic programming-based run-time thermal management (DPRTM) system, including reactive and proactive schemes, to effectively diffuse heat throughout NoC-based CMPs by routing packets through the coolest paths, when the temperature does not exceed chip’s thermal limit. When the thermal limit is exceeded, throttling is employed to mitigate heat in the chip and DPRTM changes its course to avoid throttled paths and to minimize the impact of throttling on chip performance. This thesis enables a new avenue to explore a novel run-time resource management infrastructure for NoCs, in which new methodologies and concepts are proposed to enhance the on-chip networks for future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)

    Automating Topology Aware Mapping for Supercomputers

    Get PDF
    Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Computation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Mapping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of mapping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algorithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future.published or submitted for publicationnot peer reviewe
    corecore