154 research outputs found

    Hierarchical gate-array routing on a hypercube multiprocessor

    Full text link
    Gate-arrays are the most common design style for semicustom VLSI integrated circuits. An important part of the gate-array design process is the routing of wires between the logic elements, which is an extremely compute-intensive operation. This paper presents an algorithm for routing gate-arrays that uses a hypercube connected parallel processor to provide the necessary computation power. In order to make optimal use of the hypercube, the routing algorithm is organized so that interprocessor communication is kept to minimum. It occurs only during the global routing and crossing placement phases of the algorithm, which constitute less than 15% of the total running time of the algorithm. On the basis of the results of executing the algorithm on two gate-array benchmarks the case is made for using hypercube multiprocessors as accelerators for compute-intensive CAD operations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/28650/1/0000466.pd

    A general analytical model of adaptive wormhole routing in k-ary n-cubes

    Get PDF
    Several analytical models of fully adaptive routing have recently been proposed for k-ary n-cubes and hypercube networks under the uniform traffic pattern. Although,hypercube is a special case of k-ary n-cubes topology, the modeling approach for hypercube is more accurate than karyn-cubes due to its simpler structure. This paper proposes a general analytical model to predict message latency in wormhole-routed k-ary n-cubes with fully adaptive routing that uses a similar modeling approach to hypercube. The analysis focuses Duato's fully adaptive routing algorithm [12], which is widely accepted as the most general algorithm for achieving adaptivity in wormhole-routed networks while allowing for an efficient router implementation. The proposed model is general enough that it can be used for hypercube and other fully adaptive routing algorithms

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    Performance evaluation of distributed crossbar switch hypermesh

    Get PDF
    The interconnection network is one of the most crucial components in any multicomputer as it greatly influences the overall system performance. Several recent studies have suggested that hypergraph networks, such as the Distributed Crossbar Switch Hypermesh (DCSH), exhibit superior topological and performance characteristics over many traditional graph networks, e.g. k-ary n-cubes. Previous work on the DCSH has focused on issues related to implementation and performance comparisons with existing networks. These comparisons have so far been confined to deterministic routing and unicast (one-to-one) communication. Using analytical models validated through simulation experiments, this thesis extends that analysis to include adaptive routing and broadcast communication. The study concentrates on wormhole switching, which has been widely adopted in practical multicomputers, thanks to its low buffering requirement and the reduced dependence of latency on distance under low traffic. Adaptive routing has recently been proposed as a means of improving network performance, but while the comparative evaluation of adaptive and deterministic routing has been widely reported in the literature, the focus has been on graph networks. The first part of this thesis deals with adaptive routing, developing an analytical model to measure latency in the DCSH, and which is used throughout the rest of the work for performance comparisons. Also, an investigation of different routing algorithms in this network is presented. Conventional k-ary n-cubes have been the underlying topology of contemporary multicomputers, but it is only recently that adaptive routing has been incorporated into such systems. The thesis studies the relative performance merits of the DCSH and k-ary n-cubes under adaptive routing strategy. The analysis takes into consideration real-world factors, such as router complexity and bandwidth constraints imposed by implementation technology. However, in any network, the routing of unicast messages is not the only factor in traffic control. In many situations (for example, parallel iterative algorithms, memory update and invalidation procedures in shared memory systems, global notification of network errors), there is a significant requirement for broadcast traffic. The DCSH, by virtue of its use of hypergraph links, can implement broadcast operations particularly efficiently. The second part of the thesis examines how the DCSH and k-ary n-cube performance is affected by the presence of a broadcast traffic component. In general, these studies demonstrate that because of their relatively high diameter, k-ary n-cubes perform poorly when message lengths are short. This is consistent with earlier more simplistic analyses which led to the proposal for the express-cube, an enhancement of the basic k-ary n-cube structure, which provides additional express channels, allowing messages to bypass groups of nodes along their paths. The final part of the thesis investigates whether this "partial bypassing" can compete with the "total bypassing" capability provided inherently by the DCSH topology

    Towards the Teraflop CFD

    Get PDF
    We are surveying current projects in the area of parallel supercomputers. The machines considered here will become commercially available in the 1990 - 1992 time frame. All are suitable for exploring the critical issues in applying parallel processors to large scale scientific computations, in particular CFD calculations. This chapter presents an overview of the surveyed machines, and a detailed analysis of the various architectural and technology approaches taken. Particular emphasis is placed on the feasibility of a Teraflops capability following the paths proposed by various developers

    Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

    Get PDF
    Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)

    Parallel Processing for VLSI CAD Applications a Tutorial

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research CorporationAuthor's name appears in front matter as Prithviraj Banerje

    Master of Science

    Get PDF
    thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects
    • …
    corecore