231 research outputs found

    Deterministic 1-k routing on meshes with applications to worm-hole routing

    Get PDF
    In 11-kk routing each of the n2n^2 processing units of an n×nn \times n mesh connected computer initially holds 11 packet which must be routed such that any processor is the destination of at most kk packets. This problem reflects practical desire for routing better than the popular routing of permutations. 11-kk routing also has implications for hot-potato worm-hole routing, which is of great importance for real world systems. We present a near-optimal deterministic algorithm running in \sqrt{k} \cdot n / 2 + \go{n} steps. We give a second algorithm with slightly worse routing time but working queue size three. Applying this algorithm considerably reduces the routing time of hot-potato worm-hole routing. Non-trivial extensions are given to the general ll-kk routing problem and for routing on higher dimensional meshes. Finally we show that kk-kk routing can be performed in \go{k \cdot n} steps with working queue size four. Hereby the hot-potato worm-hole routing problem can be solved in \go{k^{3/2} \cdot n} steps

    Online Permutation Routing in Partitioned Optical Passive Star Networks

    Full text link
    This paper establishes the state of the art in both deterministic and randomized online permutation routing in the POPS network. Indeed, we show that any permutation can be routed online on a POPS network either with O(dglogg)O(\frac{d}{g}\log g) deterministic slots, or, with high probability, with 5cd/g+o(d/g)+O(loglogg)5c\lceil d/g\rceil+o(d/g)+O(\log\log g) randomized slots, where constant c=exp(1+e1)3.927c=\exp (1+e^{-1})\approx 3.927. When d=Θ(g)d=\Theta(g), that we claim to be the "interesting" case, the randomized algorithm is exponentially faster than any other algorithm in the literature, both deterministic and randomized ones. This is true in practice as well. Indeed, experiments show that it outperforms its rivals even starting from as small a network as a POPS(2,2), and the gap grows exponentially with the size of the network. We can also show that, under proper hypothesis, no deterministic algorithm can asymptotically match its performance

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Towards practical permutation routing on meshes

    Get PDF
    We consider the permutation routing problem on two-dimensional n×nn \times n meshes. To be practical, a routing algorithm is required to ensure very small queue sizes QQ, and very low running time TT, not only asymptotically but particularly also for the practically important nn up to 10001000. With a technique inspired by a scheme of Kaklamanis/Krizanc/Rao, we obtain a near-optimal result: T=2n+O(1)T = 2 \cdot n + {\cal O}(1) with Q=2Q = 2. Although QQ is very attractive now, the lower order terms in TT make this algorithm highly impractical. Therefore we present simple schemes which are asymptotically slower, but have TT around 3n3 \cdot n for {\em all} nn and QQ between 2 and 8

    Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures

    Get PDF
    This work proposes a general framework for the design and simulation of network on chip based turbo decoder architectures. Several parameters in the design space are investigated, namely the network topology, the parallelism degree, the rate at which messages are sent by processing nodes over the network and the routing strategy. The main results of this analysis are: i) the most suited topologies to achieve high throughput with a limited complexity overhead are generalized de-Bruijn and generalized Kautz topologies; ii) depending on the throughput requirements different parallelism degrees, message injection rates and routing algorithms can be used to minimize the network area overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date 27 may 2009

    Randomized Routing and Sorting on the Reconfigurable Mesh

    Get PDF
    In this paper we demonstrate the power of reconfiguration by presenting efficient randomized algorithms for both packet routing and sorting on a reconfigurable mesh connected computer (referred to simply as the mesh from hereon). The run times of these algorithms are better than the best achievable time bounds on a conventional mesh. In particular, we show that permutation routing problem can be solved on a linear array of size n in 3/4n steps, whereas n-1 is the best possible run time without reconfiguration. We also show that permutation routing on an n x n reconfigurable mesh can be done in time n + o(n)using a randomized algorithm or in time 1.25n + o(n) deterministically. In contrast, 2n-2 is the diameter of a conventional mesh and hence routing and sorting will need at least 2n-2 steps on a conventional mesh. In addition we show that the problem of sorting can be solved in time n+ o(n). All these time bounds hold with high probability. The bisection lower bound for both sorting and routing on the mesh is n/2, and hence our algorithms have nearly optimal time bounds

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori
    corecore