231 research outputs found
Deterministic 1-k routing on meshes with applications to worm-hole routing
In - routing each of the processing units of an mesh connected computer initially holds packet which must be routed such that any processor is the destination of at most packets. This problem reflects practical desire for routing better than the popular routing of permutations. - routing also has implications for hot-potato worm-hole routing, which is of great importance for real world systems. We present a near-optimal deterministic algorithm running in \sqrt{k} \cdot n / 2 + \go{n} steps. We give a second algorithm with slightly worse routing time but working queue size three. Applying this algorithm considerably reduces the routing time of hot-potato worm-hole routing. Non-trivial extensions are given to the general - routing problem and for routing on higher dimensional meshes. Finally we show that - routing can be performed in \go{k \cdot n} steps with working queue size four. Hereby the hot-potato worm-hole routing problem can be solved in \go{k^{3/2} \cdot n} steps
Online Permutation Routing in Partitioned Optical Passive Star Networks
This paper establishes the state of the art in both deterministic and
randomized online permutation routing in the POPS network. Indeed, we show that
any permutation can be routed online on a POPS network either with
deterministic slots, or, with high probability, with
randomized slots, where constant
. When , that we claim to be the
"interesting" case, the randomized algorithm is exponentially faster than any
other algorithm in the literature, both deterministic and randomized ones. This
is true in practice as well. Indeed, experiments show that it outperforms its
rivals even starting from as small a network as a POPS(2,2), and the gap grows
exponentially with the size of the network. We can also show that, under proper
hypothesis, no deterministic algorithm can asymptotically match its
performance
Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks
Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment
Towards practical permutation routing on meshes
We consider the permutation routing problem on two-dimensional meshes. To be practical, a routing algorithm is required to ensure very small queue sizes , and very low running time , not only asymptotically but particularly also for the practically important up to . With a technique inspired by a scheme of Kaklamanis/Krizanc/Rao, we obtain a near-optimal result: with . Although is very attractive now, the lower order terms in make this algorithm highly impractical. Therefore we present simple schemes which are asymptotically slower, but have around for {\em all} and between 2 and 8
Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures
This work proposes a general framework for the design and simulation of
network on chip based turbo decoder architectures. Several parameters in the
design space are investigated, namely the network topology, the parallelism
degree, the rate at which messages are sent by processing nodes over the
network and the routing strategy. The main results of this analysis are: i) the
most suited topologies to achieve high throughput with a limited complexity
overhead are generalized de-Bruijn and generalized Kautz topologies; ii)
depending on the throughput requirements different parallelism degrees, message
injection rates and routing algorithms can be used to minimize the network area
overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date
27 may 2009
Randomized Routing and Sorting on the Reconfigurable Mesh
In this paper we demonstrate the power of reconfiguration by presenting efficient randomized algorithms for both packet routing and sorting on a reconfigurable mesh connected computer (referred to simply as the mesh from hereon). The run times of these algorithms are better than the best achievable time bounds on a conventional mesh.
In particular, we show that permutation routing problem can be solved on a linear array of size n in 3/4n steps, whereas n-1 is the best possible run time without reconfiguration. We also show that permutation routing on an n x n reconfigurable mesh can be done in time n + o(n)using a randomized algorithm or in time 1.25n + o(n) deterministically. In contrast, 2n-2 is the diameter of a conventional mesh and hence routing and sorting will need at least 2n-2 steps on a conventional mesh. In addition we show that the problem of sorting can be solved in time n+ o(n). All these time bounds hold with high probability. The bisection lower bound for both sorting and routing on the mesh is n/2, and hence our algorithms have nearly optimal time bounds
The Effect Of Hot Spots On The Performance Of Mesh--Based Networks
Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori
- …