113 research outputs found
The Effect Of Hot Spots On The Performance Of Mesh--Based Networks
Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance
Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms
Recommended from our members
A survey of routing techniques in store-and-forward and wormhole interconnects.
This paper presents an overview of algorithms for directing messages through networks of varying topology. These are commonly referred to as routing algorithms in the literature that is presented. In addition to providing background on networking terminology and router basics, the paper explains the issues of deadlock and livelock as they apply to routing. After this, there is a discussion of routing algorithms for both store-and-forward and wormhole-switched networks. The paper covers both algorithms that do and do not adapt to conditions in the network. Techniques targeting structured as well as irregular topologies are discussed. Following this, strategies for routing in the presence of faulty nodes and links in the network are described
Task allocation and migration on a star-network
Modern day applications require computational power which cannot be satisfied with uniprocessor systems. So the use of multiprocessor systems in such jobs becomes necessary. This thesis presents an approach of allocating the tasks to a multiprocessor system called the star network. Generally, an incoming task requires only a part of the star network, and not the whole network, for its execution. So, we need a task allocation strategy which can identify the free processors forming a substar and allocate tasks to these substars. The task executes for a time equal to task residence time and then relinquishes the substar. Sometimes there might be enough free processors forming a substar in the network which can host the next incoming task. But the allocation strategy may not recognize the free processors as a substar. To create a substar of free processors to host the next task, task migration has to be performed such that the free processors are grouped into a substar. In this work, three processor allocation strategies: static, dynamic and dynamic work task migration are presented. Using simulations, a comparison of these strategies is done to obtain the percentage improvement of one strategy over the other. Also a comparative study of the working of these strategies in star-networks and hypercubes is done. A saving of 5-11% is achieved by for both the networks incorporating task-migration in dynamic allocation over simple dynamic allocation
Analysis of wormhole routings in cayley graphs of permutation groups.
Over a decade, a new class of switching technology, called wormhole routing, has been investigated in the multicomputer interconnection network field. Several classes of wormhole routing algorithms have been proposed. Most of the algorithms have been centered on the traditional binary hypercube, k-ary n-cube mesh, and torus networks. In the design of a wormhole routing algorithm, deadlock avoidance scheme is the main concern. Recently, new classes of networks called Cayley graphs of permutation groups are considered very promising alternatives. Although proposed Cayley networks have superior topological properties over the traditional network topologies, the design of the deadlock-free wormhole routing algorithm in these networks is not simple. In this dissertation, we investigate deadlock free wormhole routing algorithms in the several classes of Cayley networks, such as complete-transposition and star networks. We evaluate several classes of routing algorithms on these networks, and compare the performance of each algorithm to the simulation study. Also, the performances of these networks are compared to the traditional networks. Through extensive simulation we found that adaptive algorithm outperformed deterministic algorithm in general with more virtual channels. On the network performance comparison, the complete transposition network showed the best performance among the similar sized networks, and the binary hypercube performed better compared to the star graph
Recommended from our members
Investigation into the wafer-scale integration of fine-grain parallel processing computer systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis investigates the potential of wafer-scale integration (WSI) for the implementation of low-cost fine-grain parallel processing computer systems. As WSI is a relatively new subject, there was little work on which to base investigations. Indeed, most WSI architectures existed only as untried and sometimes vague proposals. Accordingly, the research strategy approached this problem by identifying a representative WSI structure and architecture on which to base investigations. An analysis of architectural proposals identified associative memory to be general purpose parallel processing component used in a wide range of WSI architectures. Furthermore, this analysis provided a set of WSI-level design requirements to evaluate the sustainability of different architectures as research vehicles. The WSI-ASP (WASP) device, which has a large associative memory as its main component is shown to meet these requirements and hence was chosen as the research vehicle. Consequently, this thesis addresses WSI potential through an in-depth investigation into the feasibility of implementing a large associative memory for the WASP device that meets the demanding technological constraints of WSI. Overall, the thesis concludes that WSI offers significant potential for the implementation of low-cost fine-grain parallel processing computer systems. However, due to the dual constraints of thermal management and the area required for the power distribution network, power density is a major design constraint in WSI. Indeed, it is shown that WSI power densities need to be an order of magnitude lower than VLSI power densities. The thesis demonstrates that for associative memories at least, VLSI designs are unsuited to implementation in WSI. Rather, it is shown that WSI circuits must be closely matched to the operational environment to assure suitable power densities. These circuits are significantly larger than their VLSI equivalents. Nonetheless, the thesis demonstrates that by concentrating on the most power intensive circuits, it is possible to achieve acceptable power densities with only a modest increase in area overheads.SER
Efficient processor management strategies for multicomputer systems
Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed
- …