247 research outputs found

    PERFORMANCE ASSESSMENT OF SCHEDULERS IN OPTICAL INTERCONNECTION NETWORKS

    Get PDF
    With ever-increasing demand for high-performance computing systems, interconnection networks, serving as the communication links in multicore architectures have become a key element for guaranteeing the system performance. Compared with bandwidth-limited power hungry electrical interconnection networks, optical integrated interconnection networks also referred to as networks-on-chip (ONoC) architectures are emerging as a promising alternative to enable future computing performance. In ONoC architectures, scheduling algorithms are necessary for avoiding packet collisions while achieving high throughput, low latency, and good fairness. Scheduling algorithms exist for non-blocking electrical NoC. These algorithms can be applied to ONoC, while accounting for additional constraints arising from optical component limitations. In this thesis various scheduling algorithms are simulated, With the objective of comparing their latency and throughput using C + + programming language for ONoC with bus and ring topologies. An optimal scheduler based on two-step scheduling (TSS) technique is proposed. The optimal TSS models the scheduling problem in two steps for ONoC. The first step is the matching step which is done by representing each node pair as input bipartite graph then matching takes place between the input and output ports. The second step performs the wavelength assignment between each paired node while avoiding collisions and also with the consideration of wavelength continuity. The two-step approach with the iSLIP and MWM algorithms are considered. The proposed optimal TSS is simulated and its performances are evaluated. The optimal scheduler with maximum weighted matching (MWM) scheduling policy achieves better results in comparison to iSLIP scheduling policy based on queue length under any packet arrival process. The optimal MWM scheduling policy achieved better performance for both bus and ring topologies. The main result is that unidirectional ring topology outperforms the bus topology for any number of wavelengths less or equal to the number of ONoC port, even if the average path length is longer. The reason is that in the bus topology half of the wavelengths are allocated in each direction, fixing the maximum number of packets in each direction using two transceivers per node can compensate this issue, reaching to better performance than the ring

    Bandwidth Requirements of GPU Architectures

    Get PDF
    A new trend in chip multiprocessor (CMP) design is to incorporate graphics processing unit (GPU) cores, making them heterogeneous. GPU cores have a higher bandwidth requirement than CPU cores, as they tend to generate much more memory requests. In order to achieve good performance, there must be sufficient bandwidth between the GPU shader cores and main memory to service these memory requests in a timely manner. However, designing for the highest possible bandwidth will lead to high energy costs. The communication requirements of GPU cores must be determined in order to choose a proper interconnect. To this end, we have simulated several CUDA benchmarks with varying bandwidths using the GPGPU-Sim simulator. Our results show that the communication requirements of GPUs vary from workload to workload. We suggest that cores be connected using a photonic interconnect capable of supporting different bandwidths in order to reduce power consumption. For each transmission, the interconnect used will depend on how the bandwidth affects performance. We determined that the ratio of interconnect-shader stalls to the total number of execution cycles is a good indicator of whether or not an application will be bandwidth-sensitive. We used this finding to develop a bandwidth selection policy for GPU applications using a photonic NoC. With our policy selections, the photonic interconnect used 12.5% less power than a photonic interconnect with optimal performing choices, which only gave a performance improvement of 1.37% compared to our policy. The photonic interconnect with our policy also had the lowest energy-delay product out of the interconnects we compared it against

    T-WAS and T-XAS algorithms for fiber-loop optical buffers

    No full text
    In optical packet/burst switched networks fiber loops provide a viable and compact means of contention resolution. For fixed size packets it is known that a basic void-avoiding schedule (VAS) can vastly outperform a more classical pre-reservation algorithm as FCFS. For the setting of a uniform distributed packet size and a restricted buffer size we proposed two novel forward-looking algorithms, WAS and XAS, that, in specific settings, outperform VAS up to 20% in terms of packet loss. This contribution extends the usage and improves the performance of the WAS and XAS algorithms by introducing an additional threshold variable. By optimizing this threshold, the process of selectively delaying packet longer than strictly necessary can be made more or less strict and as such be fitted to each setting. By Monte Carlo simulation it is shown that the resulting T-WAS and T-XAS algorithms are most effective for those instances where the algorithms without threshold can offer no or only limited performance improvement

    Design and analysis of a 3-dimensional cluster multicomputer architecture using optical interconnection for petaFLOP computing

    Get PDF
    In this dissertation, the design and analyses of an extremely scalable distributed multicomputer architecture, using optical interconnects, that has the potential to deliver in the order of petaFLOP performance is presented in detail. The design takes advantage of optical technologies, harnessing the features inherent in optics, to produce a 3D stack that implements efficiently a large, fully connected system of nodes forming a true 3D architecture. To adopt optics in large-scale multiprocessor cluster systems, efficient routing and scheduling techniques are needed. To this end, novel self-routing strategies for all-optical packet switched networks and on-line scheduling methods that can result in collision free communication and achieve real time operation in high-speed multiprocessor systems are proposed. The system is designed to allow failed/faulty nodes to stay in place without appreciable performance degradation. The approach is to develop a dynamic communication environment that will be able to effectively adapt and evolve with a high density of missing units or nodes. A joint CPU/bandwidth controller that maximizes the resource allocation in this dynamic computing environment is introduced with an objective to optimize the distributed cluster architecture, preventing performance/system degradation in the presence of failed/faulty nodes. A thorough analysis, feasibility study and description of the characteristics of a 3-Dimensional multicomputer system capable of achieving 100 teraFLOP performance is discussed in detail. Included in this dissertation is throughput analysis of the routing schemes, using methods from discrete-time queuing systems and computer simulation results for the different proposed algorithms. A prototype of the 3D architecture proposed is built and a test bed developed to obtain experimental results to further prove the feasibility of the design, validate initial assumptions, algorithms, simulations and the optimized distributed resource allocation scheme. Finally, as a prelude to further research, an efficient data routing strategy for highly scalable distributed mobile multiprocessor networks is introduced
    • 

    corecore