12 research outputs found

    Blocking performance of extended pruned vertically stacked optical banyan structure under different link failure conditions

    Get PDF
    The blocking performance of extended pruned vertically stacked optical banyan (VSOB) networks under different link failure conditions has been analyzed in this paper. We applied plane fixed routing with linear search and plane fixed routing with random search algorithms to route the optical data through the network in our simulation. Our simulation results show that adding one or two extra planes to the pruned VSOB network reduces the blocking probability significantly. Beyond two extra planes, the decrease of blocking probability is not so significant. A close approximation of the minimum number of planes required to make the extended pruned vertically stacked optical banyan networks nonblocking has been presented

    A new scheme to realize crosstalk-free permutations in optical MINs with vertical stacking

    Get PDF
    ©2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Vertical stacking is an alternative for constructing nonblocking multistage interconnection networks (MINs). In this paper, we study the crosstalk-free permutation in rearrangeable, self-routing Banyan-type optical MINs built on vertical stacking and propose a new scheme for realizing permutations in this class of optical MINs crosstalk-free. The basic idea of the new scheme is to classify permutations into permutation classes such that all permutations in one class share the same crosstalk-free decomposition pattern. By running the Euler-Split based crosstalk-free decomposition only once for a permutation class and applying the obtained crosstalk-free decomposition pattern to all permutations in the class, crosstalk-free decomposition of permutations can be realized in a more efficient way. We show that the number of permutations in a permutation class is huge, enabling the average time complexity of the new scheme to realize a crosstalk-free permutation in an N by N network to be reduced to O(N) from previously O(NlogN).Xiaohong Jiang, Hong Shen, Md. Mamun-ur-Rashid Khandker, Susumu Horiguch

    Center for Aeronautics and Space Information Sciences

    Get PDF
    This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets

    Switching techniques for broadband ISDN

    Get PDF
    The properties of switching techniques suitable for use in broadband networks have been investigated. Methods for evaluating the performance of such switches have been reviewed. A notation has been introduced to describe a class of binary self-routing networks. Hence a technique has been developed for determining the nature of the equivalence between two networks drawn from this class. The necessary and sufficient condition for two packets not to collide in a binary self-routing network has been obtained. This has been used to prove the non-blocking property of the Batcher-banyan switch. A condition for a three-stage network with channel grouping and link speed-up to be nonblocking has been obtained, of which previous conditions are special cases. A new three-stage switch architecture has been proposed, based upon a novel cell-level algorithm for path allocation in the intermediate stage of the switch. The algorithm is suited to hardware implementation using parallelism to achieve a very short execution time. An array of processors is required to implement the algorithm The processor has been shown to be of simple design. It must be initialised with a count representing the number of cells requesting a given output module. A fast method has been described for performing the request counting using a non-blocking binary self-routing network. Hardware is also required to forward routing tags from the processors to the appropriate data cells, when they have been allocated a path through the intermediate stage. A method of distributing these routing tags by means of a non-blocking copy network has been presented. The performance of the new path allocation algorithm has been determined by simulation. The rate of cell loss can increase substantially in a three-stage switch when the output modules are non-uniformly loaded. It has been shown that the appropriate use of channel grouping in the intermediate stage of the switch can reduce the effect of non-uniform loading on performance

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design

    Joint optimization of topology, switching, routing and wavelength assignment

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 279-285).To provide end users with economic access to high bandwidth, the architecture of the next generation metropolitan area networks (MANs) needs to be judiciously designed from the cost perspective. In addition to a low initial capital investment, the ultimate goal is to design networks that exhibit excellent scalability - a decreasing cost-per-node-per-unit-traffic as user number and transaction size increase. As an effort to achieve this goal, in this thesis we search for the scalable network architectures over the solution space that embodies the key aspects of optical networks: fiber connection topology, switching architecture selection and resource dimensioning, routing and wavelength assignment (RWA). Due to the inter-related nature of these design elements, we intended to solve the design problem jointly in the optimization process in order to achieve over-all good performance. To evaluate how the cost drives architectural tradeoffs, an analytical approach is taken in most parts of the thesis by first focusing on networks with symmetric and well defined structures (i.e., regular networks) and symmetric traffic patterns (i.e., all-to-all uniform traffic), which are fair representations that give us suggestions of trends, etc.(cont.) We starts with a examination of various measures of regular topologies. The average minimum hop distance plays a crucial role in evaluating the efficiency of network architecture. From the perspective of designing optical networks, the amount of switching resources used at nodes is proportional to the average minimum hop distance. Thus a smaller average minimum hop distance translates into a lower fraction of pass-through traffic and less switching resources required. Next, a first-order cost model is set up and an optimization problem is formulated for the purpose of characterizing the tradeoffs between fiber and switching resources. Via convex optimization techniques, the joint optimization problem is solved analytically for (static) uniform traffic and symmetric networks. Two classes of regular graphs - Generalized Moore Graphs and A-nearest Neighbors Graphs - are identified to yield lower and upper cost bounds, respectively. The investigation of the cost scalability further demonstrates the advantage of the Generalized Moore Graphs as benchmark topologies: with linear switching cost structure, the minimal normalized cost per unit traffic decreases with increasing network size for the Generalized Moore Graphs and their relatives.(cont.) In comparison, for less efficient fiber topologies (e.g., A-nearest Neighbors) and switching cost structures (e.g., quadratic cost), the minimal normalized cost per unit traffic plateaus or even increases with increasing network size. The study also reveals other attractive properties of Generalized Moore Graphs in conjunction with minimum hop routing - the aggregate network load is evenly distributed over each fiber. Thus, Generalized Moore Graphs also require the minimum number of wavelengths to support a given uniform traffic demand. Further more, the theoretical works on the Generalized Moore Graphs and their close relatives are extended to study more realistic design scenarios in two aspects. One aspect addresses the irregular topologies and (static) non-uniform traffic, for which the results of Generalized Moore networks are used to provide useful estimates of network cost, and are thus offering good references for cost-efficient optical networks. The other aspect deals with network design under random demands. Two optimization formulations that incorporate the traffic variability are presented.(cont.) The results show that as physical architecture, Generalized Moore Graphs are most robust (in cost) to the demand uncertainties. Analytical results also provided design guidelines on how optimum dimensioning, network connectivity, and network costs vary as functions of risk aversion, service level requirements, and probability distributions of demands.by Kyle Chi Guan.Ph.D

    Routing and Wavelength Assignment for Multicast Communication in Optical Network-on-Chip

    Get PDF
    An Optical Network-on-Chip (ONoC) is an emerging chip-level optical interconnection technology to realise high-performance and power-efficient inter-core communication for many-core processors. Within the field, multicast communication is one of the most important inter-core communication forms. It is not only widely used in parallel computing applications in Chip Multi-Processors (CMPs), but also common in emerging areas such as neuromorphic computing. While many studies have been conducted on designing ONoC architectures and routing schemes to support multicast communication, most existing solutions adopt the methods that were initially proposed for electrical interconnects. These solutions can neither fully take advantage of optical communication nor address the special requirements of an ONoC. Moreover, most of them focus only on the optimisation of one multicast, which limits the practical applications because real systems often have to handle multiple multicasts requested from various applications. Hence, this thesis will address the design of a high-performance communication scheme for multiple multicasts by taking into account the unique characteristics and constraints of an ONoC. This thesis studies the problem from a network-level perspective. The design methodology is to optimally route all multicasts requested simultaneously from the applications in an ONoC, with the objective of efficiently utilising available wavelengths. The novelty is to adopt multicast-splitting strategies, where a multicast can be split into several sub-multicasts according to the distribution of multicast nodes, in order to reduce the conflicts of different multicasts. As routing and wavelength assignment problem is an NP-hard problem, heuristic approaches that use the multicast-splitting strategy are proposed in this thesis. Specifically, three routing and wavelength assignment schemes for multiple multicasts in an ONoC are proposed for different problem domains. Firstly, PRWAMM, a Path-based Routing and Wavelength Assignment for Multiple Multicasts in an ONoC, is proposed. Due to the low manufacture complexity requirement of an ONoC, e.g., no splitters, path-based routing is studied in PRWAMM. Two wavelength-assignment strategies for multiple multicasts under path-based routing are proposed. One is an intramulticast wavelength assignment, which assigns wavelength(s) for one multicast. The other is an inter-multicast wavelength assignment, which assigns wavelength(s) for different multicasts, according to the distributions of multicasts. Simulation results show that PRWAMM can reduce the average number of wavelengths by 15% compared to other path-based schemes. Secondly, RWADMM, a Routing and Wavelength Assignment scheme for Distribution-based Multiple Multicasts in a 2D ONoC, is proposed. Because path-based routing lacks flexibility, it cannot reduce the link conflicts effectively. Hence, RWADMM is designed, based on the distribution of different multicasts, which includes two algorithms. One is an optimal routing and wavelength assignment algorithm for special distributions of multicast nodes. The other is a heuristic routing and wavelength assignment algorithm for random distributions of multicast nodes. Simulation results show that RWADMM can reduce the number of wavelengths by 21.85% on average, compared to the state-of-the-art solutions in a 2D ONoC. Thirdly, CRRWAMM, a Cluster-based Routing and Reusable Wavelength Assignment scheme for Multiple Multicasts in a 3D ONoC, is proposed. Because of the different architectures with a 2D ONoC (e.g., the layout of nodes, optical routers), the methods designed for a 2D ONoC cannot be simply extended to a 3D ONoC. In CRRWAMM, the distribution of multicast nodes in a mesh-based 3D ONoC is analysed first. Then, routing theorems for special instances are derived. Based on the theorems, a general routing scheme, which includes a cluster-based routing method and a reusable wavelength assignment method, is proposed. Simulation results show that CRRWAMM can reduce the number of wavelengths by 33.2% on average, compared to other schemes in a 3D ONoC. Overall, the three routing and wavelength assignment schemes can achieve high-performance multicast communication for multiple multicasts of their problem domains in an ONoC. They all have the advantages of a low routing complexity, a low wavelength requirement, and good scalability, compared to their counterparts, respectively. These methods make an ONoC a flexible high-performance computing platform to execute various parallel applications with different multicast requirements. As future work, I will investigate the power consumption of various routing schemes for multicasts. Using a multicast-splitting strategy may increase power consumption since it needs different wavelengths to send packets to different destinations for one multicast, though the reduction of wavelengths used in the schemes can also potentially decrease overall power consumption. Therefore, how to achieve the best trade-off between the total number of wavelengths used and the number of sub-multicasts in order to reduce power consumption will be interesting future research
    corecore