28 research outputs found
Wildcard dimensions, coding theory and fault-tolerant meshes and hypercubes
Hypercubes, meshes and tori are well known interconnection networks for parallel computers. The sets of edges in those graphs can be partitioned to dimensions. It is well known that the hypercube can be extended by adding a wildcard dimension resulting in a folded hypercube that has better fault-tolerant and communication capabilities. First we prove that the folded hypercube is optimal in the sense that only a single wildcard dimension can be added to the hypercube. We then investigate the idea of adding wildcard dimensions to d-dimensional meshes and tori. Using techniques from error correcting codes we construct d-dimensional meshes and tori with wildcard dimensions. Finally, we show how these constructions can be used to tolerate edge and node faults in mesh and torus networks
Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems
This paper presents a semi-distributed approach, for load balancing in large parallel and distributed systems, which is different from the conventional centralized and fully distributed approaches. The proposed strategy uses a two-level hierarchical control by partitioning the interconnection structure of a distributed or multiprocessor system into independent symmetric regions (spheres) centered at some control points. The central points, called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. We consider interconnection structures belonging to a number of families of distance transitive graphs for evaluation, and using their algebraic characteristics, show that identification of spheres and their scheduling points is, in general, an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard Matrix. Performance of the proposed strategy has been evaluated and compared with an efficient fully distributed strategy, through an extensive simulation study. In addition to yielding high performance in terms of response time and better resource utilization, the proposed strategy incurs less overhead in terms of control messages. It is also shown to be less sensitive to the communication delay of the underlying network
Fault-Tolerant Load Management for Real-Time Distributed Computer Systems
This paper presents a fault-tolerant scheme applicable to any decentralized load balancing algorithms used in soft real-time distributed systems. Using the theory of distance-transitive graphs for representing topologies of these systems, the proposed strategy partitions these systems into independent symmetric regions (spheres) centered at some control points. These central points, called fault-control points, provide a two-level task redundancy and efficiently re-distribute the load of failed nodes within their spheres. Using the algebraic characteristics of these topologies, it is shown that the identification of spheres and fault-control points is, in general, is an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard matrix. Assuming a realistic failure-repair system environment, the performance of the proposed strategy has been evaluated and compared with no fault environment, through an extensive and detailed simulation. For our fault-tolerant strategy, we propose two measures of goodness, namely, the percentage of re-scheduled tasks which meet their deadlines and the overhead incurred for fault management. It is shown that using the proposed strategy, up to 80% of the tasks can still meet their deadlines. The proposed strategy is general enough to be applicable to many networks, belonging to a number of families of distance transitive graphs. Through simulation, we have analyzed the sensitivity of this strategy to various system parameters and have shown that the performance degradation due to failures does not depend on these parameter. Also, the probability of a task being lost altogether due to multiple failures has been shown to be extremely low
Space Shuffle: A Scalable, Flexible, and High-Bandwidth Data Center Network
Data center applications require the network to be scalable and
bandwidth-rich. Current data center network architectures often use rigid
topologies to increase network bandwidth. A major limitation is that they can
hardly support incremental network growth. Recent work proposes to use random
interconnects to provide growth flexibility. However routing on a random
topology suffers from control and data plane scalability problems, because
routing decisions require global information and forwarding state cannot be
aggregated. In this paper we design a novel flexible data center network
architecture, Space Shuffle (S2), which applies greedy routing on multiple ring
spaces to achieve high-throughput, scalability, and flexibility. The proposed
greedy routing protocol of S2 effectively exploits the path diversity of
densely connected topologies and enables key-based routing. Extensive
experimental studies show that S2 provides high bisectional bandwidth and
throughput, near-optimal routing path lengths, extremely small forwarding
state, fairness among concurrent data flows, and resiliency to network
failures
Asynchronous Bypass Channels Improving Performance for Multi-synchronous Network-on-chips
Dr. Paul V. Gratz Network-on-Chip (NoC) designs have emerged as a replacement for traditional shared-bus designs for on-chip communications. As with all current VLSI design, however, reducing power consumption in NoCs is a critical challenge. One approach to reduce power is to dynamically scale the voltage and frequency of each network node or groups of nodes (DVFS). Another approach to reduce power consumption is to replace the balanced clock tree with a globally-asynchronous, locally-synchronous (GALS) clocking scheme. NoCs implemented with either of these schemes, however, tend to have high latencies as packets must be synchronized at the intermediate nodes between source and destination. In this work, we propose a novel router microarchitecture which offers superior performance versus typical synchroniz- ing router designs. Our approach features Asynchronous Bypass Channels (ABCs) at intermediate nodes thus avoiding synchronization delay. We also propose a new network topology and routing algorithm that leverage the advantages of the bypass channel offered by our router design. Our experiments show that our design improves the performance of a conventional synchronizing design with similar resources by up to 26 percent at low loads and increases saturation throughput by up to 50 percent
Recommended from our members
Performance Modelling and Evaluation of Network On Chip Under Bursty Traffic. Performance evaluation of communication networks using analytical and simulation models in NOCs with Fat tree topology under Bursty Traffic with virtual channels.
Physical constrains of integrated circuits (commonly called chip) in regards to size and finite number of wires, has made the design of System-on-Chip (SoC) more interesting to study in terms of finding better solutions for the complexity of the chip-interconnections. The SoC has hundreds of Processing Elements (PEs), and a single shared bus can no longer be acceptable due to poor scalability with the system size. Networks on Chip (NoC) have been proposed as a solution to mitigate complex on-chip communication problems for complex SoCs. They consists of computational resources in the form of PE cores and switching nodes which allow PEs to communicate with each other.
In the design and development of Networks on Chip, performance modelling and analysis has great theoretical and practical importance. This research is devoted to developing efficient and cost-effective analytical tools for the performance analysis and enhancement of NoCs with m-port n-tree topology under bursty traffic.
Recent measurement studies have strongly verified that the traffic generated by many real-world applications in communication networks exhibits bursty and self-similar properties in nature and the message destinations are uniformly distributed. NoC's performance is generally affected by different traffic patterns generated by the processing elements. As the first step in the research, a new analytical model is developed to capture the burstiness and self-similarity characteristics of the traffic within NoCs through the use of Markov Modulated Poisson Process. The performance results of the developed model highlight the importance of accurate traffic modelling in the study and performance evaluation of NoCs.
Having developed an efficient analytical tool to capture the traffic behaviour with a higher accuracy, in the next step, the research focuses on the effect of topology on the performance of NoCs. Many important challenges still remain as vulnerabilities within the design of NoCs with topology being the most important. Therefore a new analytical model is developed to investigate the performance of NoCs with the m-port n-tree topology under bursty traffic. Even though it is broadly proved in practice that fat-tree topology and its varieties result in lower latency, higher throughput and bandwidth, still most studies on NoCs adopt Mesh, Torus and Spidergon topologies. The results gained from the developed model and advanced simulation experiments significantly show the effect of fat-tree topology in reducing latency and increasing the throughput of NoCs.
In order to obtain deeper understanding of NoCs performance attributes and for further improvement, in the final stage of the research, the developed analytical model was extended to consider the use of virtual channels within the architecture of NoCs. Extensive simulation experiments were carried out which show satisfactory improvements in the throughput of NoCs with fat-tree topology and VCs under bursty traffic. The analytical results and those obtained from extensive simulation experiments have shown a good degree of accuracy for predicting the network performance under different design alternatives and various traffic conditions.Libyan Ministry of Higher Educatio
VCSEL Techniques for Wavelength-Multiplexed Optical Interconnects
The majority of global data communication is taking place within data centers where data is stored and processed and where the largest part of the power used for global networking is consumed. With the rapidly increasing use of Internet-based applications and services, data centers are equipped with a larger number of servers and switches requiring higher bandwidth connectivity. Optical interconnects (OIs) are used to provide the connectivity needed. Short-reach OIs are dominated by 850 nm GaAs-based vertical-cavity surface-emitting lasers (VCSELs) due to their low fabrication cost, low power consumption, high modulation speed, and circular output beam. With the need for even higher bandwidth connectivity, large efforts have been invested in the development of VCSEL-based OIs offering higher aggregate capacity. Until now, higher capacity has been achieved mostly through an increase of the lane rate by higher speed VCSELs and higher order modulation formats. Furthermore, spatial division multiplexing (SDM), using parallel fibers or multicore fibers, has proven effective for increasing the aggregate capacity. With these techniques, it is expected that the OI capacity will saturate at the 1 Tbit/s level.Capacity beyond the limits of current technologies is expected by also exploring the wavelength dimension, referred to as wavelength division multiplexing (WDM). This calls for the development of high-speed VCSELs at multiple wavelengths. To also enable the very small footprint transceivers and high bandwidth density needed as transceivers move closer to the switch AISC, the multiple wavelength VCSELs should be in a monolithic array. This requires a VCSEL technology where the wavelength of individual VCSELs can be precisely set in a post-growth fabrication process. As an integration platform for multiplexing and fiber coupling we envision a photonic circuit on Si with Si3N4 waveguides and grating couplers for VCSEL integration. With such waveguides being single mode and the grating couplers being polarization sensitive, the VCSELs in the array should be single transverse and polarization mode, in addition to having a high modulation bandwidth.In this thesis, an intra-cavity phase tuning technique, based on an Ar ion-beam etching process with sub-nm precision, is demonstrated for setting the resonance wavelength of VCSEL resonators with <2 nm precision in the wavelength range 1040-1070 nm. Single transverse and polarization mode VCSELs with a record output power of 6 mW are also demonstrated. Suppression of higher order transverse modes and the orthogonal polarization state is achieved by etching a shallow mode filter in the surface of the VCSEL
VCSEL and Integration Techniques for Wavelength-Multiplexed Optical Interconnects
GaAs-based vertical-cavity surface-emitting lasers (VCSELs) are dominating short-reach optical interconnects (OIs) due to their high modulation speed, low power consumption, circular output beam and low fabrication cost. Such OIs provide the high bandwidth connectivity needed for interconnecting servers and switches in data centers. With the rapidly increasing use of Internet-based applications and services, higher bandwidth connectivity and higher aggregate capacity VCSEL-based OIs are needed. Until now, this has been achieved mostly through an increase of the lane rate by higher speed VCSELs and higher order modulation formats. Furthermore, spatial-division-multiplexing has proven effective for increasing the aggregate capacity. Much higher capacity can be achieved by multiple wavelengths per fiber, known as wavelength-divisionmultiplexing (WDM). Moreover, smaller footprint and higher bandwidth density WDM transceivers can be built using monolithic multi-wavelength VCSEL arrays with densely spaced VCSELs. This requires a VCSEL technology where the wavelength of individual VCSELs can be precisely set in a post-epitaxial growth fabrication process and a photonic integrated circuit (PIC) for multiplexing and fiber coupling. Flip-chip integration over grating couplers (GCs) is considered for interfacing VCSELs with waveguides on the PIC. In this thesis, an intra-cavity phase tuning technique is demonstrated for setting the resonance wavelength of VCSELs in a monolithic array with an accuracy in spacing of <1 nm. Uniform performance over the array is achieved by spectral matching and balancing of mirror reflectances, optical confinement factor and optical gain. Single transverse and polarization mode VCSELs, as required for flip-chip integration over GCs, with a record output power of 6 mW are also demonstrated.Finally, an investigation of angled flip-chip integration of a VCSEL over a GC on a silicon photonic integrated circuit (Si-PIC) is presented. Dependencies of coupling efficiency and optical feedback on flip-chip angle and size of the VCSEL die are studied using numerical FDTD simulations. Moreover, flip-chip integration of a VCSEL over a GC on a Si-PIC is experimentally demonstrated. The insertion loss from the VCSEL at the input GC to a singlemode fiber, multimode fiber or flip-chip integrated photodetector over the output GC was measured and quantified. The latter forms an on-PIC optical link